[00:00:36] Camille Morhardt: Hi, and welcome to today’s podcast, What That Means, part of InTechnology. I have with me today, Andres Rodriguez. He is a fellow at Intel and knows pretty much everything there is to know about deep learning, which is going to be the topic of focus today. Welcome, Andres.
[00:00:54] Andres Rodriguez: Thanks for the invitation. It’s a pleasure being here.
[00:00:57] Camille Morhardt: So, can you define deep learning for us, for people who aren’t so familiar with it and also, its relationship in the broader AI or artificial intelligence that I think almost everybody has heard of?
[00:01:11] Andres Rodriguez: Yeah. So, deep learning is a branch of machine learning, and it is the branch that has grown exponentially over the past decade. The reason for deep learning is because it has multiple layers. And what I mean by a layer is multiple transformations. So, the way machine learning typically works is you have some input data.
For example, let’s put you’re trying to predict the price of a home. And so, you have a bunch of data for the features of the home, like the number of rooms, the size of the home. And you take these features and then you put it through a machine learning algorithm and the output is a price. Machine learning algorithms, you take your input data, and you do some processing. You pass it through a machine learning algorithm that transforms the data into the output. And in traditional machine learning, you can also think of this as shallow deep learning, where you only do a few transformations to the data input in order to come out with the output.
In deep learning, you have multiple layers of transformations that you’re doing to the input data. And what’s cool about deep learning is you can actually put the raw data or close to the raw data as input into your deep learning model to come out with the output. And so, to differentiate this from traditional machine learning, for example, take a problem of image classification. You pass as an input a set of pixels that correspond to an image.
In the past, you had to do some feature extractions. You require somebody that had expertise in computer vision to extract features about the image, and then you will pass those features into the machine learning algorithm. But today, having multiple layers in your deep learning model, you can pass the raw pixels into the model and out comes the class of the model. So, for example, this is a cat or a person or a fruit, et cetera. And this has been possible more recently because of the computational advancements. So, you can train larger models, deeper models, as well as access to a much larger datasets, which are needed in order to train the multiple layers that deep learning models have.
[00:03:55] Camille Morhardt: So, what are the limitations to the deep learning model? I think you just said one, right? It requires a lot of data.
[00:04:00] Andres Rodriguez: Large amounts of data, although there are ways around that, and I’ll talk about those in a moment. Another one is large amounts of computations. But again, there are some ways around that as well. Nowadays, there are very large models that are being trained from scratch. So, you take a model. Usually, in the beginning, the model is composed of weights that are random. And as you go through multiple iterations of the training, then the model converges. Now, this requires a lot of data and a lot of compute.
But once you have a trained model, let’s suppose you want to use that model for a similar application, not exactly the same one that it was trained for, but one that shares some of the similar characteristics. What you can do is you can take that already pre-trained model and apply a smaller dataset that is specific for your problem that you’re trying to solve, and you can retrain it with the pre-trained model as a starting point. This is often called fine-tuning or transfer learning, and this process requires less compute than when you’re training a model from scratch.
So, the idea that you need humongous amounts of data or humongous amounts of compute, it’s only true when you’re training large models from scratch. There are a number of pre-trained models available in the open-source domains that you can pull from and do fine-tuning on these models for your particular problem that you’re trying to solve.
[00:05:44] Camille Morhardt: Can you give an example?
[00:05:46] Andres Rodriguez: A problem that I did 10 years ago was taking a model that was trained on what is known as the ImageNet dataset. A dataset that had a thousand classes, about a thousand samples per class, so roughly a million samples, and you train it through hundreds of thousands of iterations. I took the trained model and then I took a smaller dataset that was composed of cars, people and a few other objects of interest that was a much smaller dataset. And then I fine-tuned that model with the smaller dataset for an application that, at that time, I was interested in using.
So, then my model then was able to be deployed and was fine-tuned to just a few classes. Nowadays, you can do fine-tuning for a number of applications like language applications, other computer vision applications and recommendation models.
Now, I mentioned how does the training process works. The way it typically works is you have data that is labeled. So, a human took a number of images and labeled one as cat, one as dog, et cetera. And then you take a model and initially, the model has random weights. So, a model is just composed of weights organized in a particular architecture.
[00:07:11] Camille Morhardt: Explain more about weights as you’re going through here. Really, what are those?
[00:07:16] Andres Rodriguez: Yeah, sure. So, when you take your input image, you pass it through the various layers of the deep learning model. And the layers are essentially a number of non-linear and linear transformations. And a linear transformation is you take an input and you multiply it by weight. Sometimes, you add a bias. And so, the weights that make up the model are the weights in these linear transformations and the biases are also considered weights in deep learning lingo.
And then you take the output of the linear transformation and you apply non-linear transformation. After you pass the data through a number of linear and non-linear transformations, then you get your output. Initially, the output is going to be essentially garbage because the model has not been trained well. But then you compare the output that you get with the expected output, and you know the expected output because a human labeled what the output should have been.
And then you compute a measure of error, a metric of error. Then you back propagate the error through the model in order to know how to adjust the model weight, so that the error decreases. And you do this through multiple iterations with all the samples. Sometimes, you might iterate hundreds of times through the entire dataset.
Now, this is what’s known as supervised learning. It’s called supervised because a human somewhat supervised it in the sense that a human labeled the data samples using the training process. But nowadays, been a renewal of older methods where you take a dataset that has not been labeled and you do some initial training and supervised training. And then you finish the training cycles with a smaller dataset that is labeled. So, you do this semi-supervised approach using both unlabeled data and labeled data.
[00:09:26] Camille Morhardt: So, one question for you, as you’re back propagating the error or the weight, you said you’re going to push it now back up through the model, so that the model can figure out where it got it wrong and fix that. Does a human know how the computer is making an adjustment? Like, oops, it looks like most cats have tails. So, I was mistaken when I classified something as a cat most of the time if it doesn’t have a tail or whatever. Is it something that a human can follow? Or does the data scientist not actually know how the model is correcting itself, how it’s making that adjustment?
[00:10:08] Andres Rodriguez: So, the human can track how the weights are being changed. Back propagation is just a bunch of linear algebra equations. So, there is no magic behind that. There are ways in observing how certain change in the weight affect the output, and it’s typically not necessarily tail versus no tail. But a human, there are tools that can help you visualize changes in the weights in a particular layer, the effect that it’s having on the model output.
Although most data scientists when they’re training models, they don’t necessarily examine these type of effects. They pass the data. They give the models some training parameters. And they are observing whether or not the error is decreasing over time as the number of iterations grow. So, that’s one of the key metrics that the data scientists tends to observe, is this decrease in the training error.
[00:11:15] Camille Morhardt: What I’m sort of dancing around is this notion of inexplicability of a model. And I’m trying to figure out, is that real? Data scientists, as you’re describing it, could actually go in and look and very clearly understand which weights are leading to different outcomes. Is it more a matter of how a data scientist would communicate this? Or is it that they can’t necessarily figure out what’s happening? What is the real problem with data explainability?
[00:11:45] Andres Rodriguez: Yeah, AI explainability is a field of study on its own. In some areas, you can explain how certain changes in the weights generate output or modify the output, but the weights themselves can be a bit mysterious. For example, when you design an architecture, meaning a certain pattern of how the weights are going to be arranged, you don’t necessarily have typically an intuition of, hey, these weights in the middle of the neural network, these are going to control whether the image has a tail or not a tail. So, this level of granularity, it’s not something that the data scientist designs. And so, this is part of the inexplicability of deep learning.
But on the other hand, typically, assuming you have access to sufficient data and compute, you tend to get higher performance. And so, part of the debate has been, on the one hand, traditional machine learning models offer certain theoretical guarantees. Deep learning models do not typically have those same guarantees in performance, but the performance is better, the accuracy is better. It might be more challenging to explain. On the other hand, you get better performance.
And so, if you were to ask me, would I rather go to a surgeon that is 80% successful but can explain me exactly what goes right and wrong versus a surgeon that is 99% successful, that cannot explain in detail why it went right or wrong? I would choose the 99%. But I know there are mixed opinions in this debate.
[00:13:52] Camille Morhardt: Let me ask you something. It seems like deep learning has a lot of, I’ll just say, press right now. Do you think that there’s going to be, over time, a convergence of AI approaches maybe into a handful of distinct ones, like maybe there’s some sort of distributed learning, like federated learning and then there’s deep learning and a couple others? Or is the world and the industry in AI moving towards multiple different kinds of models for every different approach and it matters what you’re doing?
[00:14:25] Andres Rodriguez: That’s a great question. There are techniques to training a model, and there are techniques that are centralized such as you’re going to train everything in the cloud and decentralize, that you might do some training using federated learning. And I think those will continue. Federated learning is important, particularly for security when you don’t want to put your data in the cloud. So, both techniques will continue to gain popularity.
The types of models that are getting trained have evolved. Initially, when deep learning was resurfaced roughly 12 years ago, it particularly impacted the computer vision domain initially. And the types of models were known as convolutional neural networks. Convolutional neural networks or CNNs, they became very, very popular until deep learning started expanding to the language domain. In the language domain, recurrent neural networks started to become much more popular.
What has happened over the past three years is more of a convergence in the models towards a new type of architecture called transformer models. I don’t think CNNs or RNNs will disappear, but I think transformers will become the dominant type of model across the various domains. And I don’t think necessarily one type of transformer model, but the class of model will become the dominant types of models. Now, the techniques to train the models, I think those will not necessarily converge given that some compute will continue to be done in the cloud and other compute will be done at the edge.
[00:16:37] Camille Morhardt: Okay, that’s interesting. Now, I want to ask you about the functionality or the result of the models. Do you think that models will be trained more in the future towards specialized functionality, or you even say pre-trained models that you can then fine-tune? Or do you think things are going to gravitate more towards this artificial general intelligence, AGI, that we keep hearing about as a Holy Grail in the media?
[00:17:06] Andres Rodriguez: So, AGI-
[00:17:07] Camille Morhardt: Or maybe they’re not mutually exclusive. I don’t know.
[00:17:09] Andres Rodriguez: Yeah, I agree, AGI is the Holy Grail. I don’t think they are mutually exclusive. I don’t think you can take a large jump away from what we’re doing in deep learning and all of a sudden get AGI. There are actually debates in the community on whether AGI is possible.
But to the first part of your question, I think what’s going to happen is you’re going to have these extremely large models that are going to be trained that have the capacity to be fine-tuned for various types of applications. And the reason I think this is the future is models are growing in size and complexity at an extremely rapid pace. And most companies don’t have the data or the resources to be training these models from scratch, but they do have the resources to fine-tune them for their particular application.
So, I anticipate the future will be a few extremely large models trained by likely hyperscaler companies that have access to other compute and data. And then those models might be either be freely available or be given with a fee for companies to then take them and fine-tune them for their particular application.
[00:18:37] Camille Morhardt: This is fascinating. Has this business model emerged yet that you can see?
[00:18:43] Andres Rodriguez: Not the renting out models, widely. What has emerged is pre-trained models are put into model zoos, repositories, that are freely available on the one hand. On the other hand, some extremely large models are trained by a large company that then provides services directly based on those models. So, rather than renting out the models, they provide APIs for end-users to leverage those models. So, this service or AI as a service, it’s been growing.
[00:19:30] Camille Morhardt: For this to happen, you said hyperscalers. So, let’s just define more specifically the kinds of companies that you think will be able to either rent out models or provide APIs or services based on their access to large quantities of data. How do they get that data? Why is that data going through them? We’ll start with that.
[00:19:51] Andres Rodriguez: Google, Amazon, Meta, Microsoft, Alibaba, Tencent, Baidu, just to give a few examples, these companies have access to a lot of the data from the end users. They have the resources to label large amounts of data, and most of them have huge data centers. So, they have the computational capacity to train extremely large models from scratch.
[00:20:28] Camille Morhardt: I was talking with Ashwin Ram at Google who told me that, on your phone, you start to type, respond back to a text message and it’ll give you some pre-fill options, and those are all customized to you. So, the pre-fill options, after a while, they’re really going to start to look like things that I personally type and they would be very different than what would show up on your phone, perhaps. Whereas they’re beginning with a model that gets delivered to both of our phones. And then it’s like you said, it’s fine-tuning around the edge. But at the same time, they’re collecting weights back from each of us to improve that overall model that they send out with a new phone.
[00:21:06] Andres Rodriguez: This is a great example of federated learning where the training is happening on the device.
[00:21:13] Camille Morhardt: Consumers are actually doing the work, right? I may be saying, “Yes, this is accurate or no, this is not accurate.” And actually, helping improve that model.
[00:21:22] Andres Rodriguez: That’s correct, yeah.
[00:21:23] Camille Morhardt: Okay. So, I will just ask you what your opinion is on what sorts of things need to be contemplated from a policy and privacy standpoint. I know you’re not a lawyer and not an expert in this area, but just as a researcher who deals with AI on a daily basis.
[00:21:43] Andres Rodriguez: It’s not just my private information to generate a profit and how do you deal with that, even in some of the new types of models that can generate content. So, for example, there are some examples of AI generating beautiful art or amazing music, things that you would have thought would be impossible 10 years ago. Training of these models is with human content, so with music that human artists wrote. And yet, I’m not sure if the human artists are getting any compensation for their work being used to train these new AI models. Or similar in the art field, I don’t know if the artists are getting any compensation. People that are working in public policy should be paying attention and figuring out how to tackle these issues.
I’ll tell you though the biggest issue that I see with AI. And before mentioning this, I do love AI and all the many benefits that it brings our society, but one area that I am quite concerned is on personalized content. So, it is great when I go to Netflix or Amazon, and they know what I like, and I get products that they predict that I’m going to want. What I find potentially dangerous is when it comes to news sources or media content that is extremely personalized to me, that reinforces my beliefs rather than challenging them. And you can see how society can become quite divided when you are always being fed what you already believe. And so, I see this as dangerous because it can break the empathy in people towards those that have different viewpoints.
[00:24:07] Camille Morhardt: One more question for you is how do you consider the relationship between deep learning and security? And I’ll leave that pretty broad.
[00:24:18] Andres Rodriguez: I see deep learning algorithms being studied for various security applications, so how to apply deep learning to make your IT system more robust from hacker attacks. So, that’s in one area. As far as security in your own personal data, I think there is a lot of new algorithms and actual hardware features that can encrypt your data to prevent it from attackers that are trying to gather your data.
And lastly, as you mentioned earlier, the work on federated learning where the data is kept on your device and doesn’t have to be transferred to a centralized location, it’s another area of increased security. But I do see AI overall benefiting security for many companies. I don’t see it yet widely adopted. It is still in early stages. But in the early stages, it looks promising.
[00:25:27] Camille Morhardt: Thank you, Andres Rodriguez, fellow at Intel who does specific work in deep learning and general guru of artificial intelligence and also, author of a book on deep learning. Thank you so much for joining me today.
[00:25:41] Andres Rodriguez: Thanks for the invitation.