[00:00:36] Camille Morhardt: Hi, and welcome to today’s podcast of What That Means. We’re going to talk about scalability in artificial intelligence at the edge, using the example of Audi. I’m very happy to have with us today Rita Wouhaybi. She is Senior AI Principal Engineer within the Industrial Solutions Division at Intel in the Internet of Things group. Welcome Rita.
[00:00:58] Rita Wouhaybi: Thanks for having me.
[00:01:00] Camille Morhardt: You did the very first podcast on What That Means, just a general definition of artificial intelligence, which I’ve now subsequently learned, too big of a topic. It’s so enormous, so now we try to scope down just a little bit. You’re an expert at the edge and AI algorithms, and I’m hoping you can start by just giving us a general sense of the difference between AI at the edge versus AI elsewhere in the cloud.
[00:01:29] Rita Wouhaybi: Well first of all, it’s exciting that AI has become so big that you have to slice it and dice it. That’s a lot of fun. I remember when computer science used to be one thing, and now it’s its own crazy big topic. So what is AI at the edge? Many of us are used to AI in the cloud. We actually carry AI in the cloud on our bodies, in our pockets, on our computers. Every time we’re asking for directions from point A to point B, we’re accessing AI in the cloud.
AI in the edge is when the AI does not run in the cloud. It runs where the user is, or where the usage of that AI is. So I’ll get some examples. What does that mean? That means if you are in a hospital and there is an expert system, or there is a recommendation for a treatment plan based on your diagnosis that’s happening, it’s not happening in the cloud, it’s actually happening on that tablet that the medical professional is carrying.
Another example from my own field is industrial. If AI is going to control the robotic arm, is going to fix what the robotic arm is about to do, or is going to use cameras to figure out whether there are defects in the item that’s being manufactured on the factory floor, the AI is running on a device right there next to the cell where the robotic arm is operating. It is not running in the cloud.
[00:02:51] Camille Morhardt: Does that mean the model itself is sitting at or near the edge as well, or?
[00:02:57] Rita Wouhaybi: Absolutely. Sometimes the model is even being created and trained using unsupervised learning at the edge as well, or using reinforcement learning from robotics. Now, why? Why would we want to do this? Right? I mean, we’re happy. Why change? Why, why not run everything in the cloud? Often, if you look at the literature or talk to customers and people who are creating and deploying those kinds of applications, they will cite three different reasons of why this could happen.
The first one is, that’s a lot of data sometimes to ship to the cloud. Imagine you have 20 or 30 cameras that are looking for defects in an item that is being manufactured on a factory floor. Every stream shipping it to the cloud to look for defects is just a lot of networking traffic for no good reason. And we’re engineers, we’re all excited about efficiency. So why not run it at the edge? That’s first.
Second, there might be privacy issues. Medical is a great example of this. In many countries, there are regulations on where the medical data can go. It can’t go to the cloud, where it will end up in a different geographical location. Industrial is another one. A lot of times, this data actually has intellectual property. We’re going to talk about Audi. Audi was a great example of saying, “My data’s my IP. You cannot take it somewhere else. You cannot put it in the cloud.”
And the third one is latency. It still takes time to send data from somewhere on a factory floor or in a hospital or in a smart city or on a home. If I’m a robotic arm and I’m going to do welding on a car, I can’t just have the robot hanging out in space, waiting for the result to come back from the cloud. Or even worse, if I am a robot that’s assisting in a medical operation, gosh, you cannot be waiting, right? There are requirements on your latency that are sometimes in the order of nanoseconds, so sending that data to the cloud does not make any sense.
[00:05:01] Camille Morhardt: Can you tell us a little bit about what exactly you did in partnership with Audi? And then we can talk about… I know that you’ve said it’s difficult to scale AI at the edge, and there’ve been a lot of hiccups in the industry, in the world, and maybe some of those you’d overcome working with Audi. But just let us know what actually happens there. Are we talking car manufacturing?
[00:05:25] Rita Wouhaybi: Yes. The request came from Audi that they were having some issues of quality on their factory floor, mostly related to welding. What Audi does is something called spot welding, and spot welding is basically when you have several metal sheets. Think about each one of those is a metal sheet, each one of those papers is a metal sheet, and you’re going to squeeze those metal sheets in specific spots to create a bond. That’s basically how they create a chassis with very little human intervention, and it’s one of the state of the art… and I was honored of being able to walk that factory floor and help it get more efficient.
With this welding, what was happening is that every car, depending on the car model, will have several thousands of those welding spots applied to it. And this is basically what attaches those different parts together to create a chassis. Now with every operation, a robot is holding a welding gun. The car would come in inside the cell and the welding gun would lower and start squeezing at those welding spots. After every operation you get what us data geeks call a data object.
When the welding gun does every welding spot, the controller spits out a text file that contains hundred and forty-something streams, key value pairs. What robot name was it? What configuration was used? What was the outcome of temperature? What was… blah blah blah. Including some streams that contain, what is the status of that welding spot? Did it work? Was there an error generated? What kind of error? And so on and so forth. We found out that sometimes, some of these welding spots will not generate any error, but the bond won’t be there. When they would take it for quality inspection, those welding spots would fail quality inspection.
[00:07:20] Camille Morhardt: And in that case, quality inspection is a human?
[00:07:23] Rita Wouhaybi: It’s human operated, which means it cannot scale. They cannot take every car for the human quality inspection, so they would sample out of their factory, out of their assembly line, and they would take that sample car. Now, we would find some welding spots that did not take, right? But there was no error generated. And they wanted to be able to figure out, “Why are these not taking?” and to flag them. To be able to figure out, there is a car that has 20 welding spots that are not good.
Now what Audi did to combat this problem, they were overwelding by up to 20%, which means they can tolerate few welding spots here and there that didn’t take. The problem, though, would occur that some cars, very tiny percentage but it still existed, would have bad welding spots concentrated in one area. And even the overwelding would basically not accommodate for these bad welding spots. So they wanted to be able to figure out which cars, because by sampling, they weren’t catching those cars. Which cars had this issue? And they wanted something in-line, not a human going and poking around. Quality inspection was done very manual, it was by ultrasound. There were engineers carrying ultrasound little sticks and poking at every welding spot manually one by one and taking notes.
This was the problem that we decided to work with them in order to solve. And very naively, like many AI experts a few years back, I thought, “Okay, well you give me enough data, I can solve any problem. Give me data and I can create a model.” Well, it turns out that there is a big problem for scaling, especially in industrial, which is the lack of data.
Not all the data points are going to be labeled, because creating labels, saying, “This welding spot is good, this welding spot is bad,” means it went through that manual quality inspection. Means you had to invest in it human time to create those labels. When Audi handed us that first data set, it literally contained million of data points, but only few hundreds of them were labeled.
To make the matter even worse, and this is very specific for industrial but also applies in health, most people are healthy, right? Luckily, thankfully. And most things that are created in factories are not defective. So even out of the few hundreds that we got that are labeled, more than 90% were not defective welding spots. Well, how on earth I’m going to find the defective ones if all you show me is not defective? It’s a little bit of a challenge, and that’s what we call in AI unbalanced classes. Think of me as a toddler and you’re training me, and you show me a lot of not-defective and only very few defective. I’m not going to be very good at identifying and classifying the defective. I’m going to be way better at classifying, identifying, the not-defective.
[00:10:30] Camille Morhardt: Can’t you just set a threshold of, “It needs to look somehow like all of these other good ones, otherwise, I’m not sure why, but it is defective somehow?”
[00:10:39] Rita Wouhaybi: It really depends what the use case and how complicated the data is. It’s often, thresholds are not enough. You have to look for patterns. And actually, that’s one way right now that we are very much focused on, which is semi-supervised or unsupervised learning of saying, “Can I learn what a good product looks like? And hopefully that means when I see an outlier that it’s bad.”
But as a human, you can tell from my statement right now that it’s not going to always be true. You might see an outlier and still good, but you happen not to see it before. So yes and no, it’s a little bit of a trade off. But the interesting piece about the Audi is, we crunched the numbers, we went and deployed, and lo and behold, we were detecting the non-defective at a over 90% accuracy, but the defective we were at around 60%. And 60% in an AI is a little, actually, embarrassing, because it’s slightly better than a coin toss. 50 is a coin toss, right? So I was very embarrassed, and I went back to Germany and hang out with the domain expert, and spent like three days with him chatting and harassing him about explaining welding to me and explaining the process.
And that is the second point about scaling, which is what we call a data centric approach, which means you, AI expert, you, data junkie, don’t just take the data, look at it blindly as numbers. Understand what the data is representing. Focus on what your data is trying to tell you before you just start crunching it with some AI algo. So spending time with this domain expert, one day after lunch, he starts telling me about the phases of welding. I was like, “What?” It actually goes through four phases, and I’m like, “Okay, so how long is every phase?” “No no no no no, it’s not a specific duration of time. It’s actually the process, the behavior.”
So this is where you start getting that feel, that gut feel that us humans, that a domain expert has. Which is like, “Yeah. You know, in the first phase, what you’re trying to do is melt the glue,” because there is glue between the sheets that we’ve applied several cells before. And again, this is another eye opener, right? For AI, it’s never just that well documented problem. It’s what happens in the world next to it. Right? What happened in that 20 or 30 cells before welding starts, which is, they applied glue. Sometimes the glue, because it’s too humid, didn’t spread as well, or did spread too much. So you end up with different blobs.
So, okay. That first phase, we’re melting the glue. The second phase, we’re ramping up the welding. The third phase, actually it was pre-heat and then melting the glue and then doing the welding. And then in the fourth phase, we are cooling down, because if we just finish welding and boom, we open it, the sudden changing temperature will crack the steel. And it’s like, “Whoa, that is so cool. Okay. Tell me more.” Obviously I can’t share all the details for intellectual property, like we said a little earlier, but we ended up creating what we called a heuristic. So it’s not an AI algo, it’s an algo that tries to breaks down the data into those phases and understand the characteristics of those phases.
So I came back and I worked with myself and the data scientist who works for me, we were developing the algo. She and I got into a room, we made up how we’re going to create a heuristic, she created the code, and then we attached that to filter the data kind of like a funnel. To say, “Hey, data, I’m going to get to meet you right now. I’m going to extract some patterns from you before I feed you into the AI.” And lo and behold, with that updated model, our accuracy jumped to over 94%.
[00:14:42] Camille Morhardt: Oh, wow.
[00:14:43] Rita Wouhaybi: And stayed at that level until we did the handoff. According to Audi, it has continued to stay at that level when they took it from pilot to full-scale deployment. So this was a very humbling experience, and the big aha moment is that we talk about AI and the big promises, but really, one big thing about scaling is also understanding, what is that problem you’re trying to solve, and how would you solve it?
This data centric movement that right now, I’m not, obviously, the only one who’s talking about it. There’s a lot of AI experts who are now focused on it, because it does help you breaking down and simplifying your AI. And when you simplify your AI, you can create more models, it can run faster, they can run on less compute. And it’s just about scaling it on the device itself as well.
[00:15:33] Camille Morhardt: But when you say breaking it down, you’re talking about, okay, there’s four distinct phases according to this domain expert. And you’re going to have AI look for variations or potential causes of defect at each one of those phases, as opposed to just giving it everything all at once and saying, “Go crunch and look for patterns.” Is that what you mean when you say break it down?
[00:15:55] Rita Wouhaybi: You could. That is definitely one approach. The approach that we took was to actually identify how these four phases are happening and what are their characteristics. For example, how long every phase is. If you spend too long in melting the glue, that might be an indicator that something is wrong, or if melting the glue happened way too quickly, also that might be an indicator that something is off the norm.
What’s interesting about that is, now, instead of just dumping the data as is and having the AI trying to figure out those patterns, you kind of gave it a tip, right? Again, it’s like teaching a toddler. If you show pictures of cats and dogs and your toddler sees five of each and they’re like, “I don’t know, cat, dog, whatever.” And you’re like, “But hold on, look, the cat has a different tail than the dog does.” It’s about providing those tips to the AI, rather than just dumping the data and saying, “Hey, good luck figuring out what are the patterns.”
[00:16:48] Camille Morhardt: I feel like one of the things that the world talks about and the magic and amazingness of AI is that you can just dump data, and that it may come to the same conclusion that, let’s say an expert human after decades of work would do. But it can probably explain how it’s coming to that conclusion, but that might not be the same way that a human is coming to the conclusion. It may make a different designation between dog or cat not based on the tail or the ears, but something completely different that humans don’t even look at.
[00:17:28] Rita Wouhaybi: So yes and no. Actually, you say two things in there, and I want to address them separately. The first one, you said, “Hey, I thought the whole promise of AI is that you dump data and don’t have to worry about it.” That’s true if you have the data, but when it comes to expertise in industrial and medical, a lot of domains, you don’t have the luxury. I don’t have the luxury of collecting data for 40 years, but I had an expert who had been looking at those patterns for 40 years. So I bootstrapped my AI with his knowledge, and then I was able, with AI, to find patterns that he could never find. By the way, this domain expert really did not enjoy meeting me the first time I met him, because he had the old preconception of, “Oh, AI’s going to take my job.” And then at the end, he realized that no, no, no, AI actually is going to help you find those patterns that your human brain isn’t able to find them. It’s going to complement you. And I think that’s a powerful message, right?
So I didn’t have the luxury of having 40 years of data, but I had the domain expert and some of the patterns that he found. You’re spot on, if I had enough data, perhaps I could have found these patterns using AI. But this got me to solving my problem way, way quicker. Right?
The second one, you said AI’s going to find perhaps different reasons. Yes, absolutely. Even when I started with the assumptions of the human, AI was able to find additional patterns. Which, by the way, where you can take this is very powerful, because it started with Audi where Audi said, “Okay, I want to be able to flags these welding spots that my controller can’t see that they are problematic.” Now we’re able to flag them. But the power of this is not just able to flag them. The power of it is being able to push to what we call autonomous behavior.
Meaning, now I cannot only flag after the fact using a specific configuration. If a controller is about to use configuration number 212, I can say, “Hey, I’m going to predict that this is going to be problematic. Why don’t use instead hundred and seventy-two? That configuration has a better reputation in this particular situation.” So you can even eliminate some of the errors that AI can do that a human is not able to do, at least not in 10 milliseconds. To make that correction before the operation happens.
[00:19:50] Camille Morhardt: That makes sense. So it can take into account something, I’ll make it up, but like the humidity in the gluing cell 30 cells prior, and then make an adjustment to the welding arm later on in the process?
[00:20:04] Rita Wouhaybi: Exactly. It can learn from what happened. Because you don’t have one welding arm, you have few hundreds of them, right? And you can learn that, “Oh, today all the problems I’m getting are from configuration 212.” So after it errors two or three times and AI flags it, “All right everybody.” All the robots can start gossiping with each other, right? “Stop using 212, it’s problematic. Let’s switch to hundred and seventy-nine,” whatever it is.
[00:20:32] Camille Morhardt: You had some tips, and I’ll say them back, but if you could just help make sure we’ve captured them. You’re saying for scaling AI at the edge, and particularly one of the main concerns is not having a lot of error data. Because as you said, manufacturing often, or even medicine, there’s a lot more rights than wrongs, I guess, or goods than bads, so training becomes harder.
So a couple of shortcuts or tips is, one, you said batch AI into smaller subsets that you’re going to go look at, rather than everything all at once, you’re going to break it down. Another thing you said is, utilize the human domain expert that has that gut knowledge, gut instinct, and also experiential knowledge from working on something for decades or years, however long. And give a jump-start, give some tips. And that won’t necessarily bias the AI to look at things from the human perspective, but it will help get it rolling. Is there anything else or any adjustments to those two?
[00:21:38] Rita Wouhaybi: Yeah, absolutely. I think one big one is actually an approach that we started looking at more recently, after the Audi and after we’ve learned from multiple engagements with industrial partners. And this is where AI is headed, at least in our domain at the edge. For a while, it felt to me that we are like a bunch of kids and we’re very whiny. “Oh, AI in industrial’s so hard because of unbalanced classes. We have a lot of goods, and not enough bads.” And interestingly enough, maybe I should have chatted with you, because you kind of said it before. You gave that tip. “Then why don’t you learn what’s good, and you can identify outliers as perhaps bad?”
This is where a lot of right now AI effort is happening, in the sense of saying, “Can I create unsupervised learning that says, ‘Hey, if I sit down and I watch, and I see a lot of good things happening, then can I learn what a good thing is and then identify outliers?’” That also is going to help in scaling. Because when I told you about the welding use case from Audi, actually that first workshop I did with Audi, they came in with a list of use cases, very long list. It was like 20. And those guys are so creative, I’m sure right now it’s like 200 instead of 20 use cases. That they can use AI. And all of their use cases were valid and exciting, but a lot of times, and that was one of the problems for scaling, didn’t have an ROI. Return on investment.
I remember one guy came in, and he was very excited about this use case where he wanted to figure out, when do you change the electrodes on the welding gun? And I got very excited with his use case and his excitement was rubbing off. And at some point I said, “How expensive is each one of those electrodes?” And he very sheepishly said, “It’s like two cents.” I was like, “No!”
[00:23:35] Camille Morhardt: “Why are we solving this?”
[00:23:37] Rita Wouhaybi: Exactly! “No, we’re not going to be working on your use case, sorry.” So when the ROI bar is very, very high… If you’re going to do what we did with welding, meaning sit with the domain expert, learn what the data is, collect data, label it, test it and so on. So this is where scaling using unsupervised learning becomes really important. Can AI sit and watch your data and watch your processes and learn what is good, what is right, and what is not? Or can you even push it farther and say, “I’m going to watch for patterns. And if I see things happening differently, I might surface it to a user and say, ‘Why this process all of a sudden is taking you two hours today versus 15 minutes in regular days? Did something wrong happen earlier?’”
This is what we call unsupervised learning or semi-supervised learning, and what we’re doing in industrial is kind of using that data that we started complaining about to our advantage, of saying, “If naturally things are good, this is not like autonomous driving when you get an intersection, and do you turn right or turn left, and you got a 50-50 chance, so you have to have enough data.” This is a factory. If they’re doing something right, it should be more than 90% positive outcome, a good outcome, hence you should look for the outlier.
So there definitely a lot of effort in industrial with that kind of a philosophy in mind, and thinking about these problems as such. Now, that doesn’t solve all problem, right? It doesn’t solve things like worker safety, because it’s unpredictable. You might have workers who pay very close attention not to be close to a robot that they are unsafe. But there will be errors, and you can’t treat them similarly. Because the error there, if you don’t make the right call, it’s might be costing human life, and that’s not a joke. Versus if you don’t catch a defect, it might not be as big of a problem.
So there are use cases where, yes, taking advantage of these special characteristics in the data could get us into scaling, and could get us into more use cases solved much, much quicker.
[00:25:50] Camille Morhardt: Okay, well, Rita Wouhaybi, Senior AI Principal Engineer in the Industrial Solutions Group in the Internet of Things division at Intel, working on AI at the edge. Thanks, Rita.
[00:26:02] Rita Wouhaybi: Thank you.