Camille Morhardt 00:11
Hi, I’m Camille Morhardt, host of InTechnology podcast. And today I have a really interesting conversation with Pradeep Dubey. He is Intel Senior Fellow, and he’s also a superstar when it comes to parallel computing.
Now the driving workload for parallel computing these days is no surprise, artificial intelligence or really anything that does massive quantities of digital data processing. So he has, for that reason, quite a lot of perspective on what artificial intelligence is capable of. And we talk a little bit in this conversation about whether artificial intelligence or computers generally are actually ready to transition from what they’ve been acting as a “system one” in helping people with memory and helping people with computing so that people can then make decisions and take actions. Are computers and artificial intelligence actually ready now to transition over to “system two” where they can actually make perceptual decisions on our behalf and then take actions.
Before Pradeep came over to Intel, he actually worked at IBM Watson lab. And he has worked on parallel computing architecture for the IBM PowerPC, Intel 386 Intel 486 architecture, Intel Pentium Intel Xeon. He’s an IEEE Fellow. And he’s also recently been named an ACM Fellow. Just a really interesting human being and supremely technical; you’ll get a little bit of insight into that when he describes some of the mathematics behind parallel computing. Welcome to the podcast, Pradeep. Can you please kick us off by just explaining what is parallel computing? And why does it matter?
Pradeep Dubey 01:59
So parallel computing is the ability to do both parallel instructions and parallel data. Now you can do more and more instructions each cycle on more and more data items each cycle. At a higher level, it means doing parallel tasks, doing things in parallel.
Now, just make one fine distinction, sometimes you’d simply serving multiple users that can throughput kind of parallelism, right. And each user is not getting served any faster, but you’re just serving multiple users. That’s one kind of parallel computing. Often, that’s called some sort of weak scaling. Strong scaling is where you’re actually devoting more and more resources doing things more and more parallel to make one user happier and happier; meaning, the latency of the single user, the latest with which you see, the search time is constantly improving. In one case, the balance is being applied to make things 10 times faster for one user, other case balance being applied to serve 10 times more users.
Camille Morhardt 2:57
And you can do them together, right?
Pradeep Dubey 02:58
You do a combination of those together. So that’s at the high level, right? And we used to think that most things are serial and parallelizing things is hard. That’s not true. In fact, most problems can be parallelized. And it’s really very few problems, which are essentially serial meaning there’s nothing you can do you just have to wait for monopoly to start the next process, because the input to the next process depends on the output of the previous process. The very few things which are essentially strictly single. In fact, there are many parallel things right now, I’m looking at you, I’m thinking something, “What are you going to say next?” right? You’re doing something while you’re talking to me, right? The whole bunch of things going on in the world parallel, nothing to do with us, right? So things happen in parallel, the world is very highly parallel. That’s how billions of people go on living lives.
Camille Morhardt 03:47
But why design compute that way? I mean, understand, you can, but why?
Pradeep Dubey 03:55
We design compute that way because that’s the way to get things done faster. And there are things that need to happen very fast. If we don’t do that, is we all serialize things unnecessarily then things will happen slow. Good that we’ll only need one bus to carry all of us. But we’ll all wait for that bus. And that wait may be so long that we all get frustrated because we have one life to live. Things need to happen in real time before the time that we need it to happen, right? So if you’re actually trying to predict the weather for tomorrow, you don’t want to wait till day after tomorrow. That doesn’t make much sense. And the only way to ensure that it happens under the time constraint is if you do the best in terms of the speed at which you do it so that if four things can happen in parallel, why serialize them and unnecessarily take long?
Camille Morhardt 04:39
What are some of the workloads or what is one of the main computational drivers for parallel computing?
Pradeep Dubey 04:47
So drivers for parallel computing and like literally any workload that needs a lot of compute. Now, which applications need a lot of compute? That’s where quote unquote, in the old days, they’re mainly scientific applications that we knew need a lot of compute–fluid simulations are some hard optimization problems. These days, the primary applications that need a lot of compute are the ones that are rooted in dealing with huge amount of data. Because data is really what has grown faster than compute could grow. And this massive amount of data has happened because of digitization. Most things are now digital. So they are immediately available to digital world to crunch and make some sense out of it.
So pretty much everything–every transaction, every image, every video, every speech, every payroll–has become digital. And that has led to immense growth of data. But there’s no point growing that data, if we cannot do something with it, if you cannot find the photo we’re looking for then might as well go to the old school albums right paper album. So you must be able to find things that you’re looking for, you must be able to mine the data, understand the data , make sense of the data, otherwise, it just sits there.
Camille Morhardt 06:02
And is this where,you’re essentially leading us toward machine learning or artificial intelligence, which is then used to process or train large quantities of data?
Pradeep Dubey 06:12
Yes, machine learning, where somehow you figured out the patterns in data and actionable patterns in data, meaning patterns, which we can once we see that kind of pattern, we can—we know how what to do next. That’s even better. So that’s what we can now do much more efficiently. Why, because we have enough data, enough compute and smart algorithms that can make sense out of that data. And we are at the point where machines can do a very good job, quote, unquote, making sense out of at least a class of data, what we call “perceptual data.” So we can, machines can actually understand images, better videos, better speech better. These are all senses, the way we do things, we see things, right? Machines can also do that, almost as good, sometimes even better.
But now that you sense the world, what do you do next? Because you have a goal in life, you went across the street, or you want to make a portfolio go up. Once you sense the data, once you know exactly everyone’s chess pieces, you have to make your own move. That thinking, reasoning, decision-making, is where machines’ role will come next logically, right? So, so far, mostly machines have helped us now perceive the data means sense that it as good as anyone can. But decision-making part is still left to humans. And that’s good. But machines are claiming that “No, I can not just crunch numbers, I am not just good at simply perceiving this or sensing the world. Based on that I can also reason and make a decision for the goal you have in life and take you there faster, make the decision on your behalf, not only that just actually make the move actually act.”
So sensing, reasoning, and acting is the loop that we humans do all the time. We sense who’s where, we decide, okay, based on this, I should cross the road, I should make this turn. And we do that–everyone does that–then we sense again, okay, did I do try to amend this, I must just discard you backtrack, but we keep going. We keep doing that sense-reason-act, sense-reason-act; that loop goes on forever. Machines have not been able to do that loop quite; now they are increasingly able to do all of that stuff. And the agents now coming up which are acting on your behalf, leasing agents, hiring agents, all AI. So machines are beginning to do very simple task of decision making, not necessarily chess playing, but they’re beginning to do that. And that has serious significant implications.
Camille Morhardt 08:37
Given that machines are starting to take over the decision making and the acting, then we have to now provide them with a different type of compute. How do you design systems or architecture with that in mind?
Pradeep Dubey 08:53
It’s beginning to happen but there’s still a lot of hard challenges. It’s only happening for some thus, what should you buy next type of stuff. But in many cases, the decision making is not easy. It’s highly complex, especially when you want it done real time. And therefore we go back to the experts, the human experts for that kind of decision-making. And think again, most of the stuff, whether it’s teaching or singing or dancing, all these things, it’s a huge difference between a bad singer, an average singer and a good singer. Huge difference. Bad teacher, average teacher and a good teacher. So it’s probably easier to beat a bad singer. But for machines to actually be the good single may be very, very difficult. So many cases, you can do the typical stuff that still doesn’t mean that you can be the best–the real expert, the best decision maker, that’s not easy.
So point being that you will increasingly begin to do things that we are really good at. And the kinds of things we’re not good at machines will take over. That’s nothing new. That’s always happened, right? We’ll increasingly leave more things to machines; just like we have left computations to machines, we’ll leave some mundane form of decision-making permissions. But there’ll still be plenty for us to do. What we have to constantly figure out how do we get machines to become better and better at such tasks. And someday, maybe even more complex decision makings; someday maybe there’ll be better than us at everything. We’re not there yet.
Camille Morhardt 10:20
One of the things you said is that some of the hardest decisions are the most complex decisions for a machine are among the more simple ones for humans–like a standard Mom and Pop decision would be maybe extremely hard for a machine whereas, you know, weather prediction really hard for a human but a machine can pull the model.
Pradeep Dubey 10:39
Yes. So it has been the case, but it’s changing now, think how humans make tough decisions. Humans don’t make tough decisions by actually reasoning through all possibilities. That would be ideal, but that takes a lot of work. That takes a lot of energy. That’s what Daniel Kahneman called System 2. And human brain probably sucks up more than 25% of the energy what you eat. So it’s not a cheap task, right? So humans mostly make decisions based on what we call System 1, based on gut feel, which is cheap. That’s the good part is that you quickly make a decision; you quickly decide that “I’m going to buy the stock” right? bad part is that most of the time, you’re wrong.
If you look at nearly all the accidents happen, not because of some machine went wrong, or brake failures. I think human decision making was flawed and not some complex decision making the traffic conditions were really very complex and somehow this guy just missed something. And why he was texting while turning, “I have no idea. That’s so dumb, I should have never done that.” So we do sometimes very many dumb things. And most of the accidents are caused by highly avoidable human errors. So humans do make lots of mistakes, sometimes expensive mistakes. Does that mean that they’re not capable of taking better decisions? Most of the time they are capable of making better decisions, but it’s not possible to make the decision in real time with the amount of reasoning, amount of thinking it will take. And this is why System 1 developed.
Camille Morhardt 12:03
Is there ever a scenario you think where System 1 is better than System 2?
Pradeep Dubey 12:08
System one is better for us because it’s much more real time, much faster, much low energy. At the same time you pay a price for it, right? That’s, it can often be wrong, often in hindsight, may seem dumb. But that’s what we’re trying to somehow figure out that how do we more often than not make a reasoned decision, justifiable decision, informed decision? Who would not want to do that? We all want to do that. But at the same time, I have limitations of time limitations, or energy limitations have data limitations or information limitation. So under my limitations, help me take the most informed decision.
And this is where machines come in naturally. Because of course, when a doctor sees you and just ask you five questions, “What did you eat yesterday? What did you do, this? Okay.” Quite possible that if the doctor was told more of your history, and they try to look up history, right, but how can the lookup lifetime of your history, but the machines could find some things and think that “oh, maybe his sugar has to do his stress level, which has to do with his recently the meetings he’s been having” or some life challenge that he’s been having. Knowing that, doctors, “of course I see now what you should be doing.” In absence of that data, physician is doing the best they can based on what you just told them.
So, machines can help see some correlations, something in your day-to-day schedule, or all the way to your life history or other people’s life history who have been like you and find some correlations. “There are these five other things that I’ve noticed based on the data access that I had.” And leave it to the physician still, right? But access to that kind of information will make that manager or that physician make much better decision, right? And that’s what we’re trying to do that, okay, machines can definitely help because they have so much data access, and they can process and crunch so much data, real time we humans have a limited memory. “I see hundreds or 1000s of patients so I don’t necessarily remember what you told me three years back; maybe you have shared with me some things but I forgot when you forgot.”
Camille Morhardt 14:19
Uh-huh. Has AI got to the point where you’re able to use it to help it help itself? Is AI helping to design like parallel computing architecture that will help it improve?
Pradeep Dubey 14:32
These days, we’re this age of AI, so some things are of course, a little bit hyped up expectations of AI but also we have come a long way. So one of the hard problems is how do you design for AI? Right? How do we write software easier, better, less buggy, more secure, more performant, quicker? So that I can validate it much sooner, right? So we’re invoking AI machine learning for such problems that “okay, maybe it can go through my repository of the kind of challenges, bugs and fixes that I’ve had. I know my code, but I don’t necessarily know how many other people have patched this code in the past. And maybe the bug that I’m running into someone else ran into the same bug, and fixed it.” But there’s no documentation. We are all notorious for not documenting things. Often, we can’t even recognize our own code.
Camille Morhardt 15:24
You’re talking about that I would say more in that, like, old-school sense of using compute power, where it has perfect memory and it can scan literature for things that you know, CVEs or known vulnerabilities. But this pivot like to System 2 that you’re talking about, we’d be using compute differently, it would be—
Pradeep Dubey 15:42
Yes. So we’ve always had this idea at a high level that if we could go back and remember everything and just learn from all our mistakes, and just look up anything and everything, nothing, we don’t forget anything we can do better. Yes. Question’s always been “how?” Is there a simpler ways and efficient way to go back and find something similar to what I’m looking for? Find the essence of the data, right? It’s not easy, because that you have to do what’s called the technical lingo, “high dimensional” similarity search, high dimensional indexing, because most problems are very high dimensional, meaning you’re not just your looks, your attributes include your parents, your friends, your social network, your history, your kids, your likings, all kinds of stuff. So you have many dimensionalities; of which part of your dimensionality is really critical to what we’re talking now?
So what we become better at is helping machines help us—how do we really go into this high dimensional data, and find that the goods that are relevant in my present context. So maybe the in the present context, the color of the shoe she’s wearing is the best predictor. But I never knew there’s something that you’ve probably seen many such things in the news, where people saw this correlation. People who buy diapers, buy beer bottles. Now, that’s a very bizarre correlation. But in hindsight, it makes sense. So there’s such correlations we will have, there is a huge difference between causality and correlation, right? I may not know who’s causing whom. But the two things occur together.
Maybe every time I wear a red shirt, it rains, I don’t know whether my red shirt is causing the rain or rain is causing you to wear red shirt, but the two happen together. That correlation is enough, in many cases, for me to say “come on, I’ll take an umbrella.” So people can make their decision without understanding causality, just based on correlation. And correlation is a much easier problem; machines can simply go in the mounds and mounds of data and say, “Look, I have no idea. But these two things always happen together.” Someday, with more and more data analysis, we’ll actually be able to figure out why those two things are correlated, and who is causing who. But that may be long time coming. When we understand that we’ll actually be able to do much more, we will be able to design some kind of a fix, right? That’s stop the rain. So machines are now much becoming much better, often time, quickly in amounts of data, finding these correlations in a very high dimensional space, which maybe all humans can do. And that’s how they’re able to make a decision better.
Of course, over time, for machines to actually give you better and better predictions, they need to understand the correlation better and better, otherwise, they’ll not be always right. Because if there is no genuine reason for that correlation, sooner or later, you will find data that will start questioning that correlation. “No, no, red shirt, there’s no rain.” Right? It may take some time. But the point being that you’re constantly looking for, “Yes, most of the time, I saw that. I don’t know why.” Next time with more data, better modeling, I keep beginning to see more insight that is made up of these other in-between things, we call them “hidden parameters.” And you begin to understand things better as you have more and more data or better and better model. That understanding develops over time. Till then, fact that machine is finding such correlations, and therefore many times that’s good enough for me to make a decision. And I know it’s the same dog curled up very differently. That’s fine. That’s good enough, right? Why did the dog curl up like this? It never curled up before? I don’t know the reason. But it’s my dog. The machine was correct.
Camille Morhardt 19:11
Does parallel computing work on CPU as well as GPU?
Pradeep Dubey 19:17
Yes, that’s a system aspect. So it what it takes to do such a computing is a certain class of problems, a certain class of algorithms need to run fast. At the core of it, it’s dense matrix algebra, more importantly, small dense matrix algebra, call them “systolic” some time right? The matrix multiplication just like we did simply sets it that’s kind of stuff that’s really the needed more often than not. So designing machines, which can do that better; similarly, lower precision math, meaning not nicely, I need to capture the range much better. The fact that the numbers have this kind of range is more important. How I represent that, that average of this range is not that I am gonna be very precise, but if I miss the range, I have lost something much more. So I have to capture the range. But once I capture the range, I don’t have to be very precise in what the average of -100 and plus million is. So such things have made us smarter in terms of how to design such machines.
Camille Morhardt 20:27
Are you saying, if I start with– because I don’t, I don’t know this math. So you’re saying if I start with a broad range–I just have to make sure to hit the range–that enables me to do parallel computing?
Pradeep Dubey 20:46
No. That enables you to do the right kind of arithmetic. And when you have a finite bits to represent a number–in floating point especially–you want to make sure that you divide them up properly. So if I want to make sure that the range is more important, I’ll devote more bits to capturing the range, and fewer bits to capture the precision. Knowing that AI cares more about range, than the mantissa. Right, we will trade off those bits, that final bit budget differently. That’s what we have done. So that’s how AI numerics were created B4-16 and now FB8 to FB4; so we with a shared exponents, we keep creating these new or neumerics, driven by this insight, right? Unaware of this, we’re doing them differently, meaning we’re not quite trading off, taking some bits of each.
So AI is teaching us what kind of things matter more, and how we actually deal with the finite budget of our, quote, unquote, “bit budget” for representing a number, right. And that’s how we do parallel processing, meaning in a finite width of data– say cache line–we have to pack as many data items as we can. If each data item takes 64 bit, you can only pack fewer of them. So you can only operate on fewer of them each cycle. When that same 64 bit item becomes 4 bit or 8 bit, you can back eight times brighter, more. But how do you do this eight times more each cycle? You have to make the 64 bit item Become an eight bit or 16 bit item become a 4 bit item. But if to do it smartly, otherwise you lose the data. And then you’re never going to get the right answer. You lose accuracy.
So to maintain the accuracy and still get the benefit of parallel processing, whereby I can operate on more data every cycle, right, you have to make the data items smaller without losing inferencing accuracy at the end. So it’s tied to parallel processing, which is one element of file processing, right? By itself it doesn’t give you parallelism, it gives you balance because now you have ability to crunch more data at the same finite budget of how many bits you can process each cycle.
Camille Morhardt 22:35
Anything else that we should think about in this context of parallel computing and AI? I mean, should we worry about anything?
Pradeep Dubey 22:42
I mean, technically, we need to make everything fast, everything better, cheaper, so that machines can decide better, right, and help us. But always it’s the technology, it’s the science side, you can always use it and misuse it. Looking at this power of machine or of computing this it’s a genuinely there’s a concern that will these machines, start doing things and making decisions, which may be wrong, but then who’s responsible if something wrong happens? The machine or the humans behind it? Bar is very high.
These kinds of ethical considerations, not my expertise, but I clearly know that a lot of stuff that we talked about, there’s a link there, we need to make sure that we do keep doing responsible, ethical things so that the technology is constantly used to make us all happier, right. And life better for us.
Camille Morhardt 23:29
Do you think it’s reasonable? I mean, one of the themes that floated around South by Southwest this year was in programming AI having part of that be that AI check itself. Is that realistic in any way?
Pradeep Dubey 23:43
It’s very difficult from what I understand. There’s always been this challenge of knowledge and being guided by wisdom, right? Can something be self-checking, right? Again, it becomes at some point a philosophical discussion, right? And there’s no easy answer. The question is self-checking in what sense?And what what’s the goal, right? And what exactly are those things that we mutually universally agreed upon good? “Okay, always make sure that this doesn’t happen.” What is not supposed to happen ever? What’s always supposed to happen in all circumstances? There are no easy answers. So without that, what exactly is this self-checking thing? Because you need to set some ultimate good criteria that this is the goodness that you keep maximizing? Real philosophical topic, so I don’t know. It’s beyond me. Something I always worry about, but there’s not much I can offer you insight on.
Camille Morhardt 24:35
Okay, well, Pradeep Dubey, thank you so much for your time. Intel Senior Fellow and superstar in the world of artificial intelligence and the parallel computing architecture that runs it.
Pradeep Dubey 24:47
Thank you.
The views and opinions expressed are those of the guests and author and do not necessarily reflect the official policy or position of Intel Corporation.