[00:00:36] Camille Morhardt: Welcome to the In Technology podcast. We’re gonna talk about What That Means: data center demand. Today I have Allison Goodman with me. She’s Senior Principal Engineer at Intel in the Data Center Group. Welcome to the podcast.
[00:00:51] Allison Goodman: Thank you very much.
[00:00:53] Camille Morhardt: I feel like the very first place we have to start is what is a data center?
[00:00:59] Allison Goodman: That, that’s a great question. Um, what is a data center? A data center is really a collection of compute and memory and storage and network–because usually a data center implies more than one of these compute nodes that are talking to each other and then servicing a set of workloads. And for data centers, those can be all the same workloads, they can be different types of workloads. It can be bare metal, it can be virtualized. Really, all of those fit within that definition of a data center.
[00:01:32] Camille Morhardt: And people would think of these as the cloud where data is going–when it’s not staying on site at your computer or something–the data’s being shipped off. And who provides the data centers? And what are hyperscalers? Just to give us a little more lay of the land.
[00:01:48] Allison Goodman: Yeah. In that definition you can have a data center on-premise which means the data center sits with the company that you are; so I can be a bank, I can be a retail shop and have a local data center. Or I can consolidate out to a data center where we would call in the cloud.
And then the hyperscalers are really like these massive data centers that we think of with the Amazons and Alibabas and Googles, Microsofts of the world. You know, there’s only a set of them that have many different data centers all over the world that we consolidate workloads into; and usually those same banks or retail shops would communicate with those data centers completely over the network. So they’re kind of logging in through a network to run workloads that get data off those hyperscalers or cloud.
[00:02:35] Camille Morhardt: So data centers have been growing kind of exponentially over the last bit of time here, and especially now more and more with the advent of AI. What are some of the main drivers for their growth?
[00:02:47] Allison Goodman: Really with how much technology changes, it’s easy for companies to say, “rather than invest in trying to keep up with us, I’m just paying out to the cloud to keep up with it.” And that’s really one of the things that’s just driving and driving the growth of the cloud. And then you get a lot of cloud native applications, and so you also just have a tremendous amount of development that’s not necessarily migrating to the cloud, but just being created on the cloud to start with.
I know it’s pretty awesome, if you go to most engineering schools in computer science schools right now, most of those students are just immediately learning how to get on GitHub and spin up app development or software development in the cloud so they don’t have to spend all the time and money to acquire a lot of hardware. They just go up there.
The other great thing about doing that development on the cloud is that you get this kind of instant scalability to it because you’re not constrained to just what your laptop or desktop you know, is capable of doing. You can do development in the cloud and then the more people that use it, the more data that’s going in there. You can just sort of buy more services. “Oh, I need some more compute. I need some more storage. And it grows, right?” And you see all kinds of companies and startups that do just that, right? They start off, it’s low cost to get going, and then you know, they can sort of scale and grow the resources as they need because they’re all shared resources up in the cloud.
[00:04:06] Camille Morhardt: So you mentioned compute, memory, and storage, and then the addition of networking as some of the main components of the infrastructure that is the data center. So how do those interrelate and is one of them more important than the others?
[00:04:22] Allison Goodman: Those three, four aspects of the data center really have to be balanced. And so if your memory or your storage can’t keep up with the compute–in other words, the compute is computing away and it needs to feed data in or store data out and it’s waiting, there’s like these latency events and you’re waiting on it—then you have an inefficiency; There’s like an imbalance and that imbalances inefficiency. And the inefficiency ultimately means that, you know, “I’m paying for power and time that I’m not using well.” And so the best data centers, um, and what we’re always striving for as data center architects is to get a balance between those.
Now, technology doesn’t always go in a balanced fashion–meaning sometimes you get these great innovations in compute and you get way faster compute but now I have an imbalance with memory. So now my memory is not able to keep up–meaning that a computer is waiting on memory all the time. And then maybe I can also get a great innovation where my memory and compute go together and they’re fine, but my storage is really slow. And we see that a lot with sometimes network-based storage solutions where the storage just can’t keep feeding enough of the data that’s needed for these big data or AI workloads. And so you have to get way more innovative in terms of having some local storage, tiering storage, caching storage, for example, in order to get rid of those delays.
[00:05:42] Camille Morhardt: Can you explain when data is in storage and when it’s shifted to memory, and then when it’s being computed? How do those things interrelate from a data perspective?
[00:05:54] Allison Goodman: So we’ll start with just basic, imagine that you have just one system sitting in front of you– and maybe, you know, thinking about a laptop or a handheld device is an easy way to think about it, right? So storage is really the persistent mechanism, which means that all of the data that you store or share is sitting in that storage. It just means when you can power it off and you can power it on and it stays there. And then when you power on your system, essentially what’s considered hot or like the storage that you’re gonna use right away gets loaded into memory. So that memory is now faster and it’s closer to the compute and it’s ready to go.
And then the compute is where you’re interacting on our laptops or handheld devices; you’re interacting with it because it’s coming graphically to the screen and then you’re inputting data to it. And when you’re running workloads and now it’s coming into some type of computation.
During runtime, then what you would have is data that’s coming mostly from memory into that runtime compute. But the memory is a finite set. And it also, uh, so I guess there’s two things with memory. One, it’s a, a finite size, and so, usually storage is much bigger than memory in terms of gigabytes of capacity. And so, uh, at some point you have to go get more data out of storage and load it into memory usually to keep going. And then two, memory is volatile or ephemeral, which means when the system powers off, everything that’s sitting in memory is essentially lost if it’s not loaded into storage.
So usually in the background, you’re doing a lot of loading and storage; so you’re pulling out a memory and restoring it again so that you have that data in case there’s an accidental power loss or an error. Or in the case of a lot of our devices, you know, we restart them or you have to reset them, and so you wanna make sure that all of that gets stored.
[00:07:37] Camille Morhardt: What about trends in machine learning or AI? We all hear about like the vast quantities of data that this requires and the vast amount of processing that this requires in centralized training models. What about the sort of distributed nature of the collection of a lot of the information and data that’s coming to us now through IOT or just from people or machines or devices at the edge?
Where is compute happening? Is there any kind of a transition in that respect?
[00:08:06] Allison Goodman: Yes, and I think there’s sort of a fascinating transition that’s going on, being driven by these very data heavy workloads. For a lot of the past decades of compute and technology innovation, there’s been a lot of focus on compute and making more dense compute and making more cores and being more efficient with those cores. And what’s happened is that focus has also created a bit of an imbalance. And so now the compute is kind of outrunning the memory and the storage, which are the more data-centric piece of that equation.
And to make sort of matters even amplified is that you have the set of AI and analytics workloads where it’s depending on more data. And so now you have more data that you wanna act on. You have more data that you want to inference on. And data has weight, has gravity to it. Because you want to save data most of the time, you have to get it to storage, keep it in storage, pull it from storage. And the way that we’ve been thinking about it is always pulling it out of storage, getting into memory, acting on it, and then sending it back. It’s a very like compute-centric way to look at things.
And I think the inflection point that we’re on right now is really like, “well, what if we think of it as data first? Like what if we actually go to where the data is?” And I think that’s, so the cool innovations that we’re starting to see with the edge with these accelerators is like, rather than spending the time and the energy and the money to move the data around, which is getting really expensive–what if we go to the data? What if we actually mostly leave the data where it is and you move that inference all the way out to the edge because that’s the most power efficient the most performance efficient way to do the inference is just go to where you’re actually collecting that data.
And I’ll give a very basic example. If you’re trying to figure out you’ve got videos going and you’re trying to figure out if there’s a person that walked across your video; maybe an old way to do it would be to take that video, you stream it up to the cloud, you have all of this compute in the cloud, it’s checking, the video goes, “yes, that’s a person” and it sends that back. That can be a really long latency path and pretty expensive cause I’m streaming all of this data up, acting on it, and sending the answer back. The alternative to that, that inference at the edge is really have that model now just running, hopefully on a lower compute device or a very specific accelerator device that says, “Hey, my job is really just to look for people” and it can much more quickly and much more efficiently take that video stream and say, “yep, that’s a person. That’s a person,” without having to actually move the stream even off the camera device.
That’s kind of like the ultimate goal of doing the compute action actually where the data sits. And then if you need to, maybe there’s just the answer, “Hey, there was a person,” and that’s what gets saved, not that whole video stream that we’re having to move across the network.
[00:10:51] Camille Morhardt: So is the data center moving to the edge or are you just saying the processing is moving to the edge and the inferences are still traveling back to the data center–basically the updating or training of the central models?
[00:11:04] Allison Goodman: Yep. It’s really the latter. Think of it maybe as the pendulum swinging back and forth where we started off all these data centers on the edge, right? Everybody had a data, you know, they were building their own data centers. And then what you saw is the swing of the pendulum where it’s like, “oh, now we’re all moving into the cloud.” So like, okay, we’re all in the cloud and all these hyperscalers. And I think what we’re really gonna see in this next swing is to find that balance of, actually we need to do the compute in the place where it makes the most sense to be sustainable from a power and a data efficiency side, because otherwise it’s just gonna get too massively expensive to be able to do everything in the cloud.
Um, it’s really like finding the right balance, and I think that’s where the right balance of running that inference at the edge and maybe some of the training could even be in a co-lo or something in between and then just saving the nuggets that help you build better models, do better training back to where the training sits.
[00:12:01] Camille Morhardt: Is “colo” co-location?
[00:12:03] Allison Goodman: Yes. Co-location.
[00:12:04] Camille Morhardt: Okay. So what other kinds of, you’re talking about, we’re at this inflection point right now really of the pendulum coming back toward the center, but what other kinds of things are being creatively thought of about data centers and processing of information. Uh, cuz I think AI’s just gonna get more and more–the quantity of data.
[00:12:27] Allison Goodman: Yes. So one of the things I’m really excited about is that if you start with where I’ve seen storage go in the last 10-15 years where you had this very quick change from something so like hard drives into SSDs, the ubiquity of SSDs, data centers leaning into this shared storage; lots of compute that kind of pulls from this shared storage so that at least I don’t have to move the storage. And so you’ve got innovation and compute. You’ve got innovation and storage, we have lots of innovation on the networking side. And really its memory is the next one. It’s sort of the last of that four to really be innovated on.
And what’s interesting about memory is that in data centers memory is really DRAMs and it’s DIMs that’s directly attached to a compute A nd there’s a physical attachment there where I have a certain number of slots and I fill up those slots and I have this memory and that memory is kind of like hoarded by the compute. So the compute kind of gets all of that memory to itself and. The problem is that memory is expensive, and so you’re spending lots of money on it, and that data actually needs to be active on by multiple compute devices. And then sometimes you want the sufficiency of like, sometimes your compute is using all of the memory, but sometimes your compute is using very little of the memory and you can’t share it.
You know, we’ve solved this problem with storage where you can share storage now. So storage can be shared across multiple compute devices and that’s really the cool innovation in memory. It’s like, okay, now how do we share memory?
[00:13:59] Camille Morhardt: Why can’t you?
[00:14:01] Allison Goodman: Well, so previously we haven’t been able to share it really because of the technology that’s being used to access memory, which is DDRs and DIMs, and this direct attach to compute.
In order to share it, we have to come up with a new IO interface and new protocols to enable that sharing. And the CXL, or compute express lane, is really the technology that the industry is embracing to do this shared memory. And it’s really neat because now what we’re starting to see embraced by many of the technology companies is, “let’s put this memory now out on this new IO interface–which allows multiple compute devices, processors, accelerators, to actually start accessing it and sharing it without it having to move.” And so now you can both share it and you get this better efficiency because it gets loaded in. And now maybe one compute device isn’t using it much right now, but another one is using it a lot, using the entire bandwidth available. And so now you get this great efficiency that we’ve been able to get from storage recently, and now we’ll be able to from memory.
And it’s really, I think one of the last pieces of making a truly like composable data center, where you can actually pick and choose how much compute you need, how much memory you need, how much storage you need, much networking you need for any given workload.
[00:15:22] Camille Morhardt: So data is encrypted when it’s in storage. Is it encrypted when it’s in memory?
[00:15:27] Allison Goodman: It is. Now, that’s not to say that we don’t have innovations that we need to do in that encryption. We’ve been lucky because by having this very direct connection between the memory and the compute, you can leverage that for a lot of the encryption and making sure that you have good security between that transfer between memory and compute.
And when you start sharing it, now you have to come up with additional security protocols to make sure that it’s shared correctly, to make sure that that data is encrypted as it gets moved in and out of memory.
And then it gets even more fascinating when you start talking about persistent memory because you know the baseline assumption when, when you say memory, is that it’s not persistent, that it disappears when the power goes off. But in the case where you actually have persistence in the memory and it stays, now you have to take a lot of learnings from storage and bring that into memory in terms of making sure that it stays safe if you’re gonna save it in memory.
[00:16:21] Camille Morhardt: Are you personally concerned with sort of the sustainability of how data centers are powering and the processing and also the cooling? Is that something that you look at?
[00:16:32] Allison Goodman: Anytime that you’re working closely with customers and partners that are building and maintaining these data centers, that’s one of the biggest problems that we’re looking at and trying to solve for as you like roll out more data centers; is as much as we love to say, “oh, this is like the greatest new technology that we can adapt and makes everything faster and more secure and better.” At the end of the day you don’t wanna consume more and more of the percentages as power available with these data centers. They have to be more efficient–both in terms of how we say scale up, so like how much you can run on a given physical footprint, and then also how much you can get out of power and really starting to look at all of these metrics–whether it’s an inference how much power it takes to infer something, how much power it takes to run a workload, um, that that becomes really just like a basic unit of measure for all of our discussions.
[00:17:26] Camille Morhardt: Well, conveniently they’re aligned. Energy’s expensive, right? And so the desire to use that the most efficiently is there anyway from a cost perspective?
[00:17:38] Allison Goodman: In some ways that’s the cool intersection of capitalism and business, and you know, the green sustainability right now is that you can be motivated by just trying to be better by the earth, or you can ultimately, in companies are ultimately motivated by what’s the right thing to do, even for the business in terms of saving money and meeting exactly the resources that you need for that task and just continuing to be the most power efficient in doing it.
[00:18:03] Camille Morhardt: Can you provide any additional perspective on sustainability and the data center beyond just using renewable resources for power?
[00:18:12] Allison Goodman: I listened to a chat. It was actually at Intel Vision event. And one of the comments that was made in terms of just like data center and sustainability that stuck with me is if you look at just sustainability efficiency, if you go back 20 years, and you’re like ok well, it is actually pretty inefficient for everyone to have their own on-premises data center. And so cloud and hyperscalers kind of rode this wave of, they were inherently way more efficient than the edge, just because you were just consolidating all this. So you were getting better efficiency of hardware, like, right ok so now I have all the efficiency of hardware because I can grow and expand workloads as needed–the hardware’s always running all the time, so best efficiency.
But, in order to get more efficiencies in the next ten years, the hyperscalers are in some ways tapped out. Yes they can get new hardware and we can do all these innovations, but now we’re at smaller percentages of efficiency gains. The only way to get big efficiency of gains is to swing that pendulum back into that middle piece again. Now I need to utilize the edge and I need to acknowledge the gravity of the data and where it sits because that’s the only way I’m going substantially get a lot better.
[00:19:24] Camille Morhardt: So what do you think is the biggest worry or concern that you have in this space or concern for the future? I mean, I don’t know if it would be like quantum and you think, “oh my God, everything we’ve been doing is just gonna totally go on its head for some other architecture.” Or is it, “wow, we’re just up against this wall in energy consumption or cooling;” or is it that “there’s no end to the amount of data that’s gonna be generated and eventually we can’t keep up with it?” Or what kind of things do you see as like major, major problems with no resolve at this point, like over the next decade?
[00:20:03] Allison Goodman: Its probably the data one, there is so much data that is being produced and stored and so much of it is being stored just because it can and not because it necessarily needs to be. I’m sure all of us can relate to the fact, right? When I look on my camera roll I’m just like, “why do I have a picture of a doorknob two years ago?” But, you know, we’ve been spoiled a little bit in that the advances in technology and the cloud and these edge devices has just allow.
To be able to store all those things and we, you know, we compress and we compress and we compress and we have all these tiers of storage and tape and everything that allows us to save these and just because we can save it, then the really question is like, should we be saving it; just because we can stream certain data from the edge out to other devices, like should we? Is that the right thing to do?