[00:00:36] Camille Morhardt: Hi, and welcome to Cyber Security Inside podcast. We’re doing What That Means, high-performance computing or super computing today. I’m really looking forward to this topic. I’ve got with me, James Reinders, who’s a super computing or high-performance computing engineer at Intel. He is also the one API Evangelist for Intel. Welcome to the show, James.
[00:00:59] James Reinders: Thank you for having me.
[00:01:00] Camille Morhardt: So as usual on What That Means, can you please define for us what is high-performance computing or HPC, and also, is it interchangeable with super computing?
[00:01:10] James Reinders: It’s been known by a lot of terms over time. Very simply put, it’s a concept of building kind of the biggest, baddest, fastest computer you can to solve very large engineering, complex engineering, scientific computational problems.
[00:01:28] Camille Morhardt: So, has the architecture of the machine changed over time, or is it just more and more and more sort of cores if you will?
[00:01:38] James Reinders: Like everything else in computing, it changes over time. If we think way back to the earliest days of computing, computing was used to solve scientific and compute problems, but we just called them computers, our mainframes and so forth. The idea of a supercomputer came about maybe in the mid to late ’70s there started to be concepts that maybe we could build a machine that was a little more expensive, a little more complex than your average machine to do your business processing, read your punch cards and balance the books. And out of that there’s been many different changes in super computing over time.
One of them, by the late ’90s we settled one argument. There was a concern that you couldn’t really scale by adding lots and lots of cores. So there were some people trying to build individual processors that were super, super fast and very few of them and others saying, “Hey, I can hook together thousands of them.” By the late ’90s a standard super computer changed from being an exotic built machine to one that consisted of thousands of off the shelf processors. And it kind of ended that argument. Of course, we’ve seen two big changes since then. One is we’ve gone to multicore, and the next one has been accelerators, specifically GPUs have been an important addition to super computers for a big part of the last decade.
[00:03:07] Camille Morhardt: I should have mentioned in your intro too, you have been part of the team or responsible for a number of different super computers that have been ranked in the top in the world. Some of them for many years running, is that right?
[00:03:19] James Reinders: Yes. One of them was ASCI Red, which when we assembled it in 1996 became the number one supercomputer in the world by the top 500 ranking. And it was a multiprocessor machine with a little over 9,000 P6s in it. And that settled the argument about whether computers were going to be massively parallel or not.
[00:03:44] Camille Morhardt: Can you talk about the evolution sort of use cases or workloads that supercomputers have been used for over the last, from the time they sort of started to kind of now, and projections into the future?
[00:03:57] James Reinders: In the early days for supercomputers, one of the biggest motivations for them were what you would call military, loosely. Whether it’s computing ballistics or weapon design of the most horrendous proportions in terms of destruction capability, that was the reason people wanted do a lot of computation. And that continued pretty well into the ’90s.
Since then supercomputers get used for a lot more than that. In fact, I think the majority of use for supercomputers or high-performance computing these days is not military. We see things like energy or exploration, trying to figure out where to drill for oil or how to design a wind turbine better, how to build a better vacuum cleaner. And that’s an interesting thing, Wind tunnels. Instead of building physical wind tunnels, you’re able to do aircraft design, automotive design and really refine it. Drug simulations, all of these things what they have in common is they do simulations. They do simulations at the real world and the more compute powered we get, the more realistic our simulations can be to tell us about what the weather will be or to do climate forecasting to look into what may happen with climate change, or try to figure out more things about COVID so we can combat it. All of those are problems that you’ll see people using supercomputers tackling these days.
[00:05:30] Camille Morhardt: Some of this kind of modeling of very complex things like weather or disease spread or mutation, I’ve heard referenced as future use cases for quantum compute. So can you explain, is HPC or high-performance computing going to become eclipsed by quantum, or do they somehow fit together in a future world?
[00:05:53] James Reinders: For a long time we call them super computers because they were exactly that. They were the super-est computers we could build, if you’ll excuse the broken English. But along the line somebody decided the term high-performance computing was even cooler, I don’t know. Again, referencing high performance.
So, the concept with quantum computing is to be able to use this very spooky action at a distance, the entanglement as the basis of computation for certain types of problems. I just think of it as another cool way to build a supercomputer. That said, quantum computing is pretty specific the type of problems it can solve. It may not be the best way to solve every problem, time will tell. Of course, we need to figure out how to build them at scale. They stand the promise of being phenomenally amazing at modeling the real physical world. Some of the first uses will clearly be modeling of molecular dynamics, different things in chemistry, and those are incredibly important in solving problems. So yeah, I think quantum computing as it matures will become another form of super computing. I don’t think it’ll displace all the other architectures, it’ll just join the fold.
[00:07:12] Camille Morhardt: So it sounds, James, like you’re talking about supercomputer or high-performance computing is kind of a generic term, like you said earlier, biggest, baddest computer. Doesn’t matter the technology that’s in it if that shifts over to become quantum in the future, or if it’s some sort of a hybrid or you’ll have different ones for different kinds of approaches?
[00:07:34] James Reinders: Absolutely.
[00:07:35] Camille Morhardt: Is high-performance computing something that exists in every big company, in university? Is this something that a consumer would have access to via a cloud? How, where are these things located and how many are there?
[00:07:49] James Reinders: Well, it’s interesting. There’s a couple of ways to look at it. One is, if you’re looking for one of the top 500 machines in the world, obviously those are fairly rare and there are a lot of ways to make those accessible. So some of them end up in government labs with limited access, some end up in national labs or universities with considerable access. I work with a lot of different places around the world, like the Texas Advanced Computing facility in Austin. They’re part of a program that makes that compute power available to students all over the place who can apply for grants. And the grants are basically giving them compute time in where somebody has to pay for having bought the machine and paying to keep it turned on, but then they turn around and make that time available for lots of different science projects. So, if you’re in a lot of programs around the world, getting access to supercomputers is something you may find at your universities or your institutions.
The other way I look at it is my cell phone qualifies as what a supercomputer was capable of 25 years ago in terms of compute. And if you look at the amount of compute power we can pack into a laptop or into a small desktop machine, it’s amazing. And lots of companies have small racks of systems that have very powerful processors or very powerful processors with GPUs. And these are fairly affordable. So we’re seeing things that I would have thought of as super computers, even a decade ago, being things you can go click on a website and order up from your favorite compute vendor, and it’s amazing. And so lots of small companies are using these for simulations.
We talk about crash simulations, which is a way to test the straining of metals or the straining of materials of any type, and people who build products look for their durability. And rather than building prototypes these days, people draw them up in CAD tools and apply different crash methods to them. And it’s an affordable way for a lot of companies to do what they used to do physically. So we do see a lot of that. And frankly, I consider that super computing, although it wouldn’t show up on the top 500 list anymore.
[00:10:15] Camille Morhardt: Hmm, interesting. So can you rent out time at a server farm of super computers? Is that a thing?
[00:10:24] James Reinders: Absolutely. That’s been a fairly niche field. You might go see if you could get access to a super computing facility, or you might find a specialty company. But now more and more we see the cloud vendors offering some HPC capabilities. And the thing that really distinguishes an HPC computer from just a normal large cluster is how high performance the connections are between the compute capabilities. So, if I have a lot of different nodes, that’s what we tend to call them in a compute, where we have several processors, maybe some processors with some GPUs, they all share memory in a way. They might be on one board, but when we connect them together, typically the connections are fairly low performance. But in a super computer you invest more expense at connecting them together fast. That’s what enables us to write a program that uses thousands, tens of thousands, sometimes hundreds of thousands of cores, of other either CPUs, GPUs, FPGAs.
In order to do that, they have to be able to communicate with a lot of bandwidth, low latency, a lot of performance. We’re seeing cloud vendors often offer instances like that. You name it, all the cloud vendors are venturing into that. And it’s a very affordable way to get access to high-performance computing.
[00:11:50] Camille Morhardt: Is there a difference when we think about security for a supercomputer or high-performance computing, versus another kind of computer? Or I’ll extend it to include privacy and trustworthiness. I mean, is there any different way of looking at that kind of compute?
[00:12:07] James Reinders: I don’t think fundamentally there is. In other words, when I think about running a workload and securing the data and keeping it private, a super computer fundamentally is very similar. There’s a couple of things I think of though that make it a little different. One thing is, super computers don’t tend to allow multiple applications to run on the same node. That provides a little security. Obviously you still have to secure the borders. But supercomputers, since people are trying to get the ultimate in performance, they don’t tend to want to share their nodes and multitask, that’s inefficient. So when you’re actually running a workload, you run it full bore on at least the part of the machine you’re on.
But the other thing is, super computing of course, by its nature is the most computationally intensive capability we can put in one place. And yeah, obviously if you’re concerned about computation exploiting or tearing through privacy, those concerns get multiplied when you put more compute power together. So I think among HPC folks talking about privacy and ethics and so forth is, it’s a very important topic. And one that I hope all of us as engineers are concerned about.
[00:13:28] Camille Morhardt: Can you talk a little bit about how this intersection of artificial intelligence and AI workloads and machine learning workloads are coming together with HPC?
[00:13:40] James Reinders: Yeah, it’s super exciting, but you won’t find me calling them AI workloads. I look at AI as a technique. And to explain that a little, if you look at problems that we might want to solve, molecular dynamics, which is a simple concern of simulating the world of a lot of molecules bouncing around. Maybe some of those molecules make up a cell membrane. Some of them make up a virus. Some of them make up a drug that’s trying to interact with the virus and stop it from going through the cell wall. And you bounce those around and you have to inject some randomness. And in those simulations we tend to do things called Monte Carlo operations. And there’s been some very interesting work. Some of it from CERN, looking at, can you replace the Monte Carlo operations with a neural network that was trained that’s AI?
And basically they took a neural network, a GAN network–or I should say GAN, because the “N” stands for network–and they trained it by letting it watch Monte Carlo operations. And then it basically plugged it in and said, “Behave like what you saw.” And the results were really exciting. It was able to do simulations that seem to give us comparable answers at a fraction of the compute power. So they’re looking at using that possibly to simulate the next generation Hadron Colliders detectors so that they can keep looking for what happens when you split into subatomic particles?
We’ve seen this in weather simulations. People work very hard at algorithms to ascertain what the weather’s going to be, but they also have done AI training on parts of that critical parts of the weather model and seen it perform the same or better detecting things like atmospheric rivers and other weather phenomenon that affect our weather. And so AI is making its way into what people would call traditional HBC workloads, solving parts of the problems that were solved other ways before. So I find it a very exciting intersection, if you will, of AI techniques with traditional HPC techniques. But I would just eventually call all of it high-performance computing, because it’s all about solving those problems.
[00:15:58] Camille Morhardt: So, high-performance computing is centralized by its definition, right? You’re putting a bunch of stuff physically together, a bunch of compute physically together. How does it operate, or is it just completely orthogonal to distributed computing? Is there any way to have distributed high-performance computing?
[00:16:17] James Reinders: Well, that’s a really a good question. The nature of parallelism is that the ultimate thing to do is do a lot of computations that are independent. And when they’re independent, I can say, “Hey, have this part of the computer solve part of the problem, this solve another.” Those can be as far away as you want them to be, very distributed. The problem is, is when they get their solutions to go forward with the problem, we usually have to exchange a little data, do a little communication. And that’s where being close together, at least by physics the way we know it today, they have to be physically close together. Light, which is we believe the fastest thing right now only travels about a foot in a nanosecond. When you’re trying to exchange data, you can’t afford for your wires to be very long or your distances traveled.
So supercomputers are often, have somewhat of a circular, or they’re very tight. There’s, people are very concerned about how far apart they are, how far the wires are that connect them and so forth. And when you’re designing your problem, if you’re using tens of thousands of cores across the machine in many cabinets, you try to exchange data with nodes that are close to you. Because if you go from one end of the computer to the other, it takes longer. So yes, you can distribute the problem. It will slow it down. It will slow down that exchange of data. And that’s why supercomputers tend to be all in one place. At least the part of the supercomputer that is going to run a tightly connected computation on it.
[00:17:55] Camille Morhardt: How much heat do these things generate, and how do you cool them?
[00:18:01] James Reinders: Wow. Yes, they generate a lot of heat. We used to several decades ago assume that the cost of air conditioning or cooling a computer was about the same as the cost of running it. Nowadays they have a ratio, and I actually don’t remember exactly what it’s called, but that would be called I think a 1.0 or well, maybe that’s called a 2.0. But the concept is, it takes one unit to run the computer, one unit to cool it. Nowadays people try to talk about 1.4 or less than that, trying to get the cooling costs down. One of the ways is we don’t cool computers as cold as we used to. We let them run a little warm, and people talk about warm air instead of cold air. That has its disadvantages, but it lowers the power consumption.
Some computers are still air cooled. A lot of them are water cooled. And what that exactly means can vary a lot. Some of them the water is piped in to a unit very close to it and then turned into cold air and blown over the system. Others the compute board will have a heat sink on it, a very flat one, and another board with water running through it. And they’ll be clamped together to do the heat exchange. I’m not aware of any now that drip water over the circuitry, but there have been computers built like that in the past, and they’re kind of a nightmare. I knew a repairman that once said he felt like a plumber more than an electrician when he worked on those computers, because he’d have to shut down. But yeah, water is a better way to evacuate the heat. So water is often involved to pull the heat away in order to keep the computers dense enough so that we don’t stretch those wires out too far and slow down the computer.
[00:19:46] Camille Morhardt: So, what other kind of major challenges like that are there when you’re dealing with high-performance compute? What are some of the things people right now are going, “Oh my God, if we could only, so a couple decades, if we could only cool it faster, if we could only get them closer together, if we could,” what is the thing now that people are trying to figure out?
[00:20:06] James Reinders: The biggest cost in running a computer is moving data around. It used to be the computation, but now moving the data from one processor to another, to a memory, to a disc. The moving of the data dominates the power consumption in the machine in some very realistic ways. And so, there’s a lot of things going on to try to make the memories higher performance closer to the process are trying to reduce the power or increase the efficiency. So a very hot topic is things like high-bandwidth memories, and how do you connect those?
And the things we used to call chips are not a single piece of silicon anymore. There are devices out there with over a 100 pieces of silicon and them all connected together. We still call them a chip because we’re so accustomed to that. And the reason there’s so many is you have processing capabilities, you have data storage, memories, caches, high-bandwidth memory. And you’re trying to put them all together in a package, and everybody’s doing some form of that. It’s a very exciting development, but the desire for that is to shorten the leads, increase the bandwidth, try to reduce the power that’s consumed at moving data around so that you can lower the cost of running the machine, increase its performance at the same time.
[00:21:25] Camille Morhardt: James Reinders, like after the Rhine River, thank you so much for joining us. You are a super computer, High-Performance Computing Engineer at Intel, and also a one API Evangelist.
[00:21:37] James Reinders: My pleasure.