InTechnology Podcast

Rowhammer Researcher Thomas Dullien (Halvar Flake) Discusses Cybersecurity for AI and Software Optimization (187)

In this episode of InTechnology, Camille gets into cybersecurity for AI and SW optimization with Thomas Dullien, aka Halvar Flake. The conversation covers cybersecurity and software optimization applications of LLMs, the prospects of AGI and other technology leaps, and the use of data for building AI models.

To find the transcription of this podcast, scroll to the bottom of the page.

To find more episodes of InTechnology, visit our homepage. To read more about cybersecurity, sustainability, and technology topics, visit our blog.

The views and opinions expressed are those of the guests and author and do not necessarily reflect the official policy or position of Intel Corporation.

Follow our host Camille @morhardt.

Learn more about Intel Cybersecurity and the Intel Compute Life Cycle (CLA).

Cybersecurity and Software Optimization with AI and LLMs

Thomas explains a previous talk he gave about AI and machine learning and how they work great if they’re applied to distributions that stay relatively stable over time. However, they work less well with distributions that change often or with malicious intent. He gives a few examples of different applications for LLMs, or lage language models, in particular. A defensive cybersecurity example is using AI to create what he calls garbage data to trick hackers into stealing plausible-looking but actually fake data in order to protect your real data. An offensive cybersecurity example is using LLM-driven pen testing assistance. When it comes to performance optimization, Thomas says the sky’s the limit, with many opportunities to optimize with LLMs, AlphaZero style algorithms, and reinforcement learning algorithms.

Artificial General Intelligence and Other Technology Leaps

When asked about the potential timeline for artificial general intelligence, or AGI, Thomas believes we’re still very far away. At the same time, he notes how some people might expect it to be sooner after experiencing the recent and sudden jump in image and language generation. Thomas cautions that we’re possibly at a plateau with LLMs, similar to the capabilities plateau of autonomous driving cars. People expected the high rate of progress to continue when a major technological breakthrough happened, but then other factors slowed down progress in both cases.

Data for Building Large AI Models

Thomas details the trend in access to computing with centralized computers in the 1950s and 60s, decentralized computing with the PC revolution of the 1970s-90s and the emergence of the internet, and how a move back towards centralization with cloud computing. The difficulty of centralized cloud computing and AI means that only hyperscalers have access to the large amounts of data required to build large AI models. He hopes that computing will eventually become less centralized, more affordable, and more efficient so that more people can benefit. Thomas states that models need to become more efficient in order to move data much faster, as well as use more data and more compute. He adds that using synthetic data will be helpful in some cases, but others like language modeling will be more difficult.

Thomas Dullien — Mathematician, Cybersecurity, and Software Optimization Expert

Thomas Dullien, also known by the pseudonym Halvar Flake from his early research in reverse engineering to find security vulnerabilities in software. Thomas was Co-Founder and CEO of optimyze, which was acquired by Elastic in 2021. He worked as a Staff Engineer at Google from 2011-2015 and again from 2016-2018. Thomas also was the Founder, CEO, and Head of Research of zynamics GmbH, which was acquired by Google in 2011. He has a master’s degree in Mathematics from Ruhr University Bochum, where he also began a Ph.D. program before discontinuing to focus on zynamics. Thomas is particularly well-known for his research with Rowhammer.

Share on social:

Transcript

Camille Morhardt 00:32

Hi, I’m Camille Morhardt, host of InTechnology podcast. I’m very excited to have with me today, Thomas Dullien. He’s a mathematician, cybersecurity and software optimization expert, who’s started one company sold it to Google stayed at Google for eight years where he worked on Rowhammer–he’s part of the first team that was able to essentially identify it as a cybersecurity threat. And then after he left Google–where he also worked on Google Project Zero–he started another company, which was then acquired. So we’re going to talk to him about a whole range of things. Welcome to the podcast, Thomas.

Thomas Dullien 01:11

Thank you for having me.

Camille Morhardt 01:12

I should mention that Thomas is also known as Halvar Flake. Who is that?

Thomas Dullien 01:19

So the story behind Halvar Flake is that at the beginning of my career, I was working in reverse engineering with a security background, meaning using reverse engineering to find security vulnerabilities in other people’s software. And that sort of work by its nature involves breaking Ulus. So a lot of the work I was doing at the beginning of my career was, in some sense, legally risky. And it was more prudent to publish these things under a pseudonym. And so I did at the time. The trouble is that if you start your career under a pseudonym, and you do all your important early work under a pseudonym, it gets very difficult to ever get rid of that pseudonym, because nobody will recognize your real name. So it’s now like I haven’t actually tried to keep my real name a secret in more than 10 years, 15 years. By now, it’s still hard to shake off the pseudonym.

Camille Morhardt 02:07

That’s pretty interesting. And then early on, when you were studying math, you had somebody ask you “for whom?” which sort of surprised you. Can you describe that that encounter?

Thomas Dullien 02:20

Yeah, so this was Robert Morris senior, the father of Robert Morris, Jr, who was responsible for the first Morris worm, which was one of the big internet worms at the time that took down the then internet. And it was nowadays perhaps more famous for being one of the early founders of Y Combinator. But Robert Morris Sr, used to be chief scientist at NSA and has a bit of a legendary reputation in cybersecurity circles. So when I met him, I was very intimidated and very starstruck. And we had a very good conversation. And at some point, Robert Morris asked me, “So what do you do?” And I explained to him what I studied mathematics; and he asked “for whom?” Yeah, and as somebody who was fairly, I wouldn’t call it naive, but I was young and idealistic, I didn’t quite understand what the “for whom?” was all about. But in the end, it is at the core of what led me later on in my career to reconsider working in cybersecurity and then shifting to software efficiency work, because the “for whom” kind of encapsulates that all security work is always about human conflict. You’re always working for somebody and against somebody.

On the defensive side, a lot of the work ends up being non-technical, because you need to get your entire organization into shape. So there’s a lot of tasks that involve organization and motivation and skilling people up and just redesigning the org, which is great work. But it’s also less of a puzzle. On the offensive side, the work is very technically challenging and very technically interesting. But then you have the trouble, like, “Who do you want to work for?” And particularly on the offensive side, that gets to be very, very tricky. And I found that optimization work gets a better alignment, in the sense that for security, if you look at a piece of existing software, and you find a security flaw in it, it’s very rare that somebody is happy about you finding that flaw and pointing it out. Whereas in efficiency, if you find a problem in legacy software that eats a significant number of CPU cycles, and you propose a fix for it, then everybody’s happy because the software is now running faster and cheaper than before. Much less complicated to deal with results of your work.

Camille Morhardt 04:29

How are you- I suppose you’re using automation, when you’re developing this software optimization. Have you already incorporated, like, machine learning algorithms into the software that you built? Or were your companies sort of too soon for that?

Thomas Dullien 04:44

So the company was a little bit too soon, but not a lot too soon. Machine learning and AI is a really interesting field that’s developing very rapidly. So when we started out, we were mostly concerned with gathering performance data, the entire idea of optimyze at the time was, we want to be the one place that gathers performance data from a bunch of different companies, and then does aggregation and data analysis on top of these things, to then make recommendations across the board to everybody to run things faster. Because one of the things that we have observed is that very quickly, the most important libraries in a software stack–across multiple companies, or even within one big company–will, in their computational cost, eclipse the most expensive service. So it turns out that the shared libraries are where all the CPU cycles are going. And the idea of optimize at the time was, if we sit on the treasure trove of data where everybody in the world is spending the CPU cycles, you can do really fascinating things with that data, in terms of doing for example, performance bug bounties, annotating a piece of GitHub and telling people “listen, if you can make this loop 10 like, 10%, faster, we’ve got $10,000 for you, because you’re gonna save all the customers much more.”

So the plan was essentially, build tooling to gather this performance data and make it really comfortable and easy and convenient for companies to gather that data. And then do an analysis on the data including machine learning, and so forth, and then profits. And then due to an unfortunate or fortunate set of events, depending on how you look at it, we ended up getting acquired after we completed the data collection part, but before we got into the machine learning part, so we didn’t do much AI there.

That said, we did have some fairly successful experiments after the recent LLM wave arrived. And having LLMs explain stack traces to developers in the sense that a lot of performance measurements will give you at some point perhaps a stack trace into the Linux kernel. And then not everybody is very deeply into reading the bones kernel and turns out that in order to get a vague idea of what a stack trace does–and even if it’s not exact in what it’s stating–is usually pretty useful in giving you a vague idea of what this is all about. So to some extent the problem of “Can you roughly tell me what this is all about?” plays to the strength of the LLM, which is vast memory, and not necessarily precision in the details of what it’s replying.

Camille Morhardt 07:16

And that’s one way they can sort of work today. How do you think, if you were to look forward, how do you think they’re going to be used, I suppose, in every sense, right, in the offensive, defensive, as well as software optimization space?

Thomas Dullien 07:32

AI does change a lot of things. I gave a keynote in 2018, in Moscow, of all places about the use of AI in computer security. And the thesis I had back then that was reasonably unpopular was that AI and ML works great if you’re applying it to distributions that stay stable in time. So let’s say facial recognition: human faces do not evolve at a very rapid rate. Or text comprehension, the way we use language doesn’t evolve at a very rapid rate. And it usually works less well if you have a distribution that changes often, or a distribution that changes maliciously. 2018, and of the AI use cases in cybersecurity were aimed at detecting bad or detecting evil. And my thesis at the time was that this is actually using AI it incorrectly. Because you’re trying to target a fast-moving distribution or a maliciously changing distribution. And then there’s other things that are perhaps more useful.

And one of the useful things of AI I was almost jokingly mentioning at the talk was at the time generative adversarial networks had just managed to generate credible human faces. And I had proposed that as a defender, AI might be better used to create fake LinkedIn profiles, so you find phishing campaigns early, than detecting somebody that’s actively trying to evade you. Because the distribution of human faces and LinkedIn profiles is much more stable than other things. I joked that using AI to generate fake but plausible looking source code. So attackers to try to steal your source code, steal fake source code, may be a better use of AI on the defensive side. And I feel kind of vindicated by the development since, because we’ve gotten much better at generating fake LinkedIn profiles or a fake but realistic looking source code. And nowadays, it may even compile it. You may even get source code that you can build.

So yeah. So on the defensive side, I think LLMs will have a huge impact on just generating garbage data. And that sounds like a useless thing. But generating garbage data is actually quite useful because the attacker usually cannot easily tell good data from bad data. Because the people that do the actual hacking are not the domain experts. Like if I’m trying to steal the designs for a military plane, the operators on the attacker sides that are moving through the network trying to steal the data, do not have the technical expertise to tell fake blueprints from real blueprints. So I think the idea of using AI to generate garbage data for attackers to steal, that may be have big value on the defensive side.

On the offensive sides, LLMs famously can’t plan very well yet. But I can give you a good how-to guide for doing the basics. I’m hoping that we’ll get LLM-driven pen testing assistance at some point. Because there’s a lot of bug bounty and web hacking that it’s still very labor intensive and that looks like at least the lower tier could be automated quite, quite well, using LLMs.

On the performance optimization side, I think there’s plenty of great opportunities, not only from LLMs, but also from alpha zero style algorithms, reinforcement learning algorithms for instruction scheduling, for memory allocation; I think the sky is the limit really. And I think the hypothesis that we had at optimyze is that having all performance data from different companies and then being able to grind that data and produce output, that helps everybody save money and energy. I think the underlying hypothesis is still valid, we just didn’t manage to execute on it like we wanted.

Camille Morhardt 11:09

Uh-huh. And if I could also get your perspective, there’s a lot that sort of right now on Q-STAR and the possibility that artificial general intelligence is right at the precipice of, I suppose, a major tipping point. What is your take on that? How far along do you think humanity is toward AGI?

Thomas Dullien 11:34

So okay, this is an answer that is not terribly precise. So, I watched an exchange on the artist formerly known as Twitter today, where an AI researcher I follow joked that AGI is always three to five years in the future. And then Jung Lee Kuhn from Meta replied, “that’s only if you’re young and optimistic, if you’re more seasoned, you know, it’s always 10 to 15 years in the future.” So I do not expect the emergence of a general AI for arbitrary tasks that exceeds the human level in the next couple of years. That said, I think the confidence intervals around everybody’s estimates have gotten very wide after the sudden jump in performance as the LLM scaled up. Like we’ve seen, two places–image generation and language generation–where we made a big jump and a jump that very few people anticipated. If you had told me three years ago, that we would get texts to image in the quality that we’re getting now, I would have placed that much further into the future. And similarly, the LLMs now are like you, you stare at the way they’re being trained and then just predicting the next token. But then, at some point, knowing a little bit about the world helps you predict the next token, right. And the fact that this can, that there can be a semblance of something like a world model, or basic concept of some things about reality that emerges just from predicting text, that is something that I certainly didn’t anticipate.

So I think what happened is that we had two developments that violated people’s normal prediction. And as such, everybody that is careful about predictions now has to put very wide confidence intervals around any prediction. We were all a little bit wrong. And a little bit surprised. So now we have to take into account that there’s a much broader range of possibilities than we thought.

On the flip side, AI has a tendency of having these breakthroughs that then make big progress in one field. And humans have a tendency to do a linear extrapolation from that. So I mean, we all heard in the mid 2010s, like 2013-14 range, that fully self-driving fully autonomous cars are just around the corner. And the reality is that nobody expects fully autonomous, like, fully autonomous outside of a prescribed geography to be a thing in the next six, seven, eight years? Right. So there was a jump in progress, everybody extrapolated that things would continue at that clip. And then it turned out, they didn’t. And there was a fundamental thing that went missing to really make it work.

It’s very possible that transformer-based LLMs have more or less plateaued in their capability. If I understand correctly, Google’s Gemini Ultra is four times the size of GPT 4 in terms of parameters, but only beats GPT by like two percentage points on most evaluations. So I’m not an AI expert, I’m not even deeply involved in any of this at the moment, but there’s a case to be made that if I need to quadruple the size of my LLM to get 2% better performance, that we may be hitting a similar plateau, at least on this architecture. And until we’ve got the next breakthrough, things might not proceed at the same clip.

Camille Morhardt 15:01

So of course data now is largely being aggregated within or by hyperscalers. And I wonder if you could talk to us about your feelings as to what this means then for humanity and the accessibility of the data required to build AI moving forward.

Thomas Dullien 15:26

So I think the fact that things are centralizing is- that’s a trend that’s been going on for 15 to 20 years, in some sense. We had centralized computers in a handful of places in the 50s, and 60s. And if you wanted to work in computing, you had no choice but to work in these large institutions that could afford a computer. And then we had the strange anomaly of the PC revolution, which decentralized computing in the 70s, and 80s, and 90s. And then with the emergence of the Internet, you had a trend towards re-centralization, right? Where, if you want to work on large enough data problems, you kind of have to go back into a big institution because they have a computer, and when I speak about a computer, we’ve seen the emergence of the Internet giants, we’ve seen the emergence of cloud computing as a consequence of that. Well, AI certainly accelerates the problem in the sense that training the AI model is at the moment, so ridiculously costly, that like you need hundreds of millions to do it, if you want to be at the cutting edge.

Now, in fairness, it’s not the first time in the history of humanity, where building something was very capital intensive. There’s no amateur offshore oil. It’s not something you build in your garage. And perhaps we’re just all very used to this democratizing period where everybody could just build something in the garage. Clearly, I’m hoping that we will get to a point where computing gets cheap enough or we get much more clever in using compute to train these models, we build better models so that we can have less centralization again, because in my opinion centralization is very rarely a good thing. But we shall see, like, it’s really hard to make predictions. We don’t even know whether will LLM training always be this expensive when we make some breakthroughs that accelerated by a factor of two 10x that I think people will put reasonable betting odds on 10x? 100? Perhaps 1,000x? In the end, we know that a human brain runs on a bowl of porridge, right? So we know that you can, like, energetically, at least frame things in much more energy efficient manner.

Camille Morhardt 17:38

What is the biggest technical breakthrough you think is required for humanity to make kind of a revolutionary change now moving forward? In technology? Is it a hardware change a software change? What sort of breakthrough do we need right now?

Thomas Dullien 17:57

Like on the purely theoretical level, getting sample efficient learning. You can argue that older ML models with all the awesome power, they’re not very efficient in making use of their data points. I read an article somewhere that we got to superhuman Atari game performance by having an AI play more Atari games than every human has ever played in the history of Atari Games. And we got to play Go better than humanity by playing many more games than humanity has ever played in the history of Go. So I think the biggest game changer would be finding a way to be drastically more sample efficient in your use of training data. Because then, all sorts of doors open like being more sample efficient is likely going to be more computationally efficient. And it’s going to be much easier to obtain the training data that you need.

There’s an argument to be made that we’re already data constrained, in many ways training now. I think there’s a big argument, where it’s alleged that OpenAI used a lot of pirated ebooks to train the model. I’ve also heard that if you exclude those pirated ebooks from your training, your LLM is much worse. So there is an argument that you’re already data constrained if you stick to the law. So either we get to be much more efficient with the sample efficiency of our models, or we have to find a way to get more data more quickly. And with that more data and much more compute.

Camille Morhardt 19:19

One of the ways that we’re getting around the problem of needing a lot of data is actually to create it with this synthetic data, and you’re talking about a radically different approach. But I am curious, your take on synthetic data.

Thomas Dullien 19:32

I mean, people have experimented with having an LLM generate text and then training on the generated text again. But it turns out that this over time seems to lead to some sort of collapse in the performance of the model. So training a model on its own data doesn’t seem to work quite as well. I’m sure somebody must have tried training one LLM data of an ensemble of other LLMs. And I’m sure we’re going to experiment a lot with synthetic data. The issue is that I suspect you will have small creeping mistakes that will creep into your synthetic data over time that then become ground truth for the next model. And you need some sort of corrective process for these things. That said, there’s scenarios in which synthetic data can do wonders. For example, I talked to somebody that was doing an image recognition system to recognize spare parts in a warehouse. The pure image retrieval algorithm didn’t work particularly well. But then what they did is they took the original CAD drawings of the spare parts, and then rendered them in 3D and then had an image diffusion model generate fully rendered versions of this picture, and they used- generated many bearings that often then fit that into an image similarity search. And that ended up working very well. So I think there’s areas where generating synthetic data is going to be terribly useful. For the issues of language modeling, that may be more difficult.

Camille Morhardt 20:59

Thomas, I don’t want to let you go without understanding your involvement in Rowhammer. You are famous in the cybersecurity community for your work therein. So please take a moment and just describe what Rowhammer is for people who aren’t familiar with it. And then if you could just describe the role that you played and your team played in understanding what sort of cybersecurity vulnerabilities it posed, which then affected the entire industry and actually still does.

Thomas Dullien 33:37

Yeah, so rowhammer, on the technical level was an effect where DRAM manufacturers were shipping DRAM chips, which had a certain reliability problem where if you were reading your same memory location repeatedly–essentially, putting current on the wire to read out that particular memory location in DRAM, would lead to a little bit of charge leaking out of neighboring rows in memory, with the effect that if you were hammering the same place in memory hard enough, then on some neighboring rows for some RAM chips, sometimes an individual bits would flip randomly or non-deterministically.

And there was a paper that demonstrated the effect by hooking up a custom FPGA to drive the RAM really hard. And then we discussed that paper internally at Google. And people were like, “Yeah, this will never be a problem in a real system.” And then my colleague, Mark Seaborn, had identified a laptop which actually exhibited the problem. And that was very, very exciting. Because in software, he managed to get the bit flips. And then people were wondering, is this just one laptop a fad RAM? So we went to a tech stop at Google and get all the laptops they had and we tried to reproduce it and then it turned out that on a good fraction of these laptops, the reduction would work. And it turned out that that laptop was just very susceptible to the problem. And that gave us a foothold to then start experimenting with, “Hey, how do we turn this into a security issue?” And people were like, “well, it’s a random bit flip, you can’t do much.” But then we realized that even if you’re flipping around a bit of memory, perhaps you can fill all of memory with something that has a decent chance that random bit flip will provide you with some value. And then we executed on that idea, and Mark drove it all the way to the finish line.

When we demonstrated the problem, and that is a real problem, all hell broke loose with the chip manufacturers generally not being very happy about this. And then the cloud providers not being very happy about it. You can imagine that, at that scale, and with those amounts of money involved, and also with economic incentives involved, that all of a sudden, the technical problem becomes a big organizational and political problem. And that was very interesting to watch. And the underlying problem doesn’t really go away. Like if I’m making DRAM chips, the economics forced me to chip, the most unreliable chip that my customer cannot tell, is unreliable–in the sense that I didn’t like if the customer doesn’t notice it’s unreliable or can like whenever see and unreliability, it’s not a problem for the customer. So they’re fine with it. But I can’t throw away every DRAM chip that has a hint of unreliability in it. So the problem that DRAM is still susceptible to flipping bits when it’s treated poorly, isn’t going away. And that’s still an issue today.

And for me, it was very, very enlightening in the sense that, as a software engineer, you live in this very fake deterministic world. And then you realize, at some point, the mono-causal determinism that you’re used to as a software developer is an illusion created for you by generations of electrical engineers that spent their lives trying to make things more-causal and deterministic. And sometimes, it matured me as a scientist, because having been a pure mathematician beforehand, and then having that computer stuff, and largely on my mind wasn’t economics, I hadn’t really dealt with experimental science yet. And surprisingly, a lot of the backend engineers at Google that were involved in the response, were by education, not natural scientists either, so stuff like, we need to estimate how bad this RAM is, with reasonable confidence intervals. Well for some it’s not in the bread and butter of everybody’s background. Then it turns out that, funnily enough, though, the machine learning folks are much more about empirical experimentation to them than the classical computer science kernel.

Rowhammer, for me really sparked an interest in the physics of computing, and what’s actually happening during the manufacturing process. I had used computers for many, many years–for 30 years almost–without ever really wondering, “So what’s involved in making a chip? And how many weeks does it take to make a chip? And what are all the complications?” And then once you stare into that, your mind is blown and changed forever, right? Because the amount of science and modern engineering and the very intricate process knowledge is just really amazing. And it’s like for a nerd like me, it’s the greatest show on earth to some extent.

Camille Morhardt 25:49

Thomas Dullien, thank you very much. Halvar Flake pseudonym. But now we all know who you are. Thank you so much for sharing with us your expertise in mathematics and your thoughts about AI and where it’s headed, and what sort of technical innovation the world needs next; and your consideration about cybersecurity and working in this space and kind of what it means and where we’re headed. Really enjoyed the conversation, and I appreciate your time.

Thomas Dullien 26:24

Thanks. Thanks a lot for having me.

InTechnology Podcast

Rowhammer Researcher Thomas Dullien (Halvar Flake) Discusses Cybersecurity for AI and Software Optimization (187)

Cybersecurity and Software Optimization with AI and LLMs

Artificial General Intelligence and Other Technology Leaps

Data for Building Large AI Models

Thomas Dullien — Mathematician, Cybersecurity, and Software Optimization Expert

More From Season 14

What That Means: Artificial General Intelligence and Cognitive Architecture

Fortune 500 Security Execs on Protecting Product and Data (198)

Telecom’s Future: What’s Possible in Telehealth, Neuromorphic Computing, and the Arts (197)