Skip to content
InTechnology Podcast

Runtime Optimization with Granulate CEO (184)

In this episode of InTechnology, Camille gets into runtime optimization with Asaf Ezra, CEO at Grannulate. The conversation covers how runtime optimization works, how Granulate and other companies are changing the industry, and Asaf’s future predictions for hardware and programming.

Read more about Intel Gaudi2’s performance as evaluated by NVIDIA and referenced by Asaf in the episode here.

To find the transcription of this podcast, scroll to the bottom of the page.

To find more episodes of InTechnology, visit our homepage. To read more about cybersecurity, sustainability, and technology topics, visit our blog.

The views and opinions expressed are those of the guests and author and do not necessarily reflect the official policy or position of Intel Corporation.

Follow our hosts Tom Garrison @tommgarrison and Camille @morhardt.

Learn more about Intel Cybersecurity and the Intel Compute Life Cycle (CLA).

A New Definition of Runtime Optimization

Asaf explains to Camille that the way to improve application performance without constantly updating the application itself is to change the runtime to the application. However, not all applications need to run in the same way. This has opened a gap in the industry for dynamic runtime optimization that ensures compute resources are being managed optimally to improve application runtime. He shares how in his experience with Granulate, customers are generally happy with their SLAs, but they want to improve their compute performance rather than spend more money on increasing the amount of compute itself. An added bonus to improving application runtimes in these cases is achieving optimal performance with a lower carbon footprint, making runtime optimization a key sustainability factor.

Industry Game-Changers in Runtime Optimization

According to Asaf, one of the biggest shake-ups right now in regard to application performance optimization is Photon, largely due to being based on C++ rather than Java. While Photon does provide incredible performance improvement, it does come at a costly price point. However, Asaf notes how compute costs always go down over time. Another similar approach is the open-source project Gluten, which takes the Velox engine from Meta that also uses C++. By 2024, Asaf predicts compute decreasing in the data analytics market by 25-50% as a result of runtime optimizations led by these game-changers.

Asaf and Camille also dive into Granulate’s role in all this and how the company operates. Asaf shares how Granulate is built off multiple layers of offerings, including the agent running on the machine itself or the platform. The agent consists of two different modules, one as a loading mechanism to load the module into the application, and the other as an optimization module. Since joining Intel, Asaf highlights how Granulate integrates Sapphire Rapids accelerators to benchmark the CPU and the system on the chip. Their solutions also help in data analytics by giving up a certain CPU percentage for the same even percentage of memory. They also provide the ability to understand the performance of an application itself, and their solutions can work both in the cloud and on-premise.

Future Changes in Hardware and Programming

Camille and Asaf wrap up by looking at his predictions for the future of hardware and programming. He foresees a wave of customized hardware to improve efficiency, similar to the optimizations going on in software right now. Because of this wave, Asaf also believes that customized hardware is going to get much more affordable, pointing to Intel’s Gaudi as a good example. Asaf also sees generative AI as being very transformational when it comes to democratizing programming for end-to-end app creation. At the same time, he emphasizes how startups will need to question how they operate in this new generative AI landscape and how tech entrepreneurs will need to focus on their analytical skills above technical skills.

Asaf Ezra, Co-Founder & CEO of Granulate

Asaf Ezra runtime optimization Granulate

Asaf Ezra is the Co-Founder and CEO of Granulate, an Intel company that offers runtime optimization. He founded the company in 2018. Prior to Granulate, he was an Entrepreneur in Residence with YL Ventures and an R&D Team Leader for KayHut. Asaf also worked 4 years in the Israeli Defense Forces as a Project Manager, R&D Team Leader, and Software Developer. He has a Bachelor’s degree in Computer Science and Physics from The Hebrew University of Jerusalem.

Share on social:


Camille Morhardt  00:30

Hi, I’m Camille Morhardt. Welcome to InTechnology podcast. Today I’m going to talk with Asif Ezra. He is co-founder and CEO of the Intel company called Granulate, which offers runtime optimization. Welcome to the show, Asaf.

Afas Ezra  00:44

Thank you, Camille happy to be here.

Camille Morhardt  00:46

If you had to define runtime optimization, how would you define that?

Afas Ezra  00:51

We define it at Granulate–and why we say there’s a differentiation from anything that’s out there–is the fact that–at least when I was an engineer, and I developed code and applications–I tried to utilize the environment to the best of my application. And in here, we’re changing it. We’re saying, you know what?  The application is going to be the same for a long time. In computer time, it’s going to be a long time. Why? Because I don’t change versions, every minute, maybe every day. There are like CICD pipelines that work every day for several applications. But a day is a huge amount. And we said,” you know, what, why do we need when we run a production grade application, to have everything run the same way–whether it’s healthcare, electronic medical record, or when it’s SaaS platform for cybersecurity?” It makes absolutely no sense. And even if it’s two different SaaS platforms for cybersecurity, they don’t operate the same way. So it still doesn’t make any sense. And I think that’s changing the runtime according to the application and not the opposite, not building the application according to the specific runtime that I chose. That is why it’s so important.

And that is why it’s very different when you speak about runtime optimization, or the application-level optimization, versus, let’s say, a configuration tuning, or an orchestration system that looks at everything from the outside in. And the difference between the runtime optimization and the application optimization itself, is that I don’t necessarily know the business logic. So I can’t change the validity of the application. So I can’t change the order of certain operations. If the order is not guaranteed, for example, when running multi-threaded, if I didn’t guarantee the order, coming into, let’s say, a critical section, that’s fine. Everything else has worked exactly the same. So I can’t change the way the application works. But I can change the way the runtime provides you the resources with the guarantees that it does.

Camille Morhardt  03:06

You’re automating this, though; you’re going into each application and discovering where there may be a bottleneck and then trying to free those up by working with the hardware. Is that right?

Afas Ezra  03:15

Yeah, and this is sort of where the drawback is, because you have to do it automatically. It’s not going to be 100%. So obviously, if every application owner wrote the algorithms of, let’s say, scheduling or memories, or whatever, they would have done a much better job. How much better whether it’s 2x 4x? I don’t know. But by automating it, making it scalable, you’re leaving some optimizations on the table, no doubt.

Camille Morhardt  03:46

Does that automation allow for a customer to set certain vectors or levels or levers or preferences? Like the most important thing to me is performance, or the most important thing to me is to reduce the overall cost of compute. If it takes longer, that’s okay. Do people set those on their own–users have Granulate? Or are you optimizing?

Afas Ezra  04:09

So originally, they were supposed to choose like between, let’s say, lowest latency and high throughput, because those two are a lot of time interchangeable. Unfortunately, we didn’t think about it ahead of time. But it makes sense to think about applications today, as things that already adhere to their SLAs. Granulate comes in, usually when an application is already running in production. And when it’s running in production, you probably pay enough for it to run at the SLA’s that you want it to, for example, I’m looking for five nines latency of X milliseconds.

I can always throw more money at the problem, increase the amount of compute I’m using, to the point where I’m reaching, let’s say, one request per core or whatever. It’s going to cost me way too much. But just for the sake of example, which means that most of the times we come in the organization is already happy with our SLAs and they just want to maintain those same SLAs at much lower costs.

So unfortunately, what we ended up seeing is that people just say, “You know what? I’m happy with my SLA s. Can you get me there with a lower infrastructure footprint?” And thankfully, like you said, it also improves the sustainability, unlike let’s say finance tools where they might end up changing the contract that is related to the machine itself, moving it from on-demand to reserved instance contract. In here, you’re actually lowering the footprint. So you’re also gaining a lot of carbon emissions reduction. And we give the customer the calculation that we do.

Camille Morhardt  05:50

Asaf, you wrote recently about a major change in this industry. Can you explain to us what that change is? Why it’s transformational.

Afas Ezra  06:00

People didn’t necessarily view Java as this super inefficient language. And I don’t know if you know this, but Spark itself is based on Java8. So it’s already missing a lot of the optimizations that were inserted into the JVM running time.  Now, to sidestep that a lot of the things that Spark is sort of utilizing were re-implemented, and inserted into Spark itself. So they’re not relying on the implementation in JVM.  Cue in some folks around the world, some of them in Databricks, who maintain spark, and they’re like, “you know, what, if we had to do everything all over again, this could have been so much more efficient.” They chose C++, they could have chosen Rust or something like that, because they wanted to be very meticulous about memory management, which is a huge fault. And they also wanted to leverage all the vectorized operations that you can utilize when you’re actually doing the compilation.

And so they reached an incredible performance improvement. So we’re talking on anywhere between 3x to 8x faster than the C engines. And this has been hugely disruptive because if your managed Spark offering like AWS EMR or AG Insight on Azure of Dataproc on Google, all of a sudden, you have something that is out of this world faster than what you’re offering your customers. So it doesn’t necessarily mean that it’s more cost effective, because it could be priced differently which Databricks is doing. So it’s pricing Photon depends on the usage, but anywhere between 2x to 3x, or something like that. But it is a lot faster. So if I’m running a query, I’m gonna get a lot faster; it might cost me the same, but I’m going to get time back. And time is also costly. So if you’re any of the other managed Spark offerings, you have to react because otherwise you’re going to bleed customers over time. Because even if it’s more expensive now, what we see for the past, let’s say 25 years, is that compute costs just go down over time. So it could be more costly right now. But over time, it will be less. And so the barrier to make the migration is going to be less how much it’s going to cost me, let’s say more in terms of I’m going to have to calculate DBUs, which is the not exactly dollars that you pay on Databricks, but that’s their credit system. It’s definitely not something that anyone can have their competitors do and not react.

And so this, this is sort of brought tailwind to Gluten project, which Intel is one of the major backers of which takes the Velox engine from Meta–it’s a C++ engine that runs the operations–and they use a project on Gluten to connect it to Spark Engine. And so you’ll be able to do something very similar to Photon on certain operations. How many operators have a cover is obviously growing over time. And you have this pull from the market, because Ali EMR needs it and AWS EMR needs it. And Dataproc needs it. And we could be looking at 2024 as the year where all of a sudden, data analytics, as a market, goes down in compute by about, I don’t know 25 to 50%, because of the optimizations done in the backend, because everybody has to react to Photon. So I think it’s super exciting. The technical details are very technical, but the implications could be pretty massive.

Camille Morhardt  09:56

So tell us what role Granulate plays in that back end?

Afas Ezra  10:00

So Granulate, in itself, is built off of multiple layers of offerings. So the first is that we have an agent running on the machine itself or on the platform. So Granulate’s agent run alongside the application of the customer. And that will load a RunTime module into the runtime. So talking about Spark, we’re talking about a JVM module. In this case, we’re in Java, some sections in C++ and so.

Now, the agent in itself is basically two different modules. One is the loading mechanism that actually loads the module into the application. And the other one is the optimization module. So the first thing that we did, since joining Intel was to integrate Sapphire Rapids accelerators into Granulate. So now, if you want to benchmark a generation over generation, not just benchmarking the CPU itself, you’re also benchmarking the system on chip. So it’s, in this case QATN, and very soon IA, and so crypto operations and compression operations. And later on memory-intensive operations, where you can do a trade-off between the amount of memory that you’re using the amount of CPU that you’re using.

Talk about data analytics, a lot of times we’re memory bound, we’re not necessarily CPU bound. So we might be able to give up certain amount of CPU percentage, for the same even percentage of memory. And then we’re able to lower the cost here because the CPU is not the bottleneck at all. So these things are part of the, let’s say, the core offering and the ability to integrate more and more into Granulate. And then on top of that, Granulate leverages the ability to understand the performance of the application itself. And when talking about data analytics, we look at the performance of a certain query or a certain job. And then we can change everything according to that. So scaling, the instance types, let’s say amount of executors, the size of the executors, and so on.

Camille Morhardt  12:04

Is Granulate doing that dynamically as it’s looking at apps?

Afas Ezra  12:0  8

Yeah.  We can continue with the job or query type workload; it is built from multiple stages. So you have to be dynamic, because you have to in every stage, there’s more labor intensive, less labor intensive, more network intensive type works, you might be shuffling data, and so on.  So you always have to adapt to it. When it comes to the computer time, it’s almost infinite, because of the amount of operations that you can do per second.  Comes to the humans inside, it might look very quick, you know, it’s just 30 seconds or just a minute. So how much do you actually gain when you dynamically do things on per second or per 30 second granularity. And in fact, you were talking about a lot, just on the, let’s say, Databricks side, we’re talking about an average of 30%.

Camille Morhardt  12:59

That’s huge.

Afas Ezra  13:01

Exactly. And the nice thing about it is because Granulate does it dynamically, it’s responsible for the loading mechanism, you don’t need to change anything; it’s 30% at the click of a button, once you click a button to save 30%?  That sounds like a good trade off.

Camille Morhardt  13:18

Are enterprises themselves who are looking to optimize all of the different endload balance, all the different applications that they’re running, usually in the cloud?

Afas Ezra  13:28

It doesn’t have to be cloud.  Granulate, right now, operates on every, let’s say, Linux environment. And it could be on-premise systems where sometimes you could have huge lead times, and what you need is excess capacity. And all of a sudden, you can create excess capacity with software only. In the past, we had to do it with virtualization ratio, with VMware. Now we can do it with Granulate by increasing performance, releasing a lot of resources back into the original pool, or just keeping some more headroom for growth in the future.

And so when it comes to the cloud, the fun thing is, you can save tomorrow; you can activate Granulate and immediately shut down machine. So you immediately see the effect on your account.  There’s a sort of a good feeling about it, because you know that the customer is seeing the value. And you can show them anywhere between a normalization to the amount of throughput that they had, and the amount of work that they had to do and the amount of queries that they had, and so on. And you see those metrics go up or down depends on the metric itself. And that’s the fun thing about it.

Camille Morhardt  14:38

Asaf, do you see any trends coming in the kind of software hardware optimization space–the interplay between software and hardware?

Afas Ezra  14:48

I think what we’re seeing right now, which is super interesting, we might go back to it around, let’s say the OpenAI dev day announcements–which I think were transformational–but when it comes to the data center itself, in the past few years, what we’ve seen is a huge shift from general purpose solutions into very customized ones. So if you go back to about 20, let’s say 16. I don’t think anyone thought of the workload itself had such a huge market share, that it was worth having a specific chip made for it. And there was a boom of customized hardware, especially around deep learning it happened when he was around machine vision and connected cars and everything. But I think that we’re going to see a lot more of that. And we’re seeing a lot of hardware companies that are specializing in certain computations. Obviously, generative AI and deep learning are going to be very massive in the space. But I don’t think that just that, things like Graviton, that came from the fact that AWS is seeing a huge, huge compute on their RDS, on their EMR. And the compute is very similar. It’s worth actually creating a whole new chip just for that specific workload. And I think we’re going to see a lot of improvements in the efficiency of the process itself to create a little more customized hardware, similar to the process that we’ve seen in software.

So Granulate, in a way, is taking it another step, where you say, you know, what, I’m running on a JVM that is generic, how can I make that JVM a lot more tailored to my application? Is it feasible in hardware to reach that level of segregation? I have no hardware experience at all. But it’s definitely going to go in that direction. And it’s going to be cheaper and cheaper and faster and faster to do customized hardware. And I think it’s a must.  I think, by the way, Gaudi is a really good example for that you specialize in specific type of compute. And you can make it super cost effective. I think Nvidia actually released a benchmark that shows that from a cost effectiveness perspective, it’s a lot more cost effective to use Gaudi.  I think on the cost side it was like four times less of something like that.  So those types of solutions that you have to be there, that’s on the one side.

On the other side, you have, of course, generative AI and how transformational the world is revolutionary when it comes to not only being able to democratize programming, but you can actually do it from end-to-end app creation. So I’m sure you’ve seen the announcement, where now you can create your own application just by talking to an agent or chatting with an agent. If writing applications on top of iOS and Android was something that created, I don’t know, hundreds of millions of applications over time. What order of magnitude are we expected to see now? And what does it mean for startups raising money? How are they expected to utilize that money when it comes to IP, if now generating a lot of software IP is going to be massively cheaper and massively faster. And I think this is also something that we’re going to see impact the ecosystem as a whole, not just data centers, or cloud.

Camille Morhardt  18:27

What kinds of companies do you think we’re going to see explode when the new programming languages English, or whatever language you speak in that Google can translate for you?

Afas Ezra  18:39

I think the thing that we’re seeing right now is that everyone who, let’s say, tried to ride that wave very quickly, might have ended up being too fast without good enough moat. So they might end up regretting that decision or finding other ways to build a bigger moat. I think what we’ve seen in the past where usage was the main barrier could still stand. So if I had to guess, I assume that the first mover’s advantage that OpenAI had with ChatGPT and the API would still be massive, even though other cloud providers came up with similar API offerings. And I think it’s going to go the same way for the foundational model providers.

When it comes to the application itself that is going to be on top of it. I think it’s a hard question. I think that we are starting to see the obvious implementations, not necessarily on the consumer side, but definitely on the B2B side.  Customer success, customer support–this is finally the chatbot that we’ve all been waiting for, I think it’s only going to go down to who’s going to make the migration of policies the easiest. Because no matter how good your NLP model, or large language model is going to be in understanding, it’s not going to be able to create the policies that you expect your organization to adhere to, right.  Like, one airline company could offer you a refund for something that another company would only offer you like a change for that one. So I think that is going to be one type of let’s say differentiator; the user experience is going to be a lot more important than in the past. So if in the past, you could have build better technology, I don’t think you have that excuse anymore. But unfortunately for actual applications that I forsee?  Harder for me to say, but I expect to see them very fast.

Camille Morhardt  20:36

What would you tell people to focus on if they’re interested–like, if you look back to when, when you were thinking of starting a company, the skillset that you may have needed at that time could be pretty different. You might tell somebody to go out and learn some programming languages or something– which you’re pointing out maybe irrelevant very soon. So what? What would you tell people to become familiar with?

Afas Ezra  21:01

So I think the most important part is your analytical skills; being able to analyze the bigger picture and understand and be very honest with yourself about what your differentiation is, what does it entail? What are your risks to the business from competitors? internal risks–like budget, for example, not being able to reach your goal in terms of development, and so on. And at the end of the day, how important that problem is going to be in the next, let’s say, two to three years, when you finally come out with a general availability level product. And so you have to have a good thesis on the ecosystem, where it’s going. And you have to sort of maintain it and update it.

Having something like ChatGPT come out while you’re planning the future, the way that you see it is completely disruptional, and I think nobody can plan for that. But you have to now go and let’s, let’s say paint a few scenarios where one is, it’s actually that, let’s say revolutionary thing that I believe it is. The second scenario would be, you know what, it’s going to be revolutionary around consumer interaction or user interaction. For example, maybe the level that it’s going to go to is just on us interacting with products, us interacting with our environment, for example. Now, my smart home speaker is not going to just be turning on the lights and turning them off, but it’s also going to be my kids personal tutor, for example. And the third scenario is, you know what? It’s going to be similar to blockchain. It’s going to have specific use cases, but it’s not going to be that massive transformation that we all hoped it would be.

Camille Morhardt  22:54

Asaf Ezra, co-founder and CEO of Granulate, an Intel company that offers runtime optimization. Thank you for joining today.

Afas Ezra  23:03

Thank you, Camille. Really appreciate it.

More From