Skip to content
InTechnology Podcast

AI & Cybersecurity for Open Source (188)

In this episode of InTechnology, Camille gets into open source with guest Jim Zemlin, Executive Director of The Linux Foundation, and co-host Melissa Evers, Vice President of the Software and Advanced Technology Group at Intel. The conversation covers the developments in AI and cybersecurity for open-source software.

To find the transcription of this podcast, scroll to the bottom of the page.

To find more episodes of InTechnology, visit our homepage. To read more about cybersecurity, sustainability, and technology topics, visit our blog.

The views and opinions expressed are those of the guests and author and do not necessarily reflect the official policy or position of Intel Corporation.

Follow our host Camille @morhardt.

Learn more about Intel Cybersecurity and the Intel Compute Life Cycle (CLA).

AI for Open Source

Jim sets the stage for the widespread use of open-source technology in the world today by highlighting how Linux runs about 90% of the world’s computers. He explains this is because the code is good and that using it is a more efficient and affordable way to drive innovation. When it comes to AI, Jim states that the majority of software components used to build machine learning and large language models are open source. Melissa adds that good governance practices have also aided the success and security of open-source software development. As for generative AI and LLMs, Jim reflects on his challenge to developers to figure out how those models can help evolve open source. This process involves exploring different AI tools, however, understanding the context of the produced content is imperative in order to get accurate results. Melissa also emphasizes the need for re-skilling software developers to properly and securely use these new AI tools.

The emergence of AI has also increased the need for accelerated workloads. Jim points out how the supply chain for accelerated computing is disjointed and bottlenecked because of this sudden demand, which is likely going to result in reduced compute costs, improved efficiency, and more small AI models trained to perform specific tasks very well. Additionally, Melissa points to the formation of the AI Alliance as a response to the recent surge in AI use, with the goal of agreeing on the goals and priorities of AI projects. Above all, Jim stresses the need for a collective voice in the industry calling for the importance of open-source AI.

Cybersecurity for Open Source

Melissa invites Jim to talk about the Open Source Security Foundation. He explains how today’s software flows across a complex supply chain that combines thousands of different open-source components along the way. However, this leaves room for many weaknesses to be exploited. This is why the Open Source Security Foundation came together to improve things like basic software supply chain package signing. Melissa also references the recent announcement of development guiding principles at the Open Source Summit Tokyo as a collective commitment to trust and security with open-source software.

While there has been a reduction in phishing attacks, sophisticated software supply chain attacks have increased. Jim says the long-term goal is to make it harder to find attack vectors through software vulnerabilities or weaknesses in the supply chain. He adds how the Open Source Security Foundation has a joint initiative with DARPA to encourage developers to come up with AI tools for security based on generative AI technology. Melissa then highlights the Open Source Security Foundation’s scorecard as a very thorough way to get grading on the security of the posture of a codebase. Jim also states that the big security challenge right now is making open-source software curation better.

As for security standards, Jim references the recent Executive Order on Improving the Nation’s Cybersecurity from the Biden Administration that specifically mentions software bill of materials. This has led to an industry consensus on the SPDX software bill of materials (SBOMs) standard. Melissa explains a few ways competing standards develop, with emerging technology areas racing to create the best standard, while different commercial interests may create unique standards depending on the desire for transparency. The difficult part, she says, is creating the next generation of standards for an existing technology to adapt to changing use cases or consumption models.

Jim Zemlin, Executive Director of The Linux Foundation

Jim Zemlin open source AI cybersecurity

Jim Zemlin has been the Executive Director of The Linux Foundation since 2004. His career journey has focused on mobile computing, cloud computing, and open-source software. Jim is a frequent keynote speaker at tech industry events due to his unique perspective on the industry. He is on the board of the Global Economic Symposium, Open Source for America, and the Chinese Open Source Promotion Union.

Melissa Evers, Vice President of the Software and Advanced Technology Group, General Strategy to Execution at Intel

Melissa Evers open source AI cybersecurity

Melissa Evers has been Vice President of the Software and Advanced Technology Group, General Strategy to Execution at Intel since 2021, although her tenure at Intel goes back two decades. She has served in other senior leadership roles, including other vice president, director, and management positions. Melissa has been on The Linux Foundation’s Board of Directors since 2019, and she was also the Governing Board Chair for LF Edge from 2018-2021. She is also a Board member of the Technology Association of Oregon, a Corporate Champion Council Board Member at The University of Texas at Austin, and part of Future Ready Oregon – Technology Industry Consortium Board, Higher Education Coordinating Commission. Melissa has both an MBA in Finance and Strategy from the Texas McCombs School of Business and a bachelor’s degree in Civil Engineering from The University of Texas at Austin.

Share on social:

Facebook
Twitter
LinkedIn
Reddit
Email

Jim Zemlin  00:11

We need a collective voice in the industry that is screaming for the importance of open space AI. There’s all sorts of things that need to be open so that we can better trust it.

Camille Morhardt  00:28

I’m Camille Morhardt, host of InTechnology podcast and I am delighted today to be co-hosting with Melissa Evers. She’s Vice President of Software Engineering at Intel. She also heads the Open Alliance for Software across Intel. And she sits on the board of directors of the Linux Foundation. Welcome, Melissa.

Melissa Evers  00:49

Thank you.

Camille Morhardt  00:51

Melissa and I are hosting as guest today the Executive Director of Linux Foundation, Jim Zemlin. He’s been Executive Director of the Linux Foundation for almost two decades now. And we’re looking forward to getting his perspective on AI and security as it relates to open source software. Welcome, Jim.

Jim Zemlin  01:10

Thank you.

Camille Morhardt  01:11

Now I know I promised we’d get into security and AI. But I have to say, Jim, you have been Executive Director of Linux Foundation since the year that I got an external Wi-Fi card for my PC so that I could take advantage of these new things called Internet hotspots. So that’s like an incredible amount of technology transition and transformation.

Jim Zemlin  01:35

I came in exactly 20 years ago, to help provide sort of the business side of providing legal defense, protecting the infrastructure that Linux is developed on, and really with the job of “go make this the de facto software platform for the world.” And I love to tell the story about how just about 20 years ago, as well, I met my now wife on a blind date. And you know, we sat down and she asked me what I did for a living. And my wife’s a very successful tech executive, she has a Harvard MBA. And I say to her “well, I work at this nonprofit, and you know, we give everything away.”  The look of disappointment was just palpable–like she’s glancing at her watch. And lo and behold, 20 years later, I’m not only married to my wife, but Linux really runs about 90% of the world’s computers.

And I think the reason for the success of open source pivots on two main qualities. One is just good code. You know, Linus Torvalds, is a gifted developer, the kernel developers who are around him are brilliant software minds. And he just has good taste in technology. Something people don’t always realize about Linus is he wrote both Linux and Git; that’s like two holes in one in golf. It’s literally the first and second most impactful software in the world. You have to have what I like to characterize as project market fit, you know, there’s good technology that fits a real need. That’s, that’s the first ingredient. And Linux has had that for some time. The second ingredient, I think, was a little more tricky to get people to understand. But that code development of software, even by competitors, for this kind of infrastructure software, is a better, cheaper, faster way to produce innovation, and really is a non zero sum game. That’s the part I worked on more than anything else is educating the industry, proving that by working together, we can both compete and collaborate at the same time. And that hasn’t just happened with Linux, obviously. Now in retrospect, that’s happened with seminal technology. Kubernetes has changed the way that we build modern cloud applications. Node.js has completely changed how you build modern web and mobile applications. I can keep going on with critical open source projects that have completely transformed how the industry is innovating. Even in AI, almost all the software components that are used to build large language models, machine learning models are open source, we’re home to Pytorch, which is the most significant building block. Everybody uses this tool to basically build large language models and machine learning models.

And it’s that combo of a good project market fit and a concept of working together that produces these incredible products that people use every day. And then companies like Intel, they reinvest in those projects as they improve their products, right? Yeah, “well, we found a bug in Linux here, when we tested it against this semiconductor architecture,” and they contribute that better project better products, they reinvest back in the project’s Linux through code contributions. So that virtuous cycle just works every single day. We have over a half a million developers who work in our community every single day. And it’s really been an amazing journey over 20 years.

Melissa Evers  05:23

Jim, if I could add a 2b, I think part of what enables that latter piece to work so well, is governance. And it’s super not sexy (laughs). But it is absolutely essential. And if you think about the messes, and the fractious nature of early governance and open communities versus what has been kind of become almost a de facto standard of “This is the role of the governing board,” “This is the role of the Technical Steering Committee,” etc. And being able to bring all of those companies together, and startups and labs and you know, all of the various parties to be able to assert this is how we believe this technology should evolve, I think, without proper governance, none of your second point would have been able to be realized.

Jim Zemlin  06:19

Yeah, you know, it’s funny, because in artificial intelligence we saw an example of poor governance play out with Sam Altman’s exit. And then he came back in and I don’t think the Open AI organization which you know, had a board and subsidiary for-profit companies within a captive nonprofit, was functioning as well as they had hoped. On our side, as you know, Melissa, you’re a member of our board of directors, open source software development, governance, is a pretty well-trodden path where the main intellectual components, intellectual property components–the copyright of the code itself, the trademark, the name of the software, and all the IP around it–are governed under well-trodden, open source licenses that allow people to seamlessly share and sort of a many to many fashion. The decision-making structures for those IP artifacts are well formed in governing boards and technical steering committees.

And most importantly, those IP assets–now, I’m sort of talking my own book here–are housed not in a for profit company, but in a neutral nonprofit company, where multiple stakeholders can come together, join that entity, co-own those assets. And that really provides the trust among competing parties to know the license isn’t going to change the way that we’ve developed the software won’t change my competitors not going to pull the rug out from under me. And that has proven to be a really critical ingredient in enabling large scale co-investment in open source software.

Melissa Evers  08:03

Yeah.

Camille Morhardt  08:05

Jim, you’ve recently posed the question to developers–I can’t remember the context–but people are sort of debating you know, generative AI, LLM, AI more generally, and open source and kind of what is the trajectory? And maybe this should remain in closed source communities, because it’s big and scary. And you kind of came out and said, “Well, I have really a challenge for developers, which is, go figure out, you know, how generative AI and LLMs can help evolve open source.” And I was just wondering, that was a couple of months ago, if you’ve gotten any good answers back yet?

Jim Zemlin  09:22

How we continue actually, yeah, we, you know, we’ve gone and we’ve looked at open source as it relates to this area in a few ways. One, at the highest level, we believe that AI technology should be open, including foundation models. We do not believe the vague, sort of unspecific “20 years from now something terrible might happen” arguments by incumbent companies to make sure that everything is closed.  At the highest level open source in AI will provide, and I think Percy Liang from the Stanford Human AI Lab says it well, the transparency; the ability to observe how foundation models work, nobody really knows; the trust; security; understanding bias; and the traceability for the data is the model. What data went into this model? Who owns that? Is it being shared properly? You need those three qualities at the highest level, in order for large language models themselves to be developed in a way that society can count on and can balance innovation with responsibility.

The next level down, this is the interesting thing that I was talking to developers about, was, “let’s go check out some of these tools,” right, whether it’s a GitHub co-pilot style code completion tool, whether it’s generative AI tools and models that are being used to create better assisted test development for code bases, better documentation writing, you know; let’s get me 90% of the way on the documentation of this code base. And you know, that will finish it. And so I’ve asked them, just like, go look at these tools and see what you think of them and how they perform. And I can kind of report back anecdotally, what we found: it seems that there’s a bit of a bell curve for this type of technology for large language model technology, particularly generative AI, where a new developer, let’s say, is using some of these tools and because they lack context, remember, the tool is producing content, right, based on the data that it was trained on. But if a new person doesn’t have context, or any wisdom, they can’t really tell if there’s a hallucination or a bias, and they can’t really correct it. And this is why I would, again, advocate for transparency so that we could figure out why that’s happening. But it’s just not a lot of use. Because you’re not actually making the software better, you’re making it worse, because this looks good. But it’s actually bad.

The second group in this, up at the top of the bell curve, I’m a pretty good software developer; I can look at what’s happening out of these tools and know “okay, that’s wrong, this answer seems funky.” And then I’ll just go in and fix it. Pretty sophisticated, advanced developer, someone with experience, this is a huge productivity improvement tool. And I think that’s what you’ve seen people using these tools for whether it’s writing a blog post, or using it for code completion. And then fortunately, I was able to have dinner with Linus Torvalds this week and asked him a similar question. And as you can imagine, he’s on the other end of that bell curve, he’s the one of the single probably, I would argue he is the single most talented software developer in the world. And he just doesn’t get much out of it. It’s just not something where for the kind of software, he’s writing for the kind of code review, he’s doing, that it adds any value. He said, “I hope it helps,” was sort of his statement, he’s like, “I’d like to see these tools provide for better ways to create documentation, better ways to test.” He thinks that they will be a huge improvement. But at the same time, I don’t think it puts software developers out of business, particularly ones like Linus anytime soon. So that’s sort of what we’ve been looking at. And we’re just going to continue to take this journey. But I think the most important step in this journey is that the foundational elements of artificial intelligence and generative AI are open.

Melissa Evers  12:45

I think the other piece that I’ve heard feedback on is that what we’re asking of our software developers today continues to change and evolve. And so you may have been an amazing C++ developer. But now we’re asking you to program in Rust, or we’re asking you to be able to extend into different languages such as Python, et cetera. And so as people are challenged to rescale with the evolution of technology across various domains, and different security principles become more and more sophisticated, bringing in AI in the context of re-skilling also can be a very powerful tool.

Jim Zemlin  13:24

Yeah, I totally agree. I think what most firms that I speak to or doing is taking a risk-based approach to this to increase the productivity of creating documentation on a website. These tools are pretty effective for a knowledgeable person. The important thing to remember is what I said at the beginning, which is you have to have context. Remember, writing is not the same as thinking. And if you just don’t understand the subject matter at all these tools can be not helpful. They can be actually harmful because they spit out stuff that sounds pretty good, but it’s actually false. Now, you know, if you’re writing a blog post for some small topic, you’re not writing War and Peace here, so the tools can be helpful and its pretty low risk. But this risk of hallucinations and better understanding of that, I think is extremely important.

Melissa Evers  14:14

And I think it points to the end importance of open data. Spending as much time engineering our data and making sure that our data is ethical is accurate, etc, that we can better prevent hallucinations, we can better prevent the misappropriation, etc.

Camille Morhardt  14:33

So I want to ask you both about a evolution in compute toward the desire for accelerators and what the open source community is doing to help developers who maybe prefer to write once as opposed to write for every different kind of hardware. I’m sure either one of you can comment on UXL.

Jim Zemlin  14:56

Yeah, I’ll comment on the broader problem. It’s almost a running meme in Silicon Valley. I go to AI startup meetings and investment hubs, and you see people running around with hats that say, you know, “I am GPU poor.” And, you know, I think we all understand that. Right now, the supply chain for accelerated computing is a little disjointed; it’s extremely expensive. It is bottlenecked, in essentially one vendor for the most extreme workloads. And what is likely to happen in this kind of situation–which is what happens in every situation–is something’s got to give. You know, now with the emergence of AI, everyone wants to run accelerated workloads. People are not going to write to a single proprietary API that has a multi-year backorder forever. There’s gonna have to be a reduction in the cost of compute, there’s going to be a need for more efficiency, there’s going to be needs for smaller models that may be trained in a manner that would make them equally performant for a specific set of tasks. In many ways that hasn’t changed with the advent of generative AI as a concept, you know, with the advent of Open AI’s success. And that is why the Accelerated Computing Foundation and the work that we’ve been doing with Intel and other semiconductor manufacturers, I think is really important. We need to have a balance here, and that balance includes incumbent technology providers, right?  The important thing is to build standards, so that people can take advantage of accelerated computing workloads from a broader selection of vendors, from a lower cost set of resources. And that’s where you really, really get innovation to happen. I think that’s going to happen quickly. And I’m excited to be working on it.

Melissa Evers  16:44

We are too. We’ve been working on this problem for a while to try to figure out how we can free the developers’ code. And certainly the work with regard to Sickle and Cronus and the standards organizations and then making that manifest through the contributions that we’re making through the Unified Acceleration Foundation, UXL is critical for kind of bootstrapping the ability to enable heterogeneous hardware for HPC, AI, and next generation workloads and use cases.

Camille Morhardt  17:16

Melissa, I wonder if you can also comment on the AI Alliance. Am I saying that right? There’s just recently been an announcement.

Melissa Evers  17:24

Yeah. And the Linux Foundation is a participant in that, as well. So this is a group of folks led by IBM and Meta that came together, and it’s called the AI Alliance; it was announced on December 5. And a group of companies as well as governments, as well as national labs, and startups, and academic institutions have all come together to say, you know, we believe strongly to all the attributes that Jim was just speaking about, with regard to the value of open development as applied to AI–from tools to models, to data sets, etc.–there’s a lot of work to be done. And there’s a need for us to work together in a more accelerated way.

What we saw in the context of the advent of cloud computing, you know, between the work that was happening in the Open Infrastructure Foundation, the work that was happening in the Cloud Native Foundation, there were a lot of folks doing a lot of work and really disparate forms of architectures–with regard to the software architectures and the software use cases and the stacks that were being deployed. And that really fractious nature ended up meaning that there was a lot of maintenance and a lot of, you know, backlog of work to be able to support all different types of configurations, et cetera. And so part of what we’re trying to do with the AI Alliance is to bring everybody together to agree largely on these are the biggest priorities for the community to advance as quickly as possible, and get everybody’s developers that are part of the Alliance to the extent that they have alignment with those goals and have passion for that type of work, to grow in the same direction, right. To all agree, “hey, this is the work that needs to happen on the existing foundations and existing projects that are out there. Here’s new areas that we need to go explore and innovate together.” And I’m really excited about that kind of cohesive nature across the diversity of stakeholders in this conversation.

Camille Morhardt  19:18

And we need to get access to data and infrastructure, right? It’s sort of two sides to the coins. And I know that licensing has something to do, you know, with making people feel comfortable. But I guess either one of you, what do you think are some of the expectations or strides or maybe vision laid out, when that was announced, in terms of getting access to either data, or infrastructure?

Jim Zemlin  19:44

The overarching thing is we need a collective voice in the industry that is screaming for the importance of open space AI. And that having transparency, trust attribution in all of these things–its code, i’s projects that we host here at the Linux Foundation, these are the building blocks of an ML pipeline. In fact, it’s the production pipeline itself.  Our Cloud Native Computing Foundation, Kubernetes, almost all of this stuff is built on that hyperscale and cloud-based infrastructure–the software infrastructure. There is the data itself, which I’ll come back to, there’s the weights, there’s all sorts of things that need to be open so that we can fundamentally improve this technology in order to better trust it. I think that we have to have a continued collective voice that openness matters. And that’s something that we’re very supportive of.

In terms of the different components of building generative AI tools, foundation models, etc., data is obviously that fundamental building block. At the foundation, we’re already seeing large scale many to many data sharing under a common set of data licenses. Years ago, we created something called the Community Data License Agreement. This was something brokered by dozens of attorneys from all the largest tech companies in the world to come up with similar to an open source software license, how do you provide a common agreed upon legal framework for sharing data? And so we created that. It’s, you can just go to cdla.org on our website and check that out. That’s a start of how you manage data sharing in a many to many way.

We then started seeing the creation of big shared data initiatives. Overture Maps is an example of that. Last year, a group of companies Amazon Meta, TomTom, Microsoft, invested about $30 million in a large-scale geospatial mapping data initiative. The reason for this is to one, go find both public and then where there’s private data, share private data on a common license, but also come up with common metadata schema, like, you know, formats.  It’s just, they can’t be all unstructured data, you have to in order to run these tools, this is one of the big things I think people don’t understand about large language models and machine learning is you do actually have to have some form of structured data to make this stuff work effectively. And that’s the area where people are behind a little bit, in my personal opinion. But that group, I think, is a great example; group of companies come together, they’re investing in the data itself in the normalization of that data and then using that data to train models that can then be used for better mapping services by anyone, better augmented reality services from anyone using that geospatial data. We’re gonna see a lot more of this, where people are doing with data what they do with code, which is share with intention; share what you want to share, keep private, what you want to keep private. This is critical to accelerate all the goodness that comes from generative AI. I’m already starting to see more of this in our organization, I think it’s important to keep that conversation open. “Hey, we’ve got a common set of data licenses over here. Let’s agree that those are good so that data can mix more effectively, right?”

Camille Morhardt  23:11

Is this like a fundamental shift in the race between collection of data, and being the person or company who has the most data that then you could potentially monetize to more of a race toward building the best model?

Jim Zemlin  23:27

They’re not mutually exclusive concepts? Right? I mean, I think there are big companies who have incumbent data that they know how to manage incredibly well; that is a huge advantage when building this technology. But as we’ve seen over and over again, people who are not in that extraordinarily small exclusive group, find ways to work together. And so in this case, collecting data collectively, sharing data collectively, making sure that we have attribution. You know, we have a standard called the C2PA. It’s the Coalition for Content Provenance and Authenticity, this digital watermarking technology that we use to make sure that the data’s provenance is clean throughout the generative AI supply chain. chain, this has already been adopted by Leica camera, Sony camera here in Japan will embed that watermark. So you know, this one’s the authentic version, it’s not a deep fake. You’re gonna see that all over the place.

People are super creative about taking advantage of interesting new concepts, and the sharing is important. Having said that, and this is the reason why foundations like ours and the AI Alliance are important, leadership matters. Getting people together, having a conversation about, like, “you know what these 10 tools we’re all kind of building separately are pretty similar, let’s try and pick one so that we can all get to where we want to go faster.” That’s leadership. And that’s one of the things I’m excited about this AI Alliance is that it will provide that level of leadership in a very fast-moving sector.

Melissa Evers  25:53

I think similar to that, when you kicked this off, Camille, you talked about security and AI. And we’ve certainly talked a lot about AI. But I think also in the context of security and software security, attestation, s bombs, etc. I think that’s another place where advocacy and leadership are really critical. Jim, would you like to talk a little bit about what’s happening with the Open Source Security Foundation.

Jim Zemlin  26:17

Yeah. One of the things I think was really cool that Intel helped us a lot with–and companies like Google or Microsoft and others–was to look at the entire software supply chain and figure out how’s code flow from a repository where developers actually writing code to package managers where reusable software components are sort of brought into solve problems, so you don’t have to rewrite this from scratch to compilation, we’re kind of all the software comes together. And it’s built before it’s distributed to the final user of that software.  Modern software flows across a complex supply chain in that way today, combining 1000s of different open source components as it goes. Figuring out where in that flow of code there are places where there are security, weakness, working collectively to shore up that weakness, it just requires someone to lead, to coordinate, to say, “hey, let’s all get together here. And we’ll all have a particular software bill of material standards that we can all understand where the codes flowing from and to is not what we need, we need to improve that we need to create tools to automate that.” That’s something that Intel, Lynx Foundation and dozens of other companies got together a few years ago and started doing through our Open Source Security Foundation, and have already improved just basic software supply chain package signing, which makes sure that the package that you’re downloading is the one that you want it to is now something we’re seeing implemented as a direct result of this kind of a project. So I’m excited about it, the only thing I would say about it, Melissa understands this in her whole team works hard on this understands it. For all the project we have still oh so far to go improving our collective security.

Melissa Evers  27:24

I think one of the things, so the Open Source Summit Tokyo just occurred and there was an announcement associated with something called the secure software guiding principles, SSGP, or something like that. Development Guiding Principles. That’s it. And that was actually generated out of the lessons learned from Log4j. And despite the fact that the industry has come so far, with regard to being able to test that provenance, what we hadn’t really come to standards on was “okay, once you’ve remediated your software, what then how do you go through redistribution? How do you go back and re-release software that’s still in production that also has those issues, etc.?” And there was not consensus as an industry with regard to what we expected from one another.

And as Jim mentioned, you know, we have a lot of co-dependencies. Lots of companies are using lots of people’s other software. So it means that we all have to move together, we all have to follow the same principles in order for collective commitment with regard to security. And so there were quite a few people that came together as part of the Open Source Security Foundation to essentially draft those guiding principles to say like, “this is what we will expect from one another. And this is our commitment.” And every company can sign on to those standards and expectations such that we can collectively trust each other better with regard to what’s the next critical set of vulnerabilities come out this is what we are committing to with regard to our supply chain as a whole as these large players and so I’m really excited about that maturation and that announcement.

Camille Morhardt  29:08

Is there consensus about how the threat landscape is evolving right now?

Jim Zemlin  29:16

That’s a super complex question, right? In terms of bad actors are often just taking the easiest way to do bad things. And sometimes that has not a lot to do with how the software was written. But it may have everything to do with the fact that you use the word “admin” for your login credentials, right? (laughs) But the world has really started to wake up to, you’re seeing a reduction in phishing threats, because people are pretty savvy to this stuff now.

And you know, as it becomes harder to do the simple attacks, you then get into these sophisticated software supply chain attacks as a threat vector. And this is the area where if you get that part, right, you’ve really done a significant amount to harden all of our collective resilience, right? And then you pair that with best practices at deployment. And again, I think the trick there is that it’s hard. Software is created by people, people make mistakes, writing that software, those mistakes are exploited as ways to attack, to threaten folks. And so you know, we’re really focused on the long-term goal of making it harder and harder and harder to find attack vectors through software vulnerabilities or weaknesses in the supply chain.

Camille Morhardt  30:35

Do you think we’re going to start to see AI tools work for security as defensive, or proactive to find attackers or attacks?

Jim Zemlin  30:46

Yeah, and so in fact, the Linux Foundation’s Open Source Security Foundation has right now a joint initiative with DARPA to encourage developers to come up with exactly those solutions. It’s an AI challenge to come up with ways to build security tools based on generative AI technology, in order to better defend against attackers. So that’s something that I’m pretty excited about. It’s a little bit scary at the same time. But it’s just another example of where openness, using this technology for good, it’s always the right thing to do.

Camille Morhardt  31:24

Melissa, I guess I do have a question for you. I know you embrace open source and contribute to it. What is kind of your take for when companies in general make the decision to pull from open source or keep something proprietary or when to contribute?

Melissa Evers  31:42

A couple of things come to mind. One is, you should think critically about what policies and scans you need to deploy on open source code, just like you would hopefully do on code that you generate– scanning for known vulnerabilities, the ways that you’re managing secrets, etc. That being said, one of the things that the Open Source Security Foundation has come out with is a scorecard that can be applied to code bases and in fact, there’s a process by which it is being applied to a lot of Linux Foundation projects, such that you can get grading of the security posture of that codebase–how well it’s maintained, how active contributions are, etc. And it’s really rigorous grading, by the way, it’s really hard to get an A in terms of the scorecard. But you can also monitor it over time, right? Is this code getting better? Is the security posture increasing? Is it getting worse? There are folks that are really conscious of trying to make what is a very complicated question a little easier.

I think also talking to your partners and customers around their comfort and what they’re using and how they view these things. I don’t think these types of decisions should be made in a vacuum. But as Jim started this call by saying, you know, 90% of the world’s infrastructure is on Linux; the world is pulling from open source for sure. So really trying to become knowledgeable in this domain I think, should be everyone’s responsibility with regard to software development today.

Jim Zemlin  33:20

The question then is how do I curate open source software in that world? Where do I decide here’s the open source I’ll use, here’s where I’ll focus for my customer value? And then of that open source, and this was what Melissa was getting to in the scorecards, is which open source components are known good, use secure development practices, have good test coverage, have responsible disclosure policies? How do I curate that subset of that millions of open source projects that are out there into a very small set of the 1000s of open source projects that actually matter to my enterprise? And then of the most critical components in open source we rely on, how do I contribute back problems I find with that software, what I’m building my particular product, in order for the community, the collective, to maintain bug fixes and improvements? And I think engineers get that intuitively.

The big challenge from a security perspective and 2023 and 2024 is making that curation better.  Ways to even understand what all the components are–using software Bill of Materials manifests; scorecards that show based on what you have here in your code base, these components are good, you might want to use this one instead of that one, because this one’s not really maintained well, and that one has a responsible group of developers working on it. And I think those are things that when I talked to CSOs, and CTOs and VPs of engineering across the world, that’s an area where they still are struggling with and need better ways to do it.

Camille Morhardt  34:55

I wanted to ask one more question, because it came from somebody who works in the security space, noting that there seem to be a number of standards formed in open source that are independent of existing standards that industry and government hold together. First of all, is that accurate? And second of all, if that’s a strategy, why is that a strategy?

Jim Zemlin  35:19

That’s something that the technology industry has been very good at for decades. The answer to a standard that’s not quite serving our needs is to obviously create a new standard so that we can have more standards. And I would have to get specific on really what specific standard that that person was referring to. And I can just give you an example that I think they might be referring to, I don’t, I’m not sure if it is. But in the area of software Bill of Materials, there are multiple standards; the Linux Foundation has the best standard SPDX, obviously, and the most widely adopted–

Melissa Evers  35:55

Only slightly biased there, Jim (laughs).

Jim Zemlin  35:59

But I think in any new sector, you tend to see this where there’s an urgent need. “Why do we need software Bill of Material metadata standards?” Because last year in the United States, the Biden Administration issued an executive order about cybersecurity, and they specifically mentioned software bill of materials. And so there is a real requirement for this. And that work of “can we have a common set of naming? And, you know, “how do we manage version numbers, if different open source components are managed in edit a different way?” You know, there’s a complex set of tools that you need and standards you need to build for that. And so of course, there’s like an initial competition for like, you know, a kind of skin this cat better than that one. I suspect that that person would may have been referring to that. That’s sort of calming down.

And again, I’m definitely talking my book, but it seems like the industry has really come to consensus on the SPDX software Bill of Materials standard; they’ve released their 3.0 spec, which is very modular and addresses, all the concerns that were sort of happening as this concept was in its early stages. So that might be it.

But I’m certain that there are other competing standards efforts out there. And you know what, my advice is always the same. The Linux Foundation is one of the world’s largest standards development bodies, we don’t just do open source, we actually create standards, as well. And if you want your standards succeed, I know an organization that can help.

Melissa Evers  37:29

But just to recap what Jim articulated, I think there’s a couple of ways in which we see competing standards being developed. One is in new emerging technology areas, where it’s like a land grab, to, like, who’s going to create the best standard by which the most folks can align. The other area where we get into competing standards is where you’ve got different types of commercial interests and folk that may or may not be quite as transparent to different people. And so that kind of emerges over time too. And then people get a sniff of, “oh, this is what’s really going on.” And you know, so like, there’s a couple things that can result in different types of standards that are trying to address the same thing and competing.

But the final that he mentioned is when you have a technology that has existed, but the standard has become stable, but the technology is innovating again. And the challenges with something that has really hardened is it’s really hard to soften it again (laughs). And so you end up needing to create the next generation of those standards to adapt to the changing use-cases or consumption models that are driving that need. But the awesome thing about open source is that the community decides what’s the best one; they always end up self-aggregating to one particular standard and that becomes the de facto standard.

Jim Zemlin  38:46

If the code is based on a specific standard, people are going to want that code because you don’t just get some to immediate value, right? Like, “Hey, I didn’t have to even write the implementation, much less understand the specification.” And so we do a lot of that work in parallel. But I totally agree with Melissa in that the art of bringing together the broadest consensus of competing companies, right, is one of the hardest things to do in building technical standards because people don’t trust each other. Someone might have a technology, that’s the incumbent technology and you know, they want to just keep that position forever, right, and the rest of the world kind of wants to move on. So you may not make everyone happy in that process. But what you really want to do is get the broadest coalition, and focus on the innovation outcome in combination with that.

Camille Morhardt  39:41

Not to let an opportunity go to waste. Since I have both of you software experts on the line, I want to know what each of you thinks about AGI. Where are we with artificial general intelligence? What’s coming next?

Jim Zemlin  39:53

Melissa cleverly went on mute, so I’ve had to go first.

Melissa Evers  40:01

No, so I think we’re a while away. I mean, these are algorithms, these are computer programs that are still running all of this; they need some kind of human engagement with regard to retraining, etc. Will we get there someday? I don’t know. But I would say that there’s a further way to go that I think people are figuring.

With regard to where I’m most excited. I’m very, very excited about health care, and health use cases. There are so many complications with regard to disease and cancer treatments and various types of therapies. And there are so many very, very unique diseases that people are affected by. And I’ve had personal experience with someone in my family who was one and 18, in the state of Oregon to have this disease over the history of Oregon, right?  So like, there are some really significant complications by which large, large datasets can facilitate much better care, much faster, with much better efficacy. And that’s an area that I’m just incredibly excited about. And figuring out how we advance that more rapidly makes a real difference in people’s lives very, very quickly.

Jim Zemlin  41:14

I couldn’t agree with Melissa more on excitement about healthcare. Traditionally, healthcare has suffered from either really large data and kind of poor analysis–because it’s just so hard to analyze that data; or a lot smaller data sets with really good analysis of that small subset of data for drug discovery and other benefits. Now you can kind of have your cake and eat it too. And that is just so incredibly exciting for custom vaccines for cancer treatment, the amount of data you have to use to come up with that is so big, and now you can do it in a really high quality way. It’s going to change the world. You know, I think it’s going to come up with ways to treat cancer and just debilitating life ending diseases in some cases, in a way that will just change how we all live. So I totally agree with Melissa there.

Camille Morhardt  42:12

Jim Zemlin, Executive Director of Linux Foundation, thank you for joining us all the way from Japan today. Or tomorrow, I suppose, your time. And Melissa, thank you for co-hosting with me.

Melissa Evers  42:25

Thank you!

Jim Zemlin  42:26

Domo arigato gozaimashita makondo.

Camille Morhardt  42:29

Domo arigato.

More From