[00:00:41] Tom Garrison: Hi, and welcome to the Cyber Security Inside podcast. I’m your host, Tom Garrison. And with me as always is my co-host Camille Morhardt. How are you doing today, Camille?
[00:00:40] Camille Morhardt: Hi, Tom. I’m doing well too.
[00:00:51] Tom Garrison: You know, we have today broken through the doldrums of winter for just a moment for an afternoon. And the sun came out. It was brilliant. Have you seen it?
[00:01:03] Camille Morhardt: Yeah, it’s really gorgeous. And unfortunately for me, I was heading, uh, right at the noon hour into a big box grocery store. And it was kind of sad cause I was like, wow, you know, it’s so beautiful out. And I’m just about to go inside right now.
[00:01 :18] Tom Garrison: That’s right. I still maintain that Oregonians maybe more so than any other population that I’m aware of really takes advantage when the sun comes out, man, everybody goes outside if you, can you go outside during this.
[00:01:33] Camille Morhardt: I was not doing my part. Yeah. It felt very, it felt very wrong.
[00:001:37] Tom Garrison: So today we’re going to talk about updating platforms, but specifically, you know, I think there’s a little bit of context needed here and that is, we all know that applications have been moving to the cloud for ever basically; it seems like a lifetime. And we also know that the leading edge of security a mindset is to keep updating platforms and keep them updated with the latest patches from the vendors. But when you put those two worlds together, they don’t actually mix. So when people are updating devices, they do so with their on-premise resources, the servers and whatnot that are out on premise and have historically resisted moving things to the cloud and that is changing. And that’s what we’re going to talk.
[00:02:30] Camille Morhardt: Yeah. I think it’s an interesting conversation and we have, uh, one of the largest CSPs to come in and talk to us about it, how the change is happening and I can do updates remotely and just how you keep security front and center during this kind of a migration or transition.
[00:02:46] Tom Garrison: Yeah. And, and, you know, as with anything. It really boils down to how much flexibility do you have? How much faith do you have in the, in the process? And it will work and that you’re not introducing undue risk. And that’s where we’re going to talk through those details today. With, as you said, one of the, the cloud service providers.
[00:03:10] Camille Morhardt: I’m looking forward to it.
[00:03:17] Tom Garrison: Our guest today is Gabe Frost. Gabe is a Group Product Manager at Microsoft leading the Commercial Windows as a Service Engineering Team. He co-founded the Industry Alliance for Open Media, where he led as Executive Director to deliver the next generation video standard for the open web. He also co-founded three startups and has been awarded several patents, and served on the advisory board for the University of Washington Center for Entrepreneurship. So welcome to the podcast, Gabe.
[00:03:48] Gabe Frost: Hey, happy to be here.
[00:03:50] Tom Garrison: So our focus today is on keeping devices updated, right? Camille and I have spoken to several guests over the last year on the importance of keeping devices updated and the sort of challenges that are presented in doing so. In your role, I think you probably have some pretty unique perspectives in your role at Microsoft. I’d love to hear your thoughts on the challenge and what we can do moving forward.
[00:04:19] Gabe Frost: It’s a good question. I’ve been looking after this area for about four years now and side from the recent worldwide changes that all business and technical leaders have been going through, its been this interesting journey where the cloud is this phenomenon and there’s all these different options and lot of our customers when we talk to them are navigating through, what do I think about in terms of on premises management? What do I think about shifting to the cloud? How do I reason over the cost profiles of those? the pros and cons, the benefits? the shifts from people working on PCs that are locked onto their desk versus the ones that they carry in their pockets and the ones that they take home from them everywhere and shifting to this hybrid way of going about work.
Everybody has to reason through these questions in different ways and think about them differently–business continuity. And thinking about a device from the chip all the way up to the browser or the applications that they use matters a ton. It’s not just about making sure that the operating system is patched. And so when we talk to customers and our partners and Chief SecurityOofficers around the world they’re all reasoning over the same things and developing different perspectives in terms of, what is patch compliance? How should I set my goals? How should I think about what good looks like? and how do I overcome all of these different problems and unknowns?
And so it’s been a really fascinating conversation, both in terms of sharing what we know and where we think we want to go, but more co-developing in a way these ideas and making sure that we’re approaching these problems with the right intensity that they and figuring out how to do this together in a way, and deliver solutions and tech that address the sort of broad set of problems because they are all so intertwined, you know?
[00:06:39] Tom Garrison: Yeah. You said a lot there and I think that one of the elements that sort of came to mind was this notion of the device itself and do you really know the device that you’re talking to. And obviously Microsoft and Intel for that matter we’ve spent over the decades, we’ve spent a long time making sure that the human sitting behind the PC, attached to a device is really the human we think they are; but we haven’t really spent as much time saying is that device really the device we expect it to be? Or has it been altered in some way? Do we really understand its state. So is this now something in your conversations with folks in different companies, is this now a problem that people say we need a better solution so that we know more about the device itself, not just the human attached to the device.
[00:07:40] Gabe Frost: It is. What we’ve found in all of our conversations is that first off, if you think about every combination of hardware and software on a PC versus a more narrow ecosystem, it really explodes. And IT has been given this monumental task of just somehow knowing everything, right? And figuring out how to overcome all the obstacles that are presented them, oftentimes blindly. And so you’ve seen this emergence across the industry–Microsoft as well as lots of other partners and competitors– on analytics. And those analytics ideally provide some sort help finding the needle in the haystack in terms of the right insights to focus on.
And patch compliance in particular to the point that you made it’s so interesting because when you’re thinking about there’s the person and the multiple devices that they use–so there’s this concept of identity for the human being as well as for the assets that they’re using–but then being able to peel the onion on those assets individually to start to say, “well, what is the combination of all the hardware and software that’s on that device?” For a couple of reasons, the first is because you need to have some sense of what compliance means to you and what software revisions are on that machine, when there’s like umpteen amount of updaters that are floating around all over the place to get all of that stuff updated to the latest patch versions. But then there’s also the challenges that the software components present when they’re interacting with each other.
So oftentimes customers are like, well, I’m trying to update these machines, but they’re just not updating for whatever reasons why. And so they want to know from Intel, from Microsoft, they want to know from their device maker why are these things not updating? We’re using tools, we’re trying to find logs or trying to debug and figure out why this intent that I have is not being carried out. And so that–especially when you’re trying to remotely manage these devices–becomes so much more challenging.
That’s a lot of the conversation, what are the analytics that you can provide? What are the combinatory effects that I need to be thinking about? How do I mitigate and overcome problems when there’s such a layered model of software across the devices. It’s just becoming more and more of a challenge. And our challenge as folks who build products and solutions is always to try to simplify that as much as possible. Because you want to give them all the information, but if you give them too much information it’s overwhelming; you don’t know how to go about it.
So being able to understand the inventories of what’s on these assets that they’re running and who are the identities that people that are using those assets and what are their behaviors like low activity, this is a big deal. When I have a phone in my pocket and I have a PC, sometimes I’m creating and I’m using my PC, other times I’m just consuming and I’m using my phone. When you’re trying to get gigabits or hundreds of megabits of megabytes of information down onto a device, and you’re only using it irregularly, that poses additional challenges in terms of how do you keep them up to date? How do you meet patch compliance? What’s the user experience you provide? All those details.
[00:11:08] Camille Morhardt: Are there any legitimate reasons for waiting to patch?
[00:11:13] Gabe Frost: Boy, that’s tough. Legitimate reason. Well, the biggest challenges is the balance between user experience and security and oftentimes some security issues apply holistically to everybody. Like there’s a zero day vulnerability or something that affects all devices and therefore exposes the organizatio n to broader risk. You would want to patch those quickly. But sometimes there’s security issues that affect certain combinations that maybe you’re not running as an org and so knowing when should I be more determined to update devices in a short period of time versus not is always a tough one. Sometimes the answer is clear cut, and other times isn’t.
What we’ve found is that it’s nearly impossible, in that spirit of transparency, it’s nearly impossible to provide a one-size-fits-all answer to everybody. And so the easiest thing for most is just to treat how you patch and think about your organization as a service where you should just assume the content is always flowing and how do you stage that through your organization in a way that minimizes risks as you go, as opposed to treating everything like a big project that you’re going to patch once a month, everybody all at the same time.
This is the new challenge is that because there’s so many different pieces of software that you’re updating and you’re trying to balance the user experience is just making sure you have that always running production line. How do you insert things into the production line and parameterize them in such a way that you can sort of optimize the user experience most of the time, knowing that sometimes you’re going to need to expedite something into the organization because it’s a critical vulnerability. It’s super tough. It’s a hard question to answer, but it’s a good one.
[00:13:11] Tom Garrison: I would argue if you’re very, very confident that the vulnerability that doesn’t apply to you would be one reason not to. But even understanding whether it applies to you or not is so nuanced in a lot of ways, it’s kind of dangerous to say one or the other. The other one is a device that isn’t connected, or isn’t connected regularly, so you have to wait until the next time it kind of phones home or becomes connected, then you update it. But other than that…
[00:13:42] Camille Morhardt: Can I ask kind of a follow on question to that then? Because you know, we’re kind of talking OS or chip level hardware level at this point. But if you’re setting automatic updates on all different kinds of applications that are running on top of the OS, is there any risk associated with setting auto updates on this just myriad of applications you might have?
[00:14:08] Gabe Frost: Is there a risk? Yes. The question is what are the tools that you have to manage that risk so that you’re transferring unknowns to knowns? So one of the other areas that I look after is the broad Windows rollout. So anytime we release a new version of Windows, the Rollout Services team updates a billion devices and we have to do that in a systematic way. What we had to learn when we went from Windows 7 to Windows 10–which was more of a deliver as a SaaS style of a service–was how do you successfully reason over all of those risks that you’re going to deploy something.
And so the industry sort of came up with this concept of “rings” which is I’m going to organize groups of devices that I’m going to update, and I’m going to do that sort of systematically so that I can get a sense of what’s going to happen as I roll these updates out. The problem is that while rings are a useful tool, how do you know what devices to put in what ring? It’s interesting because we can observe through telemetry, how people go about doing that. We don’t know anything about the human beings that are sitting behind those machines, but when you look at are they getting good results in terms of converting unknowns to knowns in terms of their strategy for rings, oftentimes they’re not’ because what we do as humans is we go “hmm, who would be the most accommodating of risk early? It’s usually the tech folks, right? So Tom, your team gets the updates first.” Whether they like it or not, IT sends it to engineering or its organizations, because if something happens, usually they have a propensity to be able to deal with as opposed to maybe the finance group or some other group, not to diminish that in any way, that’s just how we think, how we tend to think about dealing with risk. The problem is what if every device in that group had the same graphics card and there happened to be a bug in that particular driver at that time, boom! You know, like now you’ve had a huge issue with productivity.
And so what we’ve had to learn when we roll out updates to the billion devices and Windows is “hmm, how do we determine the probability on a per device basis that this update is going to be successful on this device?” Which requires that we look at every permutation of hardware and software on every device and then look at signals that are coming back from those devices as we stage that rollout to determine do we see any outliers in terms of rollbacks? Or failures or crashes or various different signals? If you think about that over an ecosystem of more than a billion devices with every combination of hardware and software, you’re talking about trillions of records that you need to update every day. Every 24 hours you look at it again and imagine developing a score for every device to say, what do we think that the probability that this device will successfully update?
[00:17:35] Tom Garrison: So Gabe, do you set for those sorts of rings? Do you distribute risk so if there is a problem with such-and-such driver or whatever, you don’t take down that whole ring, you only take down a piece of that ring because you made sure you didn’t have too many people with that driver in that ring, because otherwise there’d be too much risk if there was an issue?
[00:18:01] Gabe Frost: Yeah, you got it. We had to figure out how to do this in a better way to provide better outcomes for customers. So there’s the right time to take the update, and then whether sort of like a red light green light. It’s one of those things where you tip toe as you go into it, because you want to convert unknowns to knowns and you want to do that in a systematic way where you can pull up the emergency brake, if you need to fairly quickly and developing all of that capability is what people rely on Microsoft to do. Because what I just described is bananas, right? Like that requires building machine learning and AI models to do all of that.
So what we’ve been doing is taking all of our learnings and lessons about risk and how to manage risk. I talked to people about like, if you have a 401k and you go, “hey, I’m invested in the S & P 500.” Most people have no idea because it’s like how is that index constructed and how much risk do I have for big companies, small companies, all of thiS? It’s similar with devices, it’s an economy and they’re changing all the time, and how do you reason through that risk is super challenging. So we’ve taken all of what we’ve learned in rollout, and we’re now starting to make that available to our commercial customers through our deployment services so that you just get that and you get the signals for free and you get safeguards.
So as soon as we learn something, Tom, you get on the phone with me and say, “Hey, Gabe, don’t deploy this driver, we’ve discovered an issue with this driver.” We can effect that in the world really quickly so that any devices that match that particular signature automatically are safeguarded. We pull them out of the deployment, both in the broad consumer ecosystem, but also those signals fire for customers who are using Intune, for example and who are deploying through those different solutions.
Camille, to answer that question, there’s a lot of risks and it’s how you leverage the capabilities to manage those risks now becomes more important than ever, and it’s been really difficult challenge. I think that the feedback that we’ve been getting from customers about like, “Hey, you give me these rings, I have all these devices help me understand how to put devices in the right rings in a way that minimizes my risks” has been a lot of what we’ve been thinking about and technology we’ve been building over the last 18 – 24 months to make that a lot easier for customers.
[00:20:35] Tom Garrison: I’m familiar with the rings and we’ve worked with Microsoft pretty extensively on it, but the element that you just raised, which maybe I misunderstood, but let’s say I was using Nike or general motors as my two corporate examples. If you were running one of those companies IT infrastructure is it their choice?–I assume it’s their choice deciding who’s in ring zero, versus ring one, versus ring two, versus ring three. Or is what you’re saying Microsoft will use all of your AI if they give you access to all of their infrastructure, whatever that you can help educate them on, who should be in rings 0 1, 2, 3, etc.?
[00:21:20] Gabe Frost: Yeah, let me explain. Four days ago you only had the ability to do rings. You define your own rings, you figure out what devices you put in those rings, and you could set, like if you’re deploying a feature update, for example, you could say, “hey, no sooner than this date, should these devices be allowed to go.” Right? You could set a start date when you want to start the deployment. We’ve just made available something called a gradual roll out, and a gradual rollout enables two things, it does you’re saying, “Hey, let the robots manage the rings” or at least some rings–o you could pick a big group of devices and say, hey, let Microsoft AI make the decisions on this group so think about it, like a ring zero, for example. If you have telemetry turned on, it gets even better if you could say, hey, I only want X percent of devices per day to go in these rings, and a number of days in between each go. So I have some time to reason over the data to see how things are doing.
But if you turn on device telemetry and you authorize Microsoft service–our deployment servic–to process that information in a compliant data boundary for you, then it will actually do, as I described, it will automatically determine of this big group of devices you handed me what is the smallest number of devices that have the highest concentration of hardware and software combinations and only pick those. So that way I can get you the broadest coverage with the least amount of devices, and you don’t have to do that on your own. The software will do that for you. And we think this is game changing in terms of managing, helping customers manage risk.
Now, we still think that we have work to do, to dial in how much information we provide and how much control we give customers over those scenarios, but bringing that machine learning and giving it to IT to be able to manage those deployments, whether you’re doing a feature update, whether you’re doing a monthly Patch Tuesday, whether you’re deploying a new driver into your ecosystem, we think is a really big innovation that’s going to help manage those risks that we’re talking about now much better.
[00:23:43] Camille Morhardt: I understand you said different software and hardware combinations, presumably, so you can get a sense what might be a problem. Are you mixing IOT and server and PC and phone all together or are you mixing it within a certain class of hardware?
[00:24:01] Gabe Frost: Currently we only support Windows Client for this. So you’d go into your direct, like your Azure active directory and say, here’s a group of devices, and I want you to run this on those devices for this update. So currently it’s supported for feature updates. If you want to deploy Windows 11, this helps a lot because number one it only offers to devices that are eligible–they meet the system requirements, you’re never going to offer to a device that doesn’t meet the system requirements and wouldn’t succeed.
The second is that it helps you reason through the set of applications that are running on those devices, as well as the hardware that’s running on those devices to get you that smallest group that you could pilot, you know, you’re, it’s almost think about it, we’re going to have all the devices that you have, we’re going to create a little live lab of the smallest set of devices, and we’re going to deploy it into that lab. So you can learn the most from the smallest amount of devices that are running.
[00:24:58] Tom Garrison: This is very interesting. I think the implications are pretty obvious in terms of making it easier to get the devices updated in a rational fashion–better probably than humans could do it, managing risk along the way. I wonder how do you see governments around the world. Do you see them starting to have a heavier hand in terms of requiring device updates?
[00:25:29] Gabe Frost: You know, I do. Most recently the latest administration has said some things about the recent supply chain attacks. So I think a lot of companies are reasoning through chain of custody in terms of the updates that are provided. You’ve seen on several occasions elements of the security administrations making statements about, “Hey, we really encourage people to patch this update. This is a big deal.” So I think more than ever, we’re actually seeing governments encourage populations to update because of the increasing frequency of attack vectors.
We’ve certainly observed it just as you have that we’re seeing more of the encouragement to go update. I don’t know what that means in terms of requirement. There’s always encouragement and there’s different levers on companies that have infrastructure that’s critical to government employees and labs and things of that nature.
We published a blog in the last couple of months where we talked about this concept of patch compliance and we looked at over the last couple of years, the number of incidents, the security incidents, and they’re just going up and up because there’s more incentive, I guess, right? There’s different incentive structures in place that would have people going after these different devices. And IOT, IOT is an interesting challenge in and of itself, when you just have J random security cameras or things like that, that are out there. How do those present an on-ramp to even PCs that are in your home network.
I think there’s just more and more on-ramps for malicious activity and it just presents that much more of a challenge for our partners, in IT, customers. It’s never been more important to be thinking about patch compliance, what it means. You know, when you think about patch compliance, what’s your goal and how do you set that goal? A lot of folks will pick a number. I hear something like 90% within X amount of time, but then when I say how much of your estate is in a drawer? How many laptops do you have that are in a drawer that your system is tracking? Or how do you think about your employees that are coming and leaving from vacation and what would the implication be?
I think that a lot of people are struggling through and reasoning, understandably through a lot of these questions to try to figure it out. I don’t believe that there’s actually an industry standard in terms of what would we broadly say how to reason over your patch compliance at any time. If someone asked you, Hey, Tom, what’s your patch compliance today? How would you reason on that? And I think we, as an industry have work to do in order to help provide more clarity in that area for customers.
It’s not only the tools and the flexibility in terms of how to update these things, but also how to reason over it, how to reason over your success, and a reason over opportunities to actually improve and get better. And those are the things we’re thinking about a lot.
[00:28:59] Tom Garrison: Before we let you go, we have one more segment that we like to call Fun Facts. What would you like to share with our listeners in the fun facts section?
[00:29:10] Gabe Frost: When I was in school, I spent all my time in math and physics, just in engineering and I have a six year old now. My six year-old is asking all the questions that six year-olds ask. Regularly I’ve been talking to him about things like he sees the stars and he wants to know what’s a star? He heard about black holes, so it’s been a real fun journey. I don’t think that I have any fact or figure to share with you. It was fun the other day, I was trying to figure out how to describe how a star collapses into a black hole and cover all the details about how does that happen to a six year-old. It was really fun to go back and relearn how to think about that and explain it in a clear way. And then you get all the follow on questions.
[00:30:02] Tom Garrison: I remember as a kid, when it was first described to me about the lifecycle of a star. And I became very worried that the sun was going to expand and I was going to get cooked.
[00:30:16] Gabe Frost: (laughs) I think you’ve got about a billion years before that happens. It was funny why are some stars brighter than other stars and all of these different concepts about relativity and how to describe that. So it’s just been super fun for me to be like, wow, how do I describe this in the simplest way possible? And I think that’s a healthy feedback channel for me on all these kinds of conversations, these concepts that are complicated and make it so simple that a six year-old could understand. (laughs)
[00:30:50] Tom Garrison: That’s great. Okay. Camille, how about you?
[00:30:53] Camille Morhardt: My fact is I spent this last weekend at the coast and it was very placid. The waves weren’t so big. I was wondering what was the biggest wave ever recorded? So I looked up the biggest wave recorded by humans as it hit land–because apparently the biggest open water wave is 63 feet, which if you’re in a boat would be tremendous.
[00:31:20] Tom Garrison: Yeah, that’d be a bad day.
[00:31:22] Camille Morhardt: Right perfect storm type of a day, so what about hitting land? So in 1958, Lituya Bay in Alaska, a part of a glacier broke off and slid into the water. 3,000 feet this thing fell. It sent off a wave that started about a hundred to 300 feet was the immediate wave that started. By the time it got to the far end of the bay, it was 1,720 feet tall! So this wave was riding in over the trees. The fun fact is there were actually three boats out in the bay and the boats all flew in at the top of the wave. And there’s this one story of one of the boats with a husband and wife, and they just rode the wave all the way in and then crash landed up in the forest and were eventually rescued. They were fine, but there were viewers that watched them and said they just sailed in all way up in the forest, so just absolutely insane. The other boats, I think people had died during the storm, but not necessarily from riding the wave.
[00:32:38] Gabe Frost: Oh my gosh (laughs).
[00:32:40] Tom Garrison: That’s crazy. So I’m going to go back out as Gabe had out into space and a little fun fact. First let me ask you two, as you’re my, my, uh, subjects here, the hottest planet in our solar system is.
[00:32:57] Gabe Frost: I’d say Mercury.
[00:32:58 ] Camille Morhardt: I would’ve thought Mercury.
[00:33:00] Tom Garrison: Everybody says Mercury, it’s actually not Mercury. It’s Venus and it’s 450 degrees Celsius. You would think that Mercury would be, cause it’s so much closer, but it has no atmosphere and therefore it can’t regulate its temperature, so it has huge fluctuations in temperature. Venus, though, has a slow access rotation and so it takes 243 earth days to complete one Venus day. Interestingly also about Venus is that the orbit of Venus is 225 earth days. And so one year on Venus is actually 18 days less than a day on Venus. That’s pretty cool. Huh? A year is shorter than a day on Venus.
[00:34:06] Camille Morhardt: That is pretty interesting.
[00:34:08] Gabe Frost: Well, yeah, that makes sense. As you described it. Wow.
[00:34:11] Tom Garrison: There you go. All right. Well, Hey Gabe, thanks so much for spending the time with us today. I thought it was a really interesting and important topic on how do we keep our platforms updated and maybe some glimpse into the future where things are headed.
[00:34:24] Gabe Frost:.Great conversation. Thanks to you both.