InTechnology Podcast

The Cloud vs. Onsite Data Centers: When to Repatriate Data (154)

In this episode of InTechnology, Camille and Tom get into repatriating data from the cloud back to onsite data centers with Chris Royles, Field CTO–EMEA at Cloudera. The conversation covers why and how this data repatriation is happening, along with its effects on data security.

To find the transcription of this podcast, scroll to the bottom of the page.

To find more episodes of InTechnology, visit our homepage. To read more about cybersecurity, sustainability, and technology topics, visit our blog.

The views and opinions expressed are those of the guests and author and do not necessarily reflect the official policy or position of Intel Corporation.

Follow our hosts Tom Garrison @tommgarrison and Camille @morhardt.

Learn more about Intel Cybersecurity and the Intel Compute Life Cycle (CLA).

Repatriating Data: A Sign of Maturity

Cloud computing took the world by storm in recent years, but not many companies and organizations are realizing the cloud might not be the best solution to every workload. Now, there’s a growing shift away from the cloud and back to onsite data centers. Chris explains some of the many different use cases for when the cloud, an onsite data center, or a combination of the two might work best.

For example, a retail website might prefer cloud computing due to variable workloads, whereas predictable workloads like regulatory reporting are better for onsite data centers. An example of using both would be with certain AI workloads through federated and machine learning where certain aspects of data are shared via the cloud but some are still kept private. He calls knowing when to use the cloud, when to use onsite data centers, and when to use both a sign of maturity.

The Many Unknown Unknowns of Data Security

Data security is a big topic in the public sphere right now—and for good reason. While many governments around the world are developing regulations and policies around data, such as GDPR and DORA in Europe, there are still many unknown unknowns about how personal data is being moved around and shared in the cloud. This is especially true for popular social media apps like TikTok. Chris emphasizes these current unknowns are why transparency in data processes is so important today—and why education from an early age about data security and privacy is essential for young people growing up in a hi-tech world.

Dr. Christopher Royles, Field CTO–EMEA at Cloudera

Chris Royles is a global leader in technical strategy with a long career in complex systems, data and analytics, and organization and skills development. He has been with the software company Cloudera since 2015, beginning as a Systems Engineer, moving on to Principal Solutions Engineer, and finally CTO in 2021. Chris has a Ph.D. in Artificial Intelligence from the University of Liverpool.

Share on social:

Transcript

[00:00:11] Chris Royles: Where do we need the workload to sit-run, and what data is it going to sit against and what are the protections we need to put around that data?

[00:00:26] Tom Garrison: Hi, and welcome to the In Technology podcast. I’m your host, Tom Garrison, and with me as always is my co-host, Camille Morhardt. Today our guest is Chris Royles. He’s Field CTO in Europe at Cloudera with experience in complex systems, in data and analytics, and extensive domain knowledge in information management, governance, security, machine learning, and cloud services.

Today we’ll be talking with Chris about artificial intelligence and why companies are repatriating their data from cloud to on-prem. So welcome to the podcast, Chris.

[00:01:03] Chris Royles: Thanks, Tom. Really appreciate the invitation.

[00:01:06] Tom Garrison: So we’re talking about companies that are kind of going against what everyone was doing 10 years ago or something, which was they were moving everything to the cloud, and we talked about repatriating data. Can you just give a little bit of background about what is it that we’re talking about and what’s driving people to want to have their data on-prem?

[00:01:27] Chris Royles: There’s quite a wide number of reasons that comes up. I’m very fortunate in my role. I cover the whole of EMEA, and as such, I speak with organizations in different industries, and each organization will have maybe different reasons for the choices they make. A few years ago, many organizations were driven top-down with an imperative to move to cloud and a lot of organizations have since then matured in the way they think. So if we generalize it, one of the foundational reasons, if you like, why organizations are repatriating workloads back from the public cloud into their own data center is they have a bit more control of what happens in their data center. And for certain workloads that are predictable and stable, those sometimes run better in the data center.

When you then explore that and start to dig into it, there’s probably two other really quite important aspects to it. One is the, if you like, protection of the data associated with that workload, so organizations have been more focused on the telemetry around the workload and how it actually operates and then thinking about where that data the workload runs against should be sitting; and that bringing it back into their data center gives them more control of that, if you like.

The other one is really cost. A lot of organizations we speak to when they move to cloud, they originally thought of it as this might save us money. That hasn’t always played out, especially for those workloads that run in a consistent and stable, reliable way already in their data center. So some organizations are making that choice. I think the more important aspect is having the flexibility, if you like, to repatriate if you need to.

[00:03:11] Camille Morhardt: Are you seeing it for certain kinds of workloads? or is this just across the board, like the pendulum swung too far one direction and now it’s coming back?

[00:03:19] Chris Royles: I think organizations, as I said, they’ve matured so they’ve had the opportunity to experience what the cloud can do for them. And for certain workloads–if you take a customer service example where you’ve got lots of customers maybe coming onto a website and placing a load on that website and asking many inquiries–different seasonal events can change that. By having a sale, say, in a retail context, you can drive more demand to a website. And so being able to sustain that website, it’s a very variable workload.

Other types of workloads, think something like regulatory reporting, for example, we’ve got a known number of accounts or we’ve got a known number of facets we need to report against. That’s a very fixed workload in many cases. And so again, running that in a reliable way in your data center is not going to change its profile, let’s say. That’s the type of workload we do see coming back into the data center.

The other one is when there’s a sensitivity around the data itself, so managing large volumes of citizen data, for example; certain organizations that manage a lot of citizen-related data–I use “citizen” instead of customer because it’s a countrywide set of information rather than cross-regional–that citizen-centric view is very interesting for some customers. I worked with an automotive customer. They did an automotive insurance, as an example, and they worked across different regions. What they actually found was they were creating copies of their environment in multiple regions–in simple terms–and because of that it was getting expensive. And so what they did was they brought some of those workloads back into data centers in each region so they could be managed at a regional level rather than managed in a shared cloud.

[00:05:13] Tom Garrison: Okay. So I’d like to explore that a little bit because we’ve heard a lot in the news about government policies and regulations about where data has to reside and certainly GDPR in Europe. But I wonder just more generally–not just in Europe but across the world–is the trend to have data not leave a particular country or a particular region and ease to stay? and then how do you manage that if you have a company that’s dealing with customers all over the world?

[00:05:47] Chris Royles: Well, to my example, some organizations are approaching it differently. So working with the European Bank recently and they’ve taken the very clear distinction to build a private cloud. They want to build in their own data center and use their own assets. And when I asked them why they were doing that instead of moving to cloud, their response was quite interesting. They said, “well, we’ve got to continue running a bank” in simple terms, “and our security team are always evaluating our ability to move to a cloud environment, but for now, we’ve got to carry on building. We can’t just stop our business from operating.” And so they’d already approached it several times and found that there were reasons not to move to cloud, and so were building their own private capability in their data center.

So I think what’s more interesting is there’s always change, and their response was, “at some point our security processes might say it’s okay and we’ll be able to make that transition.” And the thing with regulation is it’s changing. GDPR comes into force, organizations then have to align with it. There’s new regulations around resilience such as DORA coming in around Europe. Asia-Pacific, there is new regulations being formed in that area as well in terms of the citizen privacy and how data is moved out of region or across regions. And I think what’s interesting with most organizations is the regulatory landscape will change, it won’t stay static, and it will change for the right reasons. It’s to protect citizens and data and really consider the needs of the citizen.

So an organization really needs to think in those terms and recognize that things could change outside of their control. I’ll give you an example. We had Schrems II, which was a recognition, if you like, that certain social media platforms might be moving data between regions. And the question really was not just about. It’s not just about the data moving, it’s who has access to that data, and would the citizen ever be notified of that access request? So there were unknown unknowns, if you like, about what was happening with the data. And a number of organizations we were working with paused many of their initiatives and stopped to rethink. And the focus really was on flexibility as in, “well, if we want to move some workloads to public cloud, why can’t we do that? And then if we need to retain some workloads in the data center, why can’t we do that?” So really asking those challenge questions about where do we need the workload to sit run? and what data is it going to sit against? and what are the protections we need to put around that data?

[00:08:35] Camille Morhardt: I think one of the important emerging use cases in AI is this notion that collaboration can occur even across competitors for potential mutual benefit, even including, like you say, citizens, right? So if we’re going to do something like share insights to detect a health anomaly early, it’s helpful if we can have bigger sets of data. So how are you recommending people structure that kind of collaboration?

[00:09:02] Chris Royles: So good example with the federated models of machine learning, in the sense that a hospital can retain their data and a subset of the process can run against that data within the hospital and nothing needs to leave the boundaries of the hospital other than the model weights. And they can then be aggregated into a central model that can then be shared back with the hospital is a good example. And what that means is that model in many cases can outperform individual models. So in some cases, by doing federated learning, you can create models that outperform separate processes, if you like.

So in some cases federated learning is an approach you could take and can generate better results because you’re able to learn from different sets of information and create a better model. That model, of course, you would then share back with those that provided the, if you like, the processing where each federated component was deployed. So there are approaches for that and I know the US and Europe have just established a new agreement around federated learning and machine learning, so these are certainly approaches that can help.

The one thing we would always focus on is making sure that auditability and governance are always in place and primarily put in place early in the process. So whatever platform you are using, whenever you are coming in, strong authentication and authorization, and that authorization then thinking in terms of the data and how it’s represented. And so in many cases, that’s all around how you represent the metadata around data; and then the policies that you are going to apply to all data apply to the user or the role or the group are related to the activity that that individual is undertaking on the data and is really focused on the privacy aspects of the information being presented. So if you don’t have the right to access that data for a particular purpose, you can’t access it. How was that right determined? It was determined by seeking permission from the original provider of the data. That might have been the actual customer or citizen in the first place. So it’s been very focused on how those policies are built and carrying that metadata through so that the policy’s going to be applied throughout its entire life cycle. That’s quite a complicated set of conversations. It’s quite detailed.

[00:11:35] Tom Garrison: Super complicated. I want to try to simplify just a little bit and try to bring it back to something that is definitely in the news a lot right now and that is this whole dust-up that’s going on around TikTok and the US government’s concern about personal information that a particular application might or might not be gathering.

From your perspective, Chris, and the industry about information and keeping it local in certain regions and so forth, how should we be thinking about this as a citizen? How concerned should we be? Is this a good fight for people to have their governments trying to keep their personal information local? And then also, I think you mentioned–sorry for a complex question here–but you also mentioned the idea of not just where does the data reside, but who has access to the data. So in that context, can you just bring it home for us with TikTok as an example?

[00:12:36] Chris Royles: That’s very much getting into the geopolitical aspects, and that’s what’s quite interesting in my mind. It’s the state media and business relationships, and back to the unknown unknowns.

A platform like TikTok, or other platforms that are available globally, have to in many terms establish a level of trust with their users because at the end of the day, those platforms exist because the users are putting content and material in. The organization that manages that platform might be quite small in terms of content. It’s reliant on the users of the platform to put the material and the content in, and much of the value of the platform is derived from that particular data that’s put in. The question then is where’s that value going to? Who’s using that material for benefit? So that definition between state media and business is the question that rise around that. Do we actually know? To your point, we’re not on the inside of that. But what’s the transparency around things like the data processes and who has access, and even that notification as to who’s accessed your data and for what purpose? Did you give permission for that access?

Good examples: my mobile phone alerts me if an application’s using my geographic location; and my geographic location could be very useful to some organizations to understand both where I live, where I spend my time, which coffee shops I go to, where my office is. But the trips and transport and routes that I take every day. I have the choice to say whether my application enables my geo-location to be shared with another application. That is because the mobile phone I’ve chosen gives me those prompts. I’m aware to some degree of what information’s being collected. That’s not always going to be the case. Certain devices and certain applications won’t necessarily give you those prompts and alert you to that information collection. And so it’s down to the application provider to be very transparent in what data they’re collecting for which purpose, and I’m not sure that that transparency is there in all cases.

[00:14:55] Camille Morhardt: At what point should the onus be on the individual versus the application provider? I’m asking because we often trade sort of convenience for privacy. I definitely want my maps application to know where I am because I’m asking it where I’m going, so it has to be able to track me and I’m aware of that. But when it comes to something like a social media platform or platform where I’m posting videos about myself, I’m disclosing the information; and trying to regulate that as opposed to saying, “look, everybody, if you’re participating online all of your information, unless we’re talking about a very specific private…” I’m trying to play devil’s advocate here and understand how the other side might look at this.

[00:15:40] Chris Royles: I certainly think there’s a degree of education. I’ve got young children myself and they seem to be able to activate certain applications very easily. Social media applications, for example, are typically provided free. And so to block access to app stores for free apps is actually quite challenging as a parent. You’d be surprised. And so just education at a young age I think is quite important about how applications operate and how valuable our own data is.

And then what I notice, as well, from a personal perspective, is it’s the operating system and the device that notifies me that an application is doing something. It’s not the application.

[00:16:25] Camille Morhardt: Right.

[00:16:26] Chris Royles: Another good example, I went into a personal voice assistant recently, which I’ve got at home and the amount of skills that have been added automatically in my household, and when I explored if there was a way to disable skills being added, there isn’t a way of disabling skills being added. They get added. And so I then have to sort of add in every month a personal routine where I go around clearing out all the skills we don’t need anymore. But I don’t think that’s going to be a general activity many people are going to do on a regular basis. And so to your point, to what degree do these applications need to remind us that we have to go in and do our own due diligence and that we have to continually maintain our own choices around our data privacy, and getting that balance right is going to be difficult.

I think in the coming years we’ll see, as Facebook overstepped in specific occasions and had to step back and then regulators come in and provide particular guidance around that, I think we’ll see privacy coming and going in waves again and we’re in another wave.

Another recent example is the BBC being flagged on Twitter just very recently as a “government-funded media organization.” Some would say that’s useful information in different geographies of the world. Some would say, is that correct labeling? Again, media organizations making particular choices and social media platforms being the gatekeepers of some of those choices as well. Again, comes down to transparency, and as consumers, are we given the right information to make good decisions?

[00:18:07] Tom Garrison: I know that when we prepared for our talk today, we talked through an example too that wasn’t a social media example. It was more of a, I would say, traditional business example about just an airline. When you think about airlines, they have the standard business–they need to do reservations, they need to take information about people, maybe their passport information. Then the passengers then inherently fly, and they’re going to fly in this example outside the country. Some information needs to travel with the passengers, some information doesn’t need to. Andjust that complexity of doing business multi-nationally like that and what information should absolutely never leave and then there are different situations like an airline disaster or something like that where now all of a sudden more information needs to flow. So this is not just a hypothetical social media app question. It’s a day-to-day business concern and complexity that we all need to think about.

[00:19:12] Chris Royles: There’s going to be maybe three things to consider. One is the aircraft itself is a valuable asset. You are then going to have the pilots that are going to fly that aircraft and the citizens that are going to fly on that aircraft who might come from different geographies in their own right. The certain data that you want to be able to exchange–things like flight plans and things of that nature, as well as your personal identifiers, things like your passport information might need validation–and that might need to be shared. And so there’s certain data that you as a passenger would want to be passed across international boundaries.

And I always think in terms of there being a good path, when everything goes right, everything works, and everybody’s come to an agreement that this is a good thing. You also need to take into account what happens if things don’t go to plan, if the aircraft doesn’t arrive at its destination for whatever reason, what then happens? What happens with an aircraft is you’ll have maybe disaster investigation, you’ll have government entities getting involved in that process, and you’ll have people from all over the world that might be passengers on that aircraft and you’d have to take all of that into account. And there are policies in place on how information might get exchanged, but thinking about what happens when things go wrong is also as important as when things go right.

So I think that’s the key word there is consider all scenarios–the good scenarios, as well as what might happen under a difficult situation. It’s a bit like when a regulation changes and an organization might need to undertake stressed exits, say in a regulatory context; they might need to move a workload from one location to another. When things happen outside of your control, how is your data being protected is a very good question.

[00:21:01] Camille Morhardt: Everybody was going toward cloud, right? It seemed that “we’re going to public cloud, this is the great move. How can we get everything over there?” And now you’re saying people are taking a closer look and it’s matured and they’re trying to understand what workload is doing what for what purpose and what are the potential threats, and then deciding which workloads might migrate back, which ones might stay in the cloud, which ones might move to distributed learning techniques.

Are we now all with the pendulum in the middle just trying to figure out how to do it?

[00:21:33] Chris Royles: I think things are certainly settled. I would say there’s a maturity. We would always guide toward a data-driven approach in the choices you make. So if you want to repatriate a workload back to the data center, use workload analytics, use observability around your workloads and data to make the right choices.

There are particular industries where cloud has been very… As in, it’s very consistent. A lot of organizations are moving that way. I’ll give you an example. Financial services very much are on that journey to the cloud. They have very changeable workloads in terms of those workloads can vary quite significantly depending on seasons, depending on new offers they’re putting out, depending on customer demand.

But also, there’s a question around resilience. Then, as a lot of organizations start to move to cloud, you start seeing concerns around aggregation, cloud aggregation. What if all of our financial services institutions were all in the same cloud, in the same regions? What would that mean? And so regulators are looking at cloud in that way, as well. Not just which organizations are moving to cloud, but are they all doing it at the same time? Are they all moving in the same way that could bring in unnecessary risks?

So it depends on your perspective. Different industries are responding differently to the availability of cloud. And again, it comes down to the workloads that they run and the shape of those workloads that they run.

[00:23:07] Tom Garrison: Well, Chris, this has been a good conversation. It’s a very complex topic obviously, but I thank you for coming in and talking to us about this whole data movement and data privacy and a lot of the complexities the companies need to manage. It was a good topic. Thank you.

[00:23:23] Chris Royles: Really appreciate the conversation.

The views and opinions expressed are those of the guests and author and do not necessarily reflect the official policy or position of Intel Corporation.

InTechnology Podcast

The Cloud vs. Onsite Data Centers: When to Repatriate Data (154)

Repatriating Data: A Sign of Maturity

The Many Unknown Unknowns of Data Security

Dr. Christopher Royles, Field CTO–EMEA at Cloudera

More From Season 11

What That Means with Camille: Convergence Between Physical & Cyber Security (158)

Machine Identities: How Machines Authenticate Each Other with Generative AI (157)

Cyber Threats in Healthcare: When Patients Become Profit (156)