[00:00:36] Camille Morhardt: Welcome to Cyber Security Inside. This episode of What That Means we’re going to talk about natural language processing. We’re going to talk about artificial intelligence. We’re going to talk about AI in the context of cyber security and lots of other things with the Director of AI in the Office of the CTO at Google Ashwin Ram.
Welcome to the podcast, Ashwin.
[00:00:58] Ashwin Ram: Thank you, Camille. Happy to be here.
[00:00:59] Camille Morhardt: So I want to start our conversation more specific and then go broad. And I think often we go the opposite direction. But we’re having a conversation right now. We’re two human beings. And I understand that it’s a little bit difficult for AI to actually have a conversation. And I’m wondering if you can explain why that’s hard and if it’s true–that it’s hard–has there been any sort of recent evolution or progress in that space?
[00:01:27] Ashwin Ram: Yeah, so interestingly, conversational interaction is the problem I started working on for my PhD thesis decades ago. I’m still working on it. So it is a hard problem and it is not solved, but we are making huge progress in getting there. The reason it’s hard is that if you think of what it takes to have a conversation like the one we are having, there are multiple things involved in that. One of course, we have to understand speech. We have to understand the sound waves coming out of each other’s mouths and make sense of them.
And we have to account for accents and noise in their environment and all of the rest of it; turns out we’re extremely good at doing that at humans. And as of about five years ago, computers have pretty much become as good as humans but there are, of course, caveats and different use cases; but we’re reaching the point where speech understanding can be done well enough to power conversational AI. Maybe not necessarily at too many levels. So that’s kind of one problem.
The second problem is maybe the harder problem, which is once you figured out what the words are in the sound waves you heard, how do we make sense of the words? What do they mean? Natural language is inherently highly ambiguous and depends on context. So if I asked you, for example, “do I need a coat this evening?” the answer would depend on if you’re at a formal event and we have a dinner planned and I’m asking you “do I need a jacket? Is it going to be formal?” Or it might be, “Hey, is it going to be raining outside when I visit you? Am I going to need a raincoat?” So that even a simple sentence, like, “do I need a coat this evening?” can have multiple interpretations depending on context.
And so we can have computers figure that out from context. And some of this context may or may not be in the words that were just set; weather it is not something we just talked about, but it still factors into the answer to that question. Uh, and then you have to sort of figure it out the appropriate response to that.
The response depends on what we, you and I, know already in our shared context; so I’m not gonna reply with something that you already know. So all of these things have to work together in real time. And, uh, getting all of that work is really, really hard. Getting computers to understand our context, understand our shared understanding, understand these ambiguous word meanings, and of course, process speech in a way that would make all of this work smoothly. Getting that all to work together is a hard problem. We’ve made great progress on every step of this process, but there’s still a lot more work to be done.
[00:04:07] Camille Morhardt: So some of the things I think that would come up in context in a conversation that I would look at are, you know, where is somebody located? Maybe how old they are, you know, I might answer a toddler differently if they asked if they needed a coat in the evening–it would make sense to me in a different way that they were asking. And so I’m just wondering, like how, I think a lot of computers–I’ll just say computers in a general sense, but different kinds of sensors and compute devices that we carry on or around us or ubiquitous or ambient computing that may be surrounding us–can actually determine these contextual situations moving into the future.
So are we going to use those new pieces of data to inform the interaction or is it going to come separately?
[00:04:57] Ashwin Ram: We will be using those. So, if you think about it, there are kind of three broad categories of what I was calling “context.” One is what was just said in the conversation thus far, where might you fall back to something that was said, uh, you know, two or three terms ago and build on that. We both have heard that. And there’s shared context that we have. So that’s one kind of context. Computers are fairly good at capturing that now–there’s some of the larger models that we’ve seen from some of the tech companies and others.
The second kind of context is context of the world around us–a shared world knowledge–that people have. We both know that birds fly, except ostriches don’t. So we could use that context without how to explicitly talk about it in the conversation. So that word knowledge, if you will, is another kind of context.
And then there’s the third type, which is the one you mentioned, which has knowledge of each other and it could be the age, the goals, the needs, and other things. The latter is the hardest, in some ways for the computer to detect automatically, particularly given new speakers who are interacting with it. But it’s also something that could be brought in from other sources. We may know, for example, if you have a person calling into a call center for support, we may know something about that particular customer based on the account records and so forth after we authenticated them. So we could get additional knowledge of the other person in the conversation through those means.
So we really have to get to a point where we can bring all of this stuff together as easily and naturally as humans do.
[00:06:38] Camille Morhardt: And how are we doing that? How are we getting to that point?
[00:06:42] Ashwin Ram: We’ve done that through a few different ways. We are doing that by bringing in more types of data, data that comes in chunks. For example, we can ingest books and documents or other kinds of photographs or other kinds of data ahead of time and build a model, if you will, that provides context for the conversation. And we also have data coming in in real time and streaming data. And for example, as we are talking, we are both now sharing data with each other, that context for the conversation. So we have to be doing data ingestion and we have to be doing it at very large scale to really cover all of the kinds of things that you and I might have in common or know in common. Uh, so that’s one way. Those datas that will come in have to have a source. How do we get the data? Maybe it’s documents, but maybe this census out in the world, you might get weather data from weather sensors. We have now sensors in our homes as well with our smart home devices. There’s a lot of data about, for example, which room I’m in and which lights around, et cetera. Cause my smart home controller knows all of that. Use that. So for example, if I ask my smart home controller right now to turn off the light, it knows which light I’m referring to because it knows which room I’m in. So those are data that come in from the immediate context through some different kinds of sensors.
There are sensors in our cars that are wearable devices, that we were always producing data with the user’s permission, of course, to be cognizant of privacy. We want to be able to use some of that data in order to create context. And as I mentioned, one of the tricky issues there is figuring out the right balance between being helpful and being relevant and privacy.
[00:08:30] Camille Morhardt: Is there any way to actually sort of hide or have complete privacy moving forward or has the world just changed in such a way that we’re so interconnected now that you may have privacy in the sense of your data and being abstracted from your identity, but actual collection of the data is kind of a fact of life.
[00:08:51] Ashwin Ram: The answer is yes to both. There is a way to be mostly private, which is you can go off the grid. No one necessarily wants to do that in this day and age. Just like you can use cash instead of credit cards and not have anyone track what you’re buying. But there’s kind of a more modern solution to this question, which is that just because some data is being sensed doesn’t mean that data has to be collected centrally in a form that could be misused.
So I’ll give you an example of that. If you use predictive typing on a phone, I know this works on the Google phones and, uh, on, on the Android phones with the Android keyboard, as you are typing, we will predict the words and sometimes even phrases or sentences that we think you’re gonna type. Those predictions are personalized to you. So Camille’s predictions are different from Ashwin’s predictions. And what’s interesting about that example is we do that without the data ever leaving your phone. So Google never sees your data; Google servers never see your data. It’s done completely privately.
[00:10:00] Camille Morhardt: So you have a little AI engine sitting on the phone that’s keeping my data local and learning from how I respond?
[00:10:08] Ashwin Ram: Right. It’s a combination of two things. One is that– so there is an AI engine embedded in your phone that can do a local learning to help you without necessarily revealing any private data to anyone else. There is also a technology called federated learning. I think Google published this a couple years ago, which can aggregate the learnings from taping across billions of phones in a completely private way. So without your data ever leaving the phone, if you’re starting to type something that you have never typed before, but thousands of other people have, we still want to be able to predict a likely next step. And we can aggregate the learnings from multiple devices without ever looking at your data. It’s a technology called federated learning, it’s also useful for data analytics, and it’s very, very powerful.
So there are new technologies like this, where you do some level of edge computing or edge ML machine learning along with federated learning to provide the kind of help that people would value in a convenient privacy safe way.
One additional component to add to this is that also technologies, for example, data loss prevention from Google Cloud, which will filter input streams of data for any kind of personally identifiable information. You know, if you have a document for example, and you’re trying to process that document–let’s say an insurance company and you need to process, uh, records or invoices or other things–we can automatically find and delete en mass examples of PII personally identifiable information such as your name or address social security number, phone, etc. are never shared. And those things can be done automatically. So in the event that your application does require data to be centralized, identifiable data can still be masked to preserve the privacy. So there are a number of techniques that can be used to, I think, help people with AI in today’s world without violating privacy.
[00:12:13] Camille Morhardt: I’m going to pester you a little bit on this one thing, because you mentioned the phone learning my particular answers. And I’ve actually wondered about that because I feel like my phone is getting better at predicting how I’m going to respond to something. And it’s like a personal challenge. I almost never type the thing that it ends up showing me, partly because I’m getting, I get annoyed that it’s right. That is what I was going to type, darn it. And then I change it, you know, I purposely change it to something else.
So I’m kind of wondering, are we customizing for me, Camille, an individual? or are we just pigeonholing? Are we ending up being, this is what you usually say, so over and over again, we’re going to feed you the thing that you usually say so that you really never branch out. And now you’re kind of becoming a characterization of yourself. How do you avoid that problem?
[00:13:03] Ashwin Ram: So that’s an example of a larger problem that sometimes is called a “filter bubble.” When you read news, for example, on a newsfeed online, regardless of what site you use, when you listen to radio, I dare say this may happen with some of the podcast recommendation algorithms that maybe your podcast feeds into and typing on the phone and other things.
As these personalization models get better and better at modeling you, they get also better and better at filtering out things that you wouldn’t want to see, but in doing so they’re also restricting and in some sense, narrowing you and do a filter bubble. You’re living in a little bubble world of your own, where there’s very little peripheral vision into what, what else is going on.
So to avoid that algorithms need to be designed in ways that do allow a little bit of what in machine learning we call exploration, addition to exploitation, building on what we already know about you. And so following tried and tested route, you also want to be experimenting a little bit and exploring other alternatives.
How much you explore vs. exploited depends on the use case. If you’re typing, uh, and your job is to get this thing typed and move onto other things it’s not important enough maybe more expectation is fine, once in a while you might type something different. But most of the time it’s right to just move on to something else.
If you are reading news, you sure as hell do want a broader viewpoint because otherwise we just end up with more and more segmented viewpoints of people that never talk to each other. People have a confirmation bias. They like to read what they already believe. We solve this challenge a little bit.
So depending on the application, we can tweak these trade-offs and give you the kind of broader worldview that you would like while still helping you expeditiously on the path that you probably are going to take.
[00:15:00] Camille Morhardt: I find this fascinating. There’s a data scientist sitting somewhere that’s basically deciding how much of the news–which could be anything anywhere on the internet–you’re going to see that you already are looking for, already expecting, already believe versus something that’s a new perspective or a perspective that you would disagree with.
Like there’s a human, who’s making some decision within an algorithm as to how much exploration will happen in any given app. Is that what you’re telling me? Just make sure I understand that.
[00:15:36] Ashwin Ram: Yeah, I’m actually going one step further, which is there’s a human that is designing an algorithm that is then making those decisions. And in the early days of personalized newsfeeds of personalized, personalized movie recommendations or personalized shopping, et cetera, the algorithm and the algorithmic make choices had started to become a problem. What we are reaching now is a phase where these algorithms are actually machine learning models their deep learning models which can aggregate a lot of data across a lot of people and make these decisions fairly intelligently within a set of biases or policies that you provide; instead of hand-tweaking an algorithm to make that decision, you kind of come up with a set of policies that you would like the machine learning system to work with. The system that learns and works with those policies—it learns across a lot of data from a lot of people in a privacy-safe way as we just talked about. And so you end up with much better outcome then you would in a hand-tailored algorithm, or even in a handpicked news anchor show—a star new show–where someone has decided that for the audience of this particular TV channel, these are stories to tell or this, or the right bias or spin to put on those stories.
[00:16:59] Camille Morhardt: Hm, but in this case, I’m my own personal consumer at the end and I’m not making the decision as to who’s making the decision for me. Right? I’m just opening a browser and looking at, you know, today’s news and I’m not making any inputs into how much exploration is happening. Like you said, it was a data scientist setting that up with an algorithm that’s then making that decision and modifying it over time.
[00:17:26] Ashwin Ram: That’s a good point. I think that’s something that, uh, you know, we technologists should take home in our designs process, which is, I think most of these sites will give the consumers, the readers, the listeners a way to provide some input into whether they want to see more or less like a certain story or more or less about certain set of interests. Uh, those are fairly easy to do through very simple thumbs up, thumbs down type controls even.
But we don’t necessarily give people explicit knowledge that let them adjust how broad or narrow they want to be with respect to a particular topic. Those knobs do exist and there are ways that internally these algorithms get tweaked, but surfacing those and making those available to consumers might be a good idea.
[00:18:16] Camille Morhardt: What kind of cyber security risks are introduced into these types of models, especially around like natural language conversation and interaction and kind of this personalization or customization that’s being done?
[00:18:30] Ashwin Ram: Uh, I don’t know that it’s any different from cyber security risks with any other kinds of technology. These technologies are based on a lot of data, of course, including private data and sensitive data. HIPAA for example, healthcare, we have data and so forth –the providers do–and so securing user data and anonymizing it when needed and sharing it in a privacy-safe way is critical.
Those places where the data is secure is of course open to cyber security vulnerabilities and it could be cyber security vulnerabilities at the storage points. It could be also the use points when data is getting transmitted from point A to point B–either for processing or for gathering; it could be intercepted way. So one of the things that we hear a lot about—and we do this at Google as well–we have encryption of data, both at rest and in motion, which is, uh, data’s encrypted when it’s stored, it is encrypted when it’s transmitted and we also now have data being encrypted during processing. So actually take, uh, for example, if you have data as a company we’ll take that data and give you a way to process that encrypted data without ever having to decrypt it first–because the algorithms aren’t directly on encrypted data.
So there are techniques that are being developed now to help with the cyber security issues; but in terms of language and conversational data versus other kinds of data, I think there’s a lot of user data in this equally sensitive, not just conversational data. So it’s probably the same for both.
[00:20:15] Camille Morhardt: Where would you like to see AI go and how would you like to see it being used?
[00:20:22] Ashwin Ram: So I, I think AI to become invisible, to become ubiquitous and a natural part of your life in a way that is helpful to you, but doesn’t necessarily have to be called out as, “Hey, that’s the AI doing it.” If you think about electric motors, for example, this very esoteric technology, not a lot of people even know how it works. We have probably dozens of electric motors in our homes in all of our devices. But we never think of that. You don’t look at a microwave and say, “oh, wait a minute, there’s an electric motor in there.” Likewise, they shouldn’t be need to say “there’s AI in there.” It’s just getting cooked the way I want it to be cooked and it just works.
So I think the more AI gets better and better at just working and doing the right thing for me, the less attention we will need to pay to it. This is kind of the dream of ubiquitous computing—computing and algorithms everywhere and just so invisible out in the environment, but doing the right things for us to make our lives easier and more productive.
[00:21:30] Camille Morhardt: But then you want to make sure they’re doing the right things. That they’ve been set up (laughs).
[00:21:36] Ashwin Ram: Absolutely. And so we do need people setting them up, right? Just like we do need people setting up our motors right so our cars don’t crash and fridges don’t spoil our food.
[00:21:46] Camille Morhardt: Well, really fascinating conversation. Thank you Ashwin Ram from Google for joining today. Thank you so much for your time. I really appreciate it.
[00:21:59] Ashwin Ram: Thank you. Camille,