[00:00:36] Camille Morhardt: Hi, and welcome to today’s podcast in technology, what that means, we’re going to talk about deep fake today with Ilke Demir. She’s a senior staff researcher in Intel Labs and she studies all kinds of things. Help me out here. 3D computer vision, what else do you look at?
[00:00:52] Ilke Demir: Geometry, understanding with deep learning defects, synthetic media, generative models, and more other things.
[00:01:02] Camille Morhardt: Okay, I’m going to jump right into synthetic media because I was just looking at something like that. Are you talking about anchor people who are actually generated and delivering the news or is this something else?
[00:01:14] Ilke Demir: Synthetic media can be everything around deep fakes, which is like facial reanimation and facial retargeting. It can be completely new people, it can be completely new humans, it can be 3D models of buildings or cities or galaxies. So all of that is synthetic data in general.
[00:01:35] Camille Morhardt: Okay. So we’re going to spend a little bit of time on deep fake, and I know most people have probably heard of it, but can you describe what it is and how it’s used and if it’s changed in the last couple of years?
[00:01:56] Ilke Demir: Sure. So deep fakes are those synthetic media videos, images, audio or a combination of them, right? The actor or the action of the actor is not real. So you may be seeing me like that, but are you sure its really me, or if it is bad a deep fake of me? I think that is most prevalent example.
The bloom of deep fakes started with the introduction of generative adversarial networks or GANs, that was introduced in a paper in 2014, and in that case it was very blurry faces with maybe gray scale. You look there, you’ll see some kind of face, but not really photorealistic. Since then, it has been changed so much.
So now there are very powerful deep learning approaches with very complex architectures that we can actually control the face representation, we can control the head pose, we can control lighting, operations, we can control the gender, skin tone, and we can do it between many different phases. So that is where we are right now. So what we see online is getting more and more dystopian, as we should not be believing what we see.
[00:03:10] Camille Morhardt: Okay. So what kinds of detection methods, I know that you’ve developed different detection methods including real time detection methods for deep fake video. Can you talk about the spectrum of, I guess, are they all biometric things that you look at to see if just a person is a person, let alone whether it’s actually the person, but let’s start with just this is a real human versus a generated computer image of a human.
[00:03:42] Ilke Demir: So generation and detection are an arms race. More photorealistic images and videos are coming and then better detectors are coming. So in that race, researchers first introduced methods that are looking at artifacts of fakery in a blind way. So the idea is if we train a powerful network on enough data of fakes and reals, it will at some point learn to distinguish between fakes and reals because there are boundary artifacts, symmetry artifacts, et cetera.
Well that is of course, and it’s working for some cases, but mostly those are very open to adversarial attacks. They have a tendency to over fit to the datasets that they are generated on, and they’re not really open for domain transfer or open for generalization capability of those detectors.
We twisted that question. Instead of asking what is the artifacts of fakery or what is wrong with the video, we ask what is unique in humans? Are there any authenticity signatures in humans as watermark of being human. Following that kind of thought, we have many different detectors that are looking at authenticity signatures.
Fake Catcher is the first one. We are looking at your heart rate basically. So when your heart pumps blood, it goes to your veins and the veins change color based on the oxygen they are containing. That color change is of course not visible to us humans. We don’t look at the video and say, “Oh yeah, she’s changing color.” We don’t do that. But computationally it is visible and those are called photoplethysmography, PPG signals. So we take those PPG signals from many places on your face, create PPG maps from their temporal spectral and spatial correlations, and then train the neural network on top of PPG maps to enable deep fake detection.
We also have other approaches like eye gaze-based detection. So normally humans, when we look at the point, they converge on the point, but for deep fakes it’s like googly eyes. Of course not as visible, but they are less correlated, et cetera. So we collect all the size, area, color, gaze direction, 3D gaze points, all those information from eyes and gazes and train a deep neural network on those gaze signatures to detect whether they’re fake or not.
[00:05:56] Camille Morhardt: So the next step with all this, because I’m sure it’s like you said, it’s the arms race, and so then the computers or AI will figure out how to fake the convergence of vision so that we can no longer find that out. They’ll fake the vertical scanning that humans do and we can no longer detect there, and then they’ll figure out how to make sure the pupil or iris aren’t changing in size as much as they’re doing now, which is the way that we catch them. Ultimately, are we going to need to be looking at actually an individual human and saying, “Okay, this is your heart rate, so we know that you are the one speaking,” or, “This is your gaze as opposed to just a generic human gaze.”
[00:06:34] Ilke Demir: Person identification is a completely broad research topic. We haven’t invested our resources yet. We are just looking at the problem as fake or real detection. For heart rate, it is very unique to humans. So if you have the actual heart rate that is measured, that heart rate and how it is changing can be a strong signal to identify a person, identify that it is Ilke that is talking, but finding that heart rate as exactly from the video is not really possible, because there are so many things that are changing.
The camera parameters may add something, the illumination may add something, something passing by my window, creating a shadow on me may affect the PPG signal. So exactly finding that unique signature per person from video is very hard.
[00:07:26] Camille Morhardt: Okay. So you’re not looking into water markings, for example, an individual human, or identifying an individual human and then validating that that is the person whose video it is?
[00:07:38] Ilke Demir: Yeah, no, we are not doing that. We are finding that real humans collectively have PPG signals that is consistent on their faces, basically.
[00:07:47] Camille Morhardt: Okay. I have heard some people claim that … did you see the movie Maverick, the new one that came out?
[00:07:53] Ilke Demir: Not yet. Sorry.
[00:07:55] Camille Morhardt: Oh, okay. Well it’s got Tom Cruise and a bunch of folks in it, but I’ve heard it claimed that, well, we’re going to find out later that that was all generated video of actors, and I’m just wondering now, this is just sort of a wild claim right now, but it just makes me think, is that something that we’re likely to see coming? Could we generate humans that we recognize for as long term as a film?
[00:08:23] Ilke Demir: I love that question because we are not only doing deep fake detection, we are also doing responsible generation. First of all, I want to just give one remark I think I don’t want to see Tom Cruise anymore because he has so many deep fakes and we have been looking at all of those deep fakes so much. So generation of deep fakes is a huge topic and we want to do it responsibly and it is possible to create a whole movie just by deep fakes if we have enough reference images and videos of that person.
Not for 2D movies, but actually for 3D, we had those 3D productions in Intel and one of them is an AR series. So for that AR series, because of COVID, the actor couldn’t come to the studio, to the huge studio for romantic capture. And we said, okay, take a video of yourself at home with the script so that we can actually make a 3D deep fake of you using the 2D footage that you give us and using the earlier 3D footages that we reconstructed of him, and we actually did that face retargeting, which is taking the mouth, hand gestures, facial gestures, et cetera, from 2D video and applying it to the 3D capture of him that mimics him in 3D.
So if we did this for this that little AI series very quickly, then it is definitely possible to do it in 2D, which is a little bit easier to do for the whole movie.
[00:09:49] Camille Morhardt: So do you have any concerns about this? Just generally when you think about deep fake, what do you tell people to worry about? Or is it sort of shrug your shoulders? And what kinds of things are people looking at moving forward? I mean, we have ideas of verifying our identity for things, maybe that we’re posting. We used to think of video as a way to verify we actually said it. What kinds of directions is the world looking at taking to verify identities?
[00:10:22] Ilke Demir: The whole detection research actually emerged from that question. So our presence is pushed more from physical presence to digital presence, and we have all those passports, IDs, everything to verify our physical presence but not our digital presence. There are some biometrics that are going, like fingerprints, et cetera, retinal scans, but they are a little bit more high level. It’s not used for our video like this. So we are trying to implement deep fake detection for that because we have seen in news and in many real world cases that deep fakes are used for political misinformation, for forgery, for fake court evidences, for adult content, and for all of them we need some verification purposes, we need some authentication purposes, and deep fake detection is one of them.
It doesn’t exactly say that, “Okay, you are Camille, okay, I am Ilke.” But it says that this is a real human, and most of the deep fake approaches are trying to impersonate someone one to one. And in that case it is easy to say if it is fake, then it is not that person. We are also developing other approaches about how we can create responsible deep fakes, how we can enable that creative process of creating synthetic humans, creating digital humans, in a way that it is responsible and it’s not impersonating someone one-to-one.
[00:11:50] Camille Morhardt: So the main concern is that you’re treading on an actual person’s identity, versus that somebody can’t tell the difference between, say, a fake actor or a real actor in a piece of art?
[00:12:04] Ilke Demir: Yes, we want to distinguish fakes from reals before going to the identity reveal.
[00:12:13] Camille Morhardt: What’s sort of your biggest concern around deep fakes that are out there?
[00:00:00] Ilke Demir: Recently there was a video on social media platforms of Ukraine’s President Zelenskyy giving misinformation about the Russian invasion of Ukraine. And so instead of that fake video not being uploaded to the platform or not being marked, the platform waited for it to be reported as fake. Everyone went like, “Oh, this is not real, this is not real, this is fake, this is fake.” And then something was done.
But just put yourself in the place of those people inside the war, inside the invasion, but they don’t have that, “Oh, it might be fake” mindset. They are like, “Something is coming, new information is coming, we need to believe he’s saying that, et cetera.” So in that emergency situation, you don’t have that critic eye that is looking for fakes, and instead of that whole situation, if there was a deep fake detector in the ingestion step of that platform, then it would have at least given a confidence metric, a check mark saying that, “Okay, we believe that with 80% accuracy, this is a fake video that didn’t happen.”
So this is just the beginning and especially for elections, especially for defaming purposes, et cetera, deep fakes are really going there. There Are certain individuals that are really affected from those consequences and we don’t want that to happen.
[00:13:40] Camille Morhardt: Is that kind of the extent of where you think it’s headed, that there will be … we’ll just say bad actors, I guess, out there impersonating people and putting things in their likeness that aren’t true? Or is there some other kind of place that this could go?
[00:13:57] Ilke Demir: Well, that is the immediate case. We have heard that because of audio deep fake, some CEO was forced to give millions of dollars away to someone, but that was just a deep fake, it wasn’t real cetera. So those are only the immediate steps. But as those cases increase and increase and increase, at some point it will emerge in a place that no one will believe everything. Even if someone is going out there and saying the truth with all the authenticity, people will say, “Oh, that’s probably a deep fake, oh, I won’t believe it.” Or people that trust each other will share deep fakes unknowingly, and that will break the trust between those people.
So all of these scenarios in an accumulated way is going towards really dystopian future, where there’s social erosion of trust. And that social erosion of trust is not only affecting the future of media, future of digital personas, it’s affecting the future of us as our culture. Our trust is degrading, things that we see are degrading, all of these combined in a … where you want to be heard, where you want to be seen, you won’t, because everyone will think that it is a fake, or everyone will lose the faith in videos, in digital formats.
[00:15:18] Camille Morhardt: So will there be ultimately then some kind of movement toward establishing provenance when videos are made or by provenance, I mean the origin can be proved somehow or tested somehow as the true source?
[00:15:32] Ilke Demir: Exactly. You’re just on that point. I was about to say that. So of course there’s detection as a short term, but for long term there’s media provenance research that is going on and media provenance is knowing how a piece of media was created, who created it, why it was created, was it created with consent? Then throughout the life of media, was there any edits? Who made the edits? Was edits allowed? All of the life of media and what happened to it, will be stored in that provenance information and because of that provenance information, we will be able to believe what we see, saying that, “Okay, we know the source, we know the edited story, et cetera, so this is a legit piece of media, which is original or fake,” because there are so many creative people like visual artists, like studios, and those have been creating synthetic media and synthetic data through their lives.
So we want to also enable that. For that purpose there is currently a coalition, C2PA, a Coalition for Content Protection and Authentication. That’s a coalition for media provenance with Intel and Adobe and Microsoft and ARM several other companies, where they put together all the beautiful minds of all those people to create open technical standards, to create policies around content generation, media provenance, and how we can enable and protect the trust in media all at the same time. And hopefully our future research is also following in that direction. Sorry, I cut-
[00:17:02] Camille Morhardt: No, no. I’m just wondering, as we move toward attesting to people’s identities, like you’re saying or provenance of media, are we going to then have to rely on some sort of central authority to kind of check that this is verified or are we going to be able to do that in a distributed way or will it be a hybrid?
[00:17:26] Ilke Demir: There are several ideas about that, but our research is to actually make media live on itself. So the authenticity information should inherently be embedded in the media itself. Like watermarks that have been used in-
[00:17:44] Camille Morhardt: In the file, you mean?
[00:17:46] Ilke Demir: Yeah, in the file, but in the file in a way that it is protected and not changed, right? So we don’t want someone to stamp it or okay it as the higher up. We want the media to be self-explanatory and self-authentication enabled. So for example, if we have all of these adversarial networks that are creating very nice synthetic media, can we actually embed the authenticity information and provenance information inside that media, so when it is rendered or when it is consumed or when it is downloaded, the authenticity information is decoded and it gives the information.
If it is a fake piece of video or unauthentic version, then the decoder will say that, well, this is not the key, this is not how it was created. So this is not it. Our research will be more focused on that. But there are other crypto-based systems or blockchain systems that are doing that authentication and verification of generated media.
[00:18:48] Camille Morhardt: Is what you were talking about just before, the crypto systems, is that hardware based or software based or both?
[00:18:55] Ilke Demir: Both. So ideally if we have all the camera manufacturers in the world come together and decide that, “Okay, we need to do this hardware based authentication for all the photos that are taken in the world,” that would be a hardware based solution and it would be a long solution. But of course that is a maybe too long term of a solution that we cannot get all the camera manufacturers in the world, right?
So we need to have software based solutions. Now, why do you put those software-based solutions in the life of a media, right? You can try to put it at consumption, but it’s too late. At consumption, it was already created, edited, done something. On consumption level, we only can do detection, and we can also do source detection. So for synthetic data, for synthetic video, we publish some work for protecting the source generated model of a video so that we can say, “Okay, this was created by Face Map, this was created by face to face, et cetera.”
So that is one little information about provenance that we can give at the consumption time. Now, if we go back to one step before, which is probably the editing time. In the editing time, maybe some of the software editing tools can embed some certificates, some signatures inside the data so that it will be at least known that it was edited by the software. The creation is unknown, but editing is known. And again, if we go back and back, if it was a synthetic media that is created by a software, we can use those authenticator integrated GANs that I talked about. Or if it’s hardware, then we still need something in between to anonymize all of those representations.
[00:20:40] Camille Morhardt: So should artists and politicians and other people who are in media often pay close attention now to what kinds of contracts they’re signing in terms of what digital content rights they retain? Is that kind of a hot topic right now?
[00:20:58] Ilke Demir: Absolutely. I don’t know whether you have seen that there was a news article saying that Bruce Willis gave away his deep fake rights to some company and it was like, “Whoa, what? They can do a whole Bruce Willis movie without Bruce Willis.” Then there was, a few days ago there was another news article saying that, “Well, that information was fake. Bruce Willis never signed a deep fake contract.” So I think people are, especially in those cases, people are looking at more and more. I still wonder, all of those Tom Cruise videos, does anyone pay for Tom Cruise’s likeness in that video if that video is going viral and they are making money off from that video? Or is there any revenue to Tom Cruise because his face is used? I don’t know. So the laws and policies around deep fakes are emerging currently from different parts of the world, from governments.
I think in US, deep fakes are still under fair use because they are different domains and for entertainment. So they don’t need to be … I don’t want to give wrong information about that. We actually had a nice collaboration in UCLA with some legal people that are working on that topic, so I would definitely refer those questions to them. But basically that’s the current landscape right now.
[00:22:18] Camille Morhardt: So, last question for you is what advice would you give all of us who don’t have access to real time deep fake detectors? What are we supposed to do right now when we see things?
[00:22:29] Ilke Demir: Of course, I won’t say send them to me. We don’t have that much capacity. So anytime we see a viral video, we are like, “Okay, let’s run Fake Catcher,” and it catches. But yeah, of course we are having some conversations with different companies about how we can present this to the users, open it to public, or at least they can use it in their workflow whenever they have to verify third part information or whenever some platforms are encountering those fake videos. So hopefully everyone at some point will have reach to those and if you want to be one of those enablers, then reach out to us.
[00:23:12] Camille Morhardt: So what should I tell my kids then when they’re watching internet videos? How should I let them know or how should I prepare them?
[00:23:22] Ilke Demir: I would say don’t believe everything you see digitally. It may be incorrect or it may be on purpose to deceive you. Hopefully there will be some online tools that they can actually consult to when they see some suspicious videos. So yeah, keep an open eye for things to not be correct, not be real.
[00:23:45] Camille Morhardt: And it just finally, what is Fake Catcher before we sign off?
[00:23:49] Ilke Demir: Fake Catcher is the deep fake detection solution that is created by me and my collaborator, Umur Aybars Ciftci, to catch deep fakes based on heart rates. So based on PPG signals and how our veins are changing color based on our heart rate, we can find that deep fakes are fake, and real humans are real.
[00:24:16] Camille Morhardt: Ilke Demir is Senior Staff Research Scientist at Intel Labs. Thank you very much Ilke for your time. I really appreciate it.
[00:24:24] Ilke Demir: Thank you. Thank you.