How an A.I. ‘Cat-And-Mouse Game’ Generates Believable Fake Photos

oillio · on Jan 4, 2018

Good article and great tech. However, I don't know if I believe the results are as good as they claim. Many of the pictures look a bit off to me, like they all have dead eyes. Maybe celebrities generally look like that anyway, so it is being true to form. :)

In particular, I think this guy is missing a pretty significant part of his head: https://static01.nyt.com/newsgraphics/2017/12/26/ai-faces/8e...

visarga · on Jan 4, 2018

The inventor of GANs, which have been considered the most interesting idea in ML in the last decade, is Ian Goodfellow. I met him on reddit a few years ago. I was supposed to get private ML tutoring from him, just around the time Andrew Ng opened the first Coursera course. I didn't get lessons because I gave up and eventually took the MOOC. But it's amazing to know we share the same forums and sometimes exchange a comment or two.

The great idea about GANs is that they replace one of the most hard to understand parts of a neural net - the "Loss function" - with another neural net, thus making the loss function learnable. This opens up the door for a kind of unsupervised learning that was impossible to make work before. GANs are very very important also because they are almost like reinforcement learning (actor + critic = RL, generator + discriminator = GAN), and RL is supposed to be the way to AGI.

The most famous problem of GANs is instability during training and mode collapse - which is like a student learning especially for an exam (and not in general) thus optimising for the test instead of the real thing.

amenod · on Jan 4, 2018

> The most famous problem of GANs is instability during training and mode collapse - which is like a student learning especially for an exam (and not in general) thus optimising for the test instead of the real thing.

I must confess I haven't worker with GANs yet, but isn't that the whole point of GANs? Student is optimising for the test while the teacher is learning how to make tests as similar to reality as possible?

If I understand correctly, the main challenge is finding a way to allow teacher and student (well, generator and adversary) to learn at a similar rate, so that one doesn't stop learning because its competitor is too advanced. Is that correct?

vinn124 · on Jan 4, 2018

> but isn't that the whole point of GANs?

not quite, but youre on the right path.

think about it this way: you (the generative model) are trying to predict a unit gaussian, which is just a fancy way to say bell curve. you get +1 if you predict a number in this distribution (eg 0.1 or -0.5, which is within one standard deviation of the mean of 0); you get -1 if you predict a number thats "far" from this distribution (something like 40 - which has an infinitesimally low probability of being drawn from a unit gaussian).

mode collapse, then, is when you predict 0 all the time. yes, you are technically correct but youve failed to learn the true distribution.

obviously ive simplified this quite a bit and have anthropomorphize the model, but i hope you get the gist. otherwise, the [original paper](https://arxiv.org/abs/1406.2661) is refreshingly easy to read.

amenod · on Jan 6, 2018

Thanks!

bitL · on Jan 4, 2018

> private ML tutoring from him

> I didn't get lessons because I gave up and eventually took the MOOC

Udacity still didn't get him onboard. I took DLF ND because of the tutoring they promised, did GANs as my first project to be in the queue, then graduated later still with no mentoring sessions. So you didn't miss anything by dropping out. How were Ng's new lessons? Worth taking it if I did DLF + fast.ai already?

BTW, GANs main use might be allowing almost fully unsupervised learning by extending small datasets with believable data.

taneq · on Jan 5, 2018

> BTW, GANs main use might be allowing almost fully unsupervised learning by extending small datasets with believable data.

I've wondered if dreams are basically this. Your brain uses its world-model-prediction subsystem to generate plausible inputs against which to train its action-generation-policy subsystem. Then, in real life, the action-generation-policy subsystem can react much more appropriately and quickly to real events.

Also, toddlers' stream-of-consciousness babbling when they first start talking. They narrate everything and more than once I've wondered if it's essentially them generating their own verbal training data. When they start talking to themselves their pronunciation, grammar etc. start improving much more rapidly.

claytonjy · on Jan 4, 2018

> RL is supposed to be the way to AGI

Could you expand on that? The more I read from folks like LeCunn & Chollet seem to disagree strongly. Just this week Yan posted about unsupervised modeling (with or without DL) to be the next path forward, and described RL as essentially a roundabout way of doing supervised learning.

bitL · on Jan 5, 2018

RL/DRL assumes world is Markovian, i.e. past doesn't matter between two states, which is way too simple. It requires huge amount of tries/episodes and properly tuned exploration-exploitation ratio. It is somewhat based on biological reinforcement learning, so there might be basis in reality as it is with convolutional neural networks and visual field maps in visual cortex (even if very rough approximation). DRL is the technique that allows modeling decisions; so for predictions you have CNN/RNN/FCN, for generation GANs and for decisions DRL; together they are closest to AGI we have right now.

halflings · on Jan 5, 2018

> RL/DRL assumes world is Markovian, i.e. past doesn't matter between two states, which is way too simple.

There's plenty of RL papers using RNNs and some types of memory networks.

bitL · on Jan 5, 2018

Likely as value function approximators for one piece of the whole algorithm (as is the case with DQN/DDQN). However the main algorithm is likely using variation of Bellman equation, that assumes Markovian property and gives strong guarantees about convergence.

gwern · on Jan 5, 2018

If you're using DQN or pretty much anything in DRL, you don't have any guarantees about convergence in the first place, and using a RNN does give you the history summary you need (at least up to the minimum error achievable with that fixed-length summary, not that that is any more likely to converge than the overall DRL algo is).

bitL · on Jan 5, 2018

I meant that under Markovian assumption value iteration used for Bellman equation is guaranteed to converge. So it makes math people happy, even if such property doesn't hold in the real world nor in the problem they try to solve, and the "deep" in DRL is just heuristics, though surprisingly working in many cases.

vinn124 · on Jan 5, 2018

that is true: popular rl techniques (eg policy gradients) are very similar to "vanilla" supervised learning techniques and architectures, but they are unsupervised in the sense that they required zero human input.

alphago zero is the canonical example of tabula rasa machine learning.

tboyd47 · on Jan 4, 2018

Even better: https://static01.nyt.com/newsgraphics/2017/12/26/ai-faces/8e...

That's one heck of a receding hairline, meaning receding out of the plane of existence.

delecti · on Jan 4, 2018

You were primed to be looking for flaws by the nature of the article. It wouldn't be hard to come up with a context where each and every one of the 3x3 grid of pictures in that article was accepted at face value.

Retric · on Jan 4, 2018

They might work as thumbnails, but these are terrible when blown up to full size. When given both images I was trying to find one that might be real thinking it could be some freaky filter or something. And I still had a 'these are terrible fakes feeling.'

Even the 'best' headline image fails as the eyes are not the same size and the rest of the face just looks off.

delecti · on Jan 4, 2018

Did you even read my comment? They're not perfect, but you were expecting them to be fake. Someone not told there would be computer generated images would be considerably easier to fool.

Also, probably the bigger risk is not that you'll be shown an entirely fabricated image, but rather that someone could convincingly be inserted into an existing image.

Retric · on Jan 5, 2018

I was not thinking about fake images when looking at the article this was pure instinctive revulsion. It's easier to avoid the uncanny valley with pictures than motion, but some of theses fall deep into it and many others don't even make it that far.

GuB-42 · on Jan 4, 2018

In most cases hair gave it away for me, especially at the contours. The picture you linked is an extreme example.

However, give it to a good Photoshop artist, like most celebrity pictures are, and I'm sure these issues will be fixed in no time.

alergico · on Jan 4, 2018

Most likes I ever got on OKCupid was when I used a GAN-generated celebrity as my profile picture. Just one girl noticed something wasn't quite right.

ouid · on Jan 4, 2018

The test of believability they give in the article is also bullshit. Both of the options are fakes, they both even look like fakes. Her hair and forehead doesn't make sense, his mouth and ears don't make sense. They don't match it up against a real picture because people's performance on that task would contradict the headline.

vinn124 · on Jan 4, 2018

> The test of believability yep! that is the fundamental limitation of adversarial networks. theres no good measure or "loss", as it's highly subjective.

amenod · on Jan 4, 2018

And the ears are not "compatible" either. Could this be the reason they prefer women's faces (with long hair which covers ears) in the article?

chuckdries · on Jan 4, 2018

https://static01.nyt.com/newsgraphics/2017/12/26/ai-faces/8e...

yummybear · on Jan 4, 2018

Several of them seem to be using many features from specific celebrities. It may just be me, but there is a very strong similarity to Paul Walker, Liv Tyler, Michael Douglas and Adam Sandler in some of these. I don't know if it's a result of overfitting?

unsined · on Jan 5, 2018

To me the fine details are incongruent: the grain of the hair sporadically changes direction, patches of skin have different qualities. It looks like a bit of Frankenstein's work.

atourgates · on Jan 4, 2018

The hair was the giveaway to me. I stared at the two "which one is real" images for a couple minutes thinking, "They both have that fake looking wavy hair, I thought for sure neither was real."

Cheap trick NYTimes. Cheap.

nerfhammer · on Jan 4, 2018

look at their foreheads, there's still like blurry half-generated wavey hair texture on their skin.

minimaxir · on Jan 4, 2018

For those looking for a well-commented you-don't-need-a-PhD-to-understand implementation of GANs + variants (using Keras), I recommend the examples in this repo: https://github.com/eriklindernoren/Keras-GAN

coldcode · on Jan 4, 2018

Scary to think where this will be in 10 years. Perhaps even video evidence will be hard to believe any more. How do you convict someone if this technology is mature?

eat_veggies · on Jan 4, 2018

There was a post here a few days ago on amateurs (people who had never even touched python before, let alone tensorflow) who were using deep learning to generate fake celebrity porn. The results are actually pretty believable:

https://news.ycombinator.com/item?id=16040463

i_cant_speel · on Jan 4, 2018

Is this safe to view at work?

XOPJ · on Jan 4, 2018

Yes, first link is to a 4 day old post on HN, which links out to another SFW writeup. The title of the article (AI-assisted fake porn are being used by people on Reddit for self-completion) is worth keeping off the monitor if you don't want someone to quickly scan the word 'porn' on your monitor, but there are no NSFW photos or content.

johnkpaul · on Jan 4, 2018

There's a whole radiolab series about this called the Future of Fake News http://futureoffakenews.com/

adventured · on Jan 4, 2018

And or how do you prove someone's innocence if you can generate a believable, fake, crime video.

Supporting counter evidence will become that much more important.

Grasshoppeh · on Jan 4, 2018

You might be able to use this same technology to counteract this from happening.

If you generate content you would have a base to test an AI to spot actual fake content. You could use video's and pictures like these to test a learning AI to spot discrepancies, then report findings in detail.

Makes me wonder if there is a future in forensics for this type of technology.

andrewla · on Jan 4, 2018

This network was trained using an adversarial approach. What that means is that a second network that does exactly what you say was used to train the first.

They kept training until they created images that could reliably fool the discriminator. A more powerful discriminator would just be used to create better fakes.

vkou · on Jan 4, 2018

If there were a government conspiracy to put you away, most of the time, they wouldn't need fake video evidence to convict you - just a bit of prejury by officer witnesses.

matrixtransform · on Jan 4, 2018

There's going to need to be some kind of blockchain-style tech that allows the source and veracity of video to be determined.

andrewla · on Jan 4, 2018

A blockchain in this case would just be a timestamping service, to prove the minimum time that elapsed since the image existed. That’s only useful if the time to create a fake is significantly more than the resolution of the chain - 10 minutes for bitcoin, 15 seconds for ethereum.

But that I only makes sense if you know ahead of time that the information will be valuable, and it only proves the age if there is sufficient hash power on the network, so a “private” blockchain would not be viable.

ForRealsies · on Jan 4, 2018

Imagine how this could be misused to start wars-

https://www.youtube.com/watch?v=uIvvHwFSZHs

mhb · on Jan 4, 2018

Heinlein's fair witness?

jacquesm · on Jan 4, 2018

This side of the house is white.

logfromblammo · on Jan 4, 2018

You cannot infer that the type of structure supporting the visible surface is a house, or part of one.

jacquesm · on Jan 4, 2018

By that scheme you can not infer it is a surface either. But that was the example given in the book.

sago · on Jan 4, 2018

I don't understand the images associated with that article. They purport to show the progressive refinement of the output over a series of days. But the figure changes dramatically from image to image, all the way to the end of the run.

At the very least it seems the output is not stable: a human has to decide when to stop the Wheel of Fortune. It looks more like a series of images taken from different training sets or parameters, for the NNs I'm used to.

Caveat: I've done a lot of ML, but not GANs specifically. Is this common? How do you solve the 'where to stop' problem if the output is so unstable?

gwern · on Jan 5, 2018

> They purport to show the progressive refinement of the output over a series of days. But the figure changes dramatically from image to image, all the way to the end of the run.

What they are probably doing is showing snapshots of the same noise vector (==random seed) for various epoches. Since the mapping of noise vector ~> face is totally arbitrary, the ProGAN is free to vary it as it pleases; thus, some but not perfect stability. I saw the same thing in messing around with anime GANs: a fixed set of noise vectors would show the anime faces change eye or hair color etc.

> At the very least it seems the output is not stable: a human has to decide when to stop the Wheel of Fortune.

Yeah, you can't do principled early stopping with GANs, really, because there's no held-out set and the loss is changing. I always ran until it diverged or I became impatient, and similarly with ProGAN: they ran as long as they could (takes like a week on big GPUs). To some extent, if you're using Wasserstein losses, the discriminator loss is supposed to be meaningful as a kind of absolute distance between the true image distribution and the generator distribution so you can do early stopping like 'stop if no improvement for 3 epochs'. (This is just in the pure generative approach; if you're using GANs for a semi-supervised application, presumably you can do early stopping as usual based on whatever you have held-out.)

pk78 · on Jan 4, 2018

I am not sure if "unstable" is the word I would use. Sure, even after training for days the GAN produces not-so-realistic images, but the rate at which it generates those images gradually decreases over the training period and the images get more "realistic".

>How do you solve the 'where to stop' problem if the output is so unstable?

Looking at the discriminator loss would be a good start for that.

sago · on Jan 5, 2018

It's not the quality I was referring to. Look at the main image sequence. The images from 0 to, say, Day 5 show the kind of progressive refinement I expected: the network is improving its image over time. Each image is a refinement of the previous.

But compare the images from Day 5 the end. Eye colour is changing and then changing back. As is the background. And the hair colour. The position of the parting. Whether the mouth is closed or showing teeth. Day 16 is not an intermediate point between Day 9 and Day 18.

If it runs for another couple of days, would we get another version like Day 16?

That's what I mean by instability.

pk78 · on Jan 5, 2018

Ah, I understand what you are saying. The instabilities could be explained by the batches sampled during those training days and the generator's input. Training a GAN is not very straightforward and even minor changes in batch sampling could produce vastly different generated images.

IIAOPSW · on Jan 4, 2018

One application I imagine for this is a future of game development similar to the movie inception where an "architect" designs the layout and setting and then all the details are filled in ad hoc by the computers "imagination".

Today its faces that feel familiar but aren't real. Tomorrow its whole cities that feel familiar but aren't real. The cities are filled with people you swear you've seen before. Perhaps the details are tailored to you personally based on the corpus of photos you've posted online.

hellbanner · on Jan 4, 2018

This is happening. https://www.youtube.com/watch?v=1Ea57XERywM&index=6&list=PLc...

"Narrative Dungeon Design"

blaze33 · on Jan 4, 2018

The paper was published in October and is named: Progressive Growing of GANs for Improved Quality, Stability, and Variation http://research.nvidia.com/publication/2017-10_Progressive-G...

visarga · on Jan 4, 2018

And the one hour movie of celeb faces generated by the Progressive GAN (ProGAN) is here.

https://www.youtube.com/watch?v=36lE9tV9vm0

The most amazing AI video I have ever seen, actually. I spent hours staring into it - it works great as background for many pieces of music, you can think of it as the AI version of the burning log video.

saycheese · on Jan 4, 2018

>> “QUESTION: Look at the two photos below and see if you can figure out which person is real.”

>> “ANSWER: Sorry! This was a trick question. Both images were generated by computers.”

Not really a trick question when even if you know they’re both fake that the only way to be right (confirm you are right) is to be wrong.

schaefer · on Jan 4, 2018

I’d also like to comment on the “ha, fooled you!” tactic used in this article, where the authors asks the reader to choose the photo of the real person from two given photos and then reveals that, gasp, both are computer generated.

Whenever I run into this often-used tactic in papers and talks, I can’t help but feel – no, the author didn’t just convince me of their point. Instead they convinced me that they don’t value being trustworthy. Often I will just stop reading the article right then. Or if I do continue I will become unforgivingly skeptical of any claim that doesn’t provide a citation that is independently verifiable.

Use of the tactic feels particularly peculiar in an article which itself grasps towards the implications of a future in which photos and videos are no longer trustworthy, a future in which personal reputation will be more meaningful.

pluma · on Jan 4, 2018

Yeah, I thought both looked a tiny bit off. I think it has to do with the reflection in the eyes which is a tiny bit inconsistent, among other things.

disantlor · on Jan 4, 2018

maybe so (they fooled me) but you were already prepped to scrutinize them. To the point others have made, we’ll soon need to be constantly prepared to assume fakery.

the technology of fakery is rising the meet the “everything is fake news” moment

logfromblammo · on Jan 4, 2018

I immediately picked the right image, because I saw whisker stubble on the left, and I already knew that image-generation AIs seem to have a thing for painting whisker stubble all over anything even remotely resembling a male face.

Surprise! Guess I should have considered the possibility of a trick question.

m3kw9 · on Jan 4, 2018

So Cat and Mouse is layman’s term for Adversial network?

fjsolwmv · on Jan 4, 2018

No, is NYT's made up term. "Arms race" is the traditional term

ducampopinus · on Jan 4, 2018

Great overview. I especially appreciate that they linked straight to the paper instead of a popsci/buzzfeed regurgitation of the results.

fjsolwmv · on Jan 4, 2018

The faces are pretty good but the ears and craniums are awful, presumably because of a dependency on neighboring pixels and getting confused by diverse backround images. Why ruin their work by including he garbage parts in the presentation/claims? And why not leart foreground and background separately, and mask them together?

matte_black · on Jan 4, 2018

I would love this as a service for generating fake users.

hellbanner · on Jan 4, 2018

See also: https://news.ycombinator.com/item?id=16040463 -- fake porn faceswap generated by AI

lerie82 · on Jan 4, 2018

Excellent images and the more I think about what this could be used for the creepier it gets. One day, truly, software will be very dangerous.

ByThyGrace · on Jan 4, 2018

And it makes you wonder, what is the current legal framework preventing this kind of tool to be misused? How does it differ across countries/unions?

How can such a thing be enforced to begin with?

Are companies/labs/universities/individuals themselves the only thing standing between fair play and massive misuse of realistically generated media?

EGreg · on Jan 4, 2018

Mr. Hwang believes the technology will evolve into a kind of A.I. arms race pitting those trying to deceive against those trying to identify the deception.

That's like a chess game. We have seen AlphaGo and other MCTS implementations take the "trying to detect the deception" into account.

By the time the image is generated, it would have already been factored in.

visarga · on Jan 4, 2018

> the technology will evolve into a kind of A.I. arms race

Ha! Google Brain organised a competition on "Adversarial Attacks and Defenses" within the NIPS 2017 conference.

Reminds me of Harry Potter learning magic attack and defence arts at Hogwarts.

fjsolwmv · on Jan 4, 2018

Where does AlphaGo try to detect deception? What is deception in perfect information games?

floofyreal · on Jan 5, 2018

https://www.youtube.com/watch?v=XaQu7kkQBPc

Imagine automated system for danger recognition on for example airport. These kind of deception attacks could make problems with these systems. Imagine if suddenly 10,20,100 airports all around globe would recognize weapons, bombs or any other dangerous items? I can imagine panic and huge news headlines badmouthing AI.

People don't trust AI. These kind of errors could only prolong proper integration, which in many ways could enhance the way we live.

booleandilemma · on Jan 4, 2018

I’d be really worried if I was a photo model.

bcaulfield · on Jan 4, 2018

We've made good looking people obsolete.

oliv__ · on Jan 4, 2018

These are pretty damn good but it seems to me like the program is sort of over optimizing the pictures: in my mind, the pictures from the 5th to the 7th day are the most realistic.

After that, it feels to me like the "realness" slowly degrades.

hokkos · on Jan 4, 2018

Great PR but none of the faces look human, with the weird position of the nose, the strange curly thing around the hairs, there is still work to do to trick the most fundamental tool of the brain, recognize a fellow human face.

olivermarks · on Jan 4, 2018

“Believe nothing you hear, and only one half that you see.” ― Edgar Allan Poe

33W · on Jan 4, 2018

I found the Obama video at the end very interesting. It would be a neat next step to map non-Obama audio to the generated video. For example, pull audio from an Obama impersonator.

visarga · on Jan 4, 2018

> pull audio from an Obama impersonator

We can impersonate voices with neural nets. We can clone timbre and style, and this tech is being used commercially by Baidu at the very least (keyword: Deep Voice 3).

bobajeff · on Jan 4, 2018

Watching the progress of this system reminds me of that scene at the beginning of The Thing where The Thing almost but not quite mimicks one of the humans.

IgorPartola · on Jan 4, 2018

Very interesting. A this kind of tech something that is mere mortals can play with?