I'm probably being captain obvious here, but if this is what's being released for free, I wonder how much better a polished commercial version does, and when we reach the point where we can't trust anything we see anymore. It doesn't even have to be super perfect, even reaching the point where it takes experts about two weeks to determine if something's real or not might already be long enough to do great damage.
From a technical standpoint I think this is very impressive, and I'm also interested in creative/artsy use of this. Their "replace trees by houses" example is pretty dull but gives a good glimpse at what can be done.
AI can also be used to identify these fake videos, and they are probably better than human identification. There will be a rise of AI forensics.
The type of neural network they use (GAN) works by having two networks battling each other, one tries to generate fake, and another (discriminator) tries to identify fake, it's a constant arms race. As the generator gets better, the discriminator also gets better. Which means, if the fake video is this good, there must be a discriminator network that identifies fakes just as good.
We did a similar project using GAN, generating images from a text description. You can see the progression of generator and discriminator battling each other, and both get better with time.
This is how their model was trained, but I think what you've said may not quite be the case.
Because the discriminator (D) and generator (G) usually compete in a minimax game, the equilibrium probability of D correctly classifying an image as fake tends to 1/2 (ignoring distributional factors). If the competing networks have enough capacity and can be stably trained, then in theory they will reach equilibrium as the data distribution from G converges to the actual data distribution. If this is the case, then the discriminator correctly identifies fake videos with a probability of 1/2.
They may not reach equilibrium (making D > 0.5), but it's not clear that the discriminator itself is a panacea for identifying fake videos/images.
The whole GAN thing is useless though if neither of the models have a concept of symmetry or that those pixels do represent actual solid objects in space. Modern neural nets can't represent abstract spatial and semantical concepts or reason, so that videos are full of glaring perspective inconsistencies and large-scale artifacts. So nope, neither AI can help to "identify fake videos", nor do we need AI for that.
The problem is that it's a game of superhuman cat and mouse.
We'll have systems arguing with each other and no way to tell which is correct. If, for example, someone is able to get a copy of the forensic AI model, they can train their decoder/generator to work against it until the results pass as legitimate. With no human ability to argue with the results of the forensics AI, we'll just trust it and pass it off as truth.
If you can clone the jury and conduct a quadrillion private trials, your chance of success in court is going to increase substantially.
It could also lead to crimes like Blackmail becoming extinct. It would be hard to hold incriminating recordings of anyone over them if near-perfect audio and video synthesis was common.
Especially for public figures with lots training data available.
Yeah... The problem is try explaining deepfakes to your significant other when they are randomly sent what looks like a video of you cheating on them. Sure it’s possible but not likely.
Initially, every Tom, Dick & Harry script kiddie will try to blackmail people whilst the public and justice system is not really aware of the technology possible.
When eventually enough of the public is made aware to not trust photos and videos anymore then these types of blackmail attempts would be less effective. But there will still be less informed gullible targets.
Surely lawyers will know about the tech soon enough to simply show the same video evidence with the judge in place of the defendant -- reasonable doubt right there.
The blackmail angle I expect will work for a while. People falsely accused of sexual crimes can have their lives ruined by the accusation.
The justice system will hopefully catch up quickly.
But as you say the media (real and social), political propaganda, etc will exploit the uninformed masses for all it is worth, so blackmail will still work. For a while.
Someone's gonna come up with a simple way to sign pics and videos with public key encryption, after which signed media will be the only stuff that is trusted. Of course people will still make sex tapes and sign them when they are drunk because the signing software will be automatically integrated into the camera app, so blackmail is still a possibility.
But.. if the camera app automatically signs pics/vids, that would mean that a private key is available to the app without any passphrase (or one embedded in the app :O).. So why not just extract they key and sign your fake vid?
Well, for something like this the user's signing key itself would probably be managed by the OS, so not extractable. When the app is done editing, it asks the user via the OS to approve the result and the OS performs the signature. You could even embed a downsample of the original that came signed from the camera hardware.
If the key is managed by the OS but is inaccessible to the user, the concept would seem to be incompatible with free software operating systems. Also, if the camera app can have anything it "makes" approved, the app itself could take a deepfake video (from the web or device storage) and have the OS sign it.
The only way I see it working is if the key is "burnt-in" to in the camera hardware and any applications cannot MitM it.
Why would you assume it would be inaccessible to the user? I'd expect a key management interface at least. I guess I did say "not extractable", but I meant more that random apps don't have access to it directly, but they call an api to do the signing.
> the app itself could take a deepfake video and have the OS sign it
Note how I said "it asks the user via the OS to approve the result". I would expect a modal OS dialog to let the user review and approve the content before being signed and passed back to the app.
Thinking about it, there's actually nothing stopping this from happening on today's hardware using just application sandboxing. Substitute "OS" above with "Signing App" that does the same thing (accepts media signature requests from other apps, and opens dialog to request approval from user with a preview).
It was pretty limited though. For split seconds videos seemed real but the seams soon enough showed.
Fake celebrity porn has been on the internet since 1996 at the very least. It's always been crummy; but porn in general requires a thick suspension of disbelief and an intense focus in a partial object of desire (what Lacan calls the objet petit a) that blurs everything else.
Don't you think that a blockchain that works for anything other than a rather useless currency should be created before suggesting one for such a use? I see comments all the time about how we should use blockchain for this and that, and yet far simpler uses for blockchain haven't yet worked out.
A blockchain is actually a reasonable part of the solution here (just not nessesarily creating a new one). What a blockchain really does is decentralized timestamping. Usually you use it to combine it with a bit of cryptography to process flows of money, but in the same blockchain you can just write "I know hash XXXX", and if later you produce a work with that hash you can prove that it already existed at the point your message was written in the blockchain.
That's not all you would need for verification, but it is a big help.
That would require an entire team's worth of skill, preparation and work. This could be done with a few already existing medias, and a single person behind a computer.
"Starting now"? How's about 5 to 10 years ago. This is being released free now, which indicates to me this is now disposable tech and the authors have much, much better in their labs.
Probably not; the GAN technique wasn't even published until 2014, and this is based on other work that's been done even more recently. The field moves fast and is more open than you might expect (because it's still so academic).
If you compare OSS or free software to commercial software generally, I don’t think there are that many massive gaps. It’s mostly polish and small incremental improvements, but the underlying tech is mostly the same. Why would that be different in this case?
The gaps are massive in several electrical/silicon CAD verticals and simulation. Due to IP secrecy, no OSS Verilog/VHDL synthesis alternative exists for Vivado/ISE/Quartus and the Intel/Xilinx line of FPGAs. I don't think any practical alternatives exist for Mentor Graphics' line of silicon design or simulation software, nor have I seen any OS software capable of complex mixed domain simulations like COMSOL or Ansys - many of the pieces exist, but it takes a lot of work to verify that algorithmic physical models actually work together.
On average I agree they are on par. Sometimes OSS is better (ffmpeg), sometimes commercial stuff (adobe after effects).
In this specific case I think it might be beneficial if you can spend a lot of money on gathering training material and tweaking the network. But then again someone here mentioned the quality 4chans fake porn has reached, so maybe I'm wrong after all.
True but now all that stuff is pretty much produced in the private sector by government contracts with these big companies. DARPA doesn’t have better scientists than MIT and there are a lot of extremely intelligent scientists who have moral issues about working for the military.
There is no competitor, proprietary or open, that comes close to Excel. It's been relentlessly,
extensively polished for years and years, and keeps gaining new features every year.
And this sticking to the spreadsheet concept, which is very limiting.
---
Contrast for example Tableau -- it's a great idea and generated a lot of enthusiasm for a while, but never quite took off as an office package one needs to have. The normal awkwardness of its first versions is still there; they don't have the deep <whatever> that the Excel team has.
Tableau is great, but it has a much narrower use case: given one or more tables of data, generate graphs for presentation or for exploring the dataset.
In comparison, Excel can do that too (just worse), but it can also solve equations, do your company's bookkeeping, and pretty much every other task that relies mostly on numbers.
I would argue Open/LibreOffice Calc comes fairly close to Excel if you ignore the worse user interface (which is fair in the original assertion that it's "mostly polish and small incremental improvements")
The vast bulk of Adobe's advantage is in UX, not technical algorithms. Which makes perfect sense because that tends to be the case with most F/OSS software—technically brilliant but with an face only a programmer could love.
Yes, Adobe do have some remarkable algorithms that would be difficult to replicate (e.g. heal brush and content aware fill) but these are a small minority of Adobe's software advantage.
The one that irritates me the most is vector drawing programs: open source programs (and even paid competitors) just can't touch Adobe Illustrator for the sort of work I do. I'm sure at least 50 percent of it is familiarity and muscle memory, but I've desperately tried switching to a few different options like Inkscape or Affinity and left wildly disappointed.
You're right that UX is one of the biggest problems they have. One thing that is also hard to replicate it how well Adobe's software works together. Embedding smart objects and illustrator files in photoshop documents, right clicking a clip in Premiere and sending it to After Affects and back again without rendering an intermediate file etc.
I would be interested in a Lightroom alternative if anyone can recommend one though.
Do you use the Astute plugins for AI, or the native pen tool? Affinity feels different, maybe less precise, but the functionality was way better compared with the native Adobe tool. You should try Figmas pen tool, I like it.
My uses are relatively trivial- logo design, SVG generation, basic layout work and PDF tinkering. My main need is fine control of beziers with auto-guides to ensure consistency.
I meant something a little different. Any random dude with a random mid-level PC can download the software and produce amazing results without any special hardware or knowledge. I recommend you to try it out for yourself.
Was there a huge progress made since news about deepfakes broke early this year?
Around March/April this year, I actually did download the TensorFlow toolkit for face transfer that was used by /r/deepfakes people and tested it out (there were samples of photos of politicians included); the results were, at best, worse that I could do in 2 minutes in Gimp. Maybe they could get better if I had an expensive GPU farm at my disposal, but I'm pretty doubtful - given that the news died down pretty quickly, and no reasonable-quality faked pictures or videos were reported ever since.
It's more about experimenting with your training data and other configuration. What I've heard, getting great results takes time - but it's possible. Most of the focus of the community is on porn, so I can imagine not so many journalists are checking the newest results and reporting about them.
Probably Nvidia is not interesting in this type of tools and this is more like "hey, look what you can do with a lot of Nvidia cards, buy us a lot of them and you will do this and much more"
Yeah but it was already the case with CGI. IT's true that it's going to be easier and easier to do fake porn, fake speeches, fake voice recording, fake vides....
From the paper:
"we have to use all the GPUs in DGX1 (8 V100 GPUs, each with 16GB memory) for training. We distribute the generator computation task to 4 GPUs and the discriminator computation task to the other 4 GPUs. Training takes ∼10 days for 2K resolution."
As I don't have a DGX1 here, training the 2K resolution net for 10 days on a p3.16xlarge instance (also has 8 V100 GPUs) would cost USD 5875 on AWS.
(USD24.48 per hour on-demand pricing * 24 hours/day * 10 days)
The DXG-1 costs $129,000 so AWS is cheaper unless you need to do it 22 times. And you can have multiple instances running at once and get all of your results in ten days, instead of waiting ten days again for each run.
Well plus electricity. A DGX-1 takes 4 kilowatts or so that 10 day training run will take just about a megawatt hour or about $100 at retail. So the cross over point is more like 23 runs :)
DGX 1V has $7500 worth of CPU alone to feed the gpus. Throw in 8 TB of nvme ssd for training data and you're looking at something more like 1/5th the cost.
V100 has ~50% higher memory bandwidth than 2080Ti, so you probably wouldn't get 80% of the performance. Also, only two 2080Ti can be connected via nvlink.
Their GitHub README says 24GB, and 12/16 GB requires cropping and performance not guaranteed. I’ve only seen P100 with 16 each, and its the big Quadros that have 24
Seeing the example of one facial pose video transcribed to three different looking women, I'm imagining a future where Netflix does a/b testing on its shows, using similar tech to swap out different "actors" to find which one resonates with audiences best.
They could even generate a new "cast" for each market, after only shooting the show once.
- Porn, yeah, first application you can think of, there are already some startups doing it.
- Doubling actors, and applied to sound, maybe you could translate from one language to another but kind of keeping the accent and tone.
- Propaganda and misinformation. Now you can get your enemy to say and do whatever you want, on video.
- Photo-realistic games. Create a rough 3D model of an scenario and train the AI for it. Instead of photo-realistic rendering with math, render it with the AI based on a rough render, in real-time.
> Photo-realistic games. Create a rough 3D model of an scenario and train the AI for it. Instead of photo-realistic rendering with math, render it with the AI based on a rough render, in real-time.
According to last month's nvidia rtx presentation/launch event [1], they are going to do something similar quite soon. Games will ship with DNN pre-trained offline on extremely high quality renderings. Game itself renders at lower resolution (limited by performance needed for proper raytracing) and uses DNN to upscale it.
I wonder, since the NN cores of the GPU are used for real-time raytracing, will they be able to run custom NNs possibly not related to visual stuff in parallel to the ray-tracing stuff?
Edit : found the answer on Internet, apparently the RT (raytracing) cores are different and separated from the Tensor (NN) cores on the RTX
any ideas about porn startups that work on anything related to this?
i know pornhub and their network banned deepfakes and there’s also some work being done on detecting deepfakes.
I am a bit surprised how shallow the comments on this one are.
Look closely, while it does generate videos of a passing similarity, they aren't "photorealistic" in the slightest. They are good locally across time and space domains, but globally they are as far from realism as Doom 2 was.
The only explanation for the attitude I see in this submission is that most IT people trained themselves to spot CGI by looking at local artifacts, assuming that global artifacts won't happen because stuff on the scene is reasonable. There is no "stuff on the scene" with those videos, it's just mindless vector manipulation with no underlying world model. Cars wave around, trees grow a feet from each other and behave in a way incompatible with 3D perspective.
Relax, it'll require at least another AI/ML revolution (or even several) to achieve photorealism.
What media would someone collect now to be used in the future to reproduce the likeness of loved ones? Video clips of them moving? Talking? Different poses of pictures? Reading the dictionary out loud to get vocal patterns?
Heck with impersonating the POTUS. What about a lost friend, sibling or parent?
Yeah. https://www.youtube.com/watch?v=fkE6RBlfbXA This isn't the worst one, there was a time when PKD's daughter interviewed the head and he went off on a rant about how much he disliked his family. That was really rough.
The level of realism can be gauged from the examples they provide right there on the page. Of course your results may vary basing on the initial bulk of data of realistic source images you use.
You have the code right there on Github, just install it on some PC with powerful GPUs (or rent one), tune some parameters, train the network and you can do the same things.
With edge detection. Normally edge detection means looking for local sharp changes in brightness and marking them with a white spot. The edge detection used in this case looks more sophisticated to me. I don’t know how it works
Yes, also you might want to look up DLSS - they use pretrained upscaling network on GPU to generate 4K from native 1440p picture, instant ~20-50% performance bump with free AA.
of course this being Nvidia they didnt implement it universally, you need to sign up for API access to black box gameworx like scam programs in order to implement it in your game.
I'm scared. I can't trust anything I haven't took myself. The problem is that other people not even know that this technology exists and if you tell them about it, you're a lier.
I don't think there's going to be a long period where this technology is being used and isn't widely known about. It'll be used extremely quickly to abuse people using their Facebook pictures, and once there's a Facebook angle the media will be able to run with it and everyone will get it.
Is there a way that a person could create a QR code made from one-way private key? The QR code would contain date and time and other Metadata that is decrypted with a public key that proved the QR code in the video was made by, for example, the person in the video? This would "prove" the video was real? I guess the speech would still be manipulated. So the whole transcript would have to be encrypted in this way...
I assume they are talking about signing/watermarking a video in a way that survives video encoding/lossy transmission. A QR code probably wouldn’t work for various reasons but is an easy mental analogy to think of.
It would be interesting to see the applications of this technology with the use of thermal cameras. Extracting environments from thermal imaging would be nice.
One of the example translates a full human pose to a video of a dancer. If the network would be trained on the facial pose(?) / features only, would that recreate something like the facial reenactment in http://niessnerlab.org/projects/thies2016face.html (source code for face2face is not public)?
Yes, but it just cuts out the face and pastes it on a different person/background. It does not do full reenactment where you keep the entire target video environment.
Does anybody know
a) the performance (e.g. introduced latency) and processor requirements for the client/input (e.g. is real time canny edge detection good enough and how fast would that run)? and
b) what the latency impact on the NN side to build the images (e.g. how many ms are we talking about?)
This got me thinking: in future we will probably stop streaming video and instead just send simple vectors. The data then gets rendered in real time into actual video. The customers will even be able to customize the movie: pick their favorite actors, the environment, etc.
I suspect that a combination of the two is where it's at, ie, store a lossy classical compressed version, then remove the artifacts/dream up details with deep learning
Can we expect CUDA to be the x86 on PC and Servers? Literally all works are defaulting to CUDA and Nvidia's library. I don't even see a contender trying compete. I don't even see AMD's ROCm being used or even mentioned anywhere.
Well I’m this case the results are pretty good locally but have pretty obvious artefacts too. Especially the synthesised road videos, look at the trees or even more at the Lane change in the linked video.
You say 'pretty obvious artifacts' but that is because you have some clue about the process and know what you are looking for and are interested enough to look.
I tried pointing out to some friends some really bad artifacts in a video we watched, and they just could not grasp it. They couldn't see what I was seeing as it didn't look out of place to them. It isn't for lack of intelligence, they just didn't care enough to understand. That pretty much describes vast swathes of the population.
You show a video using the above technique to anyone with strongly held political/ideologic beliefs and an inclination to accept 'alternative facts' over actual facts and videos using these techniques will be like a wildfire and almost impossible to refute!
I thinka layman would be perfectly capable of understanding "computers can now create fake videos so real you can't spot them". Not that that claim is quite true yet.
Chances are there would be corroborating evidence to the contrary, since it's not often the President does or says anything without multiple witnesses and cameras being involved. In that case, it might be easier to impersonate them through their social media accounts.
The real danger here, if anything, is impersonating common citizens or lower ranking government officials.
From a technical standpoint I think this is very impressive, and I'm also interested in creative/artsy use of this. Their "replace trees by houses" example is pretty dull but gives a good glimpse at what can be done.