Hacker News new | past | comments | ask | show | jobs | submit login
Real time image animation in opencv using first order model (github.com/anandpawara)
255 points by abhas9 on May 26, 2020 | hide | past | favorite | 31 comments



I'm a huge fan of this kind of practice, where the code for a paper is all located in a single public repository with build instructions, along with directions for how to cite it. Obviously, it's a little tough to do with some more data-intensive sources (besides GH hosting limits, no one really wants to download 100G of data if they're just trying to clone a repository), but this kind of thing sets a high standard for reproducibility of published results.


> but this kind of thing sets a high standard for reproducibility of published results.

I think making the code available is good, but I think we should be careful how we use the term "reproducibility". Pulling your repo and running it had better give the same results, but it's not the same sort of thing as building my own experimental setup according to a paper's specification. The latter gives more room for variability such that successful replication speaks more strongly to the robustness of the result, and also puts human brain power next to each step of the process in a way where weirdness might be noticed.

Replication should probably involve reimplementation, if it's to carry its traditional weight. In the event that we fail to replicate, though, having the source code for both versions is likely to be hugely informative.


I think this is a fair point, but in my experience, having a concrete replica that people can start from (and compare to the paper) can make a year's difference in speeding up progress.

Many times, I've read a paper, thought something was great, and then implemented the paper and failed to reproduce the author's results. In the cases where I've been able to compare my implementation to a reference on github, I often find the paper doesn't match the code, or a subtle data processing step was left out. Having a replica (a commit hash and a pointed to versioned input data) can often make a huge difference in time.


Yeah, I'm certainly not saying it isn't advisable or even important. I'm just saying it's not the same thing as replication.


While I agree, I think reimplementation is high bar for most research, especially in very niche areas.

I think think extension also carries similar value. It is less grunt work to do, but still requires a deep understanding of the existing code. "Weirdness" should quickly become apparent.


It's incredibly satisfying to reproduce these papers. I now make Rust versions of the most interesting projects. And try to make low-latency inference pipelines for those that show potential for real-time use. Some are sketched out here: https://github.com/Simbotic/SimboticTorch

The bulk of the work to get real-time working is to move more of pipeline to GPU. Mostly things handled by numpy and some image/video transformations.


That's awesome! If you're interested, there's a group working on machine learning in Rust, including some working on doing GPU pipelines for it at https://github.com/rust-ml/wg . I'm not sure if any of the work being done right now is directly applicable to any of the projects that you're reproducing, but it might be worth a look!


Oh, I'm interested. Thanks for letting me know. Would love to contribute.


This truly is amazing.

I've noticed that GPU does help a lot with inference. It would be nice if it were easier to make projects like these mobile.

Google and Apple have SDKs for running nets on phones, but its a shame its so hard to do things like this on the Raspberry pi...


Tensorflow Lite runs on Raspberry Pis. It's even in the official docs: https://www.tensorflow.org/lite/guide/build_rpi


I'm working with same model, but in a real-time pipeline developed with GStreamer, Rust and PyTorch:

https://twitter.com/rozgo/status/1255961525187235842

Live motion transfer test with crappy webcam:

https://youtu.be/QVRpstP5Qws


Nice. I want to try something like this.


Very cool, reminds me of Avatarify, which is also based upon the First Order Model work:

https://github.com/alievk/avatarify


It looks the same, even the same images. I can only get 3fps from avatar that's with CUDA, is this one faster?


Pretty cool. Reminds me of https://github.com/yemount/pose-animator

I would use it if there was a JavaScript port.


How can it generate teeth that look like they fit the picture ???


This model is trained with short clips of human speech. There is enough statistical information to "guess" how to fill the gap created by opened lips. I'm still amazed how it conserves temporal coherence (what it looks like from frame to frame).


Is no one else deeply afraid of this future?


I'm not at all.

Deep fakes are just like Photoshop, but instead of pictures, we can generate complex shapes in all sorts of signal domains.

If you restrict the technology, it becomes the tool of state actors. If it's wide open, it's just a toy. Society will learn to accept it just as they did with Photoshop.

I'm actually really excited by the potentials it unlocks. Our brains are already capable of reading passages in other people's voices and picturing vivid scenes without them ever existing. Deep models give computers the ability to do the same thing. That's powerful. It'll unlock a higher order of creativity.


Provenance and chain of custody is everything. It's always been important, but now its critical. Any audio or video without a solid chain of custody is now suspect. Anonymous leaks are worthless as anything can be faked by almost anyone with a PC.

Old and busted: "pic or it didn't happen."

New hotness: "in person witness or it didn't happen."


Do I smell a blockchain application?


Cue 10 ICOs for AuthenticityCoin type things, most of which just exit scam and the rest of which don't actually work.

The real security hole for forgery is at the point of injection. Tracking a forgery along with a block chain doesn't prove it's not a forgery.

One thought is a camera sensor that cryptographically signs (watermarks) photos or video frames on the sensor before they are touched by anything else. It's not perfect since a highly sophisticated adversary could get the secret key out of the chip, but it could definitely make it quite a bit harder to fake photos. Nothing is ever perfectly secure. All security amounts to increasing the work function for violating a control to some decent margin above the payoff you get from breaking the control.

I could see certified watermarking camera sensors being used by journalists, politicians, governments, police, etc.


This is a start. It can even be done steganographically, embedded in the picture in a non-visual way, which is robust against compression and "social media laundering" (term of art for uploading then downloading from social media).

The problem is people just don't care. See "cheap fakes" like slowing down a video of Pelosi and claiming she's drunk. People actually believe that garbage. No amount of fancy math can fix that.


Me too man


Looks like the file mentioned in this step

> gdown --id 1wCzJP1XJNB04vEORZvPjNz6drkXm5AUK

Is no longer accessible (too many downloads in too short a time)

Edit: For anyone else with the same problem, the file in question is "vox-cpk.pth.tar" which can be found in various places on the internet.


The google colab version is not really real-time, is that correct? It loads pre-recorded video. I guess that is because it is not easy to add realtime feed from camera into browser notebook or what are the limitations there?


The paper & final models don't to justice for detailed outputs though, but this is still a great model for datasets with no annotations per se.


does anyone know if using this tool to generate a music video of famous pictures singing a song would violate any copyrights? it seems like a fun exercise.


very neat! You can crop and convert to mp4 using ffmpeg: ffmpeg -i test.avi -filter:v "crop=250:250:260:0" out.mp4


one of the authors is at snap. inquiring minds want to know: will this soon be available in snap camera?


Really cool, but I hoped to see C++ code for OpenCV, not python




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: