I'm a huge fan of this kind of practice, where the code for a paper is all located in a single public repository with build instructions, along with directions for how to cite it. Obviously, it's a little tough to do with some more data-intensive sources (besides GH hosting limits, no one really wants to download 100G of data if they're just trying to clone a repository), but this kind of thing sets a high standard for reproducibility of published results.
> but this kind of thing sets a high standard for reproducibility of published results.
I think making the code available is good, but I think we should be careful how we use the term "reproducibility". Pulling your repo and running it had better give the same results, but it's not the same sort of thing as building my own experimental setup according to a paper's specification. The latter gives more room for variability such that successful replication speaks more strongly to the robustness of the result, and also puts human brain power next to each step of the process in a way where weirdness might be noticed.
Replication should probably involve reimplementation, if it's to carry its traditional weight. In the event that we fail to replicate, though, having the source code for both versions is likely to be hugely informative.
I think this is a fair point, but in my experience, having a concrete replica that people can start from (and compare to the paper) can make a year's difference in speeding up progress.
Many times, I've read a paper, thought something was great, and then implemented the paper and failed to reproduce the author's results. In the cases where I've been able to compare my implementation to a reference on github, I often find the paper doesn't match the code, or a subtle data processing step was left out. Having a replica (a commit hash and a pointed to versioned input data) can often make a huge difference in time.
While I agree, I think reimplementation is high bar for most research, especially in very niche areas.
I think think extension also carries similar value. It is less grunt work to do, but still requires a deep understanding of the existing code. "Weirdness" should quickly become apparent.
It's incredibly satisfying to reproduce these papers. I now make Rust versions of the most interesting projects. And try to make low-latency inference pipelines for those that show potential for real-time use. Some are sketched out here: https://github.com/Simbotic/SimboticTorch
The bulk of the work to get real-time working is to move more of pipeline to GPU. Mostly things handled by numpy and some image/video transformations.
That's awesome! If you're interested, there's a group working on machine learning in Rust, including some working on doing GPU pipelines for it at https://github.com/rust-ml/wg . I'm not sure if any of the work being done right now is directly applicable to any of the projects that you're reproducing, but it might be worth a look!
This model is trained with short clips of human speech. There is enough statistical information to "guess" how to fill the gap created by opened lips. I'm still amazed how it conserves temporal coherence (what it looks like from frame to frame).
Deep fakes are just like Photoshop, but instead of pictures, we can generate complex shapes in all sorts of signal domains.
If you restrict the technology, it becomes the tool of state actors. If it's wide open, it's just a toy. Society will learn to accept it just as they did with Photoshop.
I'm actually really excited by the potentials it unlocks. Our brains are already capable of reading passages in other people's voices and picturing vivid scenes without them ever existing. Deep models give computers the ability to do the same thing. That's powerful. It'll unlock a higher order of creativity.
Provenance and chain of custody is everything. It's always been important, but now its critical. Any audio or video without a solid chain of custody is now suspect. Anonymous leaks are worthless as anything can be faked by almost anyone with a PC.
Old and busted: "pic or it didn't happen."
New hotness: "in person witness or it didn't happen."
Cue 10 ICOs for AuthenticityCoin type things, most of which just exit scam and the rest of which don't actually work.
The real security hole for forgery is at the point of injection. Tracking a forgery along with a block chain doesn't prove it's not a forgery.
One thought is a camera sensor that cryptographically signs (watermarks) photos or video frames on the sensor before they are touched by anything else. It's not perfect since a highly sophisticated adversary could get the secret key out of the chip, but it could definitely make it quite a bit harder to fake photos. Nothing is ever perfectly secure. All security amounts to increasing the work function for violating a control to some decent margin above the payoff you get from breaking the control.
I could see certified watermarking camera sensors being used by journalists, politicians, governments, police, etc.
This is a start. It can even be done steganographically, embedded in the picture in a non-visual way, which is robust against compression and "social media laundering" (term of art for uploading then downloading from social media).
The problem is people just don't care. See "cheap fakes" like slowing down a video of Pelosi and claiming she's drunk. People actually believe that garbage. No amount of fancy math can fix that.
The google colab version is not really real-time, is that correct? It loads pre-recorded video. I guess that is because it is not easy to add realtime feed from camera into browser notebook or what are the limitations there?
does anyone know if using this tool to generate a music video of famous pictures singing a song would violate any copyrights? it seems like a fun exercise.