This demo is doubly amazing: First, the obviously impressive (and slightly unsettling in its implications) manipulation of the target face, but second, the fact that this is all being done with a single RGB camera.
In terms of gaming applications this could be huge for virtual reality avatars. You can still be anonymous but still convey facial expressions with a webcam!
This is insanely awesome. Something that comes to my mind is the use of this for dubbing movies and TV series; I'm very sensitive about correctly syncing what's being said to what we see, to the point where I only watch movies in their native tongue - even if I don't know the language and need subtitles. This could be a game-changer.
I'm curious, how do you both read subtitles and watch that the sound is properly synced to an actor's lips? I can't read that fast, so I spend 80% of my time "watching a movie" simply reading text on the bottom of the screen.
Very well done. The best part is how they re-enact the mouth/teeth to look so real by capturing it by sampling earlier parts in the video to then use that on the non expression still or loop. I was blown away when Trump's teeth looked so real then they explained this process and why.
This could be huge (yuuuge) in games and virtual spaces. At GDC Unreal 4 has a demo recently and seems we are approaching that era[1]
One of the barriers existing in webcam meetings is the inability to make eye contact. It seems subtle but I think it is more important than one might intuitively think.
I've thought a lot about how to overcome this and came up with nothing but cameras beneath the screen (which Apple seems to have worked on but we have yet to see it: http://appleinsider.com/articles/09/01/08/apple_files_patent...). This technology could possibly provide a competing solution.
The data vacuuming companies (FB, Google, ad networks, etc.) collect the information about us required to know how to manipulate us and technologies such as this one represent the tools to actually get it done.
I don't think they want the world to see some source code, should it get into the wrong hands, though the repercussions of this work are bound to hit us sometime. [1]
Consider the massive rig required to perform the at-the-time groundbreaking performance capture for the game L.A. Noire: http://i.kinja-img.com/gawker-media/image/upload/s--y6fmsAIU... This is how far computer vision has come in five years.