Hacker News new | past | comments | ask | show | jobs | submit login
3D Face Reconstruction from a Single Image (nott.ac.uk)
214 points by rocky1138 on Sept 15, 2017 | hide | past | favorite | 43 comments



Note that this is just one recent work in a well established research problem that has been worked on for decades. The interesting bit here is that they appear to achieve state of the art results by doing something simpler than other approaches - instead of fitting a well thought out generic morphable face model, they use a pretty standard deep learning model to map a 2d image to a discretized 3d model of the face (voxels). Intuitively this feels a bit off, since outputting a discretized 3d model instead of fitting a continuous model has inherent resolution limitations, but the benchmark results are pretty impressive.


I guess this means that the method easily generalizes to any type of object (?)

> Intuitively this feels a bit off, since outputting a discretized 3d model instead of fitting a continuous model has inherent resolution limitations, but the benchmark results are pretty impressive.

Isn't this because deep learning is in fact a type of interpolation?

And another question: are there any techniques based on neural networks that combine multiple images into the most plausible model?


> Isn't this because deep learning is in fact a type of interpolation?

No The output is still discrete.

> And another question: are there any techniques based on neural networks that combine multiple images into the most plausible model?

Could work.


Maybe it is comparatively easy to fit the morphable model to the voxels?


This new method seems to produce nicer results when the input deviates from what would generic model expect (e.g. stronger facial expressions).

-----

Here is the same image used as input for 3d face reconstruction using both methods:

https://imgur.com/a/bZ012

Left side is morphable face model (EOS by Patrik Huber).

Right side is this new CNN voxels method.

-----

Here is live 3D model on the authors' site:

http://www.cs.nott.ac.uk/~psxasj/3dme/view.php?name=../59b41...


Had some fun with it yesterday :)

http://alteredqualia.com/xg/examples/facial_expressions.html

http://alteredqualia.com/xg/examples/max_headroom.html

It seems to be working pretty well, results do look nicer than earlier methods which were based on morphable faces [1][2].

See also continuous face reconstruction from video on their project page [3].

------

[1] https://www.youtube.com/watch?v=nice6NYb_WA

[2] http://gravis.dmi.unibas.ch/publications/Sigg99/morphmod2.pd...

[3] http://aaronsplace.co.uk/papers/jackson2017recon/


This is incredible. It seems to perform well on faces with dysmorphic features including cleft lips, asymmetric eyes, and abnormal locations of facial landmarks. It even creates an appropriate 3D representation of a cleft. I've found that some other face detection implementations actually fail at even localizing faces with these abnormalities. I don't have the background to fully understand the paper, though. Is there anything truly novel in how they are detecting faces? Could their work be leveraged to more accurately label facial landmarks for "abnormal" faces?


I'm really happy to hear that it works on those sorts of cases. I've tried a few images during testing, such as unusually long noses, and was pleased with how they came out. The novelty comes from a simple approach to a usually quite complex problem (i.e. posing the problem as a semantic segmentation problem, to produce a spatially aligned volume)


It definitely constructs a face, but how accurate is it?

Would be interesting to see a measurement of error against a known result. Maybe try rendering an image of a 3d model of a face, then attempt to reconstruct it from the rendered image and measure the displacement from the original model.


It's fitting faces to some kind of assumed model.

I tried it on a few Japanese faces; it basically gave them a Caucasian-looking side profile: big foreheads with over-hanging eyebrow ridges; big noses.

Completely unrecognizable after turning more than a few degrees from front view.

Which is not to flame the project---nice work!---but as a potential improvement for certain applications, it would be useful to simply have these details as parameters that could be interactively tweaked.

Being able to adjust the overall roundness of the face, and prominence of the chin/forehead/nose would probably go a long way.


You could use an expert system to classify the face by race. Then fit it over the appropriate model.


That is the wrong tool for the wrong problem. You don't need to classify by race, and an expert system would be a horrible way to do it.

Just get more Asian people in the training data.


Well, they include an error metric in the paper - basically average vertex distance of predicted face and actual face model (in interocular distance units). They seem to show that they are much better than state of the art with respect to that error, but of course if you want to interpret what that actually means then it's probably easier to interpret the error with your suggestion of actually visualizing it.


If you look at the .obj model file that you can download you can see exactly how accurate it it. The mesh is dense and the detail is not. The detail that it does have however is very plausible.


One of the authors here. You are very right in saying that the there aren't many details. This limitation, we believe, is due to a lack of large, high quality training sets. The data we trained from was very smooth, which means our method is unable to pick out features such as wrinkles and dimples.


What about the "Maybe try rendering an image of a 3d model of a face, then attempt to reconstruct it from the rendered image and measure the displacement from the original model." approach for generating high resolution training data?


That's actually a great idea. Because then you can get a diff between the outputted 3d and the actual 3d used to generate the images, as opposed to what I assume is just diffing the generated profile with a profile shot of the subject.


It struggles to create a likeness on the side profile but that's to be expected. Nose and mouth details lost and it ends up looking like someone else. Still it's very cool, but it can't perform miracles.

A good test is when you have a front and side source image, such as police mug shot...

Take this famous David Bowie mug shot. I also tried Jimi Hendrix and Jim Morrison, the side profile never looks like the person.

https://ozimg.s3.amazonaws.com/temp/bowie_mug.jpg

An improvement to the engine might also accept a side profile source image to improve the details. Now that would be something!


You're never going to get an accurate side profile from ONE photo (front); otherwise, you are pulling information out of thin air -- like "enhancing" an image by somehow zooming in 100x.


Never say never.

A smarter algorithm might analyse the light and shadow to a greater degree, and better predict side profile details.

Or, take it further and have the algorithm silently check the internet for match of person and then look for more images of that person. Not cheating if the measure is "upload at least one image to start with, the service will then do the best it can to perform a miracle".


You're missing the point; if the information content isn't there (i.e. "no clues") and you're not providing it somehow (e.g. by "cheating" and providing more than one image from the Internet), there's literally no way to reconstruct it -- this reeks of the Nyquist sampling theorem.

It's not something that can still be accomplished by someone who isn't a pessimist; it's provably impossible.


That's, like, the entire opposite of the nyquist sampling theorem. Faces are a low dimensional space. Images are a higher dimensional space. https://en.m.wikipedia.org/wiki/Compressed_sensing


There just isn't a learn-able (or "un-learnable" for that matter) function _even_ on the restricted domain of frontal profile images that maps surjectively onto the MUCH larger space of possible 3D face reconstructions. Intuitively, maybe the side of my jaw is deformed in a way that is not visible from the front, for example -- how can any oracle recover this information without seeing the side of my face?

Many people are mistakenly under the impression that certain signal reconstruction techniques like compressed sensing "violate" the Nyquist sampling theorem. Compressed sensing is still under the same umbrella of the Nyquist Sampling Theorem, as is CNN-based reconstruction (the technique used by this paper). I realize my analogy might have been poor; my claim is that there is still unrecoverable information loss.


As far as the demo user interface goes, I find the controls to be weird. Dragging the 3D model left and right will move the nose left and right. But dragging up and down will move the face model in the opposite direction.


That is a bit inconsistent!

Try the right mouse button instead; it seems to have a nicer rotation action, and the two axes are consistent. Also you can use the middle mouse button to zoom in.


Absolutely stunning! Congrats!

Thanks for releasing paper & source to play with!


Impressive, but I'd be interested to see if this approach works well if you add extra images (i.e., stereoscopic).

Notably, the shape/depth of my nose is very, very wrong.


I want to see some hair reconstruction!


Like all these machine learning generation things, it provides a plausible model, but not necessarily the correct model.


Amazing.

It's also amazing how far Vision has come in a few short years.


I wonder whether this could be used to crack face id.


Probably not. Apple specifically said that they've tested Face ID against masks (even Hollywood quality ones). Also, Face ID checks for user attention, so a static 3d-printed would not work. I wonder though if it worked if you put a mask over your face with eye cutouts? When the devices arrive I'm pretty sure some very clever people will try to tackle this problem, this should be interesting :)


Man are they good at marketing. They just knew this precise conversation would come up and created the perfect brief counter-argument preemptively ("hollywood quality masks").


Is it good marketing or engineering (the latter implies both)? I guess we'll find out soon enough.


If it were good only engineering no one would know about it


> static 3d-printed would not work

right, but adding in fake eyeballs should do the trick.

For inspiration, check out how these guys fool Samsung's iris scanner: https://www.youtube.com/watch?v=4VrqufsHpS4


They mentioned an infrared camera, bet they're checking temperature too.


They're using a near infrared camera which can't see much beyond what your eye can see except for TV remote LEDs and security camera floodlights. It can't see temperature unless something is almost red hot.


should be able to warm up a mask with a blow dryer


You can detect a person's pulse by amplifying the right frequencies in a video, because the pressure of the heartbeat changes the color of the face almost simultaneously. You could probably fool that by giving the mask capillaries, but that would end up being expensive.


Yes, I suspect this could be a part of the solution. They'll also need a way to simulate eyeballs, in order to fool the sensor that measures if the dummy is "looking" at the device.


Tryied it on Harry Rowolth- didnt get along with the beard.


It also caught my friend's collar and made it part of his face. Still pretty impressive technology, regardless.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: