Hacker News new | past | comments | ask | show | jobs | submit login
DeepFaceDrawing Generates Photorealistic Portraits from Freehand Sketches (syncedreview.com)
214 points by Yuqing7 on June 4, 2020 | hide | past | favorite | 75 comments



So this sort of thing seems like a big deal for indie video game devs.

One of the big problems at the moment is that photorealistic texturing basically requires finding and mapping a real life actor, which is all sorts of expensive.

This looks like a way to get past that in a super-efficient way: just mock up what "look" you're going for with a character, hit render and out pops all the data needed for a texture (I suspect we're not too many iterations way from also getting bodies, meshes and animation skeletons).

This would make it a huge field-leveller for catching up for HD content production.


> One of the big problems at the moment is that photorealistic texturing basically requires finding and mapping a real life actor, which is all sorts of expensive.

As a hobby game dev myself I find this to not be a problem at all. I think human skin is largely a solved problem. It largely boils down to creating a proper material/shader, the material will have many layers of textures. These days, with the proper tools, this is easy, and generally more of an artistic affair than technical.

In fact, I will claim that texturing for games, in general, is largely a solved problem.

Look into both Substance suite and Quixel's offerings, watch/read tutorials. Both Algorithmic and Quixel revolutionized texturing and made AAA quality possible for indies, just like Unreal Engine made AAA quality available for indies.


> I think human skin is largely a solved problem.

Can you post an example? All the fake skin I’ve seen is still marching into the uncanny valley (video at least)


To clarify, largely do not mean completely solved, right? It's not perfect, yet. Either way, I should have qualified by saying I think it's a largely solved problem as far as the technical implementation.

What an artist does with that implementation is always another matter, and it will always be down to the artist to create a pleasing final result. Specularity is one thing I think artists gets wrong all the time. At the artistic level, games often deliberately go for a more cartoony look, too.

And the uncanny valley thing is very subjective, no?

https://docs.unrealengine.com/en-US/Resources/Showcases/Digi...

I think this is more than good enough for real-time games. Of course, you will probably say, "no that sucks, uncanny valley!"

One thing to notice here is that the albedo (diffuse) map (texture) used can be very crude/simple, as it is simply one small piece of the puzzle.

If you have looked at a video, or even screenshots, like the digital human ones, where the focus is solely on the subject, often close-up, that is very different to see it running in a game, where you are less likely to notice small faults.

These days, I think facial rigs and animation is the cause of uncanny valley, to a higher degree than skin shaders.


Those do look a lot better than what I'd seen.

Skin (and other stuff) still looks pretty weird to me.


There's a huge copyright issue waiting for you when you use this kind of tech outside of research.

The way how these AIs work is that they have memorized aspects of celebrity photos and then recombine them as needed. That means if your sketch looks in any way like a celebrity from the training data set, then the AI will likely reuse parts of those photos, which would make your generated texture a derived work, meaning you'd have to pay royalties to the celebrities in the dataset.


No court has yet decided on this.

Previous research has shown that if you search the entire training dataset for the image most similar to one generated by a generative model like this, the images actually look pretty visually different. To a non-expert, they'd say the images are similar, but neither copied the other.


> the AI will likely reuse parts of those photos, which would make your generated texture a derived work

There are two inputs here: a collection of photos and a model. It's not being derived solely from the original photos. The scientist had a contribution as well, which cost him money and time to develop. As long as it's not wholesale copying and there is significant creative input (selection, changes, recombinations, etc) then I think it should be ok.


That's an open question, and big companies will face it before us smaller fish. BigCos are quite incentivized to win that fight, too.

In legal terms, if your AI model is "sufficiently transformative," you don't need to worry about anything like that. But if your AI model is overfitted and is just memorizing, then yes, you're right.


What sort of data is needed for a texture?


I'm not in the triple A game design industry, more like some good grade, game-ish design industry, so what I have to say may not apply to the state of the art.

Generally, a material is made up of _several_ textures that serve different purposes. Common textures (often called 'maps') used to build materials are albedo maps, surface normal maps, metallic/roughness maps, subsurface scattering maps (this one is important for realistic looking human skin!). There's others, and sometimes aesthetics/requirements require shader authors to _make up maps_.

Consider a texture map just like, a precomputed data cache. You can encode pretty much anything in them. Why, in the gritty gore and carrion filled trenches, I've created systems where artists can use maps to annotate parts of models they think suck. That map was used in a camera AI system that tried to avoid looking at parts artists are ashamed of, adjust depth of field threshholds depending on the depth of bad stuff... That kind of thing. (That texture seam is too gross for the sizzle cinematic, there's NO TIME to fix it. Just smudge the camera, or have it lose focus as it sweeps through! shameMap.png to the rescue).

The limit to the kind of data you can _use_ is only your imaaaagination..

For people, for most aesthetics, at minimum you need a color or diffuse or albedo map.


Love the shameMap idea!


Feel free to use the technique. I only hope that you'll also call it shameMap, because it's funny.


I have one question about this, I'm sure it's completely explainable and honest but it comes across as suspicious.

In the image labelled "Illustration of the model’s deep learning framework architecture", the input face has a strange line drawn underneath the chin. It seems like an odd thing for a human drawer to put in, and makes the person look like they have a double chin.

Yet in the output shown at the end of the pipeline, it appears as a shadow. I didn't go into the article suspicious, but this immediately made me wonder if for some of these sketches, a face to line drawing network was used for some sort of reverse process.

The image does appear in a part of the article discussing their learning methods, though, so I'm probably missing something important. But given that they "are working to release their code" it doesn't really help with confidence.


Adding a line where you want there to be a shadow in the output seems like something you could learn from trial and error when messing with a model. It somewhat weakens the accomplishment of the paper if the sketches aren’t drawn by naive users, but it’s a lot more defensible than generating the input like you suggest.


Agreed. It just looks a bit strange and doesn't help to instil confidence in the paper. My first guess would be that they've used a reverser for the learning process somehow. As it's a preprint, hopefully comments like this will help them to strengthen the paper and release their code!


Did you watch the video? It's intended to be used as an interactive tool, so you draw some lines, see the face then draw more to refine it.


That makes sense, thanks.


Wow, this is cooked.

In a few years, deep learning is going to make any sort of development of real skill feel as archaic as assembler. Learn guitar? What's the point. The little magic black box soon-to-be-smaller-than-your-smartphone can make just about any song based on minimal inputs (i.e. a beat-boxed backing track). You'll be able to generate unique and stylized paintings of your relatives and pets in seconds. Probably you'll be able to generate printable 3D objects from descriptions. Engineers will be able to sketch parts from one perspective and have the details automatically fleshed-out from best-practices learned across millions of similar parts.

You'll never get away with an illegal U-turn ever again because the city will pull footage from peoples' internet-of-crap dashcams and the machine learning algorithms will comb the feeds and send fines directly to your mailbox with basically no human intervention.


Those mostly sound like good things to me. I’m sure people who enjoy music or art will still do it regardless. The people who might enjoy it but don’t quite have the talent will have more powerful tools to help them express their creativity. And as for the engineering, that seems like an absolutely good thing, as long as we don’t get too complacent and overly trust the machines to do it for us without double checking.


Not saying it's a bad thing at all, it's just going to be very different from the world we know.


People still learn to draw actual objects despite photographs existing for longer than any of our lifetimes.


I'm not convinced. Yes, you will be able to create faces, drawings, 3d models and music just by telling the computer what you want. But I doubt that you can cross the uncanny valley of nearly lifelike artificial output to things made by a skilled human.

It's one thing to have endless amounts of texture, music, content etc. But you also need to combine them into a playable game for example where all those different parts need to match. And the game still needs to feel fresh and make fun. Will a computer alone be able to do that? Can a hobbyist do that?

In contrary, I believe tools like this will increase the required skills someone needs to bring with him to make something worthwhile. That's why tools like Game Engines, 3D Modelling Software or Music DAWs get more complex year by year.

Case in point: do you believe a hobbyist will be able to do something like that [1]? No, it will be a team of dozens and dozens of specialists who will raise to bar even higher and higher. But they will profit the most of artificially generated art, which they can manually adjust and provide the final touches, to turn something generic into something great.

[1] https://youtu.be/d8B1LNrBpqc?t=87


> You'll never get away with an illegal U-turn ever again because the city will pull footage from peoples' internet-of-crap dashcams and the machine learning algorithms will comb the feeds and send fines directly to your mailbox with basically no human intervention.

This is not a "in a few years" thing in some places: https://youtu.be/taZJblMAuko?t=1536


Most likely, a colorful decal print on your car will make you appear to be a flying panda to all those AI algorithms, thereby effectively making it impossible for you to get a speeding ticket.


Reminds me of this one (used to be called PaintsChainer) where the input is line drawings, output is colored-in using anime palettes. User can override the color choices. Very satisfying, especially on drawings by kids.

https://petalica-paint.pixiv.dev/index_en.html


That to me looks waaay more impressive than the DeepFaceDrawing.


This reminds me of edges2cats https://affinelayer.com/pixsrv/


Goodbye, modeling industry!

This is extremely impressive. Extrapolating for possible applications, I could imagine that techniques like this could one day become invaluable tools, say, for asset creators in the game industry. The video speeds up the process, but of course in a couple of years, this will be actual real-time performance. Extended to more than just human portraits, this could be a fantastic design tool.


Could police sketch artists use this as a tool?


For extra fun hook it up to the DMV photo database so that every returned image always looks just like an actual person in the area.

Your conviction rate will go through the roof when the artist's sketches are a dead ringer for the suspect!


A) people commit crimes in places they don’t live

B) even so dmv photos are notoriously bad


> A) people commit crimes in places they don’t live

No problem! Conviction/case-closure stats don't care if the person you convicted was local or not.

> B) even so dmv photos are notoriously bad

They're often unattractive but they usually identify people pretty well when they're not too old to do so.


I had the same idea, but then reasoned that the network is adding specifics to the sketch, which might not be desirable if you're trying to give a vague idea of someone's appearance.


This is certainly interesting research, and as an artist I think I'd be hugely frustrated by the amount of non-local change which seems to happen in the video. A fair number of small pen strokes seem to affect a large part of the generated face.

For example, take the difference between 2:20 and 2:27 in the video. The upper half of the drawing hasn't changed, but the generated image has a lot more hair and different ears. While the technology looks impressive as it is, it seems to me that it would be better to leave areas the artist has barely defined as blurred rather than flickering between various high resolution features that are all roughly equally matching the sketch.


That's most likely a training data issue.

The whole thing works on statistical priors: if I have feature a at location x, there's a 90% I should have feature b at location y. So if the majority of pictures of beards in my dataset were also, say, wearing sunglasses, then naturally if I freehand draw a beard the net will probably output sunglasses even if I don't change the eyes!

The solution is to ensure that you sample the full data space that you wish to reproduce (not trivial). Neural nets do seem to interpolate but this is super high dimensional space so it's not always intuitive...there are many orders of magnitude more directions in which to move to get from point A to point B.


There seems to be a good use case for video game assets generation without copyright fear.


I'd kind of like to see what it does with assorted cartoon character pictures. What does it do with, say, Charlie Brown or Calvin?


Or Garfield.



When will accent correction come to market so non native speakers (who have difficulty in pronouncing some words) are able to make quality voice overs for YouTube and podcasts?

Looking at the virtual agents, it seems they are able to understand very crappy English (with all my attempts), how far are we from correcting it?


I wouldn't be too concerned with getting accents perfect (unless your are an actor). There are so many extremely diverse native English accents, like Western American, Boston American, Received Pronunciation, Scouse, Scottish, Irish, Welsh, Australian, etc. I think different pronunciations are accepted.

And the magical accent corrector wouldn't fix bad grammar.


Yep, I can easily believe it'll fix the pitch, but not the phrasing, or the pauses, or without the odd glaring error. And the uncanny valley effect might actually be considerably worse than the traces of native languages English people are used to hearing from ESL speakers.

I'm reminded of a former flatmate whose father chose not to raise her as bilingual in the mistaken belief a second language would impair her learning. Instead, when she chose to learned Spanish as an adult anyway, she picked up the slang and pronunciation of her Colombian relatives, but never quite reached native fluency. She pointed out the drawback to having a local sounding accent and name instead of being an obvious foreigner was that everybody who met her assumed her misunderstandings, pauses or the odd really ungrammatical phrase was because she was an unusually stupid Colombian.


You are of course correct, but I've faced discrimination in hiring practices myself and therefore tried to research more fair methods of conducting interviews.

Aside from the obvious idea of masking one's visual identity for example through VR avatars in lieu of face-to-face interviews (to hide for example gender, physical appearances and able-bodieness), one's voice would still reveal many factors that could be used to purposefully or accidentally skew any neutral position one might have during interviews.

I know for a fact, that if I had the Indian accent to accompany my surname, I would be ranked lower or rejected altogether during interviews. I sadly have a slight rally Finnish accent, so even if my grammar were to be perfect, I'm not a good hire compared with native English speakers even if I'm just as capable for the same position.


Couldn't that be done by speech-to-text followed by text-to-speech?


Sure, if intonation, inflection and timing of the syllables and words are carried over on both processes. Currently they normalises all of the parameters above, which is why everything sounds so robotic and monotonous.


Is this open sourced anywhere? I didn’t see it in the article


From the article: "They are working on releasing the source code soon."


Article shouldn't have been accepted. Research papers relying on code without any verifiability ought to be more of an issue.


About a year ago I was explaining to a friend of mine how I thought that increases in artistic AI combined with procedural generation could change art forever.

He immediately thought "we should use to to make porn without actors, that would make money" this seems closer to possible everyday.


The real question: if one is trying to draw a likeness of a real person, is it possible to get this software to generate a decent likeness because the drawing projects that likeness, or are these "drawings" really just weak pseudo-random number generators and the software presents a realistic face regardless?


Given that this was created by Chinese people, in Hong Kong, why are all their examples white/hispanic people?


It is because they most likely used the CelebA dataset: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html.


"We build this on the face image data of CelebAMask-HQ [24], which contains high-resolution facial images with semantic masks of facial attributes. For simplicity, we currently focus on front faces, without decorative accessories (e.g., glasses, face masks)."

https://arxiv.org/pdf/2006.01047.pdf


Not to mention beautiful. I'm betting they trained it on that "celebrities" dataset.


Of all the possible things you could choose to comment on about this product, why did you feel like this was the most relevant for discussion?


I just noticed it and found it interesting.

Why is it interesting? Well for one, it raises an eyebrow as to the motivations behind either the technology or that of the promotional material produced for it.

I'd find it similarly interesting if, for an example, an entirely Russian team produced deepfake tech and produced promotional material for it entirely consisting of black people.

Especially in an era where we already acknowledge the prevalence of nation state cyber psyops / propaganda / manufactured news and "facts".


It's interesting if it's due to a library used and worth noting.


This is very cool indeed. I can imagine application in the game industry, advertising etc. However.... all results are frontal. No three-quarters or profile at all. This heavily restricts its use. If the user was allowed to input both a profile and a front view, then all would be fine.


Evolution of face reconstruction using eyewitnesses memory of face:

0. Portrait painted by trained artist in consultation with a witness;

1. "Identikit";

2. "PhotoFIT";

3. "DeepFaceDrawing" - [0] powered by ML & AI (which are trained on [1] & [2]?).


I find it interesting and yet am unsettled by how AI engineering is ofsetting human creativity and artistry. And whats with this fetish for deepfake? Are we pursuing this or we do it because we can?


How do the faint lines work? The grey ones already there when they start drawing in black. Are they just a guide or being used, if so in what way?


I think they're some kind of back projection from the nearby space of the image. Without them, you'd be more likely to draw features in the wrong place because you and the computer have a different idea of the intended scale and end up monster faces.


Aha! That makes sense. Thanks


Anyone know projects that have done this in reverse? I've seen a really good one but I've lost track of it.



IDEA/REQUEST (if anyone has this running)

Trace over some famous cartoon characters and see what it outputs.


Wow, watched the video. I'd love to use the sketching program. Looks like fun!


yeah lot of these "AI does the thing" stuff looks extremely interesting to play with, especially to see what happens when you poke the model with weird data (say, no eye, no hair, two noses etc) but most of them are completely inaccessible to non researchers, even when the code/model is released building the solution requires so much effort and expertise to make it impossible to play with

this is why I loved so much the pix2pix cat drawing demo https://affinelayer.com/pixsrv/ I hoped it would make a turning point for demoes but alas this is still unique


Very cool!! Some of them don't quite look real once I've messed with it (bad drawing skills) but that's very impressive, especially across such a diverse set of objects (each has their own model, but still.)


Someone please draw a penis and share the results.


Is there a demo we can try?


http://geometrylearning.com/DeepFaceDrawing/ says "[Coming Soon]" for the code.


I find it a bit disturbing that there is a lack of racial diversity in the models. Especially given what is going on nationally in the US.

I can imagine it's much easier to train on type of face, but this could lead to later bias.


I don‘t know why you are downvoted. To me this is a very valid point. Does anybody care to give some arguments for downvoting?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: