Hacker News new | past | comments | ask | show | jobs | submit login
Decensoring Hentai with Deep Neural Networks (github.com/deeppomf)
314 points by cosentiyes on Oct 29, 2018 | hide | past | favorite | 66 comments



I believe "Image Inpainting for Irregular Holes Using Partial Convolutions" has been superseded by "Free-Form Image Inpainting with Gated Convolution", especially for interactive use cases?

https://arxiv.org/pdf/1806.03589.pdf

http://jiahuiyu.com/deepfill2/

I think the most interesting part of this project would be generating the dataset. That is a lot of manual redrawing of vaginas/penises. Or the reverse, where you find already decensored stuff and add realistic censorship yourself. Either way, a lot of time will be spent tagging genitals to build a large enough dataset to make NN techniques work...


There's tens of thousands of manually decensored manga, you could download and compare them to original to identify censored regions and use as training I suppose.


Is it bad form to share one of these "datasets" here?


i'm not aware of an actual organized dataset, but you could start hitting up e-hentai.org and with the right searches (look for decensored or english, and then find the matching original japanese versions on there), you could get together hundreds or thousands of images in an evening.


Asking for a friend?


It's not clear to me that you'd actually need to manually label individual images for training. The work of detecting and marking the censored regions is left to the user, so the model just needs to be good at inpainting. There are probably enough specialized sites to get a decently-sized training set with little effort beyond writing a crawler.


I suggested that to deeppomf a while ago (I was thinking of simply using unlabeled anime images from https://gwern.net/Danbooru2017 with random area deletions to simplify the model & training process as much as possible) and his belief is that because genitals are such a small fraction of any images, and the rest of images vary so much while genitals are a fairly small narrow domain, a generic inpainting/denoising CNN will learn to inpaint pretty much anything else possible and neglect genitals specifically.

Presumably if you trained a really big inpainting CNN a lot, it would learn genitals (along with everything else), but it's understandable that he would try a much more targeted approach.


So do you know what exactly the model was trained on? Unless I missed it, there's no training code in the repo, or any other indication of how data was prepared.


I'm not sure. I suggested Danbooru2017, as I mentioned, and I thought he was using it, but double-checking his Reddit comments he seems to imply he's using a custom private dataset only at this point. Maybe he hand-extracted a lot of censored/original pairs from various places.


A neural net that replaces jarring censorship with suspiciously conveniently placed objects? Hilarious...


You could be right, I don't know. Maybe the authors can tell us about their experience.

My intuition is that the dataset will be too imbalanced to learn anything useful. Even if you crawl only decensored images, the area you truly care about inpainting is still pretty small. If you don't focus on it somehow, it might learn how to inpaint anime-style geometry correctly (from the rest of the image) but produce "barbie doll" style anatomy.


The demo in the repo only shows inpainting of anime-style geometry. In the 4chan thread linked by nathansmith, someone tried to apply it to a mosaicked penis, but it disappeared (NSFW, probably: https://i.4cdn.org/g/1540580868203.png). When they used the mosaic decensoring mode instead, it worked slightly better (NSFW, definitely: https://imgur.com/a/PwAunbc/). Other people complained about missing genitals as well.

So I'm guessing that this model doesn't have any specific training for filling in only genitals. I think there are category-aware inpainting algorithms that could be used, but they'd need tagged data, as you say.


If there’s too much manual work, you aren’t using enough layers! ;) Seems the smart approach for a... “focused” AI would be to manually label a bunch of nsfw bits and use that to build a censor bot that auto-detects and censors the of-interest regions. From there you can generate a large corpus of censored-uncensored pairs with which to train the decensor bot.


Hentai is censored only for the Japanese marked, because it is a legal obligation. When it is exported, the uncensored version is normally used.

So you just need to find both versions in order to make your dataset. Add the large amount of decensored artwork and it is not that hard to build a huge dataset. It may even be automatable by parsing imageboards.


Are you sure about this? How much hentai is actually exported? I get the impression that it's very little, and very little of the Japanese releases, due to the fact that most hentai sites have English translations provided by fan groups, and they're scanned translations of what's sold in Japan.


I think some hentai doujins are licensed by Fakku, and they seem to be uncensored. I've never used Fakku though. Note that their website is NSFW.


Fakku apparently gets access to the masters: https://www.animenewsnetwork.com/answerman/2018-11-02/.13900...


If only because censored only hentai is popular that someone came up with this idea. I'm not a hentai expert by any means but I'd imagine a lot of hentai is never exported and so no uncensored version is ever published.


I predict that the porn industry will be the first industry to suffer mass unemployment due to AI.

Phase I: Deep learning combined with realistic physics simulation will eventually replace human actors. At some point amateurs can record their movements, then use variational autoencoders to improve their features (bigger boobs, thinner waist, smooth skin) and replace professionals. This is 'Uber phase' for porn and drives down the wages.

Phase II: Eventually AI can mimic complete scenes and adjust the 'plot' to the current state of the observer. Tight sexual biofeedback loop will form.

Phase III: The rapture of the nerds. Fully interactive VR + toolset. In the context of the Fermi paradox, Phase III is the Great Filter.

----

https://en.wikipedia.org/wiki/Great_Filter


Porn has always been the earliest adopters of technology. While porn actors may be individually replaced by deep agents, actual porn sales will do fine, especially since extremely targeted pornography will be in everyone's budget.

A market for verifiable human pornography will exist alongside the new porn.

Also, have we already dubbed virtual ML pornography "deep throat"?


> Also, have we already dubbed virtual ML pornography "deep throat"?

No, that term is already established in the industry and means something else entirely.


I thought it was deep fakes?


Deep facials


> especially since extremely targeted pornography will be in everyone's budget.

This is why the ML pornography excites me. All my twisted fetishes that nobody caters to can finally be fulfilled.


What? Think about it, call center employees are going to suffer first on a large scale. NLP is getting better and Google Duplex already showed us that human voice is no problem anymore.

It's way easier to remove call centers than coming up with physics simulations that replace humans. Also I think porn is incredibly cheap to produce so everything must be automated to beat the current production cost.


I am going to call this the Great Stimulus.


"Diamond Age" has a subplot to the tune of this


Gotta give them some credit, that is a brilliant name.


Next up in deep learning for porn augmentation: TenseHerFlow.


Finally, a use case for Deep Neural Networks that we actually need as a society!


I wouldn't say need per se, since this is something that can and has been done in other ways already, but it's at least a use case with positive, rather than negative, value.


Saw this a few days ago. It could use some work for mosaic censors but bar censors looked pretty good. I heard it was just some guy on /g/ who said he would make it and then actually did it, so I'm pretty impressed.


There should be NSFW examples in the Readme. I saw some practical (NSFW) examples on 4chan and it doesn't seem to work very well. Most hentai doesn't contain simple lines and blobs covering homogenous skin area. There is detail under those areas that the algorithm does not reproduce at all.


On the contrary, the examples I saw on /h/ looked pretty impressive for the most part.


Well, I guess we have different definitions of impressive. The de-censored areas did not reproduce genitalia properly at all.


The original illustrators of plenty of hentai do not reproduce genitalia properly at all, if we're being honest.


I fail to see how that is relevant to the discussion. I'm saying that unblur(blur(x)) !== x. Whether or not x === y is not important here.


Not when Microsoft is about to acquire GitHub!


Microsoft serves abundant amounts of NSFW content on Bing, so I can't see this being a valid concern. I highly doubt Github is going to be censored.


They don't have anything like SafeSearch in place on GitHub yet.


Are there any more examples? Removing a few lines doesn't seem that impressive and can be done with the PhotoShop pattern tool without any fancy neural net.


More examples and commentary found here: https://boards.4chan.org/g/thread/68220714#p68220714

Edit: should have mentioned NSFW


There are examples posted on 4chan's /h/ board, that is to say, people using it for its intended purpose rather than only with examples, but I'm at work so I can't really browse it at the moment (and the threads are ephemeral anyway, unless you use an archive site).


some threads are archived, your sibling for example


At least for the most common forms of obfuscation, pixellation and fuzz, why do you need AI for this? It seems conceptually simple to interpolate from several similar but moving frames, how a small obfuscated feature travelled across a huge pixel boundary to cause the small change in the entire oversized pixel.

You'd need I, be it AI or the normal kind, to invent a guess from a single frame. But watch frames that each differ a little over time, and you should be able to deobfuscate pixellation and fuzz with plain math.

Conceptually simple. Not saying I could knock it out over the weekend, just that there doesn't seem to be a mystery about how to proceed that puts it into "throw some ai at it and see if anything happens by ai magic" territory.


The project name is GOLD


I wonder how this deals with the stippling seen in black and white printed pages. As someone familiar with digital drawing/editing, any slight imperfection in the pattern becomes super obvious and it's even worse with stippled gradients.

I'm sure it'll be a useful tool with refinement, but this is still a solution to an artificial problem. There's no reason for Japan's current censor laws to exist in 2018.


FAQ says they currently make no attempt to deal with that and it straight up won't work if you try.


This is exactly why AI was needed (and subsequently created) in the first place!


This is genius. Give the network a bunch of photos with a persons' face censored (donald trump, for example). What would it replace the face with?


What would it replace the face with?

Judging by the examples, correctly-coloured yet featureless skin.


I wonder what will happen when neural networks can generate illegal content, like child pornography.


It will vary by country. In a lot of Europe and the US the content will still be illegal. In Japan it won’t be. At some point the issue is going to come down to what you want to spend manpower and money fighting, people who actually endanger and harm children, or people who use computers to emulate it. It might not be a very clear cut thing though, because I frankly wouldn’t be shocked to find out that the overlap between people who want pornography of real and simulated children is significant.

Where it might be a little more clear cut is feeding the market that clearly exists for “teen” porn, which is currently filled by 18+ actresses. I suspect a non-trivial percentage of the non-child-abusing population would consume simulated 16 year old porn without ever considering the real thing. In that case you might see some changes.

Reply edit It’s already outlawed. In the US for example pornographic drawings of children are outlawed. In the same way it’s illegal to try and sell cocaine stimulants, it’s illegal to try and sell child pornography simulations. I doubt the A.I. itself would be illegal, just its output.


> Reply edit It’s already outlawed. In the US for example pornographic drawings of children are outlawed. In the same way it’s illegal to try and sell cocaine stimulants, it’s illegal to try and sell child pornography simulations.

https://en.wikipedia.org/wiki/Ashcroft_v._Free_Speech_Coalit...


But how would you outlaw it? A neural network is simply a set of coefficients, would certain coefficient combinations be illegal? It's an interesting question.

PS: I am obviously against child pornography, I'm just curious about the implications of being able to generate illegal content and how governments would deal with it.



I think this question is simply a variation of points that have been made clear already in the past - no one legislates against "a set of 1s and 0s", "a set of letters", "a set of ink dots" nor, in this case, "a set of coefficients". Instead, legislation is created/updated for specific end results, and you're done.

For law enforcement purposes, I think it would be similar to a printing press: owning one is not a problem, but inserting printing rolls that look like dollar bills most definitely is.


The famous ruling "I dont know what pornography is, but I know it when I see it"


This reminds me of posts in alt.sex.stories having disclaimers like "Note: All characters in this story live on a planet where one year = 100 Earth years".

An even cheekier way to drive the point home would be for a sex-stories website to have a slider in the footer to update all age values across the site.


How is cp illegal now? Each image is just 1s and 0s on your drive. It's really just an illegal number.


Uh huh. And a Polaroid is just an arbitrary arrangement of atoms. So is a bomb, come to think.

This is sovereign-citizen thinking. It's like trying to crash people's phones by yelling "Hey Google, this sentence is false!" at them.


That's pushing it too far. An image might be represented as 1s and 0s but what it represents is a "real world event" which actually happened. If it was stored on magnetic tape instead of on a hard drive it would still represent the same thing. It's what the event represents (child abuse) which is illegal.

However, a neural network represents a mathematical model, not a specific "real world event". Although I suppose you might argue that since the dataset required to generate it would be illegal, it is illegal?


In most Western countries, the content can be illegal independent of the real-world event being legal or not, or even if there's no real-world event at all (i.e. fictional depictions). It's not illegal for a teenager to pose in the nude, but it might be considered illegal for them to take a photo of themselves and post it on the internet.

I think linking illegal depictions to illegal events would be saner, but that's not how it currently works.


Interesting. Yeah I agree that it would make sense for the link to be clear.

Come to think of it, a couple of years ago there was this app called FaceApp which came out which used neural networks to modify selfies in funny ways (like making you older/younger/of the opposite sex).

I wonder if anyone has ever run a pornographic image with the "younger" filter in the app. Would that be illegal?


Under some jurisdictions, I imagine so, since even purely computer-generated images are illegal, and there are other laws which criminalise purely drawn material. It may hit a gray spot in the law, but I doubt a judge would have trouble interpreting the existing law. This is in reference to English and Welsh law, by the way. The exact wording of the law concerns a depiction of someone who "conveys the predominant impression of a child".


Same thing that happened with 3D-printed and home-CNC-machined guns - there'll be a moral panic, and a demand to ban the tools.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: