Open Sourcing a Deep Learning Solution for Detecting NSFW Images | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		Open Sourcing a Deep Learning Solution for Detecting NSFW Images (yahooeng.tumblr.com)
		453 points by pumpikano on Sept 30, 2016 \| hide \| past \| favorite \| 145 comments

brianwawok on Sept 30, 2016 | [–]

So can it be reversed to become the ultimate porn-finding neural network?

XCSme on Sept 30, 2016 | | [–]

When you said "reversed" I thought about it being a porn-generative neural network. Just enter your favorite keywords and an unique, tailored to your needs scene will be generated just for you!

zardo on Sept 30, 2016 | | | [–]

As long as what you need is a nightmarish version of the requested scene.

What happens here with generating illegal content? If you put a public text->image gan up, and someone uses it to generate child porn, are you responsible?

How could you make sure it couldn't?

imh on Sept 30, 2016 | | | [–]

I was wondering about illegal content in a totally different sense. I know from personal experience that overfitted generative models end up just memorizing the data set you trained it on. Say I overfit a model on Shakespeare, and you ask it to generate Shakespeare-like sonnets. It will do a fantastic job because it will just spit out one of his sonnets verbatim. Claiming that my network wrote it is ludicrous. The line is fuzzy where it's not overfitting and starts to generate its own content. For what you're talking about, if your model learns entirely from other people's movies, at what point is it not just a super weird codec for those people's IP?

BinaryIdiot on Sept 30, 2016 | | | | [–]

I would imagine if you're generating then no real people were part of its creation therefore it would be legal. If I remember correctly cartoons of children having sex is not illegal (in the United States as far as I know).

Though that then raises the question: when happens when it can be generated so realistically that it looks indistinguishable from the real thing? Would it still be treated like cartoons? How could you prove one way or the other? Lots of questions here.

daturkel on Sept 30, 2016 | | | [–]

In the USA:

Provisions against simulated child pornography were found to be unconstitutional in Ashcroft v. Free Speech Coalition [0] in 2002.

From wiki [1]:

> Referring to [New York v. Ferber, 1982: child pornography is not protected speech], the court stated that "the CPPA prohibits speech that records no crime and creates no victims by its production. Virtual child pornography is not 'intrinsically related' to the sexual abuse of children".

IANAL, but following that logic alone, the degree of realism doesn't seem to be relevant to the legal precedent insofar as photorealistic imagery would still "record no crime" nor "create victims by its production." As to whether it's dangerous for such material to exist because it would create plausible deniability for the production of actual photography while claiming it's simulated...I guess that would be a different matter.

[0]: https://en.wikipedia.org/wiki/Ashcroft_v._Free_Speech_Coalit...

[1]: https://en.wikipedia.org/wiki/Child_pornography_laws_in_the_...

voxic11 on Sept 30, 2016 | | | [–]

They actually fixed this with the 2003 PROTECT act which only makes obscene simulated child pornography illegal. Because obscenity isn't protected by the first amendment this has been found to be constitutional.

justinlardinois on Sept 30, 2016 | | | [–]

It's hard to say that changed anything. Obscenity is a notoriously thorny subject in American constitutional law. The legal test for determining obscenity[0] is highly subjective and in the context of the internet very difficult to apply.

[0] https://en.wikipedia.org/wiki/Miller_test

Normal_gaussian on Sept 30, 2016 | | | [–]

Highlights from: https://en.wikipedia.org/wiki/Miller_test

The test:

* Whether "the average person, applying contemporary community standards", would find that the work, taken as a whole, appeals to the prurient interest,

* Whether the work depicts or describes, in a patently offensive way, sexual conduct or excretory functions specifically defined by applicable state law,

* Whether the work, taken as a whole, lacks serious literary, artistic, political, or scientific value.

Also:

Critics of obscenity law argue that defining what is obscene is paradoxical, arbitrary, and subjective.

---

I think it would be hard to not find generated child porn obscene by this test, unless you have a good lawyer, at which point there is plenty of wiggle room.

justinlardinois on Sept 30, 2016 | | | [–]

I think the technical feats involved in creating such a text-to-image program might allow a talented laywer to make an argument for scientific value.

There's also the issue that "contemporary community standards" are hard to determine, because what community you're talking about is hard to determine.

yomly on Sept 30, 2016 | | | | [–]

While no one was harmed in the direct creation of a generated video, can it not be argued that in order to train such a generative engine, it is highly likely that harm-inducing content was produced/consumed at some stage?

Where does the harm boundary lie? Is harm inflicted if an inanimate object is the only thing consuming the content? Is a generated video borne from harmful content an interest payment on your harm-capital?

Meta, but interesting I wonder whether this train of thought would hold in a court of law...

voxic11 on Sept 30, 2016 | | | | [–]

In the United States under the 2003 PROTECT act they are actually illegal but only if they are "obscene" as obscene speech is not protected by the first Amendment.

>Prohibits drawings, sculptures, and pictures of such drawings and sculptures depicting minors in actions or situations that meet the Miller test of being obscene, OR are engaged in sex acts that are deemed to meet the same obscene condition. The law does not explicitly state that images of fictional beings who appear to be under 18 engaged in sexual acts that are not deemed to be obscene are rendered illegal in and of their own condition (illustration of sex of fictional minors). Maximum sentence of 5 years for possession, 10 years for distribution.

Interestingly the same act does make all computer generated child pornography which is "virtually indistinguishable from that of a minor engaging in sexually explicit conduct" with no requirement that it be obscene, I don't think that has been tested in court yet.

https://en.m.wikipedia.org/wiki/PROTECT_Act_of_2003

barnacles67 on Oct 1, 2016 | | | | [–]

Maybe today it's legal (or not?) but that's only because generating realistic porn is not possible yet. Does anyone really think that in today's American society so afraid of sex that this would remain legal? Consider the following situation:

Imagine we're 200 years into the future and nothing much has changed with respect to attitudes about sex, freedom of speech, technology, etc. All images available 200 years ago are still there on the internet to download, including illegal child porn. If you are downloading 200+ year old child porn, where everyone depicted is long dead, how could anyone be harmed by you viewing those images? Yet I cannot imagine anyone successfully using that defense if caught with those images.

It's not about who is being harmed by these images (if anyone), it's more about society hating pedophilia and going after it wherever possible, free speech be damned.

mehwoot on Oct 1, 2016 | | | [–]

how could anyone be harmed by you viewing those images?

I think the argument is that pedophiles view existing media and it encourages them to do things to real people.

The same reason why excessively violent movies are not considered acceptable for children- it's not out of concern for the people in the movies, it's the effect it will have on the person viewing it.

barnacles67 on Oct 1, 2016 | | | [–]

By that logic rape porn should be illegal too. And many movies and video games of realistic violence / gore.

And that may be part of the argument, but I think the official position of the justice department is that each time a child porn image is viewed, additional harm is inflicted upon the victim in the image.

mehwoot on Oct 2, 2016 | | | [–]

Well I think many people would want those things to be illegal too. Censorship laws are not necessarily logically consistent, nor are they black and white. Changing these laws will always be contentious.

J-dawg on Sept 30, 2016 | | | | [–]

Here in Airstrip One, cartoons of children may be illegal.

http://www.mirror.co.uk/news/uk-news/fan-japanese-anime-make...

ue_ on Oct 1, 2016 | | | [–]

I've spoken about this many times, and I'll say it again - it's a horrible violation of freedom of speech and expression to not be allowed to possess or create certain drawings.

There is no law that sickens me more than the one that criminalises drawings. It angers me far more than even modern copyright law.

And the worst part is that nobody seems to care about it, even those who are libertarians in the UK who I've met are overcome with repulsion at the type of speech or expression, not caring for the violation of rights.

It's a horrible law backed by no evidence of harm caused, and I think it is truly truly wrong that people ignore it. I am not exaggerating when I say that the principle is a motivating factor for me to leave the UK

J-dawg on Oct 3, 2016 | | | [–]

I didn't see that anyone had replied to my post until now. HN could really do with an inbox feature!

I just wanted to say that I agree 100%. The UK's laws on obscenity and freedom of speech in general are a terrifying mess.

You can be arrested and convicted for as little as wearing a t-shirt with an offensive slogan. [0]

How do we define offensive? Well, nobody really knows. It basically depends on the magistrate or jury you find yourself in front of.

When I discuss cases like this with people, they often say something along the lines of "but how can you defend this person... what they said was racist/homophobic/obscene/insulting to the dead etc". How many times will I have to explain, I'm not defending the person or their opinions, I'm defending the principle of freedom of speech.

Nobody is willing to stand up for a cartoonist who draws creepy pictures, or a football fan with a terrible sense of humour. But really we should all be protesting in the streets over this stuff.

How long before your t-shirt is deemed offensive? Or something you wrote? Or something you drew?

Just like with the cartoons, there is zero evidence of harm caused. Just innocent people persecuted with no justification.

[0] http://www.bbc.co.uk/news/uk-england-hereford-worcester-3674...

ue_ on Oct 4, 2016 | | | [–]

I agree, and in my opinion it's a sort of climb further and further down in which people decide what is offensive and not, what is acceptable or not, and they don't require evidence for it - they just legislate the problems. And it's not rare that it's done by the people who have good intentions. They want society to run smoothly and nicely - but by trying to ensure that with law, we lose individual liberties.

I've noticed that people think of speech in the same way as "you should be allowed to say that" or "you shouldn't be allowed to say this" rather than in the way of "you should be allowed to speak" or "you shouldn't be allowed to speak". This kind of "particular" reasoning leads to examining the contents of speech rather than simply the right.

A magistrate who ruled in a case about the cartoons I mentioned mentioned that "society has no need for these materials", or words to that effect. This really proves my point about taking society's use for something over the individual. Most laws usually considered to be unjust we find are related to protecting people from themselves or inadequate consideration for personal liberty.

notyourwork on Sept 30, 2016 | | | | [–]

I don't think there are that many questions really. The answers already exist like you mention. If its not real and its generated/created content it isn't illegal. This is the same as why you have all the crazy asian animated adult content.

The only question needing answered is: What is considered real?

DrStalker on Oct 1, 2016 | | | | [–]

Consult a lawyer first; in some parts of the world constructed images of child pornography are illegal, covering both drawn child pornography and photo-manipulation where a child's head is pasted on to a young-looking but legal woman's body.

zardo on Sept 30, 2016 | | | | [–]

>How could you make sure it couldn't?

I just realized the answer to this is pretty obvious, you could have a network trained to classify child pornography and use it to censor the output.

Getting the training set would be an issue, you would probably have to work with law enforcement to do it.

WhitneyLand on Sept 30, 2016 | | | [–]

Impossible. There is no accurate way to determine age by external appearance. For example a 15 year could look older than an 18 year old.

DrStalker on Oct 1, 2016 | | | [–]

That doesn't stop law makers from trying. Like the mess over small-breasted porn being banned in Australia in 2010:

> While the ACB claims that there is no blanket ban on small breasts as such, women over the age of 18 with small breasts who might look young ARE banned.

http://www.inquisitr.com/59633/australian-government-censor-...

girvo on Oct 1, 2016 | | | [–]

Yeah but Australia doesn't really have protected speech per se. Different legal system.

erklik on Oct 1, 2016 | | | | [–]

Australia doesn't technically have "freedom of speech" as America or other countries may have.

zardo on Sept 30, 2016 | | | | [–]

If it's being generated no characters actually have an age.

Any that are plausibly adults would presumably be legally in the clear.

gcr on Oct 1, 2016 | | | | [–]

Incidentally, that's almost exactly how generative adversarial networks work.

To generate realistic images, you make two neural networks: one of them (D) takes an image as input and decides whether it's real or whether it's the output of (G). The other (G) takes random noise as input and turns it into an image that will fool the (D) network.

Make them fight until they both get strong, and then use (G) as the final model.

NickNameNick on Sept 30, 2016 | | | | [–]

Surely you would just filter on the input?

zardo on Sept 30, 2016 | | | [–]

Maybe, it seems like there might be ways to work around an input filter though.

artursapek on Sept 30, 2016 | | | | [–]

Would computer generated child porn be illegal? No children are harmed in its creation.

gizmo686 on Sept 30, 2016 | | | [–]

No

" (8) “child pornography” means any visual depiction, including any photograph, film, video, picture, or computer or computer-generated image or picture, whether made or produced by electronic, mechanical, or other means, of sexually explicit conduct, where— (A) the production of such visual depiction involves the use of a minor engaging in sexually explicit conduct; (B) such visual depiction is a digital image, computer image, or computer-generated image that is, or is indistinguishable from, that of a minor engaging in sexually explicit conduct; or (C) such visual depiction has been created, adapted, or modified to appear that an identifiable minor is engaging in sexually explicit conduct.

...

(11) the term “indistinguishable” used with respect to a depiction, means virtually indistinguishable, in that the depiction is such that an ordinary person viewing the depiction would conclude that the depiction is of an actual minor engaged in sexually explicit conduct. This definition does not apply to depictions that are drawings, cartoons, sculptures, or paintings depicting minors or adults. "

In theory, you could argue that the network was trained on child porn, and as such anything produced by it involved the use of a minor engaging in sexually explicit conduct; but I can't imagine a court actually buying that arguement.

https://www.law.cornell.edu/uscode/text/18/2256

WhitneyLand on Sept 30, 2016 | | | [–]

Damn it this is serious stuff. Please cease in dispensing incorrect legal opinions.

gozur88 on Oct 1, 2016 | | | | [–]

Wouldn't that make it illegal under 8B?

gizmo686 on Oct 1, 2016 | | | [–]

Sufficiently realistic digital child porn would be prohibited under 8B. The original context of this conversation was neural net generated images, so I had not considered the case of photorealistic images. Those would be a violation of 8B (if they are of sufficient quality).

Having said that the segment I quoted is only the definition of child porn. The law prohibiting child porn [0, section c2] provides that:

"It shall be an affirmative defense to a charge of violating paragraph (1), (2), (3)(A), (4), or (5) of subsection (a) that— ...

the alleged child pornography was not produced using any actual minor or minors. No affirmative defense under subsection (c)(2) shall be available in any prosecution that involves child pornography as described in section 2256(8)(C)"

It is worth mentioning the segments of section A that are excluded from this defence:

Section 3B prohibts the advertisement/distribution/solicitation/etc of of material that is claimed to contain (i) "an obscene visual depiction of a minor engaging in sexually explicit conduct; or (ii) a visual depiction of an actual minor engaging in sexually explicit conduct;"

The relevant part of this is (i), where you would need to parse out the definition of "obscene" and "minor". Section 2256 defines minor as "any person under the age of eighteen years", however the courts would probably read it in this context in contrast to the phrase "actual minor". I could not fine the definition of "obscene" or "actual minor". Talk to a lawyer.

Section 6 relates to prohibits providing child porn to a minor.

Section 7 requires a depiction of an identifiable minor.

[0] https://www.law.cornell.edu/uscode/text/18/2252A

pbhjpbhj on Oct 1, 2016 | | | [–]

Based on what you quoted up-thread a fictional depiction of an identifiable minor would be considered child-pornography too.

Moreover in the above quote it seems it could be considered "using" to use the visual identity of a minor.

I'd imagine that meant it you used any reference image of faces for your neural net there's a chance of violating the "letter" of this provision.

This of course if not legal advice.

snowwrestler on Sept 30, 2016 | | | | [–]

Yes. Whether that is a good idea should probably not be debated here, as it will lead to a 500-comment subthread.

artursapek on Sept 30, 2016 | | | [–]

Who doesn't love a 500-comment subthread?!

w__m on Sept 30, 2016 | | | | [–]

But many can be harmed because of its creation, exposure to it etc.

Houshalter on Sept 30, 2016 | | | | [–]

This has been done, sort of. NSFW: http://deepdickdreams.tumblr.com/

pault on Oct 1, 2016 | | | [–]

Someone tell H.R. Giger his job has been automated.

flurpitude on Sept 30, 2016 | | | | [–]

It would be interesting to see the "deep dream" treatment of ordinary images using this neural net.

knicholes on Sept 30, 2016 | | | [–]

I've been looking for balance in the universe after VidAngel requires filtering of obscenity. I want to run videos through a neural network to add obscenity and nudity.

netsharc on Oct 1, 2016 | | | [–]

That'll be the killer app of Augmented Reality, "see-through" glasses.

"Ok glass, undress her.".

Something1234 on Sept 30, 2016 | | | [–]

http://www.scp-wiki.net/scp-1004

BinaryIdiot on Sept 30, 2016 | | | [–]

My first thought as well. Integrating with this some web crawlers that test images as it goes so it keeps going down the path of maximum NSFW websites.

Though at the same time I fear the type of pornography it may come across.

mhurron on Sept 30, 2016 | | | [–]

That's what Bing is for.

Grishnakh on Sept 30, 2016 | | | [–]

And in fact, that's the only thing Bing is good for.

It is really good at it, though.

abz10 on Sept 30, 2016 | | | [–]

I can assure you this is not by accident.

Grishnakh on Oct 2, 2016 | | | [–]

Microsoft is intentionally doing a poor job at all the non-porn uses of Bing?

IA21 on Oct 2, 2016 | | | [–]

No. From what I understand, Google downplays porn searching and that's the only area left for Bing to shine.

nerfhammer on Sept 30, 2016 | | | [–]

Using it as an adversarial trainer for another net it could generate infinite amounts of porn.

ryandrake on Sept 30, 2016 | | | [–]

The Internet seems to be accomplishing this on its own just fine already.

nostrademons on Sept 30, 2016 | | | [–]

My first thought as well, but it turns out that finding porn on the Internet is just not that much of a problem.

jomamaxx on Sept 30, 2016 | | | [–]

"So can it be reversed to become the ultimate porn-finding neural network?"

They have that, it's called 'Google' :)

empath75 on Sept 30, 2016 | | | [–]

You can use the nsfw subreddits as a massive dataset for categorizing porn.

Forge36 on Sept 30, 2016 | | | [–]

Not all of them. gonwild[1] is a just a massive polygon pun subreddit

[1] https://www.reddit.com/r/gonwild

crobertsbmw on Oct 1, 2016 | | | [–]

I was a little nervous at first because it asked me to conform my age and everything but it looks like you are right. Just a bunch of math geeks, haha.

pbhjpbhj on Oct 1, 2016 | | | | [–]

Very edgy stuff!

Esau on Oct 1, 2016 | | | [–]

Now I can replace Bing!

bahro on Sept 30, 2016 | | [–]

I should update my sexy map finder: http://exclav.es/2016/05/20/sexy-maps/

the_af on Oct 1, 2016 | | [–]

Those are some hot map pictures!

I don't understand how Google's algorithm can be misled into finding sexiness in those. I imagine it has something to do with skin tones or flesh colors, but then what about the high-contrast patchwork of green and brown fields Google finds "likely to contain adult content"? That's totally puzzling.

The confusion with medical images is way more understandable. If you squint, you can almost imagine those are pics of skin cancer or lesions.

Oddly enough, I even see violence in the "violent" picture. In an abstract, Rorschach Test sort of way. Well done, Google!

re on Oct 1, 2016 | | | [–]

> I don't understand how Google's algorithm can be misled into finding sexiness in those.

I'm reminded of a paper for which the authors generated different pictures of static that fooled neural network image classifiers into confidently identifying them as different objects: https://arxiv.org/abs/1412.1897

Wired summary: https://www.wired.com/2015/01/simple-pictures-state-art-ai-s...

> Computer vision and human vision are nothing alike. And yet, since it increasingly relies on neural networks that teach themselves to see, we’re not sure precisely how computer vision differs from our own. As Jeff Clune, one of the researchers who conducted the study, puts it, when it comes to AI, “we can get the results without knowing how we’re getting those results.”

andydoan on Oct 1, 2016 | | | [–]

Tangent: is your blog pure HTML/CSS? I really like the look of it and have been wanting to start something just like yours.

Secretmapper on Oct 1, 2016 | | | [–]

Not OP, but looks like it's built on Jekyll. If you're not familiar, it's a Static Site Generator. It generates static html files from your content (generally markdown files).

There are other SSGs, I use Hugo myself (http://gohugo.io, sample of my blog: http://arianv.com/). I like Hugo cause it's probably the fastest SSG (every time you create a post/change content, you're remaking your entire site from scratch. If you have lots of posts - this adds up!) but Jekyll is the most popular and has great tooling.

Hope that helps!

andydoan on Oct 1, 2016 | | | [–]

Awesome, I'll check both of them out. Thanks so much!

inlined on Sept 30, 2016 | | [–]

Forgive my ignorance of ML but the last bit: "you'll need your own porn to train on" confused me. Does this mean that they're just exposing the rough topology of their neutral net (eg depth) and not the actual weights between nodes? I'm curious to learn from an ML expert how much this actually offers.

zardo on Sept 30, 2016 | | [–]

It looks like they are sharing the trained network, but they aren't sharing the training data set.

daturkel on Sept 30, 2016 | | | [–]

The training set is almost certainly composed of copyrighted material.

Asooka on Sept 30, 2016 | | | [–]

Interesting thought - doesn't every single porn producer now have a valid copyright claim on the trained network? I don't see how you can argue this isn't a derivative work based on the movies they produced.

derefr on Sept 30, 2016 | | | [–]

I was debating with a friend about just this: whether a text-to-speech model is a derivative work of audio recordings by a given speaker, such that they'd then have claim on ownership of it. (You could almost certainly create an [overfit] model that could re-generate the original performance of a text from said text.)

Moot if it was a work-for-hire, of course; but if I, say, created a Samuel L. Jackson speech model by training on samples from his movies, and sold it as one of those car-navigation voices, could I be sued? By Mr. Jackson? By the copyright-holders of the movies?

And if I could, what does that imply about impersonators, who do the same thing, but with their brains?

Joof on Sept 30, 2016 | | | [–]

I imagine you couldn't put his name on it which would be a huge deal if you wanted to sell it.

x1798DE on Sept 30, 2016 | | | | [–]

I don't think it's a derivative work just because one of the inputs is copyrighted. I think it's more descriptive than derivative. Content producers don't generally own copyright in critics' descriptions of their movies or of plot summaries, even though their copyrighted material is a necessary input to the description's creation.

onli on Sept 30, 2016 | | | [–]

Also, good luck trying to prove that your film was used to train this network.

RandomBK on Oct 1, 2016 | | | [–]

You could be compelled to produce your training set during the discovery phase of a lawsuit.

eyeJam on Oct 1, 2016 | | | [–]

Only if there was any reasonable basis to believe that infringement had taken place. Most countries have some sort of pre-trial hearing before a civil suit to determine if there's merit to the accusation. You're generally not allowed to go on fishing expeditions without a reasonable basis for your claim(s).

cortesoft on Oct 1, 2016 | | | | [–]

It would not be considered derivative work, because what is produced is nothing like the original. There is nothing recognizable in the work produced for a court to rule on.

This is like saying the hash of the text of a book is derivative. If it were ruled that this is the case (that a hash is a derivative work) then suddenly every single number in existence is a derivative of every single other number (since there will always exist some function that will transform X into Y.)

mozumder on Sept 30, 2016 | | | | [–]

It would fall under "transformative" fair use.

rahkiin on Sept 30, 2016 | | | | [–]

Except the 'derivative' has nothing in common with the original.

cowsandmilk on Sept 30, 2016 | | | | [–]

I don't think you could argue this is a derivative work and not be sanctioned by the courts for bringing frivolous litigation.

ganwar on Sept 30, 2016 | | | [–]

If you want to train yes. AFAIK, they are releasing a pre-trained model which you can just use right away. There is no sense in "topology" of this yahoo specific model as it is a caffenet.

eva1984 on Oct 1, 2016 | | | [–]

You initialized the nets using their weights, and then provide your own data, in this case, a list of images contains a (porn|no-porn) label to 'fine-tune' the nets towards your usage case.

wildpeaks on Sept 30, 2016 | | [–]

Direct link to Github: https://github.com/yahoo/open_nsfw

zfedoran on Sept 30, 2016 | | [–]

Has anyone tried taking the features that are learned at the various layers of a neural net and feeding them into something like this: https://news.ycombinator.com/item?id=12612246?

I imagine we would get some really interesting images back...

echelon on Sept 30, 2016 | | [–]

Can you run deep dream on this? That would be quite fascinating.

RandomInteger4 on Sept 30, 2016 | | [–]

I think you misspelled "horrifying". I can only imagine that producing something akin to the human centipede ... "Infinite Girls, One Cup" ...

NicoJuicy on Sept 30, 2016 | | [–]

We are not releasing the training images or other details due to the nature of the data, but instead we open source the output model which can be used for classification by a developer.

I'm guessing the one who had to input the data/images had a fun time at work :p

bazzargh on Sept 30, 2016 | | [–]

I used to work in a company which had a division doing manual image classification next door. Not a fun time at all, the people who worked there regularly burned out on relentlessly seeing terrible things.

davegauer on Sept 30, 2016 | | | [–]

I've often thought that it would be even more helpful to automatically filter violent images. Particularly to spare humans from having to be the filters (and "relentlessly see terrible things").

However, I imagine that's far more difficult to accomplish. How do you detect graphic violence? Looking for blood isn't going to cut it. Also, I can't imagine how you'd separate the fictional from the real - I can watch horror movies with realistic special effects all day, but real violence/mutilation/death bothers me deeply.

packetslave on Sept 30, 2016 | | | | [–]

likewise the folks at YouTube that have to evaluate potential CP / gore / etc. all day long.

Joof on Sept 30, 2016 | | | [–]

Or they just turned safe search off and collected everything that doesn't also turn up on moderate search.

darklajid on Sept 30, 2016 | | [–]

They acknowledge that NSFW (or pornographic) is hard to define, a la 'I recognize it if I see it'.

But looking at the meager 3 sample images I'm confused about the scoring already. Why is the one in the middle scoring the highest?

The question is an honest one. The two rightmost images seem to be interchangeable to me and are ~boring~: People at the beach. Is this network therefor already trained to include the biases of the creators?

plingamp on Sept 30, 2016 | | [–]

All ML networks are inherently biased towards its creators. My colleague recently described this issue to me as the "Old, white, male" problem. This is why most voice recognition services drastically fail when they are shown foreign accents.

vidarh on Oct 1, 2016 | | | [–]

> This is why most voice recognition services drastically fail when they are shown foreign accents

As someone with a broad Norwegian accent: This has gotten massively better over the last few years.

Not that long ago, my local cinema chain started using voice recognition to discriminate between a list of city names, and it would consistently think I said "Birmingham" when I said "London" (!).

These days, both my Amazon Fire and the Youtube app will correctly recognise most things I throw at it, including e.g. names of random Youtube channels that bear no relation to real English words.

It's by no means perfect, but it's getting there. In relation to the "old, white, male" problem (well, I do somewhat fit that), presumably because these systems are now finally trained on huge and varied data sets.

eggoa on Sept 30, 2016 | | [–]

I wonder if anyone at Yahoo tried using this to "deconvolute" noise into Cronenberg nightmare porn?

TrueDuality on Sept 30, 2016 | | [–]

Sit back grab some popcorn. Lets see how long it takes people to start running data backwards to get new original porn.

devonkim on Sept 30, 2016 | | [–]

This was done before and posted to HN a while ago somewhat (albeit not in the same sense like DeepDream) NSFW http://blog.clarifai.com/what-convolutional-neural-networks-...

pluma on Sept 30, 2016 | | | [–]

I see you never have seen what kind of images that tends to generate.

Or you're really into R'lyehian porn.

venomsnake on Sept 30, 2016 | | | [–]

Cthulhu has nice tentacles. So I guess it will be run of the mill hentai.

pluma on Oct 4, 2016 | | | [–]

You have no idea. Try googling for "deep dream porn".

krapp on Sept 30, 2016 | | | | [–]

>Or you're really into R'lyehian porn.

It's called tentacle hentai...

lifeisstillgood on Sept 30, 2016 | | [–]

My first thought was from years ago, when I was pitching open source forensic services to London police (did not get far, bad Salesman that I am)

Cataloging, categorising pornography seized is a nasty job and one that cops across the planet might do better with good common OSS tools.

Hopefully this will help

c3534l on Sept 30, 2016 | | [–]

My first thought: would probably be very useful for sites to crack down on inappropriate content.

My second thought: I could probably use this to find porn in unexpected places via a webscraping Python program.

m-i-l on Sept 30, 2016 | | [–]

Good to see they've automated this (beyond the initial classification of training data). In the early days of the web, such filters were typically based on manually maintained lists of sites. I actually met someone at a party once whose full-time job was to surf for porn, to maintain the filter for a provider of IT services to schools (he worked for a company now called RM Education). He said it was his ideal job for the first few days, but soon grew tiresome (note that back in those days there wasn't really any extremely objectionable material on the web).

SloopJon on Sept 30, 2016 | | [–]

Anyone else see the irony in acknowledging that NFSW is subjective and contextual, but assuming that pornographic images are not?

CodeMage on Sept 30, 2016 | | [–]

There's no definition of "ironic" that I can think of applying to that. It's like saying that "beautiful painting" is subjective and contextual, but "still life painting" isn't. It just happens that pornography is almost[1] always considered NFSW, but again, I can't see how that is ironic.

[1] Almost, because porn is SFW when your work involves porn.

SloopJon on Sept 30, 2016 | | | [–]

I think it's easier to agree that a picture of a naked person is NSFW than that it's pornographic.

robbrown451 on Sept 30, 2016 | | | [–]

I didn't see where they said that pornographic images are not subjective and contextual. In fact it seemed like they were using "NSFW" and "pornographic" almost interchangeably, while acknowledging that the current implementation doesn't deal with violent images etc.

For instance, if you read it like this, it still makes sense (I replaced NSFW with pornographic):

Disclaimer: The definition of pornographic is subjective and contextual. This model is a general purpose reference model, which can be used for the preliminary filtering of pornographic images. We do not provide guarantees of accuracy of output, rather we make this available for developers to explore and enhance as an open source project.

joshmn on Sept 30, 2016 | | [–]

I'm not a deep learning person whatsoever, but I do have an interesting use case that I won't disclose publicly: Is there a way to build this, and output detections based on the, ugh, object it has detected?

e.g.

penis 0.94

vagina 0.01

pjreddie on Sept 30, 2016 | | [–]

Yes. You would have to have a large training set with these labels but it would be pretty straightforward to train. You would probably want a tagging model not a classifier because there could be multiple objects of interest in the same image. If you get me the training data I could train a model for you pretty quickly.

bbctol on Sept 30, 2016 | | | [–]

And it would be an... interesting job to tag the training set. Although for higher level content, I suppose lots of porn videos have very specific category tags that could be an interesting data set to play with. Uh, to analyze.

inlined on Oct 1, 2016 | | | [–]

Iirc you can use ML to learn the tags as well. They tend to be in text surrounding the media

joshvm on Oct 1, 2016 | | | | [–]

What's the current ML pet method for multi-label image classification? It seems like you could string together a bunch of individual classifiers e.g. "Scene contains dog", "Scene contains cat", but is there an efficient (and effective) way of doing it in one go? Does it significantly increase the complexity of the network? I would imagine a cat detector would be far simpler than a cat and/or dog detector.

Matumio on Oct 1, 2016 | | | [–]

Maybe Image Captioning: http://cs.stanford.edu/people/karpathy/densecap/ (recently discussed on HN too)

joshmn on Sept 30, 2016 | | | | [–]

> large training set

Is the training set itself large, or are you meaning a... large training set?

(sorry I couldn't help myself)

anc84 on Oct 1, 2016 | | | [–]

There was a asian paper about this years ago for detecting "them" for blurring purposes, it worked quite well.

slowmovintarget on Oct 1, 2016 | | [–]

I think I'll pass on browsing the the deep dream visualizations for this.

zuzun on Sept 30, 2016 | | [–]

With access to Flickr and Tumblr it must have been very easy to create a huge training set for such a task.

prirun on Oct 1, 2016 | | [–]

Aren't there more important problems to work on than worrying about someone looking at naked people? This is just what we need: more effort spent on censoring and controlling people.

myth_buster on Oct 1, 2016 | | [–]

Won't you say that preventing a kid from accidentally viewing porn while searching images with `Safe Search: On` an important problem?

prirun on Oct 1, 2016 | | | [–]

Well, considering the other shit they look at and participate in, like video games with people killing each other and blood spattering everywhere, I'd rather they were viewing naked, non-violent people.

cvwright on Sept 30, 2016 | | [–]

Reminds me of this post from hackerfactor where he describes his own porn filter based on pHash.

http://www.hackerfactor.com/blog/index.php?/archives/529-Kin...

It'd be interesting to see a direct comparison of the two. Off the cuff, I'd expect the deep neural network to be more accurate and better at generalizing, but much more expensive to train.

Dim25 on Sept 30, 2016 | | [–]

another work in this field: "Adult video content detection using Machine Learning Techniques" PDF: http://colorlab.no/content/download/37238/470343/file/Victor...

johnnyo on Sept 30, 2016 | | [–]

I'll bet this would be a good tool for sysadmins or network administrators to run against their network and see what it finds.

chadscira on Sept 30, 2016 | | [–]

Awesome!

I have been using nude.js to do this ( http://s.codepen.io/icodeforlove/debug/gMrEKV ), which is hit or miss.

ganwar on Sept 30, 2016 | | [–]

To be precise they are only releasing the already trained model. The associated dataset is not being made public.

Thus, it is meant to be for off the shelf use rather than being able to tinker with the network to produce nuanced results.

mattthebaker on Sept 30, 2016 | | [–]

Or they just don't want to be distributing gigabytes of porn... most of which is probably under copyright.

Making the data set available and whether you can tinker with or retrain it are very different things.

patrickaljord on Oct 1, 2016 | | [–]

I wonder what would happen if we stopped firing people for watching NSFW images. I mean bosses look at NSFW images all the time and it sounds like a shallow reason to fire someone.

davidgerard on Oct 1, 2016 | | [–]

Creates a hostile work environment, however.

Joof on Sept 30, 2016 | | [–]

Are there any other fairly basic image recognition problems that people want? I'd be happy to provide as long as a dataset is easy to collect.

CompanionCuube on Sept 30, 2016 | | [–]

Has anyone run this NN on the censored Facebook image?

askew on Oct 1, 2016 | | [–]

Interesting that the photo of two women on the beach is given a higher NSFW rating than the photo of a man on the beach.

Happpy on Sept 30, 2016 | | [–]

Could this work on mobile to detect 18+ content in images or video? Or would it be a trained library of 50mb+?

KennyCason on Sept 30, 2016 | | [–]

I literally just started working on this problem 2 hours ago >_<

zwindl on Oct 1, 2016 | | [–]

That's it! That's what I'm looking for.

matheweis on Sept 30, 2016 | | [–]

is this just the network or is it a fully trained model? The TechCrunch article suggests the former but the yahoo post the latter...

tedd4u on Oct 1, 2016 | | [–]

It's the trained model. You can use it out of the box or refine it to tailor to your environment.

matheweis on Oct 1, 2016 | | | [–]

Nice, was on phone only at the time and not able to dig deeper.

Took a couple hours to get it all up and running but indeed it works, and not half badly at that!

This is obviously a way cheaper alternative to https://sightengine.com or http://imagevision.com

Kudos to Yahoo for releasing this!

cft on Sept 30, 2016 | | [–]

I hope this is ported to TensorFlow soon!

rasz_pl on Oct 1, 2016 | | [–]

oh silly Americans, its just tits

yk on Sept 30, 2016 | | [–]

I would suggest, that the link should go to Yahoo's blog post

https://yahooeng.tumblr.com/post/151148689421/open-sourcing-...

which contains some technical details. (And furthermore, I guess the HN crowd has enough Internet experience to come up with stupid jokes of their own design.)

BinaryIdiot on Sept 30, 2016 | | [–]

The Yahoo blog[1] post is far more interesting than this techcrunch "article". Suggest changing URL to the Yahoo Blog please.

[1] https://yahooeng.tumblr.com/post/151148689421/open-sourcing-...

sctb on Sept 30, 2016 | | [–]

OK, we've updated the link from https://techcrunch.com/2016/09/30/yahoo-open-sources-its-por....

Bud on Sept 30, 2016 | [–]

So this is what Yahoo was up to for the last 10 years, instead of building any sort of security, keeping Yahoo Messenger working properly, or anything else of value? Heckuva job, Yahoo.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact