It sounds like there are some valid methodological criticisms that can be made of this criminal prediction paper. But I don't like the rush to condemn the paper as "pseudo science" just because it has implications that the OP is uncomfortable with. The OP's main point, which could have been made far more succinctly, is that "criminality" is a human judgment that encodes bias, and therefore the paper authors' machine learning methods were merely trained to learn these biases. I don't think anyone will claim that there's a perfect criminal justice system anywhere. But to dismiss these results off hand on the basis that criminal justice systems are imperfect is extremely hasty. The OP is too close minded by politically correct ideology to even consider that one might actually be able to accurately predict criminality with such a methodology.
There are many behaviors we can predict about a person with a high degree of confidence just by looking at an image of him. Why is predicting criminality not one of these behaviors? If the premise is nonsense then it won't stand up to further scrutiny. OP should do a larger study that addresses the methodological deficiencies in the original paper. Definitively shoot down this paper with hardcore research and gain widespread critical acclaim.
The article's point is broader than that: the entire idea of physiognomy, "the practice of using people’s outer appearance to infer inner character", is not scientifically defensible. There's no mechanism that would cause someone's inner character to be reflected in their appearance in any consistent way. This remains true even if it's a computer algorithm, not a person, making judgements based on people's appearances.
"There's no mechanism that would cause someone's inner character to be reflected in their appearance in any consistent way." That's a pretty big leap. Down Syndrome has consistent physical characteristics that correlate with a particular set of cognitive/behavioral characteristics. While it is important to carefully critique scientific findings that may be motivated by political biases, it is also important to give science as a process the chance to find truths even if we might not like their political implications.
Sure, that's true, and I bet a Down-Syndrome-recognizing neural net could be made quite accurate. I guess what I mean to say is that there's no general mechanism that would cause someone's inner character to be reflected in their appearance in any consistent way. If you want to predict behavior from appearance, you have to actually do the science and prove that there's an underlying mechanism before you can trust that your neural net is recognizing something meaningful.
Since Kepler, science is not about mechanism but about prediction. There is often a rather fetishistic disdain of mechanism, that gets relegated to philosophical, coffee-table curiosities. See, for example, the various mechanisms that explain Newtonian and relativistic mechanics. Have you heard about them? Probably not; nobody cares, as the theories allow to make all the predictions than you need, and you do not need anything else.
The fact that there is no mechanism to explain a phenomenon that you can predict is not a problem. It may be a shortcoming of our understanding, but not in any case "scientifically indefensible", as you claim.
> To put into perspective just how extraordinary a 90% accuracy claim is, consider that a well-controlled 2015 paper by computer vision researchers Gil Levi and Tal Hassner find that a convolutional neural net with the same architecture (AlexNet) is only able to guess the gender [5] of a face in a snapshot with 86.8% accuracy. [6]
This seems like a rather misleading comparison. I would be really surprised if CNNs, which can hit <5% on ImageNet out of 1000 classes, and whose GANs can generate nearly-photorealistic clearly gendered faces, can't even distinguish gender at least that well. And guess what happens when you click through to factcheck this claim that CNNs do worse on a binary gender prediction problem than guessing hundreds of categories? You see that the facial images used by Gil & Hassner are not remotely similar to a clean uniform government ID facial photograph dataset, as they are often extremely low quality, blurry, at many angles or lighting conditions, and I can't even confidently guess the gender on several of the samples at the beginning and end, because as they say:
> These show that many of the mistakes made by our system are due to extremely challenging viewing conditions of some of the Adience benchmark images. Most notable are mistakes caused by blur or low resolution and occlusions (particularly from heavy makeup). Gender estimation mistakes also frequently occur for images of babies or very young children where obvious gender attributes are not yet visible.
It wouldn't surprise me, going off the samples, if human-level performance was closer to 86% than 100%, simply due to the noise in the dataset.
(It's also bizarre to make this claim shortly after presenting ChronoNet! So which is it: are CNNs so powerful learning algorithms that they can detect the subtlest biases and details in images to the extent of easily classifying photographs of random scenes to within years of their manufacture and so none of their results are ever trustworthy, or are they so weak and dumb that they cannot even distinguish male vs female faces and so none of their results are ever trustworthy? You can't have it both ways.)
> It wouldn't surprise me, going off the samples, if human-level performance was closer to 86% than 100%, simply due to the noise in the dataset.
I think that's kind of the point! The idea that any agent -- human or machine -- can know someone's gender, criminal convictions, or anything else about their background just from a photograph is fundamentally flawed.
No, my point is that you can't use a hard dataset to say what is possible on an easy dataset. 'Here is a dataset of images processed into static: a CNN gets 50% on gender; QED, detecting criminality, personality, gender, or anything else is impossible'. This is obviously fallacious, yet it is what OP is doing.
> The idea that any agent -- human or machine -- can know someone's gender, criminal convictions, or anything else about their background just from a photograph is fundamentally flawed.
This is quite absurd. You think you can't know something about someone's gender from a photograph? Wow.
Personally, I find it entirely possible that criminality could be predicted at above-chance levels based on photographs. Humans are not Cartesian machines, we are biological beings. Violent and antisocial behavior is heritable, detectable in GWAS, and has been linked to many biological traits such as gender, age, and testosterone - hey, you know what else testosterone affects? A lot of things, including facial appearance. Hm...
Of course, maybe it can't be. But it's going to take more than some canned history about phrenology, and misleadingly cited ML research, to convince me that it can't and the original paper was wrong.
> This is quite absurd. You think you can't know something about someone's gender from a photograph? Wow.
No, I'm saying that neither humans nor machines can determine gender solely by looking at a picture, no matter how well they're trained. There will always be examples they get wrong. The problem is not that the machines aren't as good as humans. The problem is that they're both trying to do something that's impossible.
And predicting at "above-chance levels" isn't enough. The article goes into great detail about how this kind of inaccurate prediction can cause real human suffering.
> No, I'm saying that neither humans nor machines can determine gender solely by looking at a picture, no matter how well they're trained. There will always be examples they get wrong.
This is irrelevant and dishonest. Don't go around making factual claims like something can't be done when it manifestly can usually be done.
We can't know for sure, everyone agrees. But so what? It's still very interesting and potentially useful (or dangerous, depending on your point of view) to learn about correlations.
I remember a story a professor once told me about something similar to this. He once knew of a research team that was trying to develop an algorithm to tell the difference between pictures of American and Russian tanks. They were able to achieve a very high success rate very quickly. Excited but skeptical, they decided to keep testing the algorithm on lower and lower resolution photos. Shockingly, they were still getting close to 100% identification on images of sizes around 10 by 10.
Turns out, all the pictures of Russian tanks were taken in the winter, while all the american ones were taken in the summer. All they had done was trained a model to classify how bright the picture was.
Yeah, this urban legend always gets trotted out to criticize neural networks, but after years of looking, I've never been able to confirm it, and even when Minsky tells it, he can't name any names or concrete details about when or where - for example, in Minsky's version, it was how the photographs were developed, but in yours, it's winter/summer, and in other versions, it's night vs day, or it was forest vs grass: http://lesswrong.com/lw/td/magical_categories/4v4ahttp://lesswrong.com/lw/lvh/examples_of_ais_behaving_badly/c...
And last week I heard that story but the difference was that the images of the NATO tanks were in perfect focus while the Soviet tanks were out of focus in the training images. Clearly this story is all over the place. I wonder if even the likely source is correct. https://www.webofstories.com/play/marvin.minsky/122
All of the photos in the paper, both "criminals" and "non-criminals" are from government ids. Though as the article mentions, in the pictured example all the "non-criminals" are wearing collared shirts.
This reeks real hard of overfitting. 2000 images for training a CNN feels so tiny. The paper should have included a learning curve.
In-depth and interesting article, with thoughtful exposition. It's a longer read, and I enjoyed it. (I almost said "but", which makes me a bit ashamed, because the two characteristics are not at all contradictory, we've just been trained to seek quick reads.)
tldr: Really long article about history of stupid theories people have had linking superficial traits like your face to criminal behavior and intelligence. I'm not sure why those old theories needed such debunking.
>I'm not sure why those old theories needed such debunking.
Well, in case you actually "dr", it's because those old theories not only are still held by many, but are also re-surfacing in the form of superficial deep learning applications -- as the article mentions.
To add to that, I guess the intent of the writers is explained by the following:
> We expect that more research will appear in the coming years that has similar biases, oversights, and false claims to scientific objectivity in order to “launder” human prejudice and discrimination.
Historical theories are brought up (1) to show that they are resurfacing and (2) because sciences (and I'm inclined to say DL in particular) are susceptible for making similar mistakes. The hard part is that these biases are often not clear at all, as they are based on general preconceptions/stereotypes and the theories thus confirm something we think we know (confirmation bias).
For example, I have been examining emotion recognition software [1] with which, just as with physiognomy, the face is taken as a proxy for a person's mental state. Just as OP examines the terms "criminal" and "justice" one could inquiry into the concepts underlying the digitization of emotions, such as "anger" and "joy". Terms that seem very clear on a brief encounter, but when further examining them turn out to be heavily influenced by eg. culture. Though not as obviously poignant as incriminating an innocent person, one should still wonder then what it means to feel 34% angry.
Now, this is a single example, but I guess OP's use of historical theories allows for a critical look at more DL applications out there. And maybe helps convince laymen (policy makers that buy and employ such technology) that DL is not an easy answer to complex social/political problems, such as OP's example of Faception's classifiers for terrorists, paedophiles and white-collar offenders.
> Rapid developments in artificial intelligence and machine learning have enabled scientific racism to enter a new era, in which machine-learned models embed biases present in the human behavior used for model development. Whether intentional or not, this “laundering” of human prejudice through computer algorithms can make those biases appear to be justified objectively.
Let's be honest here. Realistically, could those biases ever be justified objectively to the satisfaction of the author? Probably not, because any evidence in their favor will likely be swiftly dismissed as itself racist, and thus, somehow automatically invalid.
On the issue of race, the modern leftist increasingly resembles a Sixteenth Century geocentrist desperately adding epicycle after epicycle in a futile struggle to preserve their cherished ideal. I suggest listening to Sam Harris's recent interview with Charles Murray. It's a rare, refreshing example of a prominent leftist at long last conceding the existence of a reality that they for so long denied:
There are many behaviors we can predict about a person with a high degree of confidence just by looking at an image of him. Why is predicting criminality not one of these behaviors? If the premise is nonsense then it won't stand up to further scrutiny. OP should do a larger study that addresses the methodological deficiencies in the original paper. Definitively shoot down this paper with hardcore research and gain widespread critical acclaim.