> Usually people are concerned about maximising prediction accuracy, and never s...

tombone12 · on July 7, 2016

You entirely miss the point! The point is that in supervised learning for example, if you optimize prediction accuracy with respect to your human generated examples, you will get a model that exactly reproduces the racist judgment of the human that generated your training set.

AnthonyMouse · on July 7, 2016

> The point is that in supervised learning for example, if you optimize prediction accuracy with respect to your human generated examples, you will get a model that exactly reproduces the racist judgment of the human that generated your training set.

In which case you aren't optimizing prediction accuracy. Prediction accuracy is measured by whether the predictions are true. If you have bias in the predictions which doesn't exist in the actual outcomes then there is money to be made by eliminating it.

It seems like the strangest place to put an objection where the profit motive is directly aligned with the desired behavior.

wodenokoto · on July 7, 2016

You need to think about how we measure truth and even what truth is.

In machine learning we tend to assume the annotations and labels are "true" and build a system towards that version of the "truth".

> Prediction accuracy is measured by whether the predictions are true.

The more I think about this sentence, the less sense it makes. Prediction accuracy can only be measured against records of something, and that record will be a distortion and simplification of reality.

AnthonyMouse · on July 7, 2016

> Prediction accuracy can only be measured against records of something, and that record will be a distortion and simplification of reality.

Prediction accuracy can be measured against what actually happens. If the algorithm says that 5% of people like Bob will default and you give loans to people like Bob and 7% of them default then the algorithm is off by 2%.

wodenokoto · on July 8, 2016

You are still assuming everything that is recorded to be "like Bob" is the truth and captures reality clearly.

Moreover you need to give loans to everybody in order to check the accuracy of the algorithm. You can't just check a non random subset and expect to get non biased results.

AnthonyMouse · on July 8, 2016

> You are still assuming everything that is recorded to be "like Bob" is the truth and captures reality clearly.

Nope, just finding correlations between "records say Bob has bit 24 set" and "Bob paid his loans." The data could say that Bob is a pink space alien from Andromeda and the algorithm can still do something useful. Because if the data is completely random then it will determine that that field is independent from whether Bob will pay his loans and ignore it, but if it correlates with paying back loans then it has predictive power. The fact that you're really measuring something other than what you thought you were doesn't change that.

> Moreover you need to give loans to everybody in order to check the accuracy of the algorithm. You can't just check a non random subset and expect to get non biased results.

What you can do is give loans to a random subset of the people you wouldn't have to see what happens.

But even that isn't usually necessary, because in reality there isn't a huge cliff right at the point where you decide whether to give the loan or not, and different variables will place on opposite sides of the decision. There will be people you decide to give the loan to even though their income was on the low side because their repayment history was very good. If more of those people than expected repay their loans then you know that repayment history is a stronger predictor than expected and income is a weaker one, and if fewer then the opposite.

TuringTest · on July 7, 2016

You have described the reason why many of us don't consider maximizing profit to be a desired behavior in all circumstances.

AnthonyMouse · on July 7, 2016

I think you misunderstand. In this case for once the profit-maximizing thing is the thing we want them to do and vice versa.

You can make a legitimate objection if the algorithm predicts that 20% of black people will default on their loans and in reality only 10% of black people default on their loans. But if the algorithm is doing that then it's losing the bank money. They're giving loans to some other less creditworthy people instead of those more creditworthy people, or not giving profit-generating loans at all even though they have money to lend. A purely profit-motivated investor is not interested in that happening.

But if it happens that disproportionately many of some group of people are in actual fact uncreditworthy, giving the uncreditworthy people credit anyway is crazy. It's the thing that caused the housing crisis. An excessive number of them will default, lose the lender's money and ruin their own credit even further. It only hurts everybody.

Houshalter · on July 7, 2016

Thats silly though. You should use ground truth of actual outcomes, not predict that useless humans would do. But even then I bet the algorithm would be less racist than the humans, if you don't give them race as a feature.

mcguire · on July 7, 2016

Yes, and you can get similar effects with unsupervised learning if the data set is biased.

hx87 · on July 7, 2016

Then the problem is in the racist human(s), not the algorithm.

yoo1I · on July 7, 2016

But the algorithm enshrines and possibly amplifies it. Or as the old saying goes:

> To err is human, to really foul things up requires a computer.

Symmetry · on July 7, 2016

Humans have much more potential for racism in making predictions about the future than in judging things that have already happened. So while I agree that racism through biased input data is a problem, I think that even with that problem machines should be substantially less racist in their judgement than the humans they're replacing even if they're not perfect.

throwawaysocks · on July 7, 2016

> maximizing prediction accuracy is inherently unbiased

This assumes that you're actually maximizing prediction accuracy, rather than taking the easiest route toward sufficiently high predictive power.

Not-so-hypothetical: you can invest $N and create a profitable model that (unfairly and inaccurately) discriminates directly based upon race, or you can invest $N*M and create a profitable model that does not discriminate on race (regardless of whether its results are racially equitable). Given the choice, a lot of people will choose the $N approach.

basseq · on July 7, 2016

> regardless of whether its results are racially equitable

Some would call this discrimination, if not outright racism.

As others on this thread have pointed out, at some point the algorithm follows the underlying data, which may also be discriminatory (zip code, etc.).