Disclaimer - I work in the financial services industry.
This is nothing new in most financial service machine learning algorithms have to meet these requirements. This is why why supervised analytics are more popular, and one of the reason why algorithms such as credit scores are typically generated and structured as scorecards with explicit reason codes for each independent characteristic, variable or vector.
This is really needed for a number of different reasons. The biggest is that algorithms are increasingly running our world. Even outside of mortgage, have you ever had to fork over the right to pull your credit score for either rent or (in some limited cases) to apply for a job? Not being able to understand:
a) Why you were rejected.
b) Why you are paying a higher interest rate.
c) What you can do to fix that?
Housing and employment is fundamentally a question of human rights, not just banking profitability or academic algorithm improvements.
Part of the reason why Credit Scoring (which is intrinsically algorithmic decision-making) is because it replaced the "old boys" network that used to dominate. In this world you went to your banker, and the decision to extend credit was dominated by your personal relationship with a banker who might not be the same ethnicity, religion, nationality, etc as you would dominate your ability to purchase housing. The credit score democratized financial access.
It can still be used to discriminate. There have been a number of studies that have shown that simply being from a white community tends to increase the number of other people you can lean on in a emergency, and hence make you less likely to default on a loan. From a pure predictability point of view, a bank is at a lower risk of default lending to someone from that community, but that in turn denies financial access to people not in that community, continuing the trend.
A big problem in this kind of financial model is that it's relatively easy to accidentally or deliberately find other variables that mirror "protected" vectors. A simplistic example is zip code, where the zip code reflects a area that is predominately ethnic minorities.
So it's not cut and dried. It's my PoV that It's not predictability versus red-tape, and the people trying to do unaccountable analytics in this space are (perhaps in-advertedily) perpetuating racism.
I think there are two problems with requiring explainable models:
Explainable models will be more easily gamed, and they are likely to be less accurate.
The features in the models themselves will become less useful at their task. They will be gamed. This is roughly along the lines of Campell's law[1], though I've seen other, better explanations that I can't find. What happens when someone is turned down for reason A and B. They go and fix reason A and B, but in the mean time, so have many other credit seekers, diminishing their predictive value. In that time, the modelers create a new, different explainable model, that no longer uses A and B, but somewhat related predictors C and D, which haven't yet been used, and so haven't yet been gamed, which the original seeker does not meet?
Explainable models, being a subset of all models, are likely to not contain the most accurate models. I don't anything about the domain of credit scoring (maybe state of the art models are actually small and understandable?), but in speech recognition, for example, models are constantly growing in complexity, their individual decisions are way beyond explainable to anyone in reasonable amount of time, and they are only getting more powerful as they get larger and more complex. In speech, models are already many gigs. In credit scoring, less accurate models mean higher rates, so there is an aggregate loss.
A fair point, but as a society, we have decided that racial discrimination is not a valid mechanism for banks to profit by. That does result in everyone paying a bit more in interest as the risk pool is larger, but a acceptable tradeoff.
In terms of gaming, verification is just as important as scoring If the data you have going into to the system is rigged, and income is not being properly validated, bad things will happen.
As a society we have directed banks to make bad loans to blacks and charge non-blacks extra to make up the difference? I'd be surprised if even 10% of people know this decision was made.
Also, what makes it acceptable to engage in this form of surreptitious wealth redistribution on racial lines?
Not being a racist makes it acceptable to not take race or a surrogate for race into condition for a loan ;-)
(Please don't take that the wrong way. I am not accusing anyone of racism. Simply stating that at some points our ideals is more valuable then an additional point of profits for the bank).
Disparate impact and it's use in credit scoring is mostly governed by Equal Credit Opportunity Act (ECOA), but most of the banks I am aware of go steps further in ensuring that disparate impact does not occur.
This is only important if you believe that racial stereotypes are true, but should be ignored. That if you control for education, income, region, etc, differences between races are still significant.
Disparate impact goes well beyond removing race as a feature. You sometimes can't use features that correlate with race, even if they are highly predictive. E.g. education.
It also has nothing to do with the profits of banks. Better prediction algorithms for loans mean less interest rates for people that are good borrowers, and less crippling debt to those that aren't. It has huge benefit to the economy and society.
I think the case of zip code based discrimination is even named. It's called redlining, and is being used in machine learning world to describe indirect discrimination based on certain attributes (eg. discriminate people based on their zip code, a zip code with mostly black population for example)
But it's also very important to understand why machine learning systems do this. If you take race neutral data, and run virtually any machine learning system on it, you'll get a race neutral output. I.e., if a $40k/year black person in 10001 is equally likely to default as a $40k/year white person in 10002, then the algorithm can be expected to give them equal credit scores.
In the event that an algorithm choose black zip codes for extra penalties, it's because the algorithm has deduced that stereotypes are correct - the bank will make more money not lending to blacks at given rates, because something directly correlated with race and not captured by fields like income/past defaults/etc is predictive.
Having consulted for folks doing lending, I'll mention an insanely difficult $100B startup idea here: build an accurate algorithm that exhibits no disparate impact. The amount of money you'll save banks is staggering.
True, and then the question becomes, is the stereotypes correct, or is it that the credit score is then propagating a system that enforces this outcome.
The stereotype (That predominantly black households in a poor area code are poorer then the predominantly white households in a rich area code) can be proven correct by simple demographics, and the fact that household wealth is correlated with credit score.
The problem is that there's a positive feedback loop at play, here.
The bank already knows their assets and income. The question is if an equally poor and educated white is just as likely to repay a loan as an equally poor and educated black. I imagine most of the difference would go away. Unless you really believe black people are inherently less likely to pay back loans, all else equal.
If you can come up with an algorithm that reproduces this conclusion - with accuracy even remotely close to the "racist" ones - banks will cross heaven and earth to pay you $billions.
I do in fact have an algorithm to remove racism from models, but I doubt its worth "$billions". The whole point of my argument is that it shouldn't be necessary. Surely you don't really believe racist stereotypes are true?
I for one of course do not believe that people from other regions/continents are somehow inherently worse or better.
But I do believe that certain elements of different cultures and different ways of social upbringing can have a lasting positive or negative effect on a person. (although who am I to judge what is positive or negative? I try not to do this, I just see differences)
If these things didn't affect how we later as adults view the world, interact with others and respond to different types of challenges in our life then you could expect that basically everyone around the world would have the same moral value system and beliefs about almost everything.
I do, in fact. There is extensive research supporting the fact that many (though not all) are accurate. The typical racist stereotype is about twice as likely to replicate as the typical sociology paper.
I didn't say that simply removing "racism" (by which I assume you mean disparate impact) is worth billions. I said doing so with the same accuracy as models which are "racist" is worth billions. Obviously you can do anything you want if you give up accuracy.
Why do you believe they are false? Simply because it's socially unacceptable to think otherwise?
I'm not sure whether or not the stereotype is true. The point is you are forced to acknowledge a pretty unpopular belief to defend this idea. And that the kind of people super concerned about disparate impact are usually not the kind of people that believe that. And I think that if you do acknowledge there are differences, the argument that it's unfair and wrong is much less strong.
In any case the link you posted earlier suggested it could be explained by different rates of numeracy and literacy. Which sound like obvious proxies for IQ. If you gave people an IQ test, or at least literacy or numeracy test, this would probably eliminate any racist bias.
> I said doing so with the same accuracy as models which are "racist" is worth billions.
You necessarily lose some accuracy if you really think race is predictive. But you need not lose all of it. The current methods for removing bias just fuzz features that correlate with race. But there is a way to make sure a model doesn't learn to use features just as a proxy for race, but instead to predict how much they matter independent of race.
> Simply because it's socially unacceptable to think otherwise?
Please stop insinuating this at people in HN comments. You've done it frequently, and it's rude. (That goes for "simply because of your mood affiliation?", etc., too.)
Dang, I'm really confused here. As Houshalter points out, he was deliberately making an argument based on social unacceptability of my beliefs. Searching for the phrase on hn.algolia.com, I've used the term 5 times on HN ever, never in the manner you imply.
I'm also confused about my use of the term "mood affiliation". Searching my use of the term, most of the time I use it to refer to an external source with a data table/graph that supports a claim I make, but a tone which contradicts mine. For example, I might cite Piketty's book which claims the best way to grow the economy is to funnel money to the rich (this follows directly from his claims that rich people have higher r than non-rich people). What's the dang-approved way to point this out?
In the rare occasion I use it to refer to a comment on HN, I'm usually asking whether I or another party are disagreeing merely at DH2 Tone (as per Paul Graham's levels of disagreement) or some higher level. What's the dang-approved way to ask this?
> usually asking whether I or another party are disagreeing merely at DH2 Tone (as per Paul Graham's levels of disagreement) or some higher level. What's the dang-approved way to ask this?
That seems like a good way to ask it right there.
Possibly I misread you in this case (possibly even in every case!) but my sense is frequently that you are not asking someone a question so much as implying that they have no serious concern, only emotional baggage that they're irrationally unwilling to let go of. That's a form of derision, and it doesn't lead to better arguments.
But if I've gotten you completely wrong, I'd be happy to see it and apologize.
As the person he was replying to, that statement wasn't rude in context. The belief being discussed really is socially unacceptable. My argument in fact, relied on that.
I personally draw the line of acceptable and unacceptable discrimination at "if the disadvantaged person can change that aspect".
Skin color cannot be changed, but zipcode certainly can. There is no rule I'm aware of that restricts visible minorities from living in rich neighbourhoods.
Except of course of the fact that visible minorities can not in fact afford to live there, or get a loan to live there, because they currently live in a poor neighborhood.
> It's my PoV that It's not predictability versus red-tape, and the people trying to do unaccountable analytics in this space are (perhaps in-advertedily) perpetuating racism.
I was with you until that last line. It is not racism to provide better credit conditions to groups that demonstrably have a lower risk of defaulting on a credit.
It would be racism if you offer worse conditions to a certain group without any rational business related explanation.
If some group has a 2x chance on defaulting on a credit then it isn't about the skin of their color - they get to pay more interest because it is more risky to offer this group a credit in the first place.
With that logic you could just as well say that offering better credit conditions to rich people is racist against poor people.
You said:
I was with you until that last line. It is not racism to provide better credit conditions to groups that demonstrably have a lower risk of defaulting on a credit.
It absolutely is, under ECOA, which explicitly maintains that you may not discriminate by race for loans.
And btw, as someone very familiar with credit scoring once mentioned to me, a small part of the reason the credit scores do not take into account what you currently make is that ability to pay is often a very poor predictor for willingness to pay. this burned a lot of people in 2006/2007 when it was assumed a good credit score justified a big loan, without any proper verification of income, assets and liabilities.
I didn't say that it is okay to discriminate by race for anything, I'm against that.
What I'm saying is that people should be treated fairly instead of using some kind of affirmative action.
If you did a proper verification of income, assets, liabilities and education, you'd likely find that some groups would have to receive worse credit conditions than others.
This doesn't mean that the system is or would be racist, it would strictly evaluate on socio-economic background, not based on race.
What I reject is to handicap one group to make up for the losses that another group generates for a bank by artificially lowering standards for certain groups.
Although this is made with best intentions, it is inherently racist (it presumes that some races are inferior to others - hence the introduction of lower standards), whereas evaluating the way I described would not be racist.
Btw: I'm myself a second-generation minority. If I'd have had the option to navigate through life with some social security autopilot and affirmative action provided by society I would have never made it to the upper strata of society.
You have to understand that expectations for life from where I and many others started were much lower than for most others, so people from this level are much more willing to accept a lower living standard provided for free than working hard for the chance to a higher standard.
From the perspective of upper middle class people these kinds of affirmative action and social benefits might seem like genuinely helping people whereas from my perspective it seems like poison that could have had inhibited my will to work on myself and fight to move upwards in society.
I was trying to think of how to specify this position more precisely:
1. Certain factors are thought to be causally correlated with higher default rates by individuals.
2. Certain populations have higher incidences of individuals matching such factors.
3. These populations have higher than average default rates.
4. Other factors that correlate with membership in these populations will thus be also correlated with higher defalt rates.
5. For an algorithm to be fair, it must be restricted to using only vectors for which causality can be proved.
One of the questions is, even if a vector is "proved" to be causal, would it be disallowed if it also strongly predicted membership in a protected population?
This is the law of unintended consequences. Data points are correlated to other data points. Unless your hiring algorithm is a random number generator, you could find some sort of "bias" you didn't intend.
At least an algorithm is measurable, repeatable, and consistent. Yes, you have to monitor the output of any process for unintended consequences, then you make tweaks as appropriate to eliminate unintended outcomes.
I made a couple of experiments on discrimination free machine learning models with naive Bayes, and I changed my perspective on data science.
Usually people are concerned about maximising prediction accuracy, and never stop to think about what correlations is the model finding down below, and the human biases present in the data annotations.
Removing sensitive variables (gender, race, etc) doesn't always help, and specially if the models have high impact on people's lives (applying to loans, university, scholarship, insurance), we cannot afford to blindly use existing annotations in black box models.
All of this excludes the fact that companies will probably maximise to profit, and will use "algorithm" as an excuse to turn down people disregarding ethics.
> Usually people are concerned about maximising prediction accuracy, and never stop to think about what correlations is the model finding down below, and the human biases present in the data annotations.
Because maximizing prediction accuracy is inherently unbiased. Bias is when the predictions made are inaccurate to the detriment of a group of people. If you had a prediction algorithm that functioned using time travel to tell you with 100.0% accuracy who would pay back their loans, there would be a racial disparity in the result, but the cause of it is not the fault of the algorithm.
And you can't fix it there because that's not where the problem is.
Suppose you have a group of 800 white middle managers, 100 white sales clerks and 100 black sales clerks. Clearly the algorithm is going to have a racial disparity in outcome if the middle managers are at 500% of the cutoff and the sales clerks are right on the line, because it will accept all of the middle managers and half of the sales clerks which means it will accept >94% of the white applicants and 50% of the black applicants.
But the source of the disparity is that black people are underrepresented as middle managers and overrepresented as sales clerks. The algorithm is just telling you that. It can't change it.
And inserting bias into the algorithm to "balance" the outcome doesn't actually do that, all you're doing is creating bias against white sales clerks who had previously been on the same footing as black sales clerks. The white middle managers will be unaffected because they're sufficiently far above the cutoff that the change doesn't affect them, even though that they're the source of the imbalance.
You entirely miss the point! The point is that in supervised learning for example, if you optimize prediction accuracy with respect to your human generated examples, you will get a model that exactly reproduces the racist judgment of the human that generated your training set.
> The point is that in supervised learning for example, if you optimize prediction accuracy with respect to your human generated examples, you will get a model that exactly reproduces the racist judgment of the human that generated your training set.
In which case you aren't optimizing prediction accuracy. Prediction accuracy is measured by whether the predictions are true. If you have bias in the predictions which doesn't exist in the actual outcomes then there is money to be made by eliminating it.
It seems like the strangest place to put an objection where the profit motive is directly aligned with the desired behavior.
You need to think about how we measure truth and even what truth is.
In machine learning we tend to assume the annotations and labels are "true" and build a system towards that version of the "truth".
> Prediction accuracy is measured by whether the predictions are true.
The more I think about this sentence, the less sense it makes. Prediction accuracy can only be measured against records of something, and that record will be a distortion and simplification of reality.
> Prediction accuracy can only be measured against records of something, and that record will be a distortion and simplification of reality.
Prediction accuracy can be measured against what actually happens. If the algorithm says that 5% of people like Bob will default and you give loans to people like Bob and 7% of them default then the algorithm is off by 2%.
You are still assuming everything that is recorded to be "like Bob" is the truth and captures reality clearly.
Moreover you need to give loans to everybody in order to check the accuracy of the algorithm. You can't just check a non random subset and expect to get non biased results.
> You are still assuming everything that is recorded to be "like Bob" is the truth and captures reality clearly.
Nope, just finding correlations between "records say Bob has bit 24 set" and "Bob paid his loans." The data could say that Bob is a pink space alien from Andromeda and the algorithm can still do something useful. Because if the data is completely random then it will determine that that field is independent from whether Bob will pay his loans and ignore it, but if it correlates with paying back loans then it has predictive power. The fact that you're really measuring something other than what you thought you were doesn't change that.
> Moreover you need to give loans to everybody in order to check the accuracy of the algorithm. You can't just check a non random subset and expect to get non biased results.
What you can do is give loans to a random subset of the people you wouldn't have to see what happens.
But even that isn't usually necessary, because in reality there isn't a huge cliff right at the point where you decide whether to give the loan or not, and different variables will place on opposite sides of the decision. There will be people you decide to give the loan to even though their income was on the low side because their repayment history was very good. If more of those people than expected repay their loans then you know that repayment history is a stronger predictor than expected and income is a weaker one, and if fewer then the opposite.
I think you misunderstand. In this case for once the profit-maximizing thing is the thing we want them to do and vice versa.
You can make a legitimate objection if the algorithm predicts that 20% of black people will default on their loans and in reality only 10% of black people default on their loans. But if the algorithm is doing that then it's losing the bank money. They're giving loans to some other less creditworthy people instead of those more creditworthy people, or not giving profit-generating loans at all even though they have money to lend. A purely profit-motivated investor is not interested in that happening.
But if it happens that disproportionately many of some group of people are in actual fact uncreditworthy, giving the uncreditworthy people credit anyway is crazy. It's the thing that caused the housing crisis. An excessive number of them will default, lose the lender's money and ruin their own credit even further. It only hurts everybody.
Thats silly though. You should use ground truth of actual outcomes, not predict that useless humans would do. But even then I bet the algorithm would be less racist than the humans, if you don't give them race as a feature.
Humans have much more potential for racism in making predictions about the future than in judging things that have already happened. So while I agree that racism through biased input data is a problem, I think that even with that problem machines should be substantially less racist in their judgement than the humans they're replacing even if they're not perfect.
> maximizing prediction accuracy is inherently unbiased
This assumes that you're actually maximizing prediction accuracy, rather than taking the easiest route toward sufficiently high predictive power.
Not-so-hypothetical: you can invest $N and create a profitable model that (unfairly and inaccurately) discriminates directly based upon race, or you can invest $N*M and create a profitable model that does not discriminate on race (regardless of whether its results are racially equitable). Given the choice, a lot of people will choose the $N approach.
> All of this excludes the fact that companies will probably maximise to profit, and will use "algorithm" as an excuse to turn down people disregarding ethics.
If companies are maximising profit, then they will treat a person _exactly_ according to everything they can observe about that person. How is that unethical?
The problem that people have is that they don't want objectivity: they want to be able to pretend that something false is true, because it is comfortable.
You wrote a program that rigs the input data in favor of one class in such a way that the classifier results are more or less uncorrelated with membership in that class. Am I understanding the code correctly?
I don't see how you eliminated "human biases present in the data annotations" here. It seems like your program merely rigs the input in order to get the kind of output you like to see.
Obviously you can tamper with the data to get any kind of result you want. You can force equal outcomes for "Never-worked" and "Private" but that doesn't mean you're removing some inherent bias in the original data.
Also, this sentence:
> All of this excludes the fact that companies will probably maximise to profit, and will use "algorithm" as an excuse to turn down people disregarding ethics.
is kind of weird in the context of your comment and program. If you are correct in your fears of biased data, then accounting for the bias in the data maximizes profit for the companies. Ethics shouldn't be necessary if it's just about bias in the input data.
Yes, in the last example you can see it as rigging the input data. I replicated the results of the experiment in the paper, and I made some remarks of this sorts in my own report and discussion.
The idea is that a sensitive parameter should not contribute to a decision, so, for example, the probability of group A or group B having access to a loan should be the same:
P(Loan|A) should be the same as P(Loan|B)
This is dangerous, as the discrimination metric is sensitive to biases present in the data. It can, effectively, make it easier for the discriminated group A to get a Loan in comparison to a person in group B in same situations. This happens if the bias is not in the annotations, but in the demographics of the dataset.
This is a really interesting problem, and I don't have answers for it.
Capitalism leads to some unfavorable outcomes.
Algorithms do capitalism better, leading to more unfavorable outcomes.
Therefore, algorithms are the problem.
I don't know what I missed, but it looks like algorithms merely provide the reductio for something you already don't like.
Is my reading of this law correct? It sounds like it outlaws all use of algorithms to evaluate people. The right to explanation is just for the rare exceptions where a "member state" authorizes it. But the legalese is difficult to parse, and no one else seems to get this impression.
I think this is really bad. That's the majority of uses of machine learning. It also has a lot of economic value to predict things like how likely someone is to pay back a loan, or even who is a spammer.
Most importantly, most these applications will go back to human judgement. And humans are far worse. Human predictions about candidates are really bad. We are incredibly biased by things like race, gender, and especially attractiveness. Unattractive people get twice the sentences of attractive people. Not to mention random stuff, like people being judged more harshly when the judge is hungry before lunch. They also rarely give explanations for their decisions, if they are even aware of the true reasons for them.
Going back to humans is a huge step backwards and will hurt a lot more people than it helps. I think the same regulations that apply to algorithms, should apply to humans. That would show the absurdity of these laws. But humans are algorithms after all. And particularly bad ones (for this purpose anyway.)
> It sounds like it outlaws all use of algorithms to evaluate people. The right to explanation is just for the rare exceptions where a "member state" authorizes it.
Who says that this is rare? That's a pretty standard way EU regulations are written to mean "Action A can only be allowed if conditions B are met".
If the regulation is approved, member states take a look and change their laws accordingly. They could outlaw algorithmic decision making if they wanted to (but they wouldn't need that regulation to do that). Much more likely is that they impose those conditions on the companies that use such algorithms just like they already impose a lot of conditions (such as "ethnicity is not allowed to be a direct input variable" currently is).
> If the regulation is approved, member states take a look and change their laws accordingly.
Small nitpick id I may: you seem to be confusing Regulations with Directives, which must get transcribed within a certain timeframe in national law. Regulations apply directly - no transcription needed.
Also: either way, EU law has primacy over national law.
Thanks for pointing this out, my post was not entirely clear on that matter. This proposed regulation does in fact contain some directive-like clauses with regards to the issue at hand. In this specific case they require the member states to disallow certain practices in their national law.
It's also not uncommon for member states to enact laws dealing with the effects of a (directly applicable) regulation.
> Most importantly, most these applications will go back to human judgement. And humans are far worse.
I think you have to be careful there. Leaving decisions to machines embeds an assumption that all information related a decision can (and has been) encoded in a machine-interpretable manner.
How often are circumstances and context taken into account during sentencing hearings? How often do judges actually try to consider things like whether the offender feels remorse?
Sure, these kinds of things can be encoded into numbers, but should they be? What does it mean to encode remorse into e.g. a 7-valued point system? How much does that depend on the person entering the number, who must judge this, or should it be detected automatically using sensors? Should we be modifying sentences according to an offender's level of galvanic skin response?
This is just an example of course, but my point in general is that you make a huge assumption that factors related to decision making can be encoded properly and in an unbiased manner.
I'd even say that many of the factors that would need to be entered into a computer would have to be judged by a human, and therefore leaving the actual decision-making to a computer is a bit of a symbolic gesture in that sense.
I would guess that, given the same data, a computer could/would be programmed to make the same judgement as a human. It is the data extraction process itself that incurs bias, not the algorithm.
Furthermore, if you really leave things up to only externally and easily measurable data, you might not get the results you want or expect. A computer might see that black people are much more likely to commit crime, because hey, more black people are in jail, and therefore judge that the black person was more likely to be the criminal, given a choice. But is this the correct way to think? Indeed, a human would have to think really hard to correct this kind of bias. In other words, saying that an algorithm is less biased than a human judge is somewhat of a non-statement, since the judge actually follows fairly strict rules regarding what the facts imply -- it is more how he judges what those facts are that changes the sentencing. Meanwhile, on a computer, the programmer has a lot more power than you might assume in controlling how facts are weighted in order to come up with what is seen as a "fair" decision. And that fairness is a human judgement, fundamentally speaking, since it is about the relationship between a community and an individual.
In short, I don't think you can replace humans with machines when it comes to making decisions about how to deal with humans. The machine can only provide inferences from facts, but the values that those facts imply are a fundamentally human concept. It is a social thing, not an algorithm thing.
Another way to put it: if laws could be perfectly encoded by logic, we would have no need for lawyers and judges. These roles exist because as a society, we are constantly re-evaluating the logic of our society. The rules are not something that can be written statically. They must change as new ideas, problems, and circumstances change.
> Leaving decisions to machines embeds an assumption that all information related a decision can (and has been) encoded in a machine-interpretable manner.
Im not assuming that at all. Even with less information , algorithms generally outperform humans. E.g. a hiring algorithm accurately predict job performance from just info on an application, while someone who actually interviews them is barely better than chance. I'm on mobile right now, but I have some links to a lot of research like this, when just one or two simple features is enough to beat human experts.
> Sure, these kinds of things can be encoded into numbers, but should they be? What does it mean to encode remorse into e.g. a 7-valued point system? How much does that depend on the person entering the number, who must judge this, or should it be detected automatically using sensors? Should we be modifying sentences according to an offender's level of galvanic skin response?
First of all, the optimal way to do it would be to have the human give a recommendation, then have the algorithm use that as a feature. So it can take human input into account, but also improve on it.
Second, encoding subjective things as real values just lays bare the silliness of it. A human factoring this things in his head is not magically more objective or less biased. Trying to rate subjective features can sometimes improve performance for humans too, since it makes them actually think about it consciously .
> A computer might see that black people are much more likely to commit crime, because hey, more black people are in jail, and therefore judge that the black person was more likely to be the criminal, given a choice. But is this the correct way to think?
But I'm not talking about predicting if someone is a criminal . I'm taking about things like loan defaults and health risk. Here the company has pretty objective ground truth. If women really are less likely to get into car accidents, then it's just a fact. Theres no hidden bias here.
> A human factoring this things in his head is not magically more objective or less biased.
I get your point, and I think you make some good arguments, especially regarding the small number of data points needed. In fact, in this regard I think that machine learning (or statistics anyway) can be a very useful tool for informing decisions. I think it should be regarded similarly to how experts are used in court.
But I think you're missing mine. It is not that humans are better or worse at having bias when considering factors. It is that some factors are inherently biased. And it is the choice of what we do with those biases that we call "values".
Now, do you think "values" are things that can be encoded in a machine-precise fashion?
Well, it's what we call "law". However, law is obviously _not_ something that can be applied "by algorithm," because otherwise we would have nothing to discuss when it comes to hearings.
> But I'm not talking about predicting if someone is a criminal . I'm taking about things like loan defaults and health risk. Here the company has pretty objective ground truth. If women really are less likely to get into car accidents, then it's just a fact. Theres no hidden bias here.
That is true. I think that categorizing people is something that will never go away. Whether it feels "right" or not is certainly debatable. People with pre-existing conditions aren't exactly happy with the current state of insurance. Do you think black people like it when they can't get a loan because they are black? Do we want society to work that way? You say that the machine can be used to make a better decision -- simple, remove the "skin colour" category. But perhaps that leads to a decrease in the confidence interval. Insurance/banks don't do that, because they consider it important information -- hence, (maybe) the need for regulation. I think this EU regulation is going to be a catalyser for some very interesting discussions around these issues.
Basically what I'm saying is, talking about algorithms vs. humans is a side issue, when the real discussion is, on what kind of data do want decisions to be made?
The United States does not have anything as sweeping as this, but in the limited area of credit the Equal Credit Opportunity Act (ECOA) requires that lenders that turn down your credit application give you an explanation. From the Federal Trade Commission's site:
The creditor must tell you the specific
reason for the rejection or that you are
entitled to learn the reason if you ask
within 60 days. An acceptable reason might
be: “your income was too low” or “you haven’t
been employed long enough.” An unacceptable
reason might be “you didn’t meet our minimum
standards.” That information isn’t specific
enough.
(The ECOA is at 15 U.S. Code § 1691, for those who want to go find the gory details)
In one of the lectures from Caltech's "Learning from Data" MOOC the professor mentioned that problem came up at a major lender that he had consulted for. He did not say how they solved the problem.
I wondered if you could generate a satisfactory rejection reason, at least as far as the law is concerned, by taking rejected applications and running them through the system again but with some of the inputs tweaked until they get an acceptance. Then report that the rejection was due to the un-tweaked parameter value.
For instance, if you picked income to tweak, you'd raise the income by, say, $5000 and try again. If that is rejected, raise it another $5000. If it then passes, you can tell the applicant they were rejected because their income was too low, and that an additional $10000 income would be enough to qualify them for a loan.
You'd have to put a bit of sophistication into this to make the explanations reasonable. For instance, if a rejected application would be approved with a very large tweak to income or with a small tweak to employment length it would probably be better to give employment length as the reason for rejection.
A potential problem with this is that if you "tell the applicant they were rejected because their income was too low" then if this is disputed then they will likely be able to point to many other applicants with even lower income that were accepted, because their total combination of factors was better.
A simple explanation "factor X is too low" implies that there exists a particular cutoff that is required and sufficient; but an explanation that accurately describes why you were rejected would likely be too complex to be understandable.
But yes, probably the intended result could/should be simply identifying the top 1-2 factors that dragged your score down compared to the average accepted candidate (which can be rather easily done automatically even for the most black-box ML methods) and naming them appropriately.
> an explanation that accurately describes why you were rejected would likely be too complex to be understandable.
This is the core of the problem. Have you ever tried to explain a statistical segmentation to someone? It requires them to let go of how they think about if-x-then-y criteria and attributes. Very intelligent people who understand statistics, weighting, clustering, etc. struggle with it--think about an "average" person.
You can back into a reason: "Here's the attribute(s) on which you deviate the most from the median of the segment of which you want to be a part." But it's not a linear, single-variable "fix".
You can look at the gradient of the inputs. It's not just that your income is too low, but also that you don't have enough credit score, past history, etc. This basically gives you a linear model that is locally accurate at predicting what the more complicated model will do. Linear models are pretty interpretable I think.
There was this idea (I think I stumbled upon it on Bruce Schneier's blog)
where the algorithm used to judge your credit score was actually public, and
there was a way to dispute its outcome if you felt it was unfair in your case.
Interesting from an academic perspective, but absolutely inaccessible to 99% of the population. If I don't get the outcome I want, I will dispute it (and people already do) because I think it's unfair.
And maybe it is in your case. Are all algorithms designed to be "fair"? Should they be? Can they be, if everyone has an exception or a loophole? Or at some point do "disputes" go back to a human with the same biases that algorithms were intended to solve?
> The United States does not have anything as sweeping as this
The EU takes discrimination much more seriously. For example auto insurance in the EU can't charge men and women different amounts, meanwhile auto insurers in the US shamelessly advertise that they do so.
But men are much riskier drivers. That's has nothing to do with sexism or discrimination against men, it's just an empirical fact. Why should women have to subsidize the cost of men?
Why is it acceptable to you to have sex explicitly in the model, but not race?
If black men are empirically riskier drivers than Asian men (something insurance companies are currently legally forbidden to discriminate on), would you be fine with charging black men more so Asian men don't "subsidize" them?
Only because race has a history of irrational prejudice and discrimination. And because race shouldn't really be predictive of anything, or at least that's the politically accepted belief. So it's been made illegal to take that into account. Which is fine, if it was just limited to that. But it's led to hundreds of other categories also being "protected", and now they are outlawing categorizing people at all. It's ridiculous.
I'm perfectly fine with Asian drivers getting cheaper insurance, if they really are somehow safer drivers. Why should they pay more, if they aren't more risky?
What about wooden buildings getting higher insurance rates? Wooden buildings are more likely to be damaged in natural disasters, burn down, suffer water damage, etc. But that's not the fault of the owner. Any specific owner could take really good care of their building, have sprinklers and alarms, maintain everything to perfect condition, etc. Yet they still pay the higher rate, so it's not a fair system. But as a group, wooden buildings are just less safe.
An older person is probably going to pay much more in life insurance than a young person. Or an overweight person, or a smoker. It may not be fair on an individual level. After all not every smoker gets cancer, not every overweight person gets diabetes, etc. But as a group, some groups have more risk than others.
The point of insurance is to predict risk as accurately as possible. Using whatever information is available. If we had time machines, we could go into the future and see what people would get into car accidents and take away their licenses. Or tear down buildings that are going to fall down and hurt people. But because we don't have that information, insurance acts as a way to hedge risk. There is zero benefit to society by making their predictions less accurate, and perhaps serious economic costs.
Good luck arguing the 'why' when you have a trained neural network and all you have is the network and weights.
On a more serious note, I love transparency but again, this is an overeager regulation (not surprising from the EU). You almost never get the true reasons for being rejected, be it in an interview or for a credit, etc. You have to figure that out yourself from the often very vague rejection letter. Our minds are basically running algorithms to which we have even less insight than our computer algorithms. Therefore, this law only hampers the flourishing of the economy while providing no value whatsoever.
Another law that 'solves' a non-issue, brought into being by overpaid career politicians.
If all you have is a network and weights, you have a black box. While it might be magic and make the right decision most of the time, we need to look in and see why those decisions are being made, somehow.
This generally basically means seeing how the net was trained, and its initial conditions. If you threw away your training methodology and data set, why should anyone trust that your algorithm can make suitable decisions? Are we to assume that nobody who writes AIs which are making influential decisions has vested interests?
If you look for example at common law; we have a system where decisions by judges are binding, and future equivalent cases are required to uphold the previous decisions. The only way this system can work of course, is if someone is keeping a record of the previous decisions. The alternative is we'd have a system of judgement based on heresay, and we can have about as much confidence in it as asking a random person on the street to make the decision.
The technical challenge is really about storage and retreival. If we know that discarding training data makes our neural networks "unaccountable for their decisions", then our technical requirement must be that we store the training data in its entirity, such that we can look back and maybe glimpse at why a neural network may behave as it has.
For example, I might create an AI which is used to decide whether to give someone a mortgage. I could have millions of samples as training data, but I might chose to sort the training data by race, and begin training a network initially so that non-white people who have been rejected mortgages initially overtrain the network to correlate race with rejection, then use a limited sample of white man, mostly accepted mortgage requests to train it the other way.
Of course this is extreme, and an expert who looked at such data set would quickly notice that the AI is unfit for purpose, and blatantly racist. But he needs the training set to even have a chance of concluding that. Without it, he has effectively random numbers which tell him next to nothing.
These laws aren't meant to stifle innovation or economic benefits, but only to ensure that fair treatment is practiced in their development. As far as I see it, if you have a neural network, a sound justification of its design and the methodology you used, combined with a complete data set and training set which can be analysed for biases, then there's no reason these regulations should get in your way.
I suppose we can always weasel around with the definition of "why".
Why was my claim refused?
Because you have a large number of consonants in your surname which, when combined with the fact your phone number has a prime number of "4" digits, leads to an increase risk of fraud.
Well, that's not a problem - when you say "my training data indicates that they do" what you really mean is "our previous experience shows that applicants with these factors have a significantly elevated chance of not paying back our money", which by itself is a valid reason as it is objective and based on real world facts.
A bigger problem would be when your features actually turn out to be proxies for 'undesired' features. For example, if you do put names in as features, then any credit risk system will learn that e.g. "Tyrone Washington" is a higher risk customer (assuming everything else is equal) than "Eric Shmidt", since those names are highly predictive of most socioeconomic factors; however, you'll risk a judge saying that this implements racial discrimination even if you exclude race directly, since those names are even more predictive of skin color than income.
Kind of, mostly. If the training data is correlating consonants in surname or prime number of '4' digits in a telephone number with increased credit risk, there's a really solid case there to ask where the training data sample came from and check it for biased sourcing.
If the system is correlating consonants in surname or prime number of '4' digits in a telephone number with increased risk, well, then in practice that simply means that you have a bad system with a serious overfitting problem - i.e., not a problem with your data but a problem with your learning process that's obtained "superstition" by treating random noise as important signal.
However, if you do some reason analysis, then this is useful and gives a lot of interesting information.
Names will be correlated with socioeconomic status and ethnicity, and also with education level [of your parents], income, etc.
Addresses obviously are strong indicators especially if you combine them with data about that location or even particular building if that's available.
Even phone numbers can carry a lot of true information e.g. location, or for countries that handed out new ranges for mobile phones, it would be correlated with how long you've had your phone number/how often you change them, which is a proxy for stability.
I'd bet that you could even build a somewhat predictive model of credit risk based on character-by-character analysis of email adresses - even excluding domain names, the fact if someone chose fluffybunny420 or jonathan.cumbersnatch carries some signal; not very much, but if you don't have better data then it will serve as a vague proxy for age, lifestyle and frivolousness (and in case of domain names, possibly employer) which all are important influencers of risk.
This explanation is solid and speaks loudly to the benefits of right to explanation. Because if the explanation is "We've found signal in the data that correlates strongly with social biases (example: surname analysis---of course there's increased risk, if the lender is operating in a country where people don't trust anyone whose name starts with "Mc", for example), and we can use that signal to basically ethnically profile without having to admit that we're ethnically profiling..." that's bad, and the law may address it.
"Redlining" practices aren't bad because they don't economically work. They do economically work; if one demographic is societally disadvantaged, you don't do your firm any short-run favors by catering to them, because they aren't where the money is. They're bad because they work by unfairly shifting the pie away from people who don't actually deserve to have the pie shifted away from them---it's a variant on "sins of the fathers" reasoning to bias on categories like surname, or race, or demographics that correlate strongly with those demographics (like geography, in cities in the United States). There are moral reasons to not allow those practices to get codified into algorithms so the people using them can excuse bad behavior as "just following the computer's orders."
If others in this thread are correct and the explanation will be a sort of sensitivity analysis (meaning that the factors relevant for the decision have to be communicated) that will amount to discrimination. It doesn't matter if some employee judges customers by their skin colour or if someone build a convenient algorithm to do it for them.
I think name-based decisions are pretty obviously racist. I'm wondering about the less correlated variables such as addresses.
I'm under impression that USA is still segregated enough so that for many places, an address (or even districts/towns) will be a very strong indicator of race.
Also it should be possible to train the NN model to explain its decision, even in plain language.
I think the "right to explanation" law is pretty sensible, people should have a right to question the algorithmic judgment that impacts their life. It will be somewhat inconvenient for big companies applying ML to make these decisions, but it won't ruin their business.
The first thing that came to my mind was... what about advertising on the web ? This is an area where machine learning and algorithmic decision-making use discrimination to either show or not an ad to a particular user. There are millions decision made everyday. Under that regulation, could you ask for the reason an ad was shown to you ?
Nope, because advertisement doesn't "significantly affect" you. At least that's how it's meant and how it will be understood. This is about credit, employment, insurance, medical care etc.
Medical care?! Oh God. Well, I suppose the AMA will keep the US from embracing machine learning so it's not like this is actually a chance for the US to reverse its lag in mortality statistics.
Although just like the things you list, I can think of a plethora of ways with which advertising can literally, indirectly change my life by convincing me to purchase a product or service.
I think it will be very hard to implement/enforce this regulation. I attended the London AI summit last week, and they had a speaker from a German lender called Kreditech.
There was a question of black box credit scoring, and the speaker made a fair point - their models have 20,000 vectors in determining credit worthiness. How would you begin to break that down to something explainable? You can list the sources of the data, or offer manual review of the application if rejected (which is something they do) - but it would be very hard to show exact causal reasoning patterns in the automatic scoring.
> I think it will be very hard to implement/enforce this regulation.
The regulation is quite simple and easy to enforce. If they can't explain then it is illegal.
What your example implies is that there are business that doesn't take discrimination laws into account when designing their software, even for sensitive decisions. And enforcing discrimination free decisions is exactly the goal of this law.
I worked for the gambling industry. "changing our software to meet the law is too complicated" never was a valid reasoning, even when complex and extremely hard to fine tune algorithms were at stake.
I never thought I'd say it, but relatively speaking, the gambling industry is grounded in reality.
Most of the commenters here are coming from Silicon Valley where regulation is virtually non-existent and typically derided as impeding the glorious all-consuming will of the free market.
>> their models have 20,000 vectors in determining credit worthiness. How would you begin to break that down to something explainable?
Well, somehow they decided that their 20k-parameter model is accurate. They should at least be able to explain why they took that decision, even if the model itself is too complex.
OK, so we're assuming supervised learning. In that case, how were the features of the training set chosen? And how was the training set labelled?
Most of the time, outside of semi-supervised learning, those things are not in the data, someone has to (painstakingly) decide them. That can very well be explained I believe.
More generally, what I mean to say is that you can learn any number of models from the same data, so to a great extent you choose what your model learns by manipulating your algorithm's (hyper)parameters, the training set's features, choosing what goes into the training set itself etc etc. That process leaves room for a lot of regulatory oversight.
- in principle. In practice I expect since we're talking about people lending money or selling insurance we're only ever going to get those details out of their cold dead hands, and only with considerable effort at that.
How do you find out that it doesn't work? Some loans default, company loses money on those. The model gets updated continuously from the real user data, so it learns from the mistakes
This is exactly why the legislation is created: to stop peoples' lives from being destroyed or negatively affected by computer programs that not even their creator know how they work.
Maybe if you can't explain your model you should be forced to open source your method and a large enough data sample for people to independently check your results.
Do you know how many actually matter for a single person? I understand that 20k vectors could be created to cover the entire population, but realistically how many are non-zero?
I expect there's some threshold below which everything is either completely ignored, or doesn't make a practical difference. An average person's model will contain:
- not divorced
- not home owner
- not prosecuted
- not on probation
- not dying
- not (19k of situations)
assuming the vectors are one variable each. Which 'vector' implies they are not. So its more like {particular value of home ownership, age range, medical cost history in a certain bracket, no pet supply purchase history}. How do you explain that? If its been created via data mining and not, for instance, fuzzy logic rule sets, then no human ever knew why each vector was chosen. Its just a pattern discovered in the population.
It would likely be very hard to explain most of the vectors.
>> Finally I get to know why my mortgage was declined :)
My first attempt at one was declined due to "problems with my credit report" or some such. I happened to be working at a place where they did credit checks on customers, so I asked someone to pull my credit report. They had all my stuff, but had mixed in a bunch of information from someone else with the same name - different age, there were loans on there from when I was 3. I called the reporting company to complain and they kept getting caught up in "how did you get your report?" as if I had done something wrong in just having access to their incorrect information. Perhaps having someone pull the info instead of the "consumer" getting it through proper channels was against some rule - this was 1995. I think the same is often true today, companies want to collect data on you but don't really want you to know what they've got or how they use it.
This is exactly why you are legally entitled to receive a copy of your credit report every year. You can challenge anything on it, and have incorrect information removed.
If you are one of the people just finding out about this today, and you Google "free credit report", you come up with a loooot of bad, scammy links. This is how you get your legally-mandated, really free, annual credit report, starting from an ftc.gov address so you know it's really the right one.
To be fair, the credit score (VantageScore 3) you provide is not the credit score (FICO) most people are interested in, and that is used for most credit decisions.
I wonder machine learning could just handle this the way that humans do. Human decisions processes are of course extremely opaque even if we're good at coming up with post-hoc justifications for our reasoning as needed.
"You're income is too low"
"But his income is also low and he got a loan!"
"Well he has a better credit score."
etc
It would obviously be impossible to give a complete and accurate explanation of a machine learning decision process the same way it would be impossible for a human learning decision process, but I wonder if it would actually be difficult to come up with particularly salient distinguishing features in general or between certain cases the same way that people do?
Can't you just say the decision was based on a random forest trained on the following parameters with historical data. You could get more specific on how the model was trained or even show model if you have to. I'm sure you can create a small-print tree diagram.
In terms of human review, does this require more than a spot check? I can imagine look over the p-value and possibly spot check the input data, and then approve it.
hi, I'm one of the co-authors. This feedback is awesome. I have a longer version in the works. if you're interested please feel free to email me, bryce dot goodman at stx dot ox dot ac dot uk.
So what effect will this have on risk assessment as used by insurance companies?
"Insurance companies use a methodology called risk assessment to calculate premium rates for policyholders. Using software that computes a predetermined algorithm, insurance underwriters gauge the risk that you may file a claim against your policy. These algorithms are based on key indicators about you and then measured against a data set to weigh risk. Insurance underwriters carefully balance the insurance company’s profitability with your potential need to use the policy."
> We argue that while this law will pose large challenges for industry, it highlights opportunities for machine learning researchers to take the lead in designing algorithms and evaluation frameworks which avoid discrimination.
This sounds like a very slippery slope.
Let me get this straight, what the EU wants to do here. If I make a business that decides how to, say, make loans to people, and I have software that analyzes how they filled out an application or took a written/logical test or something I may have to:
1) Disclose the findings and methodology of my proprietary algorithm
2) Potentially undo decisions that are deemed discriminatory
3) Thereby potentially be forced to choose who I do and do not do business with
Some regulations are good, they help us avoid things like moral hazards. Some regulations, however, act more like power grabs that take away freedom from individuals and businesses.
I'm sorry to say but this sort of thing sounds like the latter.
There are some uncomfortable facts about people and society that algorithms will uncover, and that people will try using to avoid making mistakes. Here's a poignant example, albeit with much less sophistication: http://www.nytimes.com/2015/10/31/nyregion/hudson-city-bank-...
> Instead, some officials say, some banks have quietly institutionalized bias in their operations, deliberately placing branches, brokers and mortgage services outside minority communities, even as other banks find and serve borrowers in those neighbourhoods.
Alright sure. I think we all know it's common sense to not open a jewellery store in inner city Chicago. The local clientele would not be able to afford such products and services, and that's enough of an argument, it's probably business school 101. Just like how you're not going to find a good market for a food delivery startup in Middletown, USA. The economics probably don't work. EDIT - and if you think these businesses are wrong, and they should be in these communities, let their competitors eat their lunch!
But it's viewed as discriminatory. Back to the paper, it's discussing essentially the same thing: it's the natural evolution of "don't start a jewellery store in inner city Chicago" turning into "don't start a jewellery store in neighbourhood where access to urban center is less than X, available nearby 3-lane highways is less than Y, percent of women from 18-29 is less than Z, proportion of people from culture W who are in the bottom quartile of people who buy jewellery is more than V, ...".
The solution here isn't to regulate and force people to open jewellery stores in inner city Chicago. The solution, if we really want to bring diamonds to Englewood, is to drastically reduce the poverty and crime rates so that businesses will want to sell there (https://en.wikipedia.org/wiki/Englewood,_Chicago#Socioeconom...).
And if you think having access to a local business is a "right", then you must concede that such a public good should be provided by a central government of some type. And then you have to ask yourself, do you want the government managing your bank loans, or would you rather have private companies compete on prices and service? If you want the latter, then you need to accept that certain things are just going to arise (or not arise) naturally.
This is nothing new in most financial service machine learning algorithms have to meet these requirements. This is why why supervised analytics are more popular, and one of the reason why algorithms such as credit scores are typically generated and structured as scorecards with explicit reason codes for each independent characteristic, variable or vector.
This is really needed for a number of different reasons. The biggest is that algorithms are increasingly running our world. Even outside of mortgage, have you ever had to fork over the right to pull your credit score for either rent or (in some limited cases) to apply for a job? Not being able to understand: a) Why you were rejected. b) Why you are paying a higher interest rate. c) What you can do to fix that?
Housing and employment is fundamentally a question of human rights, not just banking profitability or academic algorithm improvements.
Part of the reason why Credit Scoring (which is intrinsically algorithmic decision-making) is because it replaced the "old boys" network that used to dominate. In this world you went to your banker, and the decision to extend credit was dominated by your personal relationship with a banker who might not be the same ethnicity, religion, nationality, etc as you would dominate your ability to purchase housing. The credit score democratized financial access.
It can still be used to discriminate. There have been a number of studies that have shown that simply being from a white community tends to increase the number of other people you can lean on in a emergency, and hence make you less likely to default on a loan. From a pure predictability point of view, a bank is at a lower risk of default lending to someone from that community, but that in turn denies financial access to people not in that community, continuing the trend.
A big problem in this kind of financial model is that it's relatively easy to accidentally or deliberately find other variables that mirror "protected" vectors. A simplistic example is zip code, where the zip code reflects a area that is predominately ethnic minorities.
So it's not cut and dried. It's my PoV that It's not predictability versus red-tape, and the people trying to do unaccountable analytics in this space are (perhaps in-advertedily) perpetuating racism.