There is really no way to know or to say with logical justication that a data set, or the deep learning model that results from it, is free from bias. That's the issue at hand. The social dimension of the application must be discussed broadly to make any sense of it. It's a matter of opinion, and properly so. I worship the guy and I think he was attacked for things he didn't say, yet if Mr LeCun had acknowledged the limits of dnn black box models, it would have been helpful for everyone following along. Timnit Gebru should also have made the same point.
Sort of. I didn't read the whole exchange, but to me the thing to say would be to be a bit more explicit that the tool will always reflect the bias of it's owners. Hopefully phrased to sound less Marxist, but something along those lines, instead of leaving the impression that the only issue was an inadequate data set.
Right, probably it’s better to look at the model and call it “fair model” as long as the model output error on a given class of inputs is inversely proportional to the percent of this class in training data (eg more training data of the same class means less error and less training data means more error). I can’t recall a single ML model which was not fair though I mostly deal with texts, not images.
Yes, the problem in all these discussions is that unless you engage with them on broader epistemological grounds, you’re just occluding the a priori biases.
Engineering, as a discipline, has an unfortunate history of not wanting to engage with the social and political conditions under which its work occurs, but that becomes entirely untenable for how a lot of ML is put to use (if you’re trying to be honest anyway).
Balanced how? It could be balanced to represent a population, but which population? The US? The university where it was created? The world?
It could also be balanced so that the evaluation metrics were similar for each subgroup (possibly ending up with a sample that's very different from the population). But what are the subgroups? For any commonly used definition of race, there is a lot of intra-group variety.
Maybe balancing the training data is enough. But figuring out what balance even means is a huge question.
I don't think there's a ready CS/stats solution to this problem, so it will require interdisciplinary engagement and listening to the people who have been on the wrong end of facial-recognition bias is likely a place to start.
Because ML models don’t exist in a vacuum. The intentions and biases of the people who build and use them affect which models exist, and how they’re used. Creating a perception that models are unbiased mathematical oracles because the dataset is unbiased can be used to support harmful uses.
ML models don't exist in a vacuum, but they do exist in an empirical reality. And in an empirical reality, there is always the fundamental unbiased measure of success: predicting whatever it is the model is built to predict.
And this, I think is the knife that separates the different schools of thought on the issue.
People who are judging whether an ML model is "good" or "bad" based on this criteria necessarily see the accusation of "bias" as a claim that their model is not successfully predicting things. They rightfully retort that they would do a better job with an unbiased dataset. To argue they are their models are always wrong on their terms is to argue that there is a Ken Thompson-like hack in their mathematics. [1]
On the other hand, people who judge ML models by criteria like how they might be used or interpreted by laypeople are fundamentally talking about something other than ML models-qua-mathematical models. To the modelers, you might as well be arguing that the theory of nuclear fission is biased against the Japanese. But you are not actually talking about the empirical quality of their model, and so on your own terms you are correct. The models can be used improperly, and researchers should be careful about how their findings are perceived.
Thanks for exactly this example - nobody stated yet but the charitable / best faith interpretation I can see of the Gebru angle here really is exactly that there’s a Ken Thompson hack at play, or at least a high risk of vulnerability to such a hack.
I just don’t know how one would prove it, and as others have noted I don’t understand what the mitigating alternative in the short term should be other than just stopping the research.
Fundamentally, the choice of training data set, and the biases that went into it's collection.
Also, in the case of statistical models, the crafting of the trained features themselves.
Actually, this is also relevant for neural networks despite the fact that they learn their own features because some amount of "framing" of the raw data often takes place in order to focus the neural network on the portion of the input data the trainer sees as relevant. This removes noise, but also removes context.
You asked about the biases of the people building the model, which is what I answered.
You didn't ask about the biases that occur during the requirements specification stage, or the biases that occur during operational implementation of the trained model.
Those are just as important - and arguably even more important - than the choice of the training data and the technical implementation.
The responsibility for the ethics of using ML neither begins nor ends with the ML engineer who builds the machine, and there are serious questions arising from the application of ML in certain domains that cannot simply be addressed by "better training data".
It’s not as though the designers of the system set out to train it on a biased dataset; we can assume they were trying to be balanced from the start.
And that’s the deeper problem here: “it’s just a biased dataset” is a misdiagnoses. It’s a whole system of biases that leads to people thinking they are training with balanced data when they manifestly are not.
You’re never really going to achieve this mythical “balanced training data” until you untangle all of the other implicit personal and organizational biases. There are a whole host of ethical discussions that need to happen to even begin to flesh out what “balanced” might even mean for, say, facial recognition software intended for use in law-enforcement, but the same biases that lead people to skip right past those discussions and begin training are often the very ones that result in the biased data to begin with.
What actual task are you performing that requires you to generate a high-resolution photo of a face given a low-resolution photo of a face? If it’s just for fun, then sure, bias may not matter by definition. But if it’s used to make decisions that have real consequences for people’s lives, then it sounds like a really bad idea no matter what the training set is.