Yann was "technically correct" in that the nature of the results are highly driven by the nature of the dataset you use.
However, that framing of the discussion can easily be interpreted as absolving the ML community from the ethical discussions around the nature of the datasets.
Is it acceptable for the community to hand-wave these issues because they are "dataset problems"?
For what it's worth, this is a much larger debate that is happening in many fields. For example, a bunch of decisions around car safety were based on crash tests. Those crash tests are based on dummies that were designed around male body forms and thus don't really test how women would handle the crashes. To quote the author of Invisible Women:
> As a result, if they are involved in a car crash, women are more likely to be injured – 47 per cent more likely to be seriously injured and 17 per cent more likely to die.
This general questioning about the implications of bias in the datasets is happening in many fields
However, that framing of the discussion can easily be interpreted as absolving the ML community from the ethical discussions around the nature of the datasets.
Is it acceptable for the community to hand-wave these issues because they are "dataset problems"?
For what it's worth, this is a much larger debate that is happening in many fields. For example, a bunch of decisions around car safety were based on crash tests. Those crash tests are based on dummies that were designed around male body forms and thus don't really test how women would handle the crashes. To quote the author of Invisible Women:
> As a result, if they are involved in a car crash, women are more likely to be injured – 47 per cent more likely to be seriously injured and 17 per cent more likely to die.
This general questioning about the implications of bias in the datasets is happening in many fields