The real world example will not be a three word statements like "Women are evil"...

The real world example will not be a three word statements like "Women are evil", but rather long sentences like this one I am currently writing that include the phrase "Women are evil" twice.

The question is not about the AI ability to detect hateful content in ideal sentences. The question is if there is a bias when the AI has to make a judgement call.

We can see the same thing with face recognition. There is no race bias in AI detection in perfect lightning when the person is facing the camera perfectly. There is however a very noticeable bias when the AI is less certain using real world examples where light and positioning is far from perfect. As the data become less meaningful, the bias in favor of white skin increases.

The study would be improved by doing an additional in-depth study with real world text that has been selected by humans, and then modify the input by randomizing the target demographic. If the bias remains then we would have a higher confidence in the data. This is similar to studies done in face recognition where issues with dark skins has been demonstrated multiple times.