I feel like a lot of people don’t recognize how incredibly hard it is to check bias from data. The problem is that you have to define “subgroups” in a way that’s inherently arbitrary. This turns out to intersect with Simpson’s paradox in a weird way.