I understand how tempting it is in our age of big data and all that stuff to perceive this as some curious new phenomena, but it really is not. This is precisely the reason why we've come up with some criteria for "science" quite a while ago. And in fact, all this experiment is pretty meaningless.
So, for starters: 29 students get the same question on the math/physics/chemistry exam and give 29 different answers. Breaking news? Obviously not. Either the question was outrageously bad worded (not such a rare thing, sadly), or students didn't do very well and we've got at most 1 correct answer.
Basically, we've got the very same situation here. Except our "students" were doing statistics, which is not really math and not really natural science. Which is why it is somehow "acceptable" to end up with the results like that.
If we are doing math, whatever result we get must be backed up with formally correct proof. Which doesn't mean of course, that 2 good students cannot get contradicting results, but at least one of their proofs is faulty, which can be shown. And this is how we decide what's "correct".
If we are doing science (e.g. physics) our question must be formulated in a such way that it is verifiable by setting up an experiment. If experiment didn't get us what we expected — our theory is wrong. If it did — it might be correct.
Here, our original question was "if players with dark skin tone are more likely than light skin toned players to receive red cards from referees", which is shit, and not a scientific hypothesis. We can define "more likely" as we want. What we really want to know: if during next N matches happening in what we can consider "the same environment" black athletes are going to get more red cards than white athletes. Which is quite obviously a bad idea for a study, because the number of trials we need is too big for so loosely defined setting: not even 1 game will actually happen in isolated environment, players will be different, referees will be different, each game will change the "state" of our world. Somebody might even say that the whole culture has changed since we started the experiment, so obviously whatever the first dataset was — it's no longer relevant.
Statistics is only a tool, not a "science", as some people might (incorrectly) assume. It is not the fault of methods we apply that we get something like that, but rather the discipline that we apply them to. And "results" like that is why physics is accepted as a science, and sociology never really was.
In physics they can do experiments to get highly confident results. In medicine, economics, and any other science dealing with people, data is much harder to collect and there are harder ethics involved. We could learn so much if we simply performed controlled experiments on the global economy, politics be damned! Or if we could just make the professionals play more soccer matches in a controlled setting (but just like a pro match in every other regard!). Or if we were more aggressive with human trials of drugs. But we can't. Scientists in some fields are stuck with data sets where they'll never get 5 sigma confidence. Does that mean they should stop using statistics? Hell no. There are still very useful things to be learned. It's just much harder to get right.
Your rant makes no sense. I flip a coin 100 times and it comes up tails 99 times. You are basically saying that asking "Is the coin more likely to come up tails" isn't a real scientific question. That's just silly.
Physics uses statiatics all the time, e.g. detecting the higgs boson at cern. Do you have a formal proof thay each time they fired the accelerator it was going to be i.i.d.?
I don't understand what it is that you want to say, even if we were to accept all of your premises. That other than math and physics we actually know nothing? That knowledge (even if partial) is meaningless unless it is as rigorous as the one we can attain in physics, the simplest of sciences? Obviously we can be less confident of results in the complex/intractable/inexact sciences than we can in the simple/tractable/exact ones. You want to call only the latter group "science" and the former something else? Fine. Does that mean we should completely ignore all results in disciplines which aren't science?
So, for starters: 29 students get the same question on the math/physics/chemistry exam and give 29 different answers. Breaking news? Obviously not. Either the question was outrageously bad worded (not such a rare thing, sadly), or students didn't do very well and we've got at most 1 correct answer.
Basically, we've got the very same situation here. Except our "students" were doing statistics, which is not really math and not really natural science. Which is why it is somehow "acceptable" to end up with the results like that.
If we are doing math, whatever result we get must be backed up with formally correct proof. Which doesn't mean of course, that 2 good students cannot get contradicting results, but at least one of their proofs is faulty, which can be shown. And this is how we decide what's "correct".
If we are doing science (e.g. physics) our question must be formulated in a such way that it is verifiable by setting up an experiment. If experiment didn't get us what we expected — our theory is wrong. If it did — it might be correct.
Here, our original question was "if players with dark skin tone are more likely than light skin toned players to receive red cards from referees", which is shit, and not a scientific hypothesis. We can define "more likely" as we want. What we really want to know: if during next N matches happening in what we can consider "the same environment" black athletes are going to get more red cards than white athletes. Which is quite obviously a bad idea for a study, because the number of trials we need is too big for so loosely defined setting: not even 1 game will actually happen in isolated environment, players will be different, referees will be different, each game will change the "state" of our world. Somebody might even say that the whole culture has changed since we started the experiment, so obviously whatever the first dataset was — it's no longer relevant.
Statistics is only a tool, not a "science", as some people might (incorrectly) assume. It is not the fault of methods we apply that we get something like that, but rather the discipline that we apply them to. And "results" like that is why physics is accepted as a science, and sociology never really was.