You're right-ish, but... the problems of educational sciences are plenty, but most importantly the impossibility to run proper experiments, and the almost complete lack of fundamental, reliable knowledge about human cognition and learning. The bottom line is that they have no idea how learning works, nor how teaching works. Their findings should not be used as evidence. Only the largest effects could be cautiously used as guidelines for gradual and reversible change.
There's 200 countries to examine. Should be plenty of ways to get data to analyse. For some reason it seems to be minimal how much comparison we get. Whether comparisons we do get end up being a kind of contest, like PISA.
So we assign each difference to national differences. Or language. Or some other part of the environment.
If you could get say 50 countries to do a double blind experiment with two different teaching methods, you might make use of the scale. But even then you can only conclude something if all results have (more or less) the same effect size, and what do you win? Little. You can't even generalize over countries, since it wasn't a representative sample. And that's ignoring all the aforementioned factors, and the near certainty of a less than flawless execution that will definitely screw up the results.
So I don't think "200 countries" will work. Such approaches have been tried with other, easier manipulations, like dieting, and it frequently turns out that interpreting the data isn't straight-forward. It isn't physics.
I wouldn't say the bar isn't lower. There are so many more variables than in physics, and you have so much less control over them, yet we don't really want less reliable conclusions.
If you want a physics metaphor: it's like dark matter: no idea what it is, where to begin, contradicting theories all around. But this is right under your nose. You've taken part in it, and if you work at uni, you still do.