Unsure how either of my comments could be read as disagreeing with your point - ...

Unsure how either of my comments could be read as disagreeing with your point - both mention physics/micro-models being stronger than pure stats, but maybe in spite of your "The problem is" phrasing you only meant to amplify or only read quickly. That said, I can perhaps respond with something amplifying & clarifying of our shared skepticism of the pure statistics approach relative to something "more detailed" that might someday somehow help someone.

There are probably a half dozen micro-model effects even non-experts could rattle off that have "trended" over the decades from your shoes & surfaces to various aspects of population diet, young-in-life identification / more-optimized-maturation conditions and on & on. Statisticians call this a "non-stationary sampling process" meaning the independent & identically distributed (IID) assumption is at best a weak approximation and at worst totally misleading.

Ways to measure how much evidence there is that IID / other distributional assumptions are failing do exist { such as some of the ones here: https://github.com/c-blake/fitl/blob/main/fitl/gof.nim and referenced at the bottom of https://github.com/c-blake/bu/blob/main/doc/edplot.md (at the stage where one "plots / pools together multiple data points into some kind of "sample" with a "distribution") }. Sadly, few test such assumptions (which are rarely truly comprehensive anyway) and even small departures from modeling assumptions may lead to relatively large errors in estimates. E.g., the linked to Einmahl 2009/2010 research states this as an assumption to apply the ideas, but then shows no test of that assumption on the used data.