Hacker News new | past | comments | ask | show | jobs | submit login

Actually, most major experiments in particle physics these days (including the OPERA experiment) avoid this sort of confirmation bias by being run "blind." The scientists write all of their data reduction pipelines before taking any actual data and test their pipelines on simulated data. When they are confident that their pipeline is running as expected they run the experiment, put the data through their pipeline and publish the result, no matter how unexpected it is.

As the OPERA result showed, it has the problem that if you don't understand everything in your experiment perfectly (which is difficult to do in a very large, complicated experiment) you run the risk of embarrassing yourself by making some obvious-in-retrospect mistake and publishing an obviously absurd result. But in the long run it's not so bad a price to pay to avoid the sort of confirmation bias that Feynman was talking about.




Physics is far ahead of other disciplines in this regard. Choosing your statistical test after you gather the data, selectively removing "outliers" after you gather the data, non-blind interpretation of pictures by humans who have a stake in the outcome and only publishing statistically significant results are all par for the course in e.g. neuroscience.


This paper [1] points out that commonly-used measures of statistical significance are downright meaningless when additional degrees of freedom are hidden in the way you describe.

[1] http://people.psych.cornell.edu/~jec7/pcd%20pubs/simmonsetal...


It's even worse than that paper describes and this is something that every statistics 101 class worth its salt points out: if you are allowed to choose a statistical test after you've gathered your data you can prove any conclusion you want with arbitrarily high confidence. Note that the paper does not list choosing a test before you gather the data as a requirement. The only way to do meaningful statistics is the way splat describes: describe exactly how you're going to analyze it before you gather the data, then send the paper to a journal which decides whether to publish it before the data has been gathered, and then complete the paper by actually doing the experiment and adding the data to the paper.


>The scientists write all of their data reduction pipelines before taking any actual data and test their pipelines on simulated data. When they are confident that their pipeline is running as expected they run the experiment, put the data through their pipeline and publish the result, no matter how unexpected it is.

How does that help when the prediction matches the measurement but the experiment is flawed ?


It doesn't. Sometimes two wrongs (the prediction and the measurement) make a right.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: