Hacker News new | past | comments | ask | show | jobs | submit login

Is it? That doesn't seem at all obvious to me. In fact it seems decidedly impossible.

Almost any result could in principle be attributable to noise; where are you planning to source all of the funding to run large enough studies to minimise that? And no matter how large your experiments or how many you run, you're still going to end up with some published results attributable to noise since, as GP says, that's the nature of statistics. By its nature, you cannot tell whether a result is noise. You only have odds.

I'm not saying there aren't problems with reproducability in many fields, but to suggest that you can eliminate it entirely is naive.

No, not naive - wrong.




> Almost any result could in principle be attributable to noise; where are you planning to source all of the funding to run large enough studies to minimise that? By its nature, you cannot tell whether a result is noise. You only have odds.

Well, with a single paper the odds indeed are that it's noise. That's why we need reproduction. Now of course a paper needs to be published for it to be replicated later. But the paper (and/or supplemental material) should contain all possible things the research team can think of that are relevant to reproducing it - otherwise it's setting itself up to be unverifiable in practice. Papers that are unverifiable in practice should not be publishable at all, because a) they won't be reproduced and thus it'll be forever indistinguishable from noise, and b) there's no way to determine whether it's real research, or a cleverly crafted bullshit.


I don't disagree with any of that, although I'd stick a big citation needed on the implicit suggestion that there's a large group of scientists who aren't making a good-faith effort to ensure that their successors will have the information they need to reproduce (that is, after all, what a paper is).

My issue is the flippant and silly claim that "[i]t's possible to imagine a version of academia where results that can be attributed to noise don't get published".


I think this is actually something that can be experimentally examined.

Take a sampling of a large number of papers, give them some sort of rating based on whether they provide enough information to reproduce, how clear their experimental and analytical methodology was, whether their primary data and scripts are available, etc, and then look at that rating versus their citations.

Hopefully, better papers get more attention and more citations.

(And yeah, "peer review" as it is done before a paper is published is not supposed to establish a paper as correct, it is supposed to validate it as interesting. Poor peer review ultimately makes a journal uninteresting, which means it might as well not exist.)


That sounds like a very interesting idea. At the least, it would be interesting see the major classes of reproducibility problems. And there may well be a lot of low-hanging fruit, as the comments on this page suggest about data corpuses in computational fields.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: