Why Most Published Research Findings Are False

llimllib · on Oct 24, 2010

The author's research is discussed in an excellent Atlantic article that many may find more accessible: http://www.theatlantic.com/magazine/archive/2010/11/lies-dam...

michael_nielsen · on Oct 24, 2010

There's also a nice short summary in this blog post:

http://www.marginalrevolution.com/marginalrevolution/2005/09...

vitaminj · on Oct 24, 2010

In statistics, you're supposed to come up with a statistical model first before running regressions on the data. But quite a few papers I've read (especially in finance) seem to go the other way around, i.e.

They run regressions on a data set, adding and subtracting independent variables until the t values and standard errors start looking good.

Then they construct the linear model, assume the Gauss-Markov assumptions and sometimes (though not always) try to explain the causal relationship between the variables.

This is obviously very wrong and nobody has any clue what the distribution of the least squares estimators to these models are. But I've seen plenty of examples of this, which is enough to void the results of the paper (even if the model they come up with is somewhat plausible).

_delirium · on Oct 24, 2010

In practice that's fairly common in all areas of science. You look for patterns in data and infer a relationship/equation/etc. Of course, you are supposed to confirm that it actually holds in new data / subsequent experiments.

Widespread use of data-mining software does make it much easier to do dodgy things on a wide scale.

dododo · on Oct 24, 2010

there's nothing wrong with looking at some of the data first per se, provided you do not use the same data to draw conclusions. i.e., have a training and a test data set.

jonhendry · on Oct 24, 2010

Well, that's why the whole "replication" thing is important. One published result is interesting, but rarely definitive, and possibly wrong. (Or at least unusual for possibly difficult-to-determine reasons.)

This is another good reason to ignore the media hype for every new paper that comes out. (Besides the fact that journalists perform lossy compression on data.)

But it seems like it's how science is supposed to work: publish your results, see if others confirm your findings, because you might be wrong even if you seem to have done everything correctly and honestly to the best of your ability.

St-Clock · on Oct 24, 2010

On the whole, I agree with you, but replication is often expensive and in some fields (e.g., software engineering research), it's not enough to warrant a publication in a high profile journal or conference (especially if you confirm the original findings). Heck, you may not even get funding to do that.

Regarding "if you seem to have done everything correctly", I don't think that any honest scientist can claim that his/her study had no limitation or flaw. I regularly review papers for big conferences and there is no such thing as a perfect paper/research project/study. It's more like a threshold: despite the issues, were the findings novel, relevant, and found through a rigorous process? Would the community learn anything valuable by reading this?

Articles like the one cited by OP are useful if they make scientists and normal "folks" realize the limitations of alpha values, they are useful if they make scientists reconsider some of their methods (and way of presenting findings), but they can be harmful if the readers conclude that most scientific findings are "false" and thus, that science is bogus because it cannot find the "truth". Science is rarely, if ever, about true and false, religion is.

P.S. I realize this answer was more about the article, and less about your reply, it's just that your reply prompted me to write something :-) Again, I agree with you!

jonhendry · on Oct 30, 2010

"Regarding "if you seem to have done everything correctly", I don't think that any honest scientist can claim that his/her study had no limitation or flaw."

Right, that's what I was getting at. The scientist might believe they've done everything right, after checking and re-checking their work, but be missing some flaw or limitation in their work, their model, whatever.

I recall hearing once of an experiment that couldn't be reproduced, and it turned out to be due to some chemical property of the entirely normal laboratory glassware that one lab had used. Switching to another manufacturer removed the problem. (I'm probably messing up the details, like the consequences of the chemical properties of the glass. But the gist is correct. Different manufacturer of glassware cleared up a problem that was unexpected.)

tokenadult · on Oct 24, 2010

Regarding "if you seem to have done everything correctly", I don't think that any honest scientist can claim that his/her study had no limitation or flaw.

Hear. Hear. This, of course, was the thrust of Richard Feynman's famous Caltech commencement speech on "Cargo Cult Science."

http://www.lhup.edu/~DSIMANEK/cargocul.htm (adapted HTML text)

http://calteches.library.caltech.edu/51/2/CargoCult.pdf (PDF version as published by Caltech)

As Feynman said, "The first principle is that you must not fool yourself--and you are the easiest person to fool. So you have to be very careful about that."

ivank · on Oct 24, 2010

Related: "Frequentist Statistics are Frequently Subjective" http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequ...

ramanujan · on Oct 24, 2010

This paper has gotten way too much press for an oversimplified model of science. Here's the thing: if results hold up to scrutiny, the authors are eager to share code and plasmids/samples. If not, they are a lot more squirrelly. Outside replication is what keeps the machine moving forward, is fairly readily proxied by citation rates, and yet is not captured by Ioannidis' simple model.

samd · on Oct 24, 2010

That's not the case, even the most widely cited research is dubious. From the Atlantic article:

"He zoomed in on 49 of the most highly regarded research findings in medicine over the previous 13 years, as judged by the science community’s two standard measures: the papers had appeared in the journals most widely cited in research articles, and the 49 articles themselves were the most widely cited articles in these journals. These were articles that helped lead to the widespread popularity of treatments such as the use of hormone-replacement therapy for menopausal women, vitamin E to reduce the risk of heart disease, coronary stents to ward off heart attacks, and daily low-dose aspirin to control blood pressure and prevent heart attacks and strokes. Ioannidis was putting his contentions to the test not against run-of-the-mill research, or even merely well-accepted research, but against the absolute tip of the research pyramid. Of the 49 articles, 45 claimed to have uncovered effective interventions. Thirty-four of these claims had been retested, and 14 of these, or 41 percent, had been convincingly shown to be wrong or significantly exaggerated. If between a third and a half of the most acclaimed research in medicine was proving untrustworthy, the scope and impact of the problem were undeniable. That article was published in the Journal of the American Medical Association."

izendejas · on Oct 24, 2010

Were some of those citations made to prove those findings wrong? Dumb question, but would be a shame to not be thourough when attacking bad literature.

_delirium · on Oct 24, 2010

Yeah, that's a commonly mentioned problem with citation-counting. Much like linkbaiting on the internet, a poor-quality paper taking an inflammatory position can get a lot of citations from people debunking it. Another problem is throwaway citations: some paper gets cited as a generic example, rather than because it provides anything valuable that the paper citing it actually draws on.

Unfortunately, it's much harder to come up with better measures. Given a smallish corpus of a few hundred papers, humans could read through them and annotate each citation with things like, "cited to debunk", "cited to distinguish related work", "cited for general background", "cited in passing", "cited for result", etc. But computers are not yet very good at doing that automatically, so the large-scale citation analysis just does dumb citation-counting.

_delirium · on Oct 24, 2010

This is an interesting followup as well: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1808082/

zbanks · on Oct 24, 2010

Self reference much? This is published research...

Clearly these findings are false... or maybe not? Dammit. http://en.wikipedia.org/wiki/Liar_paradox

dholowiski · on Oct 24, 2010

I was thinking the same thing. We're sure it's not April 1st?

tokenadult · on Oct 24, 2010

The submitted article is a review article about methodology, for the most part, and isn't announcing brand-new primary experimental research findings. So the submitted article is distinguishable from the kind of articles it analyzes. See

http://en.wikipedia.org/wiki/Wikipedia:MEDRS

for more on distinctions among differing kinds of publications on research.

elbenshira · on Oct 24, 2010

First of all, the author of this piece works in Department of Hygiene and Epidemiology. Research is done differently across different disciplines, so it's dangerous to try to expand this to other disciplines. For example, some fields find alpha < 0.05 acceptable and other fields do not.

But research is very weird indeed. The more conference/journal articles you read, the less you trust them. I mean, say a field accepts results alpha < 0.05. This means that 5% of everything shown is wrong.

Feel free to correct me if you have a better grasp of statistics find what I say to be wrong.

cdavid · on Oct 24, 2010

Actually, having an alpha of value x does NOT mean that 100 * x % are false. It only gives you an indication of the coverage of your experiment, which is useful when compared to other, independent studies.

I think most scientists don't understand the meaning of the p-value. There was an interesting discussion last year in the statistical blog community on that question, with leading statisticians involved in it: http://radfordneal.wordpress.com/2009/03/07/does-coverage-ma...

greendestiny · on Oct 24, 2010

If you only publish results p < 0.05 then you can't say what percentage are due to chance. It could be all of them. All it tells you is how many experiments would get that significance level through chance. To know the number of results that are simply due to that effect you'd have to know the prevalence of actual positive results (ie not due to chance). If a actual positives are common then it could be much lower than 5% reported results due to chance, if actual positives are impossible then it could 100% due to chance.

alan-crowe · on Oct 24, 2010

The traditional notion of a 5% confidence limit comes from devising rules of thumb for agricultural research stations in the 1930's. The basic frame work is that each experiment takes many months, a large plot of land, and plenty of money. You test crop varieties that you are already confident will give a better yield in order to check that they really do so.

Suppose your initial guessing is 50:50 and over some years you run 200 tests. 100 times the crop really does yield better and most of those show up fine. 100 times the crop doesn't actually yield better and 5% of those result in false positives. You end up with around 100 true positives and 5 false positives. A positive result really means something.

Fast forward 80 years and research has changed. You have high throughput screening machines and can test 100,000 different molecules in your hunt for a new antibiotic. Suppose you have got lucky and there really is a new antibiotic in your combinatorial explosion of side chains. A p-value of 5% gives you 5000 false positives. With any luck you don't get a false negative and your new antibiotic also makes it through the initial screen. Now you have 5001 +/- 70 positives. The probability that a positive result is true is only 0.0002 or 0.02%. A positive results still means something important. You are searching for a needle in a haystack and you have discarded 95% of the hay, but there is still plenty of hay left and the 99.98% of the results are wrong.

alokm · on Oct 24, 2010

You will have to use binomial distribution (http://en.wikipedia.org/wiki/Binomial_distribution) for finding the probability that 5% of them are wrong (Assuming researches are independent). I guess this probability will come to be very small. But the point of this article is not alpha(confidence level) ,It is the bias of the researcher.

jforman · on Oct 24, 2010

p-values are calculated in many different ways, not always using the binomial distribution (of the p-values I've claimed, very few have used the binomial distribution)

alokm · on Oct 24, 2010

i was referring to the chances of 5% of the papers giving wrong results. using pvalue as the chance of failure in each research paper , binomial can be used to find the probability that 5% of the papers are in correct

zeteo · on Oct 24, 2010

"I mean, say a field accepts results alpha < 0.05."

And then there are fields like climate science, where "very high confidence" means 10% probability of being wrong:

http://ipccinfo.com/

stretchwithme · on Oct 24, 2010

Its not surprising that misrepresentation has become so common in what is now a largely political industry.

gaius · on Oct 24, 2010

Scientists gotta eat like everyone else, they will do what they need to do to get funding. It's why so much "climate science" is dodgy, the sky is falling! type research gets more headlines. Witness the recent report on the Himalayas melting for a perfect example of this phenomenon at work.

marze · on Oct 24, 2010

So what are the implications of this information to the average individual? He is basically saying that the conventional wisdom on medical questions is most often incorrect.

smakz · on Oct 24, 2010

For some reason I was reminded of this blog post:

http://jsomers.net/blog/it-turns-out

ck2 · on Oct 24, 2010

It's because researchers slack just like everyone else at their jobs and need to pay the bills in the meanwhile. Now imagine your doctor or law enforcement and the mess they cause when they slack and cut corners just to produce "product" and justify their jobs.

drewse · on Oct 24, 2010

This title yelled "paradox!" at me. It's funny to see it coming from a ".gov" website.

For those who need clarification, if this published research and its title are true, than it is saying that research like itself are usually false. This contradicts the original assumption that it is true.

If this published research and its title are false, than research like itself is usually true since what it's saying must be wrong. This contradicts the original assumption that it is false.

usaar333 · on Oct 24, 2010

Amusing at first glance, yes.

Of course he's talking about studies that use significance tests, which his own paper isn't directly using to prove his point.

mkramlich · on Oct 24, 2010

The OA has most likely reached a false conclusion.

_3u10 · on Oct 24, 2010

Do the paper's findings apply to the paper itself?

fauigerzigerk · on Oct 24, 2010

So if this research is not false (unlikely according to the author) then mankind would be moving backwards, unless non scientific reasoning compensates for the failure of science. Medical treatment would get constantly worse, people would be misdiagnosed and mistreated more than ever, death rates after cancer and cardiac events would rise.

Experimentalist · on Oct 24, 2010

That's a lot of assumptions for 2 sentences.

fauigerzigerk · on Oct 24, 2010

Those are implications of what that study claims to have found, not to be taken entirely serious ;-)

runcible_spork · on Oct 24, 2010

And grant writers and heads of research departments everywhere disagree.

jakerocheleau · on Oct 24, 2010

This is also a published article and the same principles should be applied here, no?