Hacker News new | past | comments | ask | show | jobs | submit login
Psychologists Strike a Blow for Reproducibility (2013) (nature.com)
84 points by jcr on Dec 28, 2014 | hide | past | favorite | 27 comments



This article is a bit old (26 November 2013), for recent news on reproducibility check the Reproducibility Initiative.

It's a movement by several companies and groups

Nature recently joined it, so if you want to publish a paper next year with them you have to at least complete the checklist to make reproducibility easier: http://www.nature.com/nnano/journal/v9/n12/full/nnano.2014.2...

They have also started to reproduce publications of voluntaries, here's a very recent paper detailing their replication efforts: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjourna...

Article summarizing the paper: http://www.nature.com/news/parasite-test-shows-where-validat...


Yeah - this is a sample of the 'gold standard' of psychology papers, and only 10/13 could be reproduced.

The reasons for shoddy reproducibility are p-value hacking, intense pressure to publish at all costs, and a premium on 'gladwellesque' results where a simple theory seemingly explains a lot.

Gellman and Uri Simon-Johnson have both written a lot about this.


Yeah - this is a sample of the 'gold standard' of psychology papers, and only 10/13 could be reproduced.

Having watched scientists at work, reproducing 10 of 13 high-profile studies sounds pretty reasonable. Given p≤0.05, you'd expect 19/20 to be reproducible. Then you add in other factors:

- Surprising positive results get published more readily than negative ones.

- Some results may be sensitive to tiny changes in methodology.

- And more: "Why Most Published Research Findings are False" http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/

Some of these effects can be reduced with pre-registration of studies, and with other methodological improvements. But in general, even when everybody plays by the rules, a fair bit of garbage is going to slip through, because it's hard to eliminate all sources of error and bias.

So when a scientific field can say, "Hey, 3/4ths of our really interesting results are real!", that's about what I expect when the processing is working.


> Given p≤0.05, you'd expect 19/20 to be reproducible.

No, that's not the case; follow your link to "Why Most Published Research Findings are False" to see why that's not the case. The "positive predictive value" of a discovery, even without any bias, is very different from what the p value would imply.


Yes, this is an excellent point. To take an extreme example, if there are no true, interesting results to be discovered in a field of research, then any study claiming such a result is by definition false. Analogously, if none of your employees do drugs, all positive drug tests are false positives.


And another big one: the low statistical power of much psychological research, meaning that even if an effect is present, the original study or its attempted replication might not find it due to inadequate sample size.


Considering that it included a larger and more diverse sample across 12 countries, it may be that the original studies which couldn't be supported were valid but not widely applicable beyond a specific population, or that the results represent a change in how people act from the time of the studies (One of them was an effect based on seeing an American Flag, and I can see how that might have changed).


The original paper considers this and finds that to be of limited if not negligible concern. The experiment designs were extremely meticulous and even included the original authors whenever possible.


Great point!


> Yeah - this is a sample of the 'gold standard' of psychology papers, and only 10/13 could be reproduced.

It also raises the question of whether the reproducibility study is itself reproducible.


While I'm very happy to see more attempts at replication, I am quite shocked to hear the grandstanding that "reproducability is not as much of a problem".

Seen as a meta-experiment it is incredibly weak, with a tiny, biased sample. High-profile results that have a simple enough procedure so they can be combined.

As such, it can't get even close to support what they're (whoever -they- actually is, it may not be the actual researchers) trying to claim. What about less high-profile stuff? What about more complex setups? If 20% of your most rock-solid results are not reproducible, I wouldn't be so quick to celebrate. Imagine if 20% of basic physics or maths results weren't actually true...


Why did they include studies that have already been replicated innumerable times? There's is no value in doing so apart from biasing the results toward an outcome more favourable for psychology.


It's important to read the study's goals... But I agree. If we really want to understand how well the studies are being conducted there should be a random sampling of studies from different behavioral affects.


This could be equivalently titled "20% of the most well known psychology results are impossible to reproduce".

Also, among the ones that did, there are the Kahneman ones, and _he_ was the one to point out that most experiments are never reproduced so there was a higher chance that his results would be reproducible.


Your equivalent title would be equally misleading. It's not because an experiment does not, in one particular attempt at replication, lead to identical conclusions, that a finding is therefore "impossible to reproduce" or bogus.


I believe they had each of the 13 studies independently reproduced by 36 different labs. If you can't reproduce it across such a setting I think it's safe to discount the original work.


I understood the point was these was not "one particular attempt" but rather a larger set of attempted reproductions, have I misread?


I guess it's debatable. You could consider a single experimental design, administered by many different labs across many different countries, to be "many replications", or you could argue that by using the exact same questionnaire and keeping other conditions as comparable as possible, it's just one big geographically dispersed replication.

(If you look at http://www.talyarkoni.org/blog/wp-content/uploads/2013/12/ma... you see that the results for the 3 "failed" experiments did see some effect at some labs. It's only when tallying up all the results that they had to conclude that replication had failed for these three.)

This isn't just semantics, though. For example, if one question on the questionnaire influences how survey takers answered subsequent questions, then we might well fail to replicate the results of an older study. You might then conclude that the older study isn't as generalizable as you thought it would be and that you need more research to figure out when exactly the effect occurs or not. You might even say that it's likely that the original study was just a fluke. But it doesn't mean the original hypothesis has been definitely and unequivocally refuted.


To the idea of replicating the experiment, replicating it as closely as possible is exactly what the statistical methods assume. The other concept you're speaking to seems more like establishing causation or just generally understanding when the observed effects do or do not result. That seems better as a separate study altogether.


You're right. I was referring specifically to the fact that in this replication study, they've actually bunched together many different studies into a single questionnaire that tests them all. So it's not a straight-up replication, but then, in the social sciences replications almost never are.


The submission is from 26 November 2013. (I read the article when it was first published and have read related articles about the Many Labs Replication Project before.) The article kindly submitted here is by experienced science writer Ed Yong and links to some helpful background reading, including his report about Daniel Kahneman's open letter from 2012.

The Journal of Open Psychology Data published findings of the replication study mentioned here,[1] and the PsycNET site of the American Psychological Assocation provides a citation to the published version of the study findings in a psychology journal.[2] Improving replicability in science is an ongoing effort not just in psychology but in most branches of science, and is critically important in medical studies.

[1] http://openpsychologydata.metajnl.com/article/view/jopd.ad/1...

[2] http://psycnet.apa.org/journals/zsp/45/3/142/


All of their data and methods are all online here: https://osf.io/ebmf8/

This is a very well done study and almost every criticism I've read in the comments is addressed in it if you read the write up.


The reproducibility initiative and its supporters have been called "replication bullies."

http://www.sciencemagazinedigital.org/sciencemagazine/23_may...

I kind of see their point, but in the end if your study can't be replicated, you have to take your lumps.


<rant> And I think that psychology began to die when it became obsessed with statistics. Researching the mind has become an endless, bottom-up process with very few ideas and zero grand idea. I'm not even talking about practice... Being a psychologist myself, I'm bored through my skull with this illusion of objectivity (the general linear model is only a theory) and the pathetic little results it methodically crank out. The Rorschach is a test with very low statistical properties. I love it, it is useful. And I can prove it one case at the time. That's why I am learning computing and leave the tedious gravy train of meta-analyses, rotated factor-analyses and manualized, empirically validated methods behind me.</rant>


Great idea, lets go on our gut feelings rather than science. Lets apply this to other fields too. Driving a car will become much more exiting when it can explode any second. Homeopathy is medicine with very low statistical properties, but it's so much fun! I love it, it's so useful! And the car engineers and pharmaceutical scientists will no longer have to do that boring math and statistics. Lets leave empirically validated methods behind us!

Hint: if the statistics show that the results are pathetic, then the results are pathetic. Ignoring statistics merely sweeps that under the rug, it doesn't fix it.


An even better idea: Let's live according to probabilities. That will be fun too. And there are tons of positive results that are pathetic. Study shows that impact of hammer on foot causes sensation of pain compared to control group (p<.001). Obsession with statistics is the culprit not statistics. It's just a tool like another. Not a TOE.


Like many "soft-sciences", psychology has had a serious case of physics-envy since at least the days of Freud, with it only getting worse since. There were some brief respites now and again, with the likes of Jungian and Humanistic psychology. But these have largely been stamped out by scientistic, mechanistic, reductionist approaches.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: