Every academic cries about this but no one wants to accept an obvious solution: Add reprodicibility criteria for peer review. Right now even Tier 1 conferences do not have this criteria in their peer review system. The reviewer simply can't reject paper because authors didn't provided reproducibility information!
This is truely a plague in research right now. I've came across quite a few instances where authors told me that their experiments weren't reproducible even for them! They do not note this in paper because ultimately everyone needs to show some result for the funding they received (i.e. "published paper"). They obviously never want to share any code, data files, hardware etc. On one instance, author wrote me back that they can't share code with me because they lost all the code because their hard drive crashed! Reproducibility is a fundamental tenant of doing scientific work and this is actively and completely ignored in current peer review system.
I think conference chairs needs to take stand on this. We get now 4X to 8X papers in tier 1 conferences. Reproducibility could be a great filter when area chairs are scrambling to find reasons to reject papers. Sure, there will be papers where very specialized hardware or internal infrastructure of 10,000 computers were used. But those papers would be great for Tier 2 conferences.
If a researcher told me they couldn't provide primary data because their hard drive crashed, I'd contact their dean and the editor of the journal that published the article, and let them know. Researchers have a duty (unfortunately not well codified) to retain primary source data, and "hard drive crash" isn't an excuse (it means they didn't take even basic backup and archival measures).
I think this is a great idea at least for fields where reproducibility is comparatively easy to achieve, e.g. computer science: Include your data, code and run environment (using a suitable technology) with the paper to give reviewers a chance to "play" with your experiments (if they actually have the time for this).
In other fields such as experimental physics or biology it might be more difficult to achieve the same effect though, as experiments are quite hard to repeat in general: For example, in my former field (experimental quantum computing), building the setup and fabricating the sample required for a given experiment could take years of effort, making it almost impossible to "just" reproduce someone's work for the sake of verification.
That said, in experimental quantum physics the exciting results tended to get replicated within a couple of years by different teams anyway, not because these teams wanted to verify the results but rather because they wanted to build their own experiments on top of them (I imagine this is similar in other fields). Another natural way of exchanging knowledge and improving reproducibility was via the exchange of PostDocs and PhD students: If you do good work in one group, another group will usually be very eager to give you a position so that you can help them to set up the experiments there as well. I'd even argue that this is one of the main mode of knowledge dissemination in experimental science today, as most research papers are just extremely hard to reproduce without the specific -and often not encoded- knowledge of the individuals that ran the experiments.
I'm not sure though if it is practical to document everything in a research paper in such a way so that any person can reproduce a given experiment, as many of the techniques are very specialized and a lot of the equipment in the labs (at least in physics) like sample holders, electronics, chip design templates and fabrication recipes is custom-built, so documenting down to the last detail would take years of effort.
That's why (IMHO) written PhD theses are so important, because that's kind of the only place where you can write 200-400 pages about your work, and where you can include minute details such as your chip fabrication recipe, a description of your custom-built measurement software and a detailed summary of the experimental techniques used in your work. In that sense, PhD theses are probably more important to reproducibility than short papers.
Your description of experimental quantum physics exactly matches the work done in state of the art biology. It basically takes 10-20 years of training and excellent brains to reproduce your competitor's paper in a way that lets you build on top of it. Many people who complain that papers aren't reproducible just don't have the skill to move the state of the art, because it's become so esoteric and challenging to run the critical experiments.
I used to be a "everything must be replicable by even the simplest of people" person, but I changed to "for progress to be made there must be <X> competent people who can reproduce challenging experiments and run new ones".
It's completely on and off the walls bollocks with sriracha sauce on top and a good kick to the aforementioned balls that we don't have enough space (and time) for writing down the important bits of experiments.
While I agree with you in general, I'd like to point out a maybe-not-so-small detail:
> Sure, there will be papers where very specialized hardware or internal infrastructure of 10,000 computers were used. But those papers would be great for Tier 2 conferences.
This is an accurate description of a _lot_ of Google papers when it comes to Computational Linguistics. And I don't know whether a Tier 1 conference that rejects the biggest players could remain a Tier 1 conference for long.
There are many papers in Comp. Ling. right now that claim state of the art by running neural networks for a couple weeks on very expensive hardware. If those papers are not allowed in Tier 1 conferences then we should call it "state-of-the-cheap-art", but accepting them undermines the whole point since only a select few can afford it.
I don't say it can't be done. But I do think that it's not trivial.
Passing peer review is something used as a substitute for reproducibility. Just like testing a null hypothesis is a substitute for testing the research hypothesis, and citation count is used as a substitute for making precise and accurate predictions.
The procedure (scientific method) that has lead to all the great stuff we have around us today, I'd even call it a pillar of civilization, has been undergoing a piecemeal replacement since approximately WWII.
Scientific method (basically me just paraphrasing Imre Lakatos):
1) Explore and describe aspects of the world in detail, figure out what situations naturally arise or that can be devised that produce consistent, stable phenomenon.
2) Abduce (guess) an explanation for these consistent, stable phenomenon.
3) Explore the logical consequences of assuming your guess is correct. Figure out a few otherwise surprising (ie, inconsistent with other explanations people may have) predictions that can be deduced from it.
4) Collect data and compare it to the predictions generated in #3
5) Discard the guess or modify it to make it more consistent with the data produced in #4
6a) If the guess is modified, Go to 3.
6b) If the guess is discarded, Go to 2.
A lot of cancer research seems to be failing at step 1, ie they can't even get into the "testing otherwise surprising predictions" loop since there is no consistent, stable phenomenon to trust.
The wikipedia one is ok, but really most of that is stuffed into my #1 and it treats the "forming and testing a hypothesis" part too superficially. What did I write that makes you think the two are in conflict?
I wrote:
"figure out what situations naturally arise or that can be devised that produce consistent, stable phenomenon."
This is the same as figuring out the "reproducible manner".
I don't think you understood what I took issue with in your post.
You said:
> Passing peer review is something used as a substitute for reproducibility
And what I'm saying is "No, it's not". Reproducibility of the results is the most important aspect of the scientific method, otherwise, as I said above: "what's the point really? We would resort to trust or belief in one's sayings."
Oh. I never said that was a good thing. Institutionalized peer review as its done today is a relatively new thing, introduced post-WWII. Ie, that statement was descriptive, not normative.
I think you mean "proxy variable", not "substitute". That is, it's a variable which is correlated with the one you really want (quality, truth) but can't actually measure due to difficulty.
Sure, but I think non-preprint cultures would instantly become only-preprint cultures because I believe (and this might be folly) that most researchers rely on work they are confident about based on experience with the authors and details from the paper. I don’t know if this is that bad of a change since I admittedly do this already.
I have been working on reproducing computational biology papers from the cancer field lately. I am very frustrated. When the inputs and outputs are machine-readable data, there's no excuse for not making your work reproducible, in my opinion. Often the problem is plain laziness and disorganization.
One major problem is that there's not much real incentive to make your work reproducible. Money granting organizations favor researchers breaking new and exciting ground, not those rehashing an already published method. Publishers don't require reproducible methods, and reviewers don't have the time, desire, nor expertise to do an in-depth methods review.
Wet lab experiments are 1-2 orders of magnitude more expensive and difficult to reproduce, that's true, but we're not even getting the basics right!
Yeah there is no excuse for comp bio work really, but I don't think you should cast aspersions of laziness. By far the least lazy people I know are scientists. Publishing a paper is an incredible amount of work often under less than ideal conditions. The authors are usually exhausted. That isn't an excuse, but it is a factor nonetheless.
There are some incentives to make work reproducible. If you publish a method implemented in software, then making that software easy to use, well documented, with example data and respond to questions/issues etc it greatly enhances the chance other people will use it and cite it. Things have been getting much better in the last 5 years.
Preprints also help - I saw a preprint the other day where people complained on twitter there was no methods section and no code. The authors responded. They realise that potential reviewers may be the ones making these comments or at least see the comments. This talks to your point about not treating a paper as a single point in time, but an ongoing process.
I think things are improving. I am now seeing papers publish Jupyter notebooks in python or R scripts to reproduce all the diagrams and analyses, along with curated data that just plugs in. In general this actually works (although sometimes crucial bits of data are missing, and unlike an error in the paper, these are not necessarily fixed).
I'd probably encounter the bad kind of comp bio if I actually worked in that field, but a tool made by computational biologists for reproducible research has also been very useful in my work: the build tool Snakemake [1].
Snakemake is a parallel build tool designed for data instead of software as its main use case.
> I don't think you should cast aspersions of laziness
You're right. I use laziness out of frustration, but in reality I know it's hard for me to make my own work fully documented and reproducible.
A large part of the situation is cultural. Software engineers are born into a world of version control, unit tests, documentation, managing complexity, reproducibility, debugability, etc (and we still struggle with it). That sort of culture and associated tools are missing from data science.
That would be awesome and I think everyone would like to do this but funding for that barely exists. If you’re in the United States, complain to NIH, NSF, and especially your rep.
And another thing! Why should a project be done when it gets published?! What kind of software project would make one release and call it good forever? Nonsense.
In an ideal world you would be right, but that's not how the funding structures for academic research are set up.
Think of a research lab as a company that gets paid per prototype and then has to market the concept for the next prototype in an infinite loop. If you can't package up what you're doing into a sequence of small prototypes then you're not getting paid.
Research labs operate on a very slow tech upgrade cycle anyway; since code is handed down from assistant to post-doc to grad student, complete rewrites would take up a significant fraction of a person's time at any given lab, and so codebases are often as long-lived as the labs in which they live. We're talking decades-old FORTRAN here. Running twenty-year-old software is a barrier for some labs, but not all.
My experience is the opposite, that a lot of labs are running R code that won't work 6 months later, and no one actually recorded the package version numbers that were used.
TBF, the experience I have is with physics and mechanical/civil engineering labs, where there historically hasn't been much of a reliance on R. And in any case said experience is several years out of date.
Speaking of, I haven't played with R - what are its standard methods for handling dependencies? I'm particularly enamored of the pip and npm way of doing it, where you create a version-controlled artifact (requirements.txt and packages.json, respectively) that defines your dependencies. Does R not have a similar system, or do people just not use it?
R isn't fantastic for handling dependencies. If your code is bundled up as a package then you can specify version numbers for your dependencies, but I don't know of any equivalent to `pip freeze` to actually list these. Installing anything other than the latest version of a package is a bit of a pain, and setting up environments for separate projects is pretty much unheard of.
I'm a bit bitter about the whole "writing reproducible code in R", as I'm currently wasting a lot of time trying to get R code I wrote at the start of my PhD to run again now I'm writing up.
One where you don't get sustained funding to maintain it. In compbio even major resources, known to everyone in the domain only have funding from one two-year grant to another.
Yup, it is very frustrating - and discouraging. There are some tools designed for replication - Snakemake, for one, in python, which seems to work pretty well. But, I think it adds unnecessary dependencies and complexity. I support using basic using basic UNIX tools for reproducibility, when possible: git to make all code publicly accessible, and make to reproduce all experiments and figures. I've implemented it in 2 comp bio projects (one which relies on publicly accessible data, and one which doesn't). In both cases it massively improved my workflow. It has the added benefit of being language-independent: when I have to blend R and python, it works seamlessly.
I first started thinking about this problem after I heard this talk by Arfon Smith [1]. It contextualizes reproducibility tangentially in discussions about viewing modern research as a heterogeneous (nodes can represent papers, experiments, software, authors, etc.) dependency network.
Strongly agree with this. We're aiming to make Kaggle Kernels/Datasets as reproducible as possible while being easy to use, with versioned data, versioned code, and versioned compute environments (the latter through Docker containers). https://www.kaggle.com/kernels
Requiring reproducibility would radically change the academic landscape for scientists. I'm sure in the end it would be for the greater good but I think most career researchers in the US would hate this kind of change.
It would require either all researchers to be 25% software engineers, or to add 30-50% headcount by adding SE's. It would be a radical change in methods and funding. There is massive institutional pushback to moving in this direction, as people already feel too much is done in/by computers. To make any research to be publishable depend on software engineering technicalities...
OTOH, it would be awesome for me, as I've been doing consulting on exactly this for 15+ years :)
shrug What does 'the correct answer' even mean for any real world problem? There is so much we don't know. I'm not saying we should just make up papers and call it good, but there is a gradient between 'making stuff up' and 'doing everything according to a hypothetical ideal methodology all the time'. It's a matter of opinion how much not having a 100% repeatable experiment for each paper moves us towards the left of that gradient. It would be an improvement, yes, but I would put it more in the 'moves us from 75 to 80' ballpark than in the 'moves us from 20 to 100' that most people in this thread seem to believe.
"How do you do science if you can't repeat experiments?"
Well, the way we've been doing it for 300 years? It's not like the flawed processes of the past didn't yield anything. To put it in operations research terms: science doesn't work like hill climbing, it's more like a stochastic optimization process without a (well defined) halting condition. Again, I'm not saying we can't and shouldn't strive to improve, but the first year grad student level huff puffery in this thread is just completely detached from reality. 'Science' as you were explained the concept in high school is only a high-level abstraction of the concept, it's not how actual things are or get done in the real world.
personally I find the worriers overstating the problem of reproducibility. lots of results are reproduced all the time, but in the course of testing other hypotheses.
if a result depends on exact replication of methods it isn't that robust an effect, and just might be a trivial one.
I have worked on experimental high-energy physics with computers and had the same idea - physicists should just hire software engineers. Large amounts of physicist time are spent cobbling together, and suffering from, shoddy software. At least we had public repositories with version numbers, and datasets were shared and uniquely named.
Can you point to some computational biology papers from the cancer field that you're reproducing? I'd love to give it a go myself. If you don't mind, can you also contact me (email is in my profile)? Maybe we can collaborate.
The term reproducible research refers to the idea that the ultimate product of academic research is the paper along with the laboratory notebooks [12] and full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.[13][14][15][16][17] Typical examples of reproducible research comprise compendia of data, code and text files, often organised around an R Markdown source document[18] or a Jupyter notebook.[19]
> abstract numerical disciplines
Could you give an example of a non-numerical science?
Yes of course I know what the Wikipedia definition is. Have you ever tried to 'replicate' a paper? Let's say someone did a survey of something. How are you going to 'replicate' that? Or used a particular piece of equipment, or did some measurements in a field? Sure, it's easy to restrict yourself to the data files that go into some software, and say 'oh I used the same data set and software version, my research is 'replicatable'.' My point is that there are a lot of aspects to most research that aren't easy (or even at all) to replicate, even if one is put in all the effort required to document every minute circumstance. There is more to 'science' than CS algos and lab bench biomed.
"Could you give an example of a non-numerical science?"
This is the point you're probably going to argue about what is or is not 'science', so let me use 'scholarship' instead from the fields I work in: law, environmental science and geography. There is a lot of research that is not purely numerical in nature. Note that this doesn't mean that it doesn't use numbers; you misread what I wrote - I didn't say 'numerical disciplines', I said 'abstract numerical disciplines', by which I meant disciplines that require at least some judgement of qualitative properties or inexact measurements of things along the way, somewhere.
No, but I had spent a whole year at university trying to replicate experiments in physics lab. I was studying CS but they had too many physics teachers so we had many of the same courses :)
The equipment was abused by decades of students, you had 45 minutes to do the experiment, and the results usually weren't even the correct order of magnitude :) It was stuff like measuring speed of sound, atmospheric pressure, gravity constant, etc.
Still - at least I knew it's the problem with equipment or with me, not with some obscure details that weren't mentioned and that I can't replicate.
> My point is that there are a lot of aspects to most research that aren't easy (or even at all) to replicate
And my point is that it makes publishing everything that can be adjusted that much more important.
Another question - when there's economic and proffesional incentive to fudge the numbers, and no way to check if the numbers were fudged - how do you trust the results?
I think you too his question too literally: I think he meant "what does it even mean pragmatically and why do you expect funding to go to less novel but more easily reproducible results instead of more novel and harder to reproduce papers?"
Well, reproduced research is inherently more valuable than novel but unsupported research. Otherwise why fund research at all? Just hire alchemists or poets, at least you’ll be entertained.
The current state of research seems to reflect that there is a balance: almost none of the most cited papers go really far out of theirs way to be trivially reproducible, even in computing where doing so is unusually much easier.
If a paper is very novel and impactful but difficult to reproduce, it doesn't seem to matter much for citation counts as long as people just believe it to be true.
And yet there's no revolution among funding sources to send money away from those and instead towards researchers who spend more time on it.
funding should go to more reproducible studies because those are the ones that are usefull. What use is a study saying that X will cure cancer Y if noone can reproduce that and verify?
I think it’s the same problem as trying to document how to install or use new software. Unless you redo the steps yourself or somebody does that for you, you won’t know which assumptions you forgot to document. Since no-one retries the study right after you finished it, there out to be gaps in the documentation.
by the way, when inputs and outputs are clearly available, it's amazing how easy it to eliminate papers from consideration.
I found a paper that deleted (one at a time) each gene in yeast, to see which ones were "absolutely required". They published the list. For each "novel discovery of a required gene", I was able to show the gene overlapped (yes, genes overlap) with an already known required gene, and so, much of the paper's "novel discovery" section came into doubt.
It's nice because genes have precise coordinates and range intersection is cheap to compute.
Many of the popular journals in cancer biology have strict data retention policies. If that was the case in your example, you should probably contact the editor but also appreciate that mistakes do happen and most people are not cheating the system.
It's quite difficult even for trained programmers to make their pipelines reproducibile. Just look at how much effort goes into build pipelines and the like. There isn't any good tooling for doing it so people build their own systems ad-hoc which they never document, that is if they use a system at all.
Big governments need to fund it and solve this problem once and for all. At the moment most grant procedures require something more concrete than just "solve the reproducibility problem", though.
> So papers have become a good business, no the way to disseminate outstanding research results.
That's awfully cynical and over-broad, but I agree to a point. Greedy and unscrupulous publishers are part of the problem, but so are lax or unprincipled scientists eager for prestige and a career-making publication in a top tier journal. It's an unfortunate chicken-and-egg cycle now with no easy way to cut it. Perhaps more emphasis on replication post-publication? Perhaps a reputation system for unethical publishers or scientists?
"Greedy and unscrupulous publishers are part of the problem, but so are lax or unprincipled scientists eager for prestige and a career-making publication in a top tier journal."
That's just incredibly unfair. There are some fields and methodologies where p-hacking and cherry-picking have been a problem, but the primary reason that papers aren't reproducible is just noise and basic statistics.
As a scientist, you control for what you can think of, but there are often way too many variables to control completely, and it's probable that you miss some. Those variables come to light when someone else tries to work with your method and can't reproduce it locally. However, real scientists don't stop and accuse the original authors of being "unprincipled" -- nine-point-nine times out of ten, they work with the original authors to discover the discrepancies.
It isn't surprising at all to actual, working scientists that most papers are impossible to reproduce from a clean room, using only the original paper. It's the expected state of affairs when you're working with imperfect, noisy techniques, and trying tease out subtle phenomena.
> It's the expected state of affairs when you're working with imperfect, noisy techniques, and trying tease out subtle phenomena.
Sounds like a ridiculously low standard. If your paper is in principle unreplicable, then I only have your word for evidence of what you're claiming. This is not science. Even journalists are held to a higher standard.
There are some fields and methodologies where p-hacking
and cherry-picking have been a problem, but the primary
reason that papers aren't reproducible is just noise and
basic statistics.
It's possible to imagine a version of academia where results that can be attributed to noise don't get published.
Is it? That doesn't seem at all obvious to me. In fact it seems decidedly impossible.
Almost any result could in principle be attributable to noise; where are you planning to source all of the funding to run large enough studies to minimise that? And no matter how large your experiments or how many you run, you're still going to end up with some published results attributable to noise since, as GP says, that's the nature of statistics. By its nature, you cannot tell whether a result is noise. You only have odds.
I'm not saying there aren't problems with reproducability in many fields, but to suggest that you can eliminate it entirely is naive.
> Almost any result could in principle be attributable to noise; where are you planning to source all of the funding to run large enough studies to minimise that? By its nature, you cannot tell whether a result is noise. You only have odds.
Well, with a single paper the odds indeed are that it's noise. That's why we need reproduction. Now of course a paper needs to be published for it to be replicated later. But the paper (and/or supplemental material) should contain all possible things the research team can think of that are relevant to reproducing it - otherwise it's setting itself up to be unverifiable in practice. Papers that are unverifiable in practice should not be publishable at all, because a) they won't be reproduced and thus it'll be forever indistinguishable from noise, and b) there's no way to determine whether it's real research, or a cleverly crafted bullshit.
I don't disagree with any of that, although I'd stick a big citation needed on the implicit suggestion that there's a large group of scientists who aren't making a good-faith effort to ensure that their successors will have the information they need to reproduce (that is, after all, what a paper is).
My issue is the flippant and silly claim that "[i]t's possible to imagine a version of academia where results that can be attributed to noise don't get published".
I think this is actually something that can be experimentally examined.
Take a sampling of a large number of papers, give them some sort of rating based on whether they provide enough information to reproduce, how clear their experimental and analytical methodology was, whether their primary data and scripts are available, etc, and then look at that rating versus their citations.
Hopefully, better papers get more attention and more citations.
(And yeah, "peer review" as it is done before a paper is published is not supposed to establish a paper as correct, it is supposed to validate it as interesting. Poor peer review ultimately makes a journal uninteresting, which means it might as well not exist.)
That sounds like a very interesting idea. At the least, it would be interesting see the major classes of reproducibility problems. And there may well be a lot of low-hanging fruit, as the comments on this page suggest about data corpuses in computational fields.
> However, real scientists don't stop and accuse the original authors of being "unprincipled" -- nine-point-nine times out of ten, they work with the original authors to discover the discrepancies.
I'm not a real scientist or even a pretend one, and I'd like to believe your 9.9/10 figure, but don't delude yourself there aren't those out there publishing papers for the sake of nothing more than retaining their position in a university. Or bumping their citation count or pushing an agenda or whatever.
We're in this 'reproducibility crisis' precisely because this game of science being played doesn't reward reproducibility and scientists are just as much participants as publishers are.
The statistics and probabilistic methods of science are there specifically for controlling and quantifying the effect of noise and uncertainty and making sure experiments are reproducible a high percentage of the time.
If you don't get a quantifiable amount of reproducibility, there is no point to using statistics at all and what you are doing is not science.
I wrote my concerns about replicating these studies a couple of years ago [1]. In short, most papers don't have enough details to allow for a third party to replicate them.
Problem areas include detailed protocols and reagents used. If you don't what someone did exactly and what they used then replicating it is going to be very difficult.
That is exactly the problem. All this stuff is supposed to be in the paper, the paper is not a press release.
And everyone understands that its hard to know exactly what info someone else will be missing, so it wont be perfect at first. That is why there is supposed to be a back and forth the first time a new method is used.
Figuring out what needs to go into the methods is going to be an iterative process. As it is right now its a disaster though. Like they say in this article no one even knows the most basic info like cell density. The entire system is not set up to deal with replications at all because they haven't been doing them.
I don't agree. I think this stuff should be in supplemental notes (we call it the 'artefact' in my field, not sure if that's universal). A paper should be reasonably readable, not a huge list of processes in detail.
I'm used to biomed where the "supplements" do exist and are containing more and more of the actual information while the "paper" is becoming more of an "executive summary". Since I find all that info in the same place, I consider it part of "the paper".
These sayings like "welcome to the real world", "there's a difference between theory and practice", thought-killers are just excuses for being lazy. Science is hard, really hard.
People who think doing a good job is too hard should get out of science instead of polluting the literature with their statistically significant results or whatever. Unfortunately its been the opposite happening and the people who cant handle it have been training the new people how to get away with half-assing it. This has been going on for a long time now, at least since the 1940s in some areas...
There's a huge difference between "bad science", and "it doesn't live up to the average HN reader's belief of how science works". Maybe the parent's comment is harsh, but the sentiment is, IMO, correct.
Science is a system, not a series of published recipes. Even the best, most-cited papers in scientific history, when taken in isolation, are often riddled with problems. No paper is perfect, but the process of science, over time, validates the papers that are worth validating. Experiments are rarely reproduced exactly, but rather, reproduced under a thousand different conditions, thus proving that the result is robust over time.
I've often felt that these efforts to mass-reproduce scientific literature are misguided. Anyone who has worked in a lab knows that most papers are difficult to reproduce for any number of reasons, and that even if you have a perfect description of materials and methods, plenty of correct and valid results are difficult to reproduce, just because lab work is hard and messy. It's not fair to fling accusations of impropriety simply because an experiment doesn't work in your hands.
> Science is a system, not a series of published recipe
Peer review and reproduction are essential components of a scientific system. If a paper does not provide—it its main text or supplements—the necessary detail to enable reproduction, it is not a scientific paper but a press release.
> plenty of correct and valid results are difficult to reproduce
There is lots of good science in figuring out why certain results reproduce in certain circumstances and not in others. With insufficient methodological detail, this exploration is rendered impossible.
"If a paper does not provide—it its main text or supplements—the necessary detail to enable reproduction, it is not a scientific paper but a press release."
No. That's ridiculous. When a paper doesn't have all of the necessary details to reproduce, then scientists contact the authors, and try to get those details. In nearly all cases, the details are resolved without incident. If the omissions are sufficiently egregious, errata are issued. If they're really, really bad, papers are withdrawn.
If a paper is so old that the original authors are long gone, you look for a citation history, and see who else reproduced the work over the years. Papers that were never reproduced are unlikely to be fruitful.
Mistakes happen and details get omitted, but thankfully, we have lots of ways of working around those mistakes. Nobody -- except for armchair scientists -- assumes that scientific knowledge is communicated exclusively by the methods sections of publications.
"This article rebuts your assertion. In most of the cases, the details were not resolved despite great effort and expense."
No, you're just misinterpreting it. They stopped pursuing a number of papers after they discovered they couldn't possibly reproduce them all with their tiny budget.
Nobody said it was cheap to reproduce science, just that it's possible.
"I’m not a scientist, but I advise (without compensation) a grant-writing board. But okay."
It sounds like you're unqualified to be doing that job, then.
Indeed; journals are repos, authors are committers, reviews are pull requests. But the unit tests are insufficient, and literature references don't quite make it as a form of continuous integration.
Many Big Pharma companies literally have their PR department run their clinical trial studies. If someone is surprised that a paper sounds like a press release they have a lot to learn how "science" works today.
>"Many Big Pharma companies literally have their PR department run their clinical trial studies. If someone is surprised that a paper sounds like a press release they have a lot to learn how "science" works today."
I call it biomedical research rather than honor it with the name "science".
Repeatable experiments. Done by machines, properly accounted ingredients and so on. Physics does it. It should be perfectly doable with cell cultures too. Mice are of course a lot harder. But if we wouldn't wait decades to replicate, but could simply get in touch with the originators - and so do a collaborative effort in establishing the parameters.
This isn't helpful. People questioning this is a good thing. We can't just throw our hands in the air and go "welp, shit's fucked, no point in trying to fix it".
This is nothing recent though. Back in early 2000s I was trying to reproduce published reactions regarding catalysts leading to certain yields, and virtually all published stuff was absolute garbage and could not be reproduced no matter how many times you tried. That makes you wonder what journals actually do: accept articles on good faith only? That is unforgivable in a scientific society.
Back in the day I had a physics professor who sometimes complained about his inability to reproduce certain published physics experiments - the kind you could do in a small lab and on a limited budget. (We're not talking "Big Physics" here.) Well, the guy wasn't particularly well-liked for a variety of reasons, so we all tended to just chalk this up to probable incompetence on his part. Knowing what I know now, though, I might be more inclined to give him the benefit of the doubt here - that in fact these experiments were just not reproducible, at least not as described in the literature.
The peer review process is halfway between "accept on good faith" and "reproduce the results". And it's not a flawless compromise. As a peer reviewer, I check that an incoming paper is sane, says something new, and doesn't have obvious holes. But I don't reproduce the original experiment unless it's very quick. Peer review is entirely unpaid and uncredited, so I can't justify taking time from my own research.
Ideally, every research team would have a remote collaborator that checks their results and gets credit for doing so (even if the answer is no!). But right now there is nothing to incentivize that kind of rigor.
> If you don't what someone did exactly and what they used then replicating it is going to be very difficult.
Why do journals accept these papers? If you can just make shit up, it kind of makes the whole value of a paid journal moot. What value do they add if not reviewing the damn paper?
I don't write research papers, so I don't really know what I'm talking about here, but I assume it's not as easy to just "make shit up" as you are implying here. There's a peer review process, there are lots of people working in the lab that make it difficult to keep a large conspiracy secret, there are serious penalties if you are caught fabricating results, etc.
Peer review is pretty ineffective at catching errors. In one study of peer review, reviewers caught ~ 30% of deliberately inserted errors that were specifically chosen to be detectable in review [0]. Fraud is noticed by the broader research community (assuming it's noticed at all) when it continues for years over multiple manuscripts.
You make a good point about colleagues in the same lab noticing unethical behavior. A shared research culture that values integrity and transparency is a good defense against malpractice and fraud. Building up and maintaining that culture—and the methodological skills to back it up—is a continual challenge.
What do the reviewers do? They don't try to reproduce it. So yeah make shit up, as long as it is plausible no one will notice.
For example Yoshihiro Sato was recently (2016-2017 time frame) caught fabricating results, turns out he had been doing it for 30 years. He had more than 20 papers retracted. Also note he wasn't caught by a fellow scientist in the field he was caught by a statistician bulk processing papers looking for anomalies. So yeah big penalties if you are caught, but the odds of getting caught are so low you have to be stunningly stupid to get caught.
Sadly, once the statistical methods being used here and the types of anomalies being found become well-known, it's probably not too hard for fakers to adjust their "results" so that such anomalies are far more difficult (if not impossible) to detect. For example, if an analysis would show that your experimental data is maybe a bit too clean, in a way that wouldn't normally turn up in the real world and would therefore cast doubt on its validity, then you just need to make sure that you dirty it up a little before publication.
There is a difference between wholesale making things up and conveniently leaving out variables that alter analysis of your data. I do agree overall that this is not trivial and there are tremendous disincentives; that is evidentally not a good tactic to reproducing experiments in the general case.
What is repeatability? 10% of the time? 50% of the time? 99% of the time? 99.9999% of the time?
Even medications made in high volume processes have rejected units (and sometimes, batches). Process drifted, machine failed, raw ingredients had some issue.
A scientific paper at the very early stage is meant to show that something is possible if the conditions are just right. It may be actually outside the capability of the lab to document (or measure, or even know) every variable that makes up these conditions. One of the reasons there is such a chasm between lab and practice is that as people dive in for deeper review and productization, they find the process is too finicky, the applicability too narrow, etc. But the scientific paper is the first step in this process of discovery.
That said, it's critical that authors document whatever they do know or have control over. Electronic publishing, tools like Github, etc, all make the barrier for disclosure much lower.
I think the key here is not if the paper was reproduced, but whether it includes enough information to make it possible.
> Even medications made in high volume processes have rejected units (and sometimes, batches). Process drifted, machine failed, raw ingredients had some issue.
Sure, but here the process is known in details, so the manufacturer can just rerun it. A scientific paper (with supplemental material) should be exactly such a detailed process description, so that researchers could run it if they want.
Now, if the paper does not contain enough information to enable its reproduction, then there's really no way to tell whether it described real results, tampered results, or whether it was just made from whole cloth.
There's no shame in being wrong. But there is shame in making sure no one can check if you're wrong.
It isn't but it kind of is. It is a 95% chance that you should reject the null hypothesis. For example if we flip a coin a 5 times and get heads every time that gives you a p value .03 for rejecting the null hypothesis that the coin is fair. If I run the five coin flip experiment a 100 times and never once reproduce a significant result is it because gee science is hard, or is it because you hit one of those 3% of cases. The same goes for real science if you can't reproduce it more often than not then you might be close but you haven't figured it out yet.
Ok, so we can agree that if you got positive a result with p<0.05 and you repeat the experiment 100 times you expect to reproduce a positive result (i.e. getting a new p-value below 0.05)
a) if the null hypothesis is true, 5 times
b) if the null hypothesis is not true, anywhere between 0
and 100 times (depending on what is the true alternative)
This is quite different from your previous assertion that the expected number of successful replications would be 95 out of 100.
I once read about a scientist who, when others were having trouble replicating his results, with a straight face then claimed that his work was so bleeding-edge that it could only be reproduced by his own hands, in his own lab, using his own materials and equipment. Now, while there might have been an element of truth to this it sure sounded like total B.S. to me! I'm not sure than anyone had really challenged him on it, either.
Because the journals in question are not open access / open data where data submission is required. To answer your next question, yes, that entirely goes against the whole point of science, which is that it's supposed to be reproducible.
It's not terrible for in vitro studies, but that would be a stretch for in vivo studies at a CRO, especially for oncology, where lots of studies require tumor grafts. It's not my area, but I think those mice can run $1000 and up per animal. The cheap, more common strains are ~$25 each. I haven't looked at the papers they're trying to replicate, but in my lab, it's not unusual to have 60 mice in a study, so you can see how just getting animals into your facility can be expensive.
It is; that's my point. As mattkrause notes below, that number is for mice with human tumor xenografts. You have to harvest a human patient's tumor, blend it up, and inject it into immunocompromised mice. If you buy them from the vendor who has to buy their tumors, pay their labor, and make a profit, they're not cheap. If you have ready access to clinical tumors and cheap labor, the price drops precipitously to ~$70.
By the way, you can poke around some of the animal vendor's sites. They mostly have their pricing on their websites along with a bunch of other information. Four really big ones are Taconic, Jackson Labs, Envigo, and Charles River.
While labs have huge economies of scale—-I worked somewhere with a quarter million mice—-the mice aren’t just random field mice. Some of the genetically engineered ones are hundreds of dollars a pair (and possibly more if custom, raised under unusual conditions, etc). Xenografts need skilled labor and very clean conditions for the immunocompromised mice. $1000 seems a little high, but not much, especially if the work isn’t being done by an underpaid grad student.
For the price of 10% of subsidy, you could sponsor reproducing ~10% of discoveries (proportional to the subsidy they took). Just some simple public provably random commitment protocol.
While you are doing the experiment, everyone knows theres a 10% probability of eventually being selected, so healthy pressure to make sure everything is properly documented, and everyone looks out to detect fraud by collaborators.
I don't see the problem, unless its the 10% price hike... if 10% is too much, just do 90% of the usual number of projects. I'd prefer 10% less projects if it ensures much higher reproducibility rates...
I don’t understand your math. Typically, for one paper, if you’re trying to reproduce the experiment (rather than just the analysis) the cost is closer to 110% because, for many researchers, their papers build off of their previous papers so they already have expertise with a particular assay, protocol, technology, etc.
The math stays the same, even if the numbers change: with your number of 110% of the original project price for reproduction by others we get the following price hike 110% * 1/10 = 11%
big deal, 11% more expensive science, but 1 out of 10 results get reproduced, stoichiocratically, so you don't know if it will be reproduced until after publication...
Also: your comment about reproducibility details being scattered over the previous work of the original authors... as I said, in a world where we use my system, you are incentivized to put all details for reproduction within the paper, since you wouldn't want to risk possible reproduction by others to fail simply because they didn't read your previous papers...
There have been several news items on Hacker News recently about academic publishing, reproducibility in general and pre-print servers and I would love it if eLife got some more attention, for better or worse, for the excellent work they are doing and helping to fund.
To those biology researchers in the comments, please submit your work! even failed results! Even if it's software heavy. Editors are listening closely to your feedback.
There is an unwritten rule in biology that if you publish a paper that refers to uses of certain reagents that are not commercially available, then you are obligated to provide those reagents to other investigators who read the paper and request them. There can also be an expected obligation that the other researchers will share any data they generate using the reagents with the original authors.
Outside of biology, I have seen many "academic" papers published on computer-related topics that refer to software programs developed by the papers' authors that are crucial to the research but not publicly available. Is there any similar unwritten rule to that in biology where another researcher reading these papers can request a copy of these programs from the authors?
Obviously, in many cases other researchers cannot replicate and verify findings without access to the same research tools used in the published papers.
I've frequently asked for researchers code. most are excited someone is interested in their work, if a little nervous, or embarrassed by their coding umm style (neuroscience)
> organizers realized they needed more information and materials from the original authors;
Then the study already isn't replicable by definition, so why waste time asking the original authors? Just mark it down as 'not replicable' and move on.
I'm not so sure this is the right way to go about it. They give the example of cell densities, but there are also tons of things that just are never reported in in vivo papers, including the light cycle in animal rooms (mice and rats are nocturnal, so some labs make their rooms dark during the day), the preparation of treatment compounds, information about animal ages, handling procedures, etc. In addition, many papers have length limits, so you can only include so much.
It's important that it become the norm to include it, though. So rather than just moving on, they're trying to gather that information so that if they CAN reproduce it, they now know how to make sure the next experiment can be more reproducible.
Nobody ever reproduces papers exactly, because you can’t. There are too many variables, and even though you try to control as many as possible, you can still be blindsided by the random variate that you didn’t anticipate.
Scientific results that are robust to random variation are the important ones. The ones that can only be reproduced exactly as specified are most likely to be “meaningless”.
I'm going to guess that psychology research is far less expensive. To give you a sense of perspective here, $25k will get you something like 4 months of a full time employee at the "fresh out of college" experience level, including benefits and research consumables. That also assumes that you have all of the equipment in place.
If all there doing is running a few western blots, because that's all they can do on this budget, well then ok. But I'll venture to guess that that's a far cry from reproducing the most important results in these papers.
As far as I know, the problem wasn't primarily one of underestimating cost or effort - it was one of funding. The non-profit that did this has downsized quite a bit since the RP:CB began and has been seeking other funders.
It's good to have reasonable goals and a budget, even if some of them aren't met. The funders absolutely know what they are doing and handicapping a replication effort because it went a few thousands over budget would be outrageous. There are still unknown quantities at play here, a large scale replication effort hasn't been attempted before and those efforts themselves - the extent of the effort required to replicate and the impact of failure - are also goals. Failure results, of which there have been several already now, can severely damage the reputation of prestigious journals and teams of scientists. They're not doing this because replication is popular, or cheap or because they thought it would be a fun afternoon lark.
They don't try to reproduce all the results in a paper, only a few "important results", more like a spot check. And it is supposed to be much cheaper since the methods have already been figured out...
Most of it is a set of nearly fanatical beliefs about reality only vaguely related to facts agreed through consensus by authority. Look at how strongly doctors fought against germ theory. They saw no reason to wash their hands for delivering a new born after having spent their morning cutting open cadavers. Their response was anything but scientific even in the face of easily tested claims and results.
If research can’t be reproduced it’s just a story with interesting data. I doubt most scientists using statistics would be able to provide the alpha / p value / confidence interval etc for their hypothesis.
When the bar is set so low should we be surprised by low quality?
Hand washing may still be a problem. I saw a blurb recently which stated that when doctors know that they're being monitored for using proper hand washing protocols, compliance was near the top end of the scale. But when they thought they were no longer being watched, compliance fell by something like two-thirds. So in other words, doctors (and no doubt others) probably still aren't anywhere close to doing it right.
That would be because “doing it right” is completely over the top. You are suppose to wash your hands before and after each patient interaction, which includes touching their notes, or coming into a patient environment without touching anything.
This is impractical, as it would slow everything down so much, and would make carrying anything impossible, not to mention the negative impact of doctor-patient interactions. It also doesn’t match how patients then move around, touching objects around hospital and nullify the whole process.
Doctors and nurses tend to common sense check when a hand wash/alcohol gel is needed, which I think is a good thing. I think some go too far the other way, but there will always be variation.
> That would be because “doing it right” is completely over the top.
Understand the over the top and common sense angles here, but the "unmonitored" drop was so dramatic as to imply that this was less a matter of reverting to a more sensible approach and more one of dropping down to outright carelessness. That may not actually be the case though.
One big issue with making papers reproducible is that the incentives to do that simply don't exist in many cases. It is often not rewarded if you put additional effort there, and usually also not punished even if you don't do the minimum the journals require in their official rules.
The official rules are slowly changing, and the funding agencies tend to require that scientists make raw data available, and put effort into making their experiments reproducible. But the reality is changing much more slowly than the rules.
Actually the incentives encourage non-reproducibility. Any effort put into reproducibility increases the chance that the results won't be publishable. The worst outcome is not publishing crap, it is not publishing at all. This needs to change.
Please don't ignore this very important quote from the article:
"In fact, many of the initial 50 papers have been confirmed by other groups, as some of the RP:CB’s critics have pointed out."
This article is staying assiduously neutral, but one perfectly valid interpretation is that the original initiative was flawed in a way that was predicted by the initiative's critics. Science is routinely reproduced -- just not by labs working in isolation, using publications as clean-room instruction books. This is the sort of thing that programmers believe about science, not something that scientists believe themselves.
There are a great many serious, legitimate scientists who believe that this "reproducibility crisis" is verging on irrational, and it's important to consider their arguments. It's particularly scary to me that so many comments here are dovetailing with the sort of nonsense you encounter on anti-vax and global warming denial forums. We're literally gaslighting the process that has done more to advance society than any other in human history:
"The discovery that an experiment does not replicate is not a lack of success but an opportunity. Many of the current concerns about reproducibility overlook the dynamic, iterative nature of the process of discovery where discordant results are essential to producing more integrated accounts and (eventually) translation. A failure to reproduce is only the first step in scientific inquiry. In many ways, how science responds to these failures is what determines whether it succeeds."
And the attitude being described above seems to me to border on religious "just trust us" dogma. I mean, one of the big things that's supposed to distinguish real science from pseudoscience and such is the ability to fairly easily and routinely reproduce its results. And if that's not happening, and in fact it's being actively discouraged, then that's a real problem!
"In fact, many of the initial 50 papers have been confirmed by other groups, as some of the RP:CB’s critics have pointed out."
I don't know that I would be willing to take a statement like that at face value. I once read of a scientist who claimed that there was absolutely no need to try and independently reproduce his work, since it was being routinely reproduced in college labs everywhere as part of other experiments - or something like that. To me that meant that it should then be trivial to reproduce these results when attempted by trained professionals who were specifically trying to do that very thing. So why the reluctance to allow that to happen?
"And the attitude being described above seems to me to border on religious "just trust us" dogma. I mean, one of the big things that's supposed to distinguish real science from pseudoscience and such is the ability to fairly easily and routinely reproduce its results."
Nowhere is it written that you should be able to "easily and routinely" reproduce a scientific result. It's damned hard for skilled scientists to reproduce most scientific results. And yes, it's utterly impossible for amateurs. I'm sorry if that makes you uncomfortable, but it's the truth.
There are many things in life that you routinely accept on authority: When you fly in a plane, that it isn't going to crash into the earth. When you turn on the tap, that clean, safe water comes out. When you go to your doctor, that she is prescribing you medication that will help you, not hurt you. When you vaccinate your children, that you are protecting them from horrible diseases. I'm sorry that it bothers you, but "science" is just one more thing in the world that you're going to have to accept, because nobody -- scientist or otherwise -- can independently verify all of human knowledge.
If it makes you feel better, you can look around, realize that we're no starving from famine, or dying from minor cuts, or poisoning ourselves with lead, or dying of Polio or Smallpox, and you can try to convince yourself that in the long run, the method works. Because that's the argument you're missing. You don't have to take my word for it, or anyone else's. It's a system. The system works. You just don't fully understand why it works, and you're not willing to read when a group of scientists write a long article that tries to explain it to you. Because you'd rather believe that explanation is "dogma".
"I don't know that I would be willing to take a statement like that at face value."
Nobody is stopping you from digging into it. If you're really so bothered by it, I encourage you to follow up.
"So why the reluctance to allow that to happen?"
There is no reluctance, whatsoever. Nobody is trying to stop these people. They're just beginning to realize that their approach is a lot damned harder than they originally expected, and those of us who knew it would be are pointing it out.
I understand where you're coming from here so I don't want to belabor the point, but what you're actually describing is more of a faith-based system rather than one based on science! And I for one am simply no longer willing to take most of what (for example) shows up in the peer-reviewed literature these days based primarily on faith, and I haven't been for several decades now. The fact that you and so many others seem so readily willing to do so makes me seriously question your critical thinking skills.
And I don't blindly trust that "the system works" really at all to the extent that's so it often claimed to. An awful lot of what gets published these days tends to just not hold up to close scrutiny, and often what appears to be groundbreaking research may just kind of up and disappear with no further trace soon enough, without ever directly or even indirectly leading to a practical new product or a new drug or whatever. I understand that this kind of thing is going to happen, and it may happen quite a bit if you're doing really groundbreaking stuff, but in general this should probably be more the exception than the rule.
But yes, I do understand how the method works, and how the system that's supposedly based on that method works (when it actually does work), and also when it doesn't work and why, and what the motivations might be to try and pretend that it works even in situations when it actually doesn't. And rather than blindly defend it, maybe you try a bit harder to understand how it actually "works", too.
>"If it makes you feel better, you can look around, realize that we're no starving from famine, or dying from minor cuts, or poisoning ourselves with lead, or dying of Polio or Smallpox, and you can try to convince yourself that in the long run, the method works."
Don't you have any more recent examples? Perhaps people used to be doing reproducible science but this got lost somewhere along the way, that's why all your examples are decades old.
>"In fact, many of the initial 50 papers have been confirmed by other groups, as some of the RP:CB’s critics have pointed out."
Something is wrong about this. How did these "other groups" get the protocols while this reproducibility project couldn't? Are they being stonewalled? Did the other groups not actually get the real protocol but instead fiddled around until they got the same result? Unfortunately that claim doesnt have any citations.
Why, apart from expedience, do designers of experiments share an institutional affiliation with the experimenters themselves? I wonder what would happen if the two groups were far apart, and if communication between them restricted to the transmission of an experimental protocol and the reception of raw observations.
I'd honestly say if the study isn't worth replicating then don't fund it to begin with. Its like funding it enough to get some animals or cells into your lab but then not enough for gloves and pipets and computers.
That sounds great in abstract but every study is subject to that logic, including studies studying studies. Which means that every study should be matched by an infinite regression of studies, which is untenable to put it mildly and mathematically fucking ridiculous to put it conservatively.
how do you know that those two groups aren't collaborating for the same result? Or biased in the same direction of the result?
you don't. You either add a 3rd party (4th... 5th...6th... Xth)or accept the conclusion. If you add a third party how do you know they are trust worthy: infinite regression.
Its not about perfection, its about taking the simple, common sense, and time-tested precautions that gave us the technological marvels we enjoy today.
But of course the more independent, even adversarial, the two groups are the better. Like in this story it mentions the "critics" of this project claim other labs (their friends?) have already replicated these studies, apparently using the super secret protocol they couldn't explain to this group.
I'd compare it to using sms 2FA. Is it perfect, no. But it is far, far better than no 2FA at all. And there's nothing stopping you from putting your banking app on the same phone you use for 2FA.
an upvote for you. the so-called 'soft sciences' are even softer when it comes to the rigour in their publications, which is a shame because it tarnishes their whole field. The gaming of academic prestige by publishing publishing publishing is exacerbating this, but so are the new predatory publishers who will accept and publish anything. Lots of interesting churn in academic publishing at the moment.
I don’t get the downvotes, there is some truth here. I even skimmed a doctoral thesis supposed to be in the field of linguistics where the author declared her work was also made as an activist. Too many papers made by too many "academics" are straight made up ideological essays with no grounding in reality nor in science.
Nope—-one came from me, and I’m a systems neuro/CS staff scientist.
Social science is incredibly important. Psychology departments have contributed so much to our understanding of perception and attention, which in turn has paved the way for safe and effective user interfaces, better compression algorithms, and prosthetics. Sociology can tell you how to optimize anything involving groups of people, whether it’s effectively provisioning government services or running an aircraft crew. Linguistics has obvious applications to speech and text processing, education, and even medicine. All this and more, plus you get to learn about what it means to be human! And it’s cheap to boot! Saving one airplane funds a hell of a lot of cockpit management studies.
Sure, there is some crappy social science, but there are also crappy programmmers. The blatant dismissal of whole fields is the worst kind of pseudointellectual snobbery. By all means, call out work that’s subpar, but if you can’t see how social science is both interesting and useful, it’s only because you’re not trying.
Replicating cancer papers: This is one of those ideas which although pointless and impractical, is seemingly impossible to criticize.
Now the leaders of this project are basically saying - we can't do difficult experiments, and can't replicate studies which have already been replicated by others.
What have we learned exactly, except that experimental biology is difficult?
These papers underlie a lot of modern research and treatments. If they aren't reproducible, the papers' results, and other papers that use their results come into question. This is a serious, documented problem [0].
> What have we learned exactly, except that experimental biology is difficult?
The problem wasn't that the project couldn't do difficult experiments, it was that not enough information was provided by the original papers. That's what we learned here.
I disagree with your characterisation of the problem and the proposed solution.
Cancer research is not an edifice constructed on foundational results. It is not like physics in this regard, where for example, accurate calculation of the gravitational constant is vitally important, and over time repeat measurements are generally more precise but not wildly different, or do not dispute the importance of the gravitational constant.
Take one of the papers they reproduced - Transcriptional amplification in tumor cells with elevated c-Myc. This is an interesting result conducted in highly contrived experimental conditions. Nobody is going to base their career, a drug development program, treat a patient or even start a PhD based on this result alone. I say this is a translational cancer scientist and medical oncologist. The contribution of this paper is to our knowledge of the biology of Myc, which has a multitude of actions which are context dependent. Myc is studied in a variety of different ways using many different methods. If this paper were being repeated today, the technologies and techniques would be quite different. The result of this paper is not plugged into the central dogma of cancer biology, setting us down an erroneous path for the next thousand years. So to return to my original question, what have we learned in trying to replicate this study, that we didn't already know? The money would have been better spent on orthogonal validation/extension of the result using modern techniques - another name for this is 'science'. The replication crisis suggests that this routine extension/validation process is somehow less important than going back and repeating the original experiment, which is I think a complete misunderstanding. You also seem to be saying that we should ask researchers to document in excruciating detail all experimental conditions such that a pastry chef or meteorologist could walk into a lab and successfully reproduce the experiment - this is an impossibly high bar to set for scientists who are already working under very difficult conditions, and is not the solution.
>"what have we learned in trying to replicate this study, that we didn't already know?"
It sounds like you think it doesn't matter if that result was published vs "Transcriptional amplification in tumor cells with depressed c-Myc".
If I misread that title replace it with whatever is the opposite result in this case. Anyway, like I said elsewhere if it isn't worth trying to replicate, then the original study should have never been funded. How many of these studies just exist as a "jobs program"?
It sounds like you think it is most of them, in which case great. We can then easily cut out 90% plus of funding current going towards jobs program stuff and devote it to the <10% that is worthwhile...
The effects of Myc have been tested in many different ways since that study. Just type Myc and transcription into pubmed and you can see for yourself.
I'm not sure what you are trying to achieve by making outlandish comments about cutting funding or jobs programs (the idea that cancer research is a jobs program is utterly hilarious!!), but it doesn't really seem you read my comment.
If no one cares whether the authors got it all wrong and "Transcriptional amplification in tumor cells with elevated c-Myc under conditions xyz" should actually be "Transcriptional amplification in tumor cells with depressed c-Myc under conditions xyz", then why was this funded?
If someone does care then it should be replicated.
> If someone does care then it should be replicated.
It is tested in other forms, but isn't generally replicated in the sense you seem to think is paramount. Nor should it be. I'll give you a silly example - would you support a project to go back and replicate electromagnetism experiments performed at the start of the century? Say we do repeat Millikan's oil drop experiment and get a different result (which is actually what happened) - does this mean there is a reproducibility crisis in physics? If we don't repeat the exact experiment, does that mean that Millikan shouldn't have received funding? Why is it that replicating the result the way Millikan did it more useful than doing other related experiments with more sophisticated or different apparatus? The latter is actually MORE useful.
>"would you support a project to go back and replicate electromagnetism experiments performed at the start of the century?"
Yes, of course! That is a great idea. Everyone should be doing this experiment in high school or undergrad science class by now. In fact that seems to be a thing:
If no one cares whether the authors got it all wrong and "Transcriptional amplification in tumor cells with elevated c-Myc under conditions xyz" should actually be "Transcriptional suppression in tumor cells with elevated c-Myc under conditions xyz", then why was this funded?
I disagree. Consider computational biology papers – you should be able to show me code that takes the raw data and turns it into your results. Peer review is a sort of global, public code review in this case. There is a lot of value (education, validation, propagation, sharing, etc), and it's completely practical. Sadly, we (as a field) are nowhere near that, IMO.
I have not said that reproducing a result is pointless or impractical. If you are a computational biologist, you will surely know that there are endless bench marking papers comparing published methods for sequence alignment, variant calling, differential expression, genome reconstruction, phylogeny reconstruction, cell segmentation etc etc. You will also know that if a methodology for producing a result is obscure, it will not be repeated or cited much, and a re-implementation of the method that is accessible will easily over take it. Published data sets are also endlessly re-analysed. If someone publishes a paper that is code heavy without any code, then they get chewed out on twitter or at conferences (rightly so). Top tier journals have also started requiring publication of custom code.
My argument is that a 'reproducibility project' of the kind described above is pointless and impractical. I do not see evidence that this project has taught us anything.
This is truely a plague in research right now. I've came across quite a few instances where authors told me that their experiments weren't reproducible even for them! They do not note this in paper because ultimately everyone needs to show some result for the funding they received (i.e. "published paper"). They obviously never want to share any code, data files, hardware etc. On one instance, author wrote me back that they can't share code with me because they lost all the code because their hard drive crashed! Reproducibility is a fundamental tenant of doing scientific work and this is actively and completely ignored in current peer review system.
I think conference chairs needs to take stand on this. We get now 4X to 8X papers in tier 1 conferences. Reproducibility could be a great filter when area chairs are scrambling to find reasons to reject papers. Sure, there will be papers where very specialized hardware or internal infrastructure of 10,000 computers were used. But those papers would be great for Tier 2 conferences.