Chart of the Decade: Why You Shouldn’t Trust Every Scientific Study You See

epistasis · on Nov 8, 2018

I 100% agree with the headline; especially for research papers even more than clinical trials.

However, I'm a bit puzzled by the weird direction the journalist ran with this, which is straight to his preconceived notions that are not that supported by the data he's looking at.

But there's a bit more to this than just that one chart. In addition to self-correction (e.g. beginning to require pre-registration of trials), science is somewhat additive. Is it not possible that the low-hanging fruit had been found earlier, in the 1970s-1990s, and the problem got harder? With the advances in treatments for cardiovascular disease couldn't the problem just be harder?

And look at that "harm" study. Turns out it's one of many from the Women's Health Initiative [1], and it's the test of whether hormone therapy for post-menopausal women causes heart problems: that is, it's not a test of a drug to prevent heart conditions, but a test of side effects.

How many other of these studies are like that: studies about effects on the heart, not trials for new drugs to treat the heart? How did the ratio change before and after 2000?

In any case, don't trust every popular news article you read about science, particularly if it's written by Kevin Drum and posted on Mother Jones.

[1] https://www.whi.org/SitePages/WHI%20Home.aspx

saalweachter · on Nov 8, 2018

I'll go one stronger and say "You Shouldn't Trust Any One Scientific Study You See".

Individual studies can be really interesting. They're important for researchers to know about to inform their future work. But any one study - even ones that are done honestly, with good methodology and sound foundations - can be just totally wrong. There could be confounding factors you couldn't have known about that completely invalidate the result. Your test subjects could be unusual in some way, your animal models could be a poor analogue for humans in this particular case, you could have just had really aberrant statistical flukes in your statistic sampling.

It's the body of scientific research, the dozens, hundreds, thousands of studies stacked on top of each other that bring certainty.

GiuseppaAcciaio · on Nov 9, 2018

^ this. Especially on highly sensitive topics like medicine, I tend to be cautious when reading about the findings of any single study; whenever someone tells me "there's a study that says [flavour of the day]" my immediate reaction is to go see if there's a Cochrane meta-analysis :)

KineticLensman · on Nov 9, 2018

Yes. Science can be an adversarial process as people converge on the truth. Like a legal trial, you shouldn't take out of context any single statement from a defence or proceution lawyer as representative of the whole truth

saalweachter · on Nov 9, 2018

Or it’s like seeing one great play in the middle of the game. It may be a beautiful play, it might have scored the team you’re rooting for a point, but until you know the whole game you don’t know who won.

gitgud · on Nov 9, 2018

That is a great analogy, just like scientific studies a defence or prosecution can argue to win, but in the end it's usually the evidence that swings the case...

jacksnipe · on Nov 8, 2018

I think you meant “shouldn’t”.

prakashk · on Nov 8, 2018

> Is it not possible that the low-hanging fruit had been found earlier, in the 1970s-1990s, and the problem got harder?

The problem of "problems getting harder" is a continuous phenomenon. Why would that be the case suddenly after 2000, and not before?

> However, I'm a bit puzzled by the weird direction the journalist ran with this, which is straight to his preconceived notions that are not that supported by the data he's looking at.

Is it possible that your own preconceived notions about the author and the publication may have caused you to judge this way?

In any case, don't trust every comment posted on Hacker News (including my own) :)

Natsu · on Nov 8, 2018

> The problem of "problems getting harder" is a continuous phenomenon. Why would that be the case suddenly after 2000, and not before?

Well, there was that whole dotcom boom and a lot of things changed for computers & the internet which led to researchers being able to share more information, use more powerful computer techniques, etc.

cinquemb · on Nov 9, 2018

>Well, there was that whole dotcom boom and a lot of things changed for computers & the internet which led to researchers being able to share more information, use more powerful computer techniques, etc.

In my experience working with/contracting for neuro labs, a lot of researchers don't really know how to fully leverage the technology that's available, and often rely upon proprietary tools they have limited knowledge of, which doesn't bode well for being able explore for themselves.

The few that I have met that can push the limits of current technology are working in labs ran by the above…

I'm not sure how it is in other fields, but in convos from some other commentators on HN over the past years, makes me think this is not just in neuroscience.

Maybe the problem is that the skills needed to explore the solution space and communicate it effectively have gone up because the complexity it has added to the process without research labs/academia addressing the gap sufficiently? I don't think this is a problem with just labs or academia though, not many people in general have the skills to be able to leverage technology to it's fullest for even the most banal tasks.

Natsu · on Nov 9, 2018

I think it's one of those things where a few point shift in the average of the distribution really changes some numbers near the edges of it, though.

epistasis · on Nov 9, 2018

I don't see that chart supporting a sudden change. The time point could be moved quite a bit and tell the same story.

Also I had not heard of Kevin Drum before this, and had a positive view of Mother Jones. I'm left with a poor impression of Kevin Drum and a hit to Mother Jones' reputation after reading this.

dbt00 · on Nov 9, 2018

You could go read the original paper and examine the actual data and scientific conclusions drawn from the underlying data as represented on that chart.

You could look for confirmation or disconfirmation of the hypothesis that preregistration leads to an increase in null results in other data sets.

Both of those seem like more useful ways to dispute this finding than squinting at a graph and finding fault with a blog post headline.

epistasis · on Nov 9, 2018

I'm not sure if you meant to reply to me, because every aspect of your post is wrong.

The "squinting at the graph" was assuming that there's a sudden change. I read the paper, and looked up several studies and came to the conclusion that the paper was being misrepresented by the blog text. And I agreed with the headline, but not with the blog text.

stilley2 · on Nov 8, 2018

I would say that not all research needs to be held to the pre-registering standard of clinical trials. Some research is more exploratory, and should be interpreted that way.

AlexCoventry · on Nov 8, 2018

Exploratory work is extremely valuable, but less trustworthy ipso facto.

ianbooker · on Nov 8, 2018

That is right, And wrong.

There are explicit exploratory methods you can use to carry out research and that is very fine, but most research today is done using an inference model, where you formulate a hypothesis and use data to corroborate it.

That is one of the premises of big data that most people do not get: It enables you to do research explicitly exploratory again, while still being rigorous and somehow falsifiable..

stilley2 · on Nov 8, 2018

What specific exploratory methods are you referring to?

shoguning · on Nov 8, 2018

Agreed, further:

>every significant clinical study of drugs and dietary supplements for the treatment or prevention of cardiovascular disease between 1974 and 2012

There is room for a lot of bias when selecting which studies are 'significant' and on topic. Not to mention deciding which metric to report from the study.

capnrefsmmat · on Nov 9, 2018

Their procedure was carefully documented, including the inclusion criteria and how exactly they categorized the studies. They even had two teams independently search the literature to find studies meeting the criteria: https://journals.plos.org/plosone/article?id=10.1371/journal...

Literature reviews in meta-analyses are usually conducted like this, with specified lists of search keywords and flow charts of inclusion criteria.

TangoTrotFox · on Nov 9, 2018

Problems getting harder is not really a valid explanation for the current state of science, or any particular field. At any given glimpse of society, their current state of the art is the best they've managed with all they know. Further progress is always harder, even when we now think of it as something that should have been easy. For instance the basic concept that things happen for a reason is something everybody understands now, even only a completely intuitive level - yet it was one of the breakthroughs of the Greeks. And Newtonian mechanics is also something people tend to understand quite intuitively, yet following the Greeks it would take the better part of two thousand years for people to consider that e.g. objects in motion stay in motion unless acted on by something else. Artistotlean mechanics held that continued motion required continued force. Artistotle seemingly did not consider friction, air resistance, and so on that cause objects to slow. Or just consider the tens to hundreds of thousands of years to develop writing, or to enter into the iron age, and so on.

There's some argument to be said that in times past there were fewer "scientists" working on these topics, explaining the delays. But I don't think that's fair. The reason I put "scientist" in quotes is because many of these things have minimal prerequisites and the average home would have all that's needed to experimentally test and discover these concepts. And their immense value, far beyond the academic, meant their discovery likely would have been able to rapidly spread regardless of the origin - so that, at least to some degree, precludes authority as a necessity.

Many things that now constantly elude our ability to understand, perhaps dark matter is a great example, will likely one day be child's play.

bad_user · on Nov 8, 2018

> With the advances in treatments for cardiovascular disease couldn't the problem just be harder?

I'm pretty sure we have no significant advances in the treatment of cardiovascular disease.

And given the alarmingly climbing rates of chronic diseases, e.g. cardiovascular disease, diabetes, obesity, cancer, I'm also pretty sure that the health care industry has done more harm than good when it comes to chronic diseases.

For everything else, for problems they can measure and treat with a pill, sure, I'm no anti-vaxxer, but they failed hard at chronic diseases, promoting cures and guidelines that did more harm than good.

epistasis · on Nov 8, 2018

Definitely not accusing you of being an anti-vaxxer, but I would love to hear your sources for some of your statements, because they contradict the data that I've seen. Searching now again, everything I'm finding is showing really big improvements in mortality from cardiovascular disease.

Recent US outcomes: https://jamanetwork.com/journals/jamacardiology/fullarticle/...

US & New York State outcomes 1980-2007, Figure 2: https://www.health.ny.gov/diseases/cardiovascular/heart_dise...

I haven't hit the major mortality sources, but these bits match everything else I've heard.

I'd also like to know the basis for asserting that the health care industry doing more harm than good.

JauntyHatAngle · on Nov 8, 2018

>And given the alarmingly climbing rates of chronic diseases, e.g. cardiovascular disease, diabetes, obesity, cancer, I'm also pretty sure that the health care industry has done more harm than good when it comes to chronic diseases.

I mean, can't this be attributed to other things too? (The way people live has changed in the past 20 years...)

I'd rather see some studies to show if/why those things have spiked rather than disavowing attempts to help it.

What else are we supposed to do?

icelancer · on Nov 9, 2018

>> In any case, don't trust every popular news article you read about science, particularly if it's written by Kevin Drum and posted on Mother Jones.

Very good advice. Over 95% of the crap posted on Mother Jones should be ignored.

icelancer · on Nov 9, 2018

"Before 2000, researchers cheated outrageously. They tortured their data relentlessly until they found something—anything—that could be spun as a positive result, even if it had nothing to do with what they were looking for in the first place. After that behavior was banned, they stopped finding positive results. Once they had to explain beforehand what primary outcome they were looking for, practically every study came up null. The drugs turned out to be useless."

This is a ridiculous "plain English" description of what is happening here, and I say that as someone who is regularly very critical of academia, drug trials, and research science (I've lived it; check my bio).

Clinical trials and mandatory registration are great things. It does not mean that researchers were massively cheating in the past, however - it means that they were finding secondary and tertiary findings and reporting them instead of the main investigative thrust of the research. Yes, some blatant cheating happened, as did p-hacking (though this problem still exists), but to act like the clinical registration database completely stopped a massive ring of fraud is ridiculous and the data does not support that, merely a conspiratorial narrative around it.

caymanjim · on Nov 8, 2018

I'm cynical and have no trouble believing that results were twisted immensely prior to the rule change in 2000. The incentives are huge: profit, prestige, or simply job security.

I do wonder if this chart misrepresents something, though: there are studies that produce incidental--but genuinely valuable--discoveries. It's unclear to me if that accounts for the pre-2000 results or not. With the new rules, would there have to be another study stating the new objective?

I'm not questioning that bad science occurs, but I am questioning what this graph really tells us.

scotty79 · on Nov 8, 2018

> With the new rules, would there have to be another study stating the new objective?

I think it's only fair to force you to replicate at least once the positive result you think you see in the data you collected for another purpose before you can claim you got something.

Balgair · on Nov 8, 2018

Yes, the linked article is saying that researchers used to employ techniques like 'p-hacking' (among others) in order to report results that were favorable/novel. That the scientists and clinicians in charge teased the data too much.

rdiddly · on Nov 8, 2018

You're right, the new rule doesn't filter out dishonest discoveries, only incidental ones (which may be dishonest). If you wanted to show benefit from an incidental discovery, you would have to run another study with that as its declared objective.

The author is kind of doing the same thing here that they accuse others of doing...

smrtinsert · on Nov 8, 2018

Clinical trials != scientific studies in general. Motherjones is doing a great disservice implying equivalence.

function_seven · on Nov 8, 2018

Aren't they equivalent in their potential to allow post-hoc data mining?

Whether it's a clinical trial, or a psychological study with n participants, or a look at how changes in x soil conditions affects the y tree. Anything where data samples are taken and statistically analyzed is prone to p-hacking and after-the-fact hypothesis changes.

This example happens to be for a narrow set of studies: clinical trials for cardiovascular drugs/supplements. The take away from that graph applies broadly.

L_Rahman · on Nov 9, 2018

And in fact the replication crisis in psychology suggests that the practice goes beyond just clinical trials.

gaius · on Nov 9, 2018

The replication crisis in every field of science. Two-thirds of reported results cannot be reproduced.

dwd · on Nov 9, 2018

If you haven't read it already, this article on the placebo effect a few days ago was interesting in how we on one hand use it as a baseline for measuring the effectiveness of drug treatments but actually can vary wildly between individuals. It's not surprising that results are so inconsistent.

https://news.ycombinator.com/item?id=18398342

geggam · on Nov 8, 2018

Explain that to the layman in a way the entire Internet will understand.

Until you can ... they are for all intents and purposes.

radus · on Nov 8, 2018

I'm a graduate student in the biomedical sciences. I perform scientific studies and occasionally publish in peer reviewed journals. I am not a clinician, and I do not run clinical trials.

Clinical trials are not basic research, though they rely on a great body of basic research, and are themselves experiments. Requiring that clinical studies have clearly defined end points serves the purpose of ensuring that the results are robust and well understood. This is important because the end goal of a clinical trial is developing a treatment of some sort that will go into many humans, and mistakes can be very harmful.

Choosing an appropriate endpoint for a clinical trial can also be very challenging. Say for example you're trying to advance a drug candidate for treating cardiovascular disease. You have many choices for how to measure that - chest pain, resting heartrate, cholesterol etc. It's very much possible that your drug does improve cardiovascular health, but your trial can fail because you chose the wrong endpoint.

When I perform an experiment, I do so with a hypothesis which is sometimes proven right, and sometimes proven wrong. Either way, the results can be interesting and form pieces of the large puzzle that is a scientific study. Further, questions that are addressed in a scientific study are generally going to be more broad than those addressed in a clinical trial - we don't really do exploratory clinical trials, but exploratory research is a very important part of science. For these sorts of studies, it is often difficult or impossible to have a well defined endpoint such as is required in a clinical trial. One example for you: I hypothesize that enzyme X has function Y. During my studies, I discover that enzyme X actually has function W! If the evidence I present for enzyme X having function W is solid, my results can still be great and useful, even if my initial hypothesis was wrong.

jmts · on Nov 8, 2018

"Clinical trials are a type of scientific study. The studies shown in the article are of a particular type, and don't represent all studies, much like cows don't represent all farm animals. There may be similar cases across the remainder of science, but this case cannot be used to show that this is true, in the same way cows producing milk cannot be used to show that chickens produce milk.

"A reasonable takeaway from this article is that any one scientific study may contain flaws, or show bias, and therefore there is a possibility for it to be confirmed, improved, or disproven in the future. As a result, it is better to take early results with a grain of salt until additional supporting evidence is found, than to take them as gospel.

"It should be noted that these results are due to flaws in the scientific establishment, flaws in the human application of science, and flaws in the humans themselves, but not the scientific method itself. The scientific method itself is sound and can be trusted. You use it every day to see whether the water in your shower is too hot or cold, or whether it is raining outside."

profalseidol · on Nov 9, 2018

Maybe we should have some good metric showing how small the sample size is compared to the whole. And maybe make it a rule to add 'Approximation' in the title if it uses that.

pimmen · on Nov 9, 2018

Possible sources of skew is more interesting. Larger, skewed samples can be far worse than smaller, randomized ones.

Fomite · on Nov 9, 2018

This is the source of a classic exam question given by Ken Rothman, an epidemiologist:

As sample size goes up, the probability that an estimate's confidence interval contains the true value of an effect goes..

A) Up B) Down

pimmen · on Nov 9, 2018

Thanks for the reference! Very interesting Twitter discussion that sparked. :)

gmfawcett · on Nov 8, 2018

Hardly equivalent "for all intents and purposes." The purposes of the layman are nothing like those of the research community.

gaius · on Nov 9, 2018

The purposes of the layman are nothing like those of the research community.

The "laymen" are the ones funding all of this through their taxes. I'm sure scientists would love it if the plebs just shut up and kept on giving them money, but it doesn't work like that. In fact that is why it is becoming harder and harder for scientists to communicate with the public to affect policy.

geggam · on Nov 8, 2018

If you put it in the public domain it is your responsibility to ensure it is digestible by the audience you publish it to.

So yes... public posted information should be layman ready

radus · on Nov 8, 2018

Do you apply this reasoning to articles or comments discussing kernel development? Some topics have higher irreducible complexity than others.

More importantly, I can make whatever I want public, and I can do so with whatever audience I have in mind.

geggam · on Nov 8, 2018

Interesting.

So inciting panic online is fine but not in a theater ?

metaphor · on Nov 9, 2018

Your remark finds difficulty in satisfying clear and present danger.

geggam · on Nov 9, 2018

Vaccine studies

metaphor · on Nov 9, 2018

...are of a certain irreducible complexity which undergo a process of peer review prior to publication in scientific journals, and rightfully presume a minimum prerequisite background for proper interpretation? I'm sure the topic and its meta are interesting to the professional audience in which they were intended for.

Intermernet · on Nov 9, 2018

Emphatically, no, it's not.

Putting something in the public domain allows other people with knowledge in the field to use the information. It doesn't mean that the lay-person should be able to digest it.

The public domain isn't some lowest common denominator clearing house of information, it's just public.

Udik · on Nov 8, 2018

If anything, clinical trials should be more reliable than the average scientific study: after all they are experiments, with well defined numbers (sample size, effects measurement, controlled conditions). Compare with all non-experimental science- including for example most environmental and climate science, where, if experiments are made at all, the results are wildly extrapolated and generalized.

dwighttk · on Nov 8, 2018

If humans are the equivalent of atoms, or molecules, or rockets, or planets or whatever other scientific thing you are studying.

superjan · on Nov 8, 2018

One could argue that this chart too is an example of post-hoc data mining. I still find the effect plausible, though.

ineedasername · on Nov 9, 2018

Post-Hoc or not would depend on what hypothesis the researcher had before looking at the data. We don't really know, or at least I couldn't tell from the article.

certmd · on Nov 8, 2018

While many on this forum may understand why what this analysis shows is not good science, the article does a poor job of explaining why finding positive results one wasn't originally looking for may not be reliable.

I can imagine a non-statistically minded person thinking "So what it's not what they were looking for originally? We're missing positive findings now. This is a terrible regulation." When in reality these "positive" findings were p-hacked to meet minimum criteria for statistical significance and likely arose by chance since luck would have it that in any study there will be some set of data that by chance is a statistical aberration.

rossdavidh · on Nov 8, 2018

While some have pointed out that this very article could be considered post-hoc data mining, I think it is not. This exact kind of affect was what was intended by the requirement to pre-register. Looking afterwards to see if the intended affect (change in how many studies report benefits) appears to have happened in reality, makes perfect sense. They didn't institute the requirement at random, and then later consider that maybe it could have an impact of reducing (perhaps spurious) findings of significant benefit; that was more or less the only reason for doing so.

LanceH · on Nov 8, 2018

The article claims the drugs are worthless. The studies only show they aren't successful in a single pre-registered result.

DrNuke · on Nov 8, 2018

From a science perspective, positive results are overrated; the most important aspect is to investigate a “good” question. Researchers understand “good” has not the same meaning everywhere or to every stakeholder. More at large, blurred journalism is attacking science to undermine the scientific method, aka meaningful questions + proper scrutiny + full transparence. This is a sign of the times, though.

brownbat · on Nov 9, 2018

In our world, it's considered malpractice if you have the option to include double blinding in your study design, but opt not to for convenience.

In some alternate universe, it is considered malpractice for those who design the study to be the same group that runs the study.

I don't think we can get there from here, but if we had a core track of theorists who designed studies, and a second equally prestigious track of practitioners, who independently tested and ran studies, experimental science would be much more rigorous.

Your prestige should be tied to your ability to identify novel experiments to try, or in rigorous testing procedures, never tied to your ability to shape data to make your claims appear grand.

qyz721 · on Nov 9, 2018

I often wonder about the kind of research done in computer science--how much is it influenced by the kinds of things that easily get you a Ph.D., versus the kinds of things that are useful but less apparently flashy.

A lot of the PL students at my school are extremely wary of doing any follow-up work on ideas that have been published before, even if the implementations of those things are obviously shoddy and don't really demonstrate that the idea works. There's a lot of novelty chasing, which is part of what's pushing people to include deep learning in their work, since it often allows them to claim novelty, even if their results aren't very good.

SomewhatLikely · on Nov 10, 2018

Another big influence is what problems can attract funding.

qwerty456127 · on Nov 8, 2018

> Then, in 2000, the rules changed. Researchers were required before the study started to say what they were looking for. They couldn’t just mine the data afterward looking for anything that happened to be positive. They had to report the results they said they were going to report. And guess what? Out of 21 studies, only two showed significant benefits.

Why is this considered good? Isn't this just a counterproductive limitation? Significant benefits found without knowing what are they going to be in advance are still significant benefits and if they are observed scientifically and proven reproducible I'm glad we've found them.

> Once they had to explain beforehand what primary outcome they were looking for, practically every study came up null. The drugs turned out to be useless.

Aren't newly discovered drugs meant to undergo strict and targeted clinical trials? How can they even be considered being drugs before this? And how can they turn out to be useless after passing this stage?

Also in some cases when nobody wants to fund clinical trials despite very interesting life-enhancing effects supposed or when it's clear the time to general availability through the fully white research and approval chain is going to take longer than people want to wait some non-approved substances happen to be sold on eBay (or, for more questionable substances, on black market), hundreds or thousands of people buy it and report their experience on reddit and this data can be a source for further clues for research.

abrichr · on Nov 8, 2018

The problem is that given enough data, you can always find patterns. Without a causal relationship, however, those patterns are likely spurious.

For a more in-depth analysis, see "The Deluge of Spurious Correlations in Big Data": https://www.di.ens.fr/users/longo/files/BigData-Calude-Longo...

Here's the abstract:

Very large databases are a major opportunity for science and data analytics is a remarkable new field of investigation in computer science. The effectiveness of these tools is used to support a ‘‘philosophy’’ against the scientific method as developed throughout history. According to this view, computer-discovered correlations should replace understanding and guide prediction and action. Consequently, there will be no need to give scientific meaning to phenomena, by proposing, say, causal relations, since regularities in very large databases are enough: ‘‘with enough data, the numbers speak for themselves’’. The ‘‘end of science’’ is proclaimed. Using classical results from ergodic theory, Ramsey theory and algorithmic information theory, we show that this ‘‘philosophy’’ is wrong. For example, we prove that very large databases have to contain arbitrary correlations. These correlations appear only due to the size, not the nature, of data. They can be found in ‘‘randomly’’ generated, large enough databases, which—as we will prove—implies that most correlations are spurious. Too much information tends to behave like very little information. The scientific method can be enriched by computer mining in immense databases, but not replaced by it.

monktastic1 · on Nov 8, 2018

Are you familiar with the multiple comparisons problem[1]? The problem is that in any sufficiently rich dataset, you can find something unusual-looking, and if you're not straightforward about how much digging you had to do to find it, it will look much more special than it actually is. So if a benefit is found by chance, that's fine, but it should only motivate further research which specifically look for that effect.

[1] https://en.wikipedia.org/wiki/Multiple_comparisons_problem

qwerty456127 · on Nov 8, 2018

> but it should only motivate further research which specifically look for that effect

Indeed. But this doesn't disqualify looking for "something" as a valid and useful method of research to be a stage of the whole research chain. That ought to be allowed although research papers produced this way should make this clear.

lolc · on Nov 9, 2018

It's called exploratory research and everybody is fine with it when it's labeled as such.

bgaragan · on Nov 8, 2018

I'm curious about this as well. Assuming the effects they found were actually legitimate (that is, the alleged "torture" of the data was statistically sound), how is this at all a bad thing? Shouldn't it be safe to assume that they could simply repeat the trial with a different stated objective and successfully yield the positive result? It seems counterproductive indeed to waste time and money like that.

I'm having a hard time seeing the problem with taking an exploratory approach and just testing placebo vs. some treatment and reporting whatever you find.

Please correct me if I'm misguided on any of this.

nkurz · on Nov 9, 2018

I presume you are being genuine, but you should know that your comment is almost indistinguishable from satire. More specifically (and hopefully more helpfully) it sounds like the comments made by Brian Wansink before his recent fall. Google for his story if you are unfamiliar.

Assuming the effects they found were actually legitimate (that is, the alleged "torture" of the data was statistically sound), how is this at all a bad thing?

Assuming that the effects are legitimate is exactly the problem. You are right, if we somehow know that the effects found are real and reproducible, then all is good. The problem is that we almost never know this. Presumably what you mean is "If the results are reproducible, what's the harm?". I'd agree with this, but the problem is how to know ahead of time that the results are going to be reproducible.

Shouldn't it be safe to assume that they could simply repeat the trial with a different stated objective and successfully yield the positive result?

If they did the statistics correctly (accounting for the multiple inferences, all assumptions about iid data met, no biased dropouts, everything else aboveboard) and got a really solid result, then yes, it's theoretically likely that the results would be reproducible. The problem is that they almost certainly didn't do the statistics correctly, and intentionally or not they probably violated a lot assumptions. In too many cases, even the main line conclusions can't be replicated. It's rarely "safe" to assume that an effect is real until it's actually been replicated, and almost never safe to make this assumption for result obtained by sifting the data after the fact looking for correlations.

I'm having a hard time seeing the problem with taking an exploratory approach and just testing placebo vs. some treatment and reporting whatever you find.

There are ways to do it this way, but it usually takes larger sample sizes than are available. It also requires "bespoke" statistics that are easy to get wrong. In practice, it's usually better to use the incidental "results" as idea generators for future experiments, rather than assuming that the findings are real and don't require further testing.

bluGill · on Nov 8, 2018

Because when you analyse a lot of data after the fact you will find something. It is really easy/cheap to measure things not part of the study. If you look at 100 things and find 5 that are interesting with only a 5% chance of being a false positive they are all false positives because you expect in a sample of 100 to find 5 false positives.

Analysis after the fact isn't useless. However the bar of significance needs to be much higher. You need a much larger sample size, or better yet design a new study (controlling for things the original didn't) to draw conclusions.

Note, technically statistics do not expect to find 5 false positives in 100. It is close enough for discussion and makes intuitive sense. If you want the real truth be prepared for a lot of math.

xyzzyz · on Nov 8, 2018

"Significant" here doesn't mean what you think it means, but rather it has a technical meaning. Basically, it means that if the intervention has absolutely no effect, then there's less than 5% probability that measured "benefit" is just a random fluke -- thus, since you actually measured benefit, you are now left wondering whether the intervention actually has an effect, or you just happen to be in 5% of possible universes where the randomness just happened to align this way. Of course, the first one is usually more likely, but 5% probability of getting result if no effect actually exists is still pretty high.

The simplest, most straightforward explanation for the mechanism, and why it's bad, is this:

https://xkcd.com/882/

In the old scheme of things, the researchers would have found and reported "significant" bad outcomes connected to green jelly beans. With required preregistration, they must preregister 20 studies, so if 19 similar studies show no effect, and 1 of them shows "significant" effect, then it's likely that you're just dealing with random fluke.

qwerty456127 · on Nov 8, 2018

> since you actually measured benefit, you are now left wondering whether the intervention actually has an effect, or you just happen to be in 5% of possible universes where the randomness just happened to align this way.

That still seems a good first step. It may also be curious to find out what were the things that have actually lead to the effect if not the one we were checking. E.g. it may happen that it the substance researched only affects subjects with particular features or that something else they were doing has produced the result on itself and knowing any of these still seems useful (some really important discoveries were made this way AFAIK).

xyzzyz · on Nov 8, 2018

> It may also be curious to find out what were the things that have actually lead to the effect if not the one we were checking.

This sounds like a good idea in theory, but in practice, it is usually even harder to answer this question than the original question that yielded the data. This type of question requires extremely careful study designs, large sample sizes, accurate power estimates, etc.

ineedasername · on Nov 8, 2018

It's not a counterproductive limitation. In fact it should lead to more research: In this type of clinical experiment, if your data happens to show a significant result that wasn't in the pre-planned outcomes, the proper process to follow is to formulate a second course of research designed around validating that new hypothesis.

BurningFrog · on Nov 8, 2018

That's a nice theory. I worry that in reality a lot of those second research projects will never happen.

ineedasername · on Nov 10, 2018

You're right, it may not have had that result. But I think if the pharmaceutical companies had potentially significant results there would be a strong financial incentive to follow up with a another targeted study. It would be interesting to test, look and see whether the average number of trials per unit of time, probably year, increase after this change was made. I'm curious, but not enough to actually do the work :) Always more interesting question out there than time to follow up on them.

amirmasoudabdol · on Nov 8, 2018

Although I agree with the argument but their chart and their data actually do not represent what they are claiming. I’ve checked their paper while back [1] and contacted them about it but haven’t heard anything back. Their argument is sound but only because!

[1]: https://amirmasoudabdol.name/likelihood-of-finding-null-effe...

fizzigy · on Nov 8, 2018

The article seems to suggest that scientific studies aren't reliable, but fails to point out the key takeaway that now (post registration requirement), studies are much more likely to be valid/useful than before.

So the headline really should read something like "Why you should trust scientific studies a lot more now than you did before"

nemo44x · on Nov 8, 2018

Good science has its methods published, is reproducible and is peer reviewed.

Another, perhaps better, headline would be: "Why You Shouldn't Trust Every 'Study' You See".

jacksnipe · on Nov 8, 2018

No True Scotsman. In fact, I’m pretty sure this is just another form of post-hoc reasoning.

cup · on Nov 8, 2018

The downside of this though is that unexpected positive results sometimes get buried. For instance, clinical trials have to report all adverse effects but not positive effects. We've had drugs tested for symptom X fail to have any effect, while causing bald people to have their hair grow back. Yet the company we were testing it for wasn't interested and those results never made it into the public domain.

rickycook · on Nov 8, 2018

i would assume that the positive results still get reported somewhere in the paper? so other people could still pick up on it in the future.

if they’re positive enough then the drug company will fund another study with that as the primary outcome. that seems prudent too: it should be the primary thing you’re examining so that you can design the study correctly, rather than a simple “oh and by the way” side note

jonathankoren · on Nov 8, 2018

That wouldn't happen. You would simply declare, that you want to check if the drug does cause hair growth, and then run a test explicitly looking for that.

Unless you're looking for the effect from the outset, you can't be sure that what you saw wasn't actually random. There's a term for what you're describing, it's p-hacking, and it's explicitly the very thing declaring what you're looking for before you run the test is designed to prevent.

Like many science related topics, there's a XKCD about p-hacking that describes a similar scenario to your example: https://xkcd.com/882/

corysama · on Nov 8, 2018

This would seem to indicate that the results of drug research are unpredictable. Therefore, if you are required to predict your results, you are set up to fail.

My question is: Was the free-range research actually effective? Or, was it "technically-correct effective just so you can't call me out on a failure, moving on..."

whack · on Nov 8, 2018

Is it possible that prior to 2000, the studies that found a null result, were simply not published? Hence why they are underrepresented in that chart? My understanding is that it's very hard to get null results published in scientific journals, though I'm not sure if that applies to clinical studies as well.

Fomite · on Nov 9, 2018

One interesting effect that pre-registration has is that there are now several journals that allow you to publish your protocol, and agree to publish the followup paper about the results regardless of the outcome.

thedirt0115 · on Nov 8, 2018

This is very similar to A/B Experimentation -- Gotta force the user to select a primary metric BEFORE the experiment starts to keep 'em honest.

sudofail · on Nov 8, 2018

I have a general rule to ignore any statistical interpretation in articles like these. Data interpretation is hard. There are a lot of variables that need to be accounted for, and under the best circumstances of academic peer review, we still get it wrong.

This article also feels like it has an agenda. Maybe I'm not familiar with MotherJones, but the tone of the article strikes me as unprofessional. And the headline, that's obviously correct and means nothing. It's like saying, "Why You Shouldn't Trust Every Stranger You Meet".

leesec · on Nov 8, 2018

Reminds me of this old classic, one of my favorites:

http://slatestarcodex.com/2014/12/12/beware-the-man-of-one-s...

ineedasername · on Nov 8, 2018

I'd say a good rule of thumb is that the smaller the number of observations, the less you should trust any research that hasn't been reproduced. And when it comes to publication of false positive results, always remember your XKCD: https://xkcd.com/882/

taruz · on Nov 8, 2018

I want a chart like that but for journalism.

njstraub608 · on Nov 8, 2018

I'd have to agree with this; every news outlet tries to treat a lack of proper controls as a breaking story / massive conspiracy theory to get a buzz going around the topic. I think it's important to inform people but unfortunately, most people would rather buy into the idea that there was malicious intent rather than also evaluating the possibility that, like most evolving business areas, it takes time to build in the proper procedures to attain measurably objective results. This happens in every industry and isn't unique to science, however, in this case, the physicians recommending these drugs to their patients should have been able to read between the lines or at the very least have a managed treatment plan to evaluate the efficacy on those they're treating.

clircle · on Nov 8, 2018

I don't follow this comment. What would you plot on the vertical axis?

nostrademons · on Nov 8, 2018

Clickthroughs, presumably.

I assume that taruz is saying that journalists should be required to say what they're investigating before publishing an investigative report. Right now, they start investigating, and if there's something outrageous (even if it wasn't what they were initially looking for), they publish. Sometimes they even skip the "start investigating" part and just put up a tip-line for anyone who has a beef to get a story out there.

This is great for manufacturing outrage and hence clicks, but it gives the public a hugely skewed perspective on how the world is. Imagine that 0.01% (1 in 10,000) of all peoples' actions are outrageous and will piss off a large portion of planet. Most people, by those numbers, would say that the majority of folks are decent, law-abiding citizens. Now imagine that a news outlet is allowed to freely go over someone's life, and they end up evaluating 1000 actions. There's a 10% chance that they'll find something outrageous. Now imagine that 10 such reporters do this to 10 people, and if any one of them finds something publishable (= outrageous), they go to press. Suddenly there's a 64% chance that one of them will find something, and you've likely got your news cycle for the day.

With the millions of people looking for something bad that a tip-line can generate, outrage is virtually assured. And that's where journalism is today. The world isn't actually a worse place than it was in 1980; in fact, by most metrics it's significantly better. But we've increased the amount of unpleasantness that people can be exposed to by 3-4 orders of magnitude and then implemented a selection filter that ensures that only the worst stories go viral. Of course we get only bad news; that's all that's economically viable, and we have such a large sample size that we can surely find it.

jernfrost · on Nov 9, 2018

That is a natural outcome of the hyper capitalist world we currently live in. I have a mother who has been a journalist for over 40 years. I've heard for years about the pressure sell. Article titles often get changed against the will of journalists to be more click-bait like to sell more.

As people pay less for good journalism, and ad revenue is shrinking, the media is getting desperate to stay afloat.

Journalists today have significantly less independence than they had 15 years ago. Everything is far more controlled and geared towards sales.

I would like to see more alternative finance models for quality journalism.

Given how social media is actually starting to destroy democracy, it may be worth considering government grants to independent media organizations as a part of national defense. A democracy cannot function if the media is utterly broken. That means citizens are no longer capable of making informed decisions.

teddyh · on Nov 9, 2018

There’s also this to consider:

“If you give me six lines written by the hand of the most honest of men, I will find something in them which will hang him.”

– Commonly attributed to Cardinal Richelieu

I.e. any fact or action can be willfully misinterpreted to fit almost any narrative.

rossdavidh · on Nov 8, 2018

Good idea, but you'd have to have an objective way of deciding how accurate the article was.

monochromatic · on Nov 9, 2018

>Post-hoc data mining is very bad, except when you’re gunning up evidence of anthropogenic climate change, which is very good

noname120 · on Nov 9, 2018

For the record, I believe that you didn't get downvoted for suggesting that there is post-hoc data mining happening around anthropogenic climate change. Instead, it's because you're providing no evidence for this and it's in addition offtopic.

jernfrost · on Nov 9, 2018

I down voted because this is (1) whataboutism, (2) I can't see the analogy.

That increased CO2 emissions should cause a temperature increase has been proposed over 100 years ago. They did not look at climate change recently and suddenly decide "lets blame CO2".

(3) the man made climate change theory, does not rest of a single study, it rests on countless ones.

(4) There is actually a scientific model explaining why we see the measurements we get. They are not digging around at random.

The case for man made climate change is pretty strong. Look at these graphs and tell me which graph correlates most strongly with global temperature increases: https://www.bloomberg.com/graphics/2015-whats-warming-the-wo...