Hacker News new | past | comments | ask | show | jobs | submit login

I agree with the other commenter, and Gelman's blog is a good place to start. There are very clear, concrete statistical arguments, but they are difficult to summarize in the comments section of HN.

The gist is like this. Imagine that someone had people play 20 different slot machines. Then they went into a private room and looked at the results for each machine. After, they come out with the results of 5 of the machines and say, "look, our slot machines pay out at a higher than chance level!".

Do you believe them? I hope not. If only 4 machines had done well, maybe they would have shown only 4, or 3, etc.. They've effectively stacked the deck.

On the other hand, suppose someone said they were running an honest experiment with 20 slot machines. How many machines would you expect them to report on?




I can follow your analogy, but it doesn't quite add up.

The researchers took a group of people and submitted each of them to an "extensive neuropsychologist test battery".

That's not just testing 20 different slot machines, that's testing 20 different designs of slot machines.

(Also, each person does each test on their own, which is the equivalent of having them not only play twenty different models, but a unique production per person unit per model)

In that light, the claim that all slot machines have a high pay-out chance is obviously suspicious, but would coming back and saying "these five designs have a higher than chance level of winning!" be an incorrect conclusion?

If each slot machine types is unique, no. But if one slot machine design is known to have a flaw, and if it shares this flaw with another design, and if that other design does not show the same increased performance, then things get really suspicious.

So the question becomes: do we know how strongly correlated the results of these tests typically are? If that is a lot (which I would expect to be true with at least some of these tests), the absence of the other tests is suspicious. If it is low, it might be less of an issue.


You hit on a lot of important points. The key here is that they are making a claim about memory, a psychological construct, that is being represented by some of their measures (batteries). But their batteries also purport to measure other psychological constructs, too.

To connect this with the slot machine analogy, it might be like if groups of slot machines had different colors, and they chose the color that yielded the best results.

> but would coming back and saying "these five designs have a higher than chance level of winning!" be an incorrect conclusion?

It would be if you didn't take into account that you analyzed 20 machines in your analysis .


The number of hypothesis tests is problematic (alpha inflation / familywise error rate) and the authors simply mention, "Another limitation was that we did not correct for multiple tests in the analyses as this was a pilot trial."

However, identifying their study as a pilot study does not excuse the lack of correction for multiple tests. These procedures are well known so failing to use them leads to maximum likelihood of ending up with false positives. The study then provides the best case scenario with the highest likelihood for false positives and seems to highlight the authors' desire to avoid failures to detect effects.

There is always a statistical trade off between reducing the numbers of failures to detect and reducing the number of false positives, so readers will be left to their own devices to decide if the authors' approaches and interpretations were justified.

The study's strength of argument is reduced by: * The authors' self-description of the study as a pilot study with a small sample size, * no corrections for multiple testing, * use of a non-representative sample ("Only approximately 15% of the screened volunteers were included in the study, and our recruitment method yielded a sample of motivated, educated, physically healthy subjects concerned about age-related memory problems. The sample, therefore, was not representative of the general population.") * PET brain imaging results self-described as "exploratory" (I take the word exploratory to mean "ourselves and others need not trust these results or interpretations" or at least "different results would not be unexpected")

What I find least appealing is the lack of a specific conflict of interest statement even while there is explanation of authors' financial interests in the substance being studied and the company selling it. As noted in the article, "Industry Sponsorship and Financial Conflict of Interest in the Reporting of Clinical Trials in Psychiatry " from the American Journal of Psychiatry ( https://ajp.psychiatryonline.org/doi/abs/10.1176/appi.ajp.16... ), "Author conflict of interest appears to be prevalent among psychiatric clinical trials and to be associated with a greater likelihood of reporting a drug to be superior to placebo." (Perlis et.al, 2005) and this explains the primary conclusions of the current study - better than placebo.

We can't fault the inventors of the substance, holders of the patent, and those with financial interests in the company which sells the product to want to test their product and promote positive findings, but how objective are the investigators and how rigorously are they trying to apply the notion of falsification to their own ideas?

They lose everything with falsification and negative results would undermine the parent company's claims that, "Theracurmin® product is one of the most advanced and studied, highly bioavailable forms of curcumin in the marketplace." http://theravalues.com/english/ . Looking on the research page, there are only a small number of studies shown at http://theravalues.com/english/research-clinical-trials/ . All positive outcomes of course.

All of the current supporting studies are listed at http://theravalues.com/english/literature-published-articles... so does anyone want to open up each of those studies and look at the sample sizes in each one?

In one of the supporting studies for Theracurmin® (https://www.nature.com/articles/srep39551), the sample size was six rats. In a second study (https://www.ncbi.nlm.nih.gov/pubmed/21603867) the sample size was 6 people.

What sample sizes do we see in other studies? How many other studies of this substance are equally as weak in terms of sample size? 6 rats here, 6 people there, 40 people here... not convincing. If anything, the strong marketing hype based on such studies makes me more wary and less trusting of the marketing and scientific claims.

Given the Theravalues clinical trials website promotes the product for for "Progressing malignancies, Mild cognitive impairment (the study highlighted in the parent post), Heart failure / diastolic dysfunction, Cachectic condition, Osteoarthritis, Crohn’s disease, Prostate-Specific Antigen after surgery" I'm left with distrust.

Kudos to the efforts to begin product testing and this sort of research is time consuming and expensive, but studies with sample sizes of 40 people do not support the company's marketing hype. If this study is positive enough for the authors to obtain more grant funding and run a much larger and better clinical study then I would be interested to see what is found then, but happier even still if authors with no financial or personal conflicts of interests ran the study.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: