Hacker News new | past | comments | ask | show | jobs | submit login

In section 13, I see they are still teaching the Fisher-Neyman Pearson hybrid (ie the null ritual). For a brief overview see [1]. To start you off: Fisher said the idea of power was nonsense[2], and Neyman-Pearson said a hypothesis is either rejected or not (there is no gradient of evidence for/against).[3]

[1] Gigerenzer, G (November 2004). "Mindless statistics". The Journal of Socio-Economics. 33 (5): 587–606. doi:10.1016/j.socec.2004.09.033

[2] 'The phrase "errors of the second kind", although apparently only a harmless piece of techinical jargon, is useful as indicating the type of mental confusion in which it was coined.' -Ronald Fisher. "Statistical Methods and Scientific Induction." Journal of the Royal Statistical Society. Series B (Methodological) Vol. 17, No. 1 (1955), pp. 69-78 https://www.jstor.org/stable/2983785

[3] 'no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis.' -Neyman, J; Pearson, E. S. (January 1, 1933). "On the Problem of the most Efficient Tests of Statistical Hypotheses". Phil. Trans. R. Soc. Lond. A. 231 (694–706): 289–337. doi:10.1098/rsta.1933.0009.




Indeed, when I took my introductory statistics courses at UC Berkeley (STAT 200A and STAT 200B), we discussed the pitfalls of blindly applying these concepts---albeit at very high level.

Initially, I wrote this cookbook/cheatsheet in order to structure and retain the material in these courses, not to challenge them. Most of the content comes from the cited references, all of which have a very terse and mathematical presentation. It would be great to augment the current document with pointers to the literature that offer a critical discussion. As a non-statistician, I lack the historical perspective, but I always appreciate contributions from experts in the field. (The document is open-source: https://github.com/mavam/stat-cookbook)


I don't have time to contribute but the wikipedia page on NHST[1] used to be pretty good about refs. A lot of the stuff pointing at controversy/history has been slowly removed the last few years... it still isn't too bad though. Anyone interested can also try looking through old versions. There have been thousands of papers published on that topic.

[1] https://en.wikipedia.org/wiki/Statistical_hypothesis_testing...


How would you answer the question how big a sample has to be to measure an effect of a certain size?


I think I know what you are trying to ask, but this sounds muddled to me. Can you clarify what you mean by "measure an effect of a certain size"?


In the placebo group value A is x %. You want to know if in the treatment group value is at least x + 10 %. How many people do you have to test? (Without getting into details about study design. :-)

Anyway, you don't think power analysis à la Cohen is useful?


If that is what you care about for some reason, you would set x+10% as the null hypothesis, right? Not sure what that scenario is supposed to have do with power.

Also, this isn't really about what I think, rather I would hope people check the Fisher 1955 ref and go from there.

What I think though is this whole idea of testing vague/vagrant hypotheses (eg the example we used here) is wrong in the worst way possible. The null hypothesis should be deduced from some theory, or at least correspond to what you care about. I have shared this paper on the site many times, I think it should be standard reading in high school: http://www.fisme.science.uu.nl/staff/christianb/downloads/me...


You cited Gigerenzer, Neyman, Pearson as if they opposed the concept of power. Fisher might be in a different boat but he also claimed that smoking doesn't cause lung cancer, so he probably wasn't always right. :-)

Sample size, effect size & power are related concepts in the context of power analysis -- see also Cohen's "A primer on power", which is available on the Internet. The concept of power has nothing to do with "degrees of evidence" or vague hypotheses.


>"You cited Gigerenzer, Neyman, Pearson as if they opposed the concept of power."

Sorry for the miscommunication. The point is that power is a Neyman/Pearson concept, Fisher said it didn't make sense. On the other hand a gradient of evidence is a Fisherian concept, Neyman/Pearson said that didn't make sense.

What people have been teaching as stats is a mismash of the two that makes sense to no one who thinks these types of things through. Gigerenzer reviews this strange phenomenon and offers some entertaining commentary, it is a decent starting point.


>"The concept of power has nothing to do with "degrees of evidence" or vague hypotheses."

Yes it does. To properly assess the probability of incorrectly failing to reject a hypothesis you need to know how likely the data would be under various rival hypotheses. This depends on the precision of the various hypotheses. This is explained by Fisher in my original ref.


You may also want to check out figure 2 of this paper which further illustrates the relationship between a statistical hypothesis and the research hypothesis: http://rhowell.ba.ttu.edu/Meehl1.pdf


> Ten years later, I wrote at greater length along similar lines (Meehl, 1978); but, despite my having received more than 1,000 reprint requests for that article in the first year after its appearance, I cannot discern that it had more impact on research habits in soft psychology than did Morrison and Henkel.

is the author using a null value to inform this perception?


Doesn't sound like it. It sounds like Meehl is giving an order of magnitude estimate. He is saying that it is his impression that both Morrison & Henkel's paper and his own seemed to have little effect on practice.

Clearly he doesn't think it had exactly zero effect, since it affected him!


assuming 1000 reprints somehow implies there should be a discernible 'impact on research habits' seems like an example of what your referenced Figure 2(o) calls 'Estimating parameters from sample'

(o) http://rhowell.ba.ttu.edu/Meehl1.pdf


You have it reversed.

"Estimating parameters from sample" (on the right) would be his observation that there was little discernible effect. Thinking that 1000 reprints of the paper would have a larger effect on practice would more correspond to "theory" (on the left), although that is a pretty vague one.


but where does the 1000 figure come from? it reads arbitrary


Just because the ancients said something is no reason to take it as gospel.


Sure, but what is going on is that statistics textbooks have been teaching nonsense since the 1940s, and that nonsense has been taken as gospel.

It is no accident that statistics is usually presented as arising anonymously and forming a monolithic paradigm, when in fact the exact opposite is true. Stats must be one of the most controversial areas of intellectual activity around. Those refs are to start off the curious, some of whom will manage to break free of the brainwashing and think for themselves.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: