Hacker News new | past | comments | ask | show | jobs | submit login
Ways to fix statistics (nature.com)
134 points by aqsalose on Dec 1, 2017 | hide | past | favorite | 52 comments



I consider myself analytically-minded, and statistics still gives me a headache. Most people understand basic relationships: "These numbers are different, but not meaningfully so" or, "These two elements are more strongly related than these other two."

The problem is threefold:

1. Learning statistics is a tautology. Meaning: the best way to learn statistics is to already know statistics. Go google how to do basic significance calculations in, say, Excel. The first paragraph of the resulting articles will dive into t-test and twin-tales and P-values and on down the lines. Ask someone how to do a simple comparison, and they'll digress into null hypotheses and "what test did you decide to run". You'll probably see some nasty formulas and opining about which PhD-level approach is better. It's just opaque.

2. Tools are professional grade. Software needs to bridge the gap between users' intent and their ability to do the analysis. Statistics requires R (you have to know how to program and statistics), SAS ($$$), or another package that limits barrier to entry. Most people know how to ask the basic question, but no idea how to get the answer or interpret the results.

3. People who know statistics can't interpret results to laymen. I see this a lot in business: really bright folks build segmentation models and predictive models, you start talking about cohorts and medians... but that's where it ends. Your c-suite doesn't care about that: it cares about the so-what.


The root problem is even simpler.

The question that everyone wants from statistics is, "Is this true?" The question that data properly answers is, "Here is how to update your beliefs." Therefore every incentive exists to pose questions that can produce answers which are MISUNDERSTOOD as being answers to the question we wanted answered which we couldn't.

And once a particular way of answering it has become popular, every incentive exists to push it to draw conclusions because conclusions are better than the alternative.

That's why p-values are so popular and misused. That's why you can't keep people from p-hacking their way to something publishable.

My preference is to not fight it, but accept it. Make null results publishable. Have businesses focus on decision procedures, not statistics. You can argue all day with things like https://www.evanmiller.org/how-not-to-run-an-ab-test.html that people shouldn't misuse statistics. Or you can decide the maximum number of conversions that you'll spend, call that N, stop if one version gets sqrt(N) ahead or if you get to N without a clear answer. (In the latter case, go with the one ahead as the best available answer.)

The second procedure has a rigorous statistical interpretation as "the best answer available with the effort we were willing to put in". But it isn't a definitive answer, and doesn't pretend to be. Complex conversations are avoided. Get an answer, go with it.


Or you can decide the maximum number of conversions that you'll spend

What are "conversions" in this context?


Whatever you want them to be.

People signing up for your service. People buying. Whatever you're trying to optimize for in the A/B test.


I'm sorry, but this is entirely unfair, especially points 1 and 3.

First, addressing #3:

>People who know statistics can't interpret results to laymen.

This doesn't match my experience. Stats experts can sometimes be terrible at communicating their work, but no more so than the security expert at explaining why we have to jump through these hoops, or the web dev explaining why the site went down. Some technical experts are good at communicating to lay people, some aren't. Stats is no outlier.

But for learning and using statistics:

> Go google how to do basic significance calculations in, say, Excel. The first paragraph of the resulting articles will dive into t-test and twin-tales and P-values and on down the lines.

This seems to really be saying Using statistics requires you to understand statistics. Which seems sensible.

The big issue is that uncertainty is a different kind of thinking. To put it at its simplest, if you want a yes or no answer and I keep saying "Well, probably yes, but maybe no" you're going to be really frustrated that I keep using the language of uncertainty. "I just want excel to give me a yes or no answer, but the articles keep trying to teach me about 'maybe' in the first paragraph!" Uncertainty involves new kinds of statements. No way around it. The software and the concepts require it, and there's no (correct) way to reduce it to "I just want excel to tell me whether this effect is real." When you say most people know how to ask the basic question, I disagree. Most people want to know "Is this effect real, yes or no?" But no matter what tools you use, that's not enough of a question to have a single correct answer.

"You'll probably see some nasty formulas and opining about which PhD-level approach is better." That's just untrue. It sounds anti-intellectual and a lot like the defeatist "I'm just not a math person." The formulas you'll see in those 'Intro to NHST' tutorials require high school math and are covered in the first stats class you'd take in undergrad.

Edit: Sorry for the rant. I'm going to leave it up, but I'm just really turned off by "I consider myself analytically-minded, and statistics still gives me a headache," followed by 'here are the problems with statistics.' It's not hard, but you have to learn the foundation to use it. Do that before criticizing how we use it.


In his defense (and admitting your points are all completely right):

I consider my self analytically minded (i hope I am to some degree since its how I make my living).

Statistics during university also gave me a headache. Still does.

10 years later though: I've now worked for the national statistics body. I put my head down and said "Screw this, every time I don't get something, I'm going to try to implement it from first principals until I do. I'm going to code it up. Every time I don't understand the paradigm/language, i'm going to try to do everything in that mindset until I do."

I worked for several years in the stats methodology division until I moved on.

I am now of the not uncontroversial opinion admittedly, but ever increasing in popularity, that the reason I didn't understand stats in university stems from:

1) It is taught badly.

2) Often what is taught IS wrong.

3) In practice it is used wrong: sig testing is the most overused technique I've seen in any field i've had experience with, except perhaps linear models (another stats baby) and in my experience, once one has internalised the lessons/mindset of variance/sampling/etc rather than "correct/not-correct", statistical significance is almost always tangential to the actual analytical question, yet it is often treated as the goal. I think its popularity is partly due to the fact that it does provide someone a framework for yes/no decision making, negating the very mindset change needed to properly understand statistics. Frequentist techniques were the focus in university stat level classes, and such a framework and models are often forced into very non-frequentist type situations where a bayesian/subjective interpretation is, in my subjective opinion, more rational and justified. My stats classes had almost no material on computation, logic, falsifiability, experiment and data design, etc, which, subjectively I view as much more important to real statistical work than rote learning what a regression, least squares, or R^2 measure is.

A great deal of the time, analysts have trouble explaining the methods they use because its obvious (to me) that they don't really understand the methods they're using (if they did, they often wouldn't be using them).


The more I study (halfway through a tough PhD now), the more I realise there's an obsesion with making hard things harder, sometimes just because you were taugh that way, other times because hey, "it's post-graduate education! it´s suposed to be hard".

And the many times me and the research group I work with have sat, read things from the basics upwards, doing an effort to explain everything as simple as posible, while not hiding complexity, it took a lot of effort, but our work was highly praised and moved back into the courses we teach. And students are happier (or at least not miserable as they used to be when teaching hard things was done "the classical way").

So I agree with (1), and all I can say is that, when in a position of teaching, it's best to take the time, work our material from the ground up, and teach it in the simplest posible way.


> ... taught badly [and educators are] often [incorrect]

Yes, but this may also be the case with many other topics. I had the dismaying experience of TAing a class at a highly respected university where a tenured professor was using lecture notes that were factually incorrect and in my mind glaringly so. Not just a bit of errata, but consistently wrong.


  It's not hard, but you have to learn the foundation to 
  use it. Do that before criticizing how we use it.
Here's the crux of the problem: we disagree on your first phrase here! I think statistics is hard to learn. I (clearly) don't have the foundation to use it. I'm not criticizing how you use it, I'm criticizing my inability to do so.

Sounds like a personal problem, dunnit?

What's frustrating is that I do "consider myself analytically-minded". I have a Bachelor's in CS; I took two (?) stat classes in college. And it's still like pounding my head against a wall. I find opportunities to use statistics to solve business problems, but finding resources to help me solve those problems with statistics is challenging without, in effect, going "back to school". (Incidentally, this is how I learn: concepts through examples.)

I find that answers to what I think should—should—be simple questions to involve academic digressions into, in effect, the "different kind of thinking" you mention. I find them academic because they don't help me solve my immediate problem. That's not anti-intellectual: but by virtue of statistics being intellectual, it's exclusionary.

I don't think I'm alone in that. From observation, I'm ahead of the curve in even trying. Maybe what I'm asking for is impossible. I hope not, because it the alternative is continued non-use of statistics. I'm bullish because I see opportunity for software to bridge the gap. Software may not be mature enough to do all the heavy lifting required of the user today, but it'll get there.


What’s a good way of learning statistics for someone who is more into pure mathematics?

I always avoided stats as much as possible because I found the classes extremely un-rigorous. I couldn’t get to grips with what the notation really meant, and there seemed to be a lot of hand waving on the way to results.

I’m not doubting that stats is rigorous, just my exposure to it so far hasn’t been.


My recommendation depends on what you'd like to do with it/what problems you are trying to solve. Huge field and all. Whatcha looking for?

One resource I've started recommending to people is this little book http://bayes.cs.ucla.edu/PRIMER/ It's not going to teach you a lot of the practical tools, but there's a weird feature of statistics that the answers to most questions aren't just functions of the data. They're functions of how the data was gathered and what assumptions are fair to make too.

So I've started to point people to this first, and only after that to what's usually stats 1. Once you finish that little book, any intro to mathematical statistics should suffice, but is probably overkill on the practical side. Hence my original question.


>Learning statistics is a tautology. Meaning: the best way to learn statistics is to already know statistics. Go google how to do basic significance calculations in, say, Excel. The first paragraph of the resulting articles will dive into t-test and twin-tales and P-values and on down the lines.

I'm not sure what you mean by this. Or rather, your statement is true for all disciplines.

Your question: How do I do a significance calculation?

This automatically should mean you have basic statistical knowledge. How else would you know what a significance test is? In my introductory statistics course, we were taught p-values and t-tests. Why is it so problematic that people who want to use a statistical test should know what is considered introductory material?

>Software needs to bridge the gap between users' intent and their ability to do the analysis. Statistics requires R (you have to know how to program and statistics), SAS ($$$), or another package that limits barrier to entry. Most people know how to ask the basic question, but no idea how to get the answer or interpret the results.

Again: Is that not true for many disciplines? It is always easy to ask basic questions: When I type in a URL in my browser, how does my browser get the information it is showing me? The answer is not something the layperson will understand.

The issue with statistics is that answering simple questions can be complex. What is considered significant actually depends on the problem domain. You need to know what level of significance is good enough for the problem. And if you do not know how the significance test actually calculates things, you will not know if the level of significance you specify is accurate for your problem. It would be scary if my statistics tool gave me simple binary answers ("Yes, these two variables are related") - that would often be wrong and prone to misuse.

I think the problem with statistics is that we don't do enough of it. My undergrad had one class in it, and we never used it outside that class. Yet when I look at skills needed in the real world, statistics is high on the list - higher than many engineering courses I took, and definitely higher than most mathematics courses I took. Most engineers never use calculus outside of school - yet they can still do basic differentiation and integration. But most of them cannot do basic statistics - and many of them have a need to do statistics on the job.

I think typical engineering courses should try hard to put in statistical elements to their homework assignments.


> 1. (...) Go google how to do basic significance calculations in, say, Excel. The first paragraph of the resulting articles will dive into t-test and twin-tales and P-values and on down the lines.

Understanding significance calculations and performing them in Excel are completely different things, and it's perfectly reasonable for a learning resource to cover one of them but not the other. Software developers, the primary target audience of this website, should know this as “separation of concerns”.

> 2. (...) Software needs to bridge the gap between users' intent and their ability to do the analysis.

All statistical software can do for you is perform calculations. The meaning of the results or even the calculations themselves is completely up to you. You can even decide it's meaningless!

> 3. (...) Your c-suite doesn't care about that: it cares about the so-what.

This is exactly the problem. People want to make the right decisions, but they don't want to take the trouble to understand how reality's complexity affects the outcomes of these decisions.


Learning statistics is like learning anything else. Kids who want to make a video game are often dismayed when they realise how much they'd have to learn. If they ask how you put a walking character controlled by the mouse and keyboard on the screen, you'd have to explain about 3d graphics, the game loop, input devices. For 3d graphics you need vectors, transformation matrices, etc. Eventually you get down to basic programming constructs like loops. Like with statistics it can seem like an infinite fractally expanding amount of knowledge. A simple question such as how you draw a moving character or how you calculate whether the difference is statistically significant, has a complicated answer. The knowledge required isn't infinite or tautological or circular. With statistics it eventually bottoms out, but to really understand a t test you need at least probability theory and calculus. College students spend a year studying that.


Probability theory requires a solid grounding in real analysis. Undergraduate real analysis alone is one year worth of material, and this is assuming that you already know how to write basic proofs. If all you know is some calculus tricks, as taught to engineering students, then you can probably compute probabilities, but you aren't ready to understand probability theory.

Fortunately, most users of statistical tests don't really need to know probability theory anyway. This even includes some statisticians.


With today's computers you can get a lot done without knowing t distribution and p-values, just using bootstrap and Monte-Carlo. See 93 conversions in test and 87 in control and want to know if this is just noise? Pull 10,000 bootstrap samples and look at the distribution. You can even do it in Excel if you want, but writing a loop in any programming language is not that difficult either.


Yea, but I wouldn't trust someone's bootstrap estimator if they don't know how to interpret a t-test.


At least a partial antidote for #1: https://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick...

For #2: In my experience, it takes a long time to go from "the basic question" to a well-founded statistical question for which analysis is appropriate.

For #3: The best managers and analysts will find ways to ask questions and meet in the middle between technical complexity and the business/science problem at hand.


Re point 2.. I am a bit of an Evan Miller fanboy, but would like to make a shoutout to Wizard anyway. https://www.wizardmac.com/ is a fantastic piece of software.


"I consider myself analytically-minded, and statistics still gives me a headache"

"... blah blah etc."

I suspect you do yourself a disservice. I think you instinctively know how to interpret and analyse discussions that involve statistics. That will include seriously complex and probably abstruse (to the layman) discourse. You are probably not a bleeding edge exponent and probably not in the habit of dropping papers on the world.

However, I think you might be able to mess around with equations involving mu 'n' sigma (int al) without breaking a sweat.


While there are problems with the way statistics are used by scientists and others, the elephant in room is the incentive system. p-hacking and the like result from the need to get “exciting and publishable” results on a consistent basis or effectively be fired (the simple path is no “sexy” results -> no “high quality” papers -> no grant funding -> no job).

A possible solution that I have not seen proposed (this is mostly likely my ignorance) is for journals to only accept/reject research proposals, not finished papers. The journal would publish the resulting paper no matter the outcome of the research.


The problem is that journals have incentives too. And these unfortunately often align: sexy results -> more income.


This is true, but it is surprisingly weak as the income of a journal is not too closely related to its impact factor.

My proposal would still mean that Nature would get the most sexy papers because it would get the most sexy proposals. A bigger problem would be journals choosing proposals from authors with a history of getting sexy results rather than on the quality of the proposal. This reliance on "track record" is why the grant system is so broken.


Isn't it? I mean, I know there's no hard data on it since most journal subscriptions are bought in bulk, but the journals are using their impact factor to sell those subscriptions (case in point: [1]). Likewise, if they charge publication fees, authors will be most willing to pay large sums of (their funders') money for journals with high impact factors.

And do you really think Nature would still get sexy results if there are sexy proposals? Or would your proposal (which I do agree with, for the same reason the traditional publishers won't do it:) lead to better research but fewer "sexy" results? After all, journals with a high impact factor are also known to have more retractions.

(The traditional) publishers are going to keep doing what is bringing in the most money - or at least, what they believe what will. At this point, they don't believe that your system would lead to higher income.

What could change that if authors are no longer incentivised to chase high impact factor journals. But that means there will have to be better ways to evaluate them that will actually get adopted (i.e. not require evaluators to comb through every applicant's research [2]).

This is a difficult problem. And yes, I realise I'm not coming up with a solution. The only ideas I have on that front have a really small chance of success, but it's a bit too much to elaborate on that here. (Although in time I will write about them at [3].)

[1] https://mobile.twitter.com/simoxenham/status/235698457124425...

[2] https://theconversation.com/why-i-disagree-with-nobel-laurea...

[3] https://medium.com/flockademic


See also further discussion on Andrew Gelman's blog: http://andrewgelman.com/2017/11/28/five-ways-fix-statistics/


Step 1: Become Bayesian.

Seriously though, I am inclined towards approaches, such as pre-registration, which limit the number of researcher degrees of freedom in analysis. It's not necessarily that statistics are broken. It's that the system incentivizes researchers to break the assumptions underlying these statistical tests.


What informs your priors?

That is the core issue with practical applications of Bayesian statistics. Especially because it is a totally new parameter that people can complain about. At best this leads to rampant bikeshedding, at worst we get prior-hacking as opposed to p-hacking.


What I wonder is, given a system with perverse incentives, won't people find a way to abuse Bayesian statistics?


I'm sure they would! But there's a pretty strong argument it's a step in the right direction. Actually, this link makes a more modest proposal that papers should report likelihoods rather than p-values. This avoids reported results results depending on priors (which perhaps we don't trust authors to choose well), though a reader can easily impose themselves if they want to.

https://arbital.com/p/likelihoods_not_pvalues/

You still have the option to muck with things by choosing your hypothesis class in a bad way-- nothing can really replace publishing data!


Probably.. but with Bayesian methods at your model/data updates its priors, rather than you effectively embedding your prior beliefs into your models via selectively choosing tests that support them.


It's very easy to let that slip into post hoc justification of your priors.


In my view, "prior" may be a misnomer. There is nothing that I'm aware of in Bayes' theorem to suggest that you have to formulate your priors before gathering or analyzing your data. I would describe priors as constraints that are included in an analysis, to narrow the results based on additional information that you're aware of. Bayes' theorem mainly provides a framework for computing what happens when you do that.


Maybe the prior doesn’t need to be formulated _before_ getting the data, but it needs to be _independent_ of the data.


Why become Bayesian? The rational actor theorems that drove that in the '60's turn out not to apply if you use a class of priors instead of a single prior, and single priors are overspecific to capture belief in most cases.

Better to go all the way down to decision theory.


Do you have some fuller description of, or some links to, the work you are referring to?

Also, would decision theory replace Bayesian inference? Would it not rather _use_ it?


> Worse, NHST is often taken to mean that any data can be used to decide between two inverse claims: either ‘an effect’ that posits a relationship between, say, a treatment and an outcome (typically the favoured hypothesis) or ‘no effect’ (defined as the null hypothesis).

This is the whole problem, the rejection of the null hypothesis is not just a ritual to follow before you then accept the alternate. Fisher's core idea is that it only takes one piece of evidence to reject something, while it should take many to confirm something.

So the only conclusion you should draw is about what you've rejected. You can't then take that and infer the acceptance of something else. It's a subtle but important distinction that is the whole reason for this process.


Likelihood functions, p-values, and the replication crisis: https://arbital.com/p/likelihoods_not_pvalues/?l=4xx.


Is there a calculus based approach to statistics? Like understanding linear regression simply as a minimization problem, but ALL THE WAY ie extending to other statistical techniques?


The stats course that I took in college was certainly calc based. There were two stats options:

"Stats for scientists" was a one semester course, mostly involving plugging numbers into formulas. There was lots of hypothesis testing.

"Math stats" was for math majors. It was two semesters, and focused mainly on proofs. I took math stats. I also ran a tutoring session for the scientists.

A problem is that the basic stats course is taken by a lot of students who wouldn't have gotten through calculus. So, building stats on top of calculus would have created two forbidding layers of abstraction instead of one.

On the other hand, I graduated from college in 1986, and we did all of our calculus by hand. I wonder if a potential compromise today would be to teach stats by exploration using random numbers. You're still doing integration, albeit numerically, but maybe it wouldn't seem so forbidding. And by playing with random numbers, you can learn the hard way what erroneous conclusions you can draw from them.


To build on fny's answer, there is one school of statistics (Bayesian statistics) where there basically is a "right" way to do analysis for any problem, provided you make the necessary assumptions (likelihood and prior) correctly. However, the most common statistical concepts (e.g. p-values or confidence intervals) are not in the Bayesian school


Credible interval or highest posterior density intervals are arguably much closer to what most people think a confidence interval are.

I’m not sure there is a single right way to solve any given problem isn a Bayesian way, but it does force you to think more about the problem at hand and make your assumptions explicit.


It's called probability theory, but that's not going to resolve these issues.

If you're interested, there's a great 100-level course online from Joe Blitzstein at Harvard: https://projects.iq.harvard.edu/stat110/home


It's called “probability theory”, but is it actually probability theory? How can you have a course in “probability theory” that doesn't mention the Lebesgue integral even once?


I took a course in "probability theory" that never mentioned Lebesgue integrals. Those came in measure and integration later.

The course was about distributions, how product distributions interact, the central limit theorem, etc. That is, it was about turning stochastic models into predictions. Later we had a course "Statistics" that is about matching seen results to stochastic models.


> The course was about distributions,

How can you study these without knowing the Lebesgue integral?

> how product distributions interact

How can you do this without the Fubini-Tonelli theorem?

etc. etc. etc.


By talking about the operations without giving the rigorous backgrounds. Heck, you can do quite a bit with rieman integrals, and for e.g. binomial distributions you just need sums.


This is exactly what leaves many people feeling like mathematics is a bunch of unmotivated tricks and rabbits pulled out of a magician's hat.


This was during a Bsc in mathematics. Like I said we later got the theoretical grounding.

Starting with measure theory would be what leaves people feeling that mathematics is useless formal bickering.


Statistics/Probability theory is really deeply tied to calculus/real analysis . Basically any quantity you can think about in statistics or probability is really an integral. The trick is that they aren't Riemann integrals, but they are a more advanced type called Lebesgue integrals, which has to do with measure theory.

The main difference between the two is that a Riemann integral count every area of space the same ([0,1] counts just as much towards your integral as [1,2]), but Lebesgue integrals use what is called a measurable function, which maps f(set) -> positive number. You can use this function to weight different parts of your integral differently. Now what you can do is make measurable functions that map f(set) -> how likely that set happens (which is exactly the probability)


I think this is an interesting collection because most proposals acknowledge that statistics cannot be fixed by means of statistics alone but rather by taking into account the human factor. Bayesian statistics won't keep humans, who want to achieve something that's not primarily about statistics, from doing stupid things.


One can imagine workflows that combine some of these solutions. Imagine pre-registering experiments and analyses... and then fellow scientists privately weigh in with their predictions for the results. These predictions can then be used to form a prior for a false-positive risk calculation.


>We need more observational studies and randomized trials — more epidemiology on how people collect, manipulate, analyse, communicate and consume data.

I don't understand what is meant by "epidemiology" in this sentence.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: