Hacker News new | past | comments | ask | show | jobs | submit login
Simpson's Paradox (vudlab.com)
210 points by mmaia on Sept 18, 2013 | hide | past | favorite | 50 comments



The Omitted Variable Problem is part of my mental framework that causes me to not believe most epidemiological studies, especially ones that confirm a popular belief.

The almost universally omitted variable is health-consciousness. Some people are health-conscious and some aren't. People who are health-conscious do a whole bunch of things, some of which help (like exercise, sleep well, eat moderately). They also do things that are widely believed to be good for you, like eating broccoli.

So if you do a study, you'll find that people who eat lots of broccoli are healthier. You'll be able to confirm pretty much any widely believed health folk wisdom, unless it's something quite harmful, as long as you omit health-consciousness as a variable.


Don't throw the baby out with the bathwater. There are ways of calculating whether a hidden variable is more likely to cause the observational data than the independent variables of your model.

A simple example would be a wet lawn. We know rain causes a wet lawn, and our observation shows indeed that rain and wet lawns are strongly associated. However, observing a case where a given lawn is wet, and yet there's no associated rain is a clear signal that a latent cause hasn't been accounted for (namely sprinklers). This principle still applies in noisy observational data or probabilistic rather than deterministic causal relations, though you do need a bigger sample to reach the same confidence.

We also can look at measures of model fitness. If a variation of the model that hypothesizes a latent causal variable is more likely to generate the observed data than the model without it, we know we've missed something. The general case of this is that we learn the model itself from the observed data.

This page from Kevin Murphy is a reasonable survey of the methods: http://www.cs.ubc.ca/~murphyk/Bayes/bayes.html I also recommend his textbook.


Surely some studies account for this? If you look for people who do X and people who don't and just analyze their lives, yes this problem is likely to exist. But if you take two randomized samples of a the population and say to group A, "do X," and to group B "don't do X," you have an effective control group. At least I think so. Don't some dietary studies even provide the participants with custom food regimens to try to eliminate extra dietary differences between study participants?


Epidemiologist here. Many studies account for it and we are acutely aware of this problem and have some sophisticated methods for dealing with unmeasured propensities. And randomized trial are "epidemiologic" studies too. Not everything in epidemiology is observational. Simpson's paradox is taught in 1st year intro to epi courses. There is also a lesser known "reverse-simpson" which comes from overgeneralizing specific results.


You're describing a randomized controlled trial and not an epidemiological study.


You are mistakenly confounding that epidemiological and observational.

Epidemiology studies can be randomized or observational.


Interesting...what would be an example of a randomized epidemiological study?


That's one way to do it (the best way). You can still draw "causal" conclusion on observational studies (where you don't get to explicitly assign people to group A and B, but where those assignment are given to you), but it is much harder and you have to be very careful with what you are doing. If you are interested in that topic I suggest you google "Rubin Causal Framework", or check out some of the papers by Donald Rubin (arguably the pioneer of the field of Causal Inference).


What you're describing is the "as long as you omit health-consciousness as a variable" part of his comment.


If it was a huge deal, wouldn't epidemiologists be slobbering over the statistical power they would get from using it?

That is, if they can control for it, they can make assertions about smaller effects.


> If it was a huge deal, wouldn't epidemiologists be slobbering over the statistical power they would get from using it?

Why would you want to do that? You can build a nice career on publishing (spurious) associations. As long as relevant variables are omitted, no epidemiologist need ever be unemployed.


That may get you written about in popular-science magazines, but your work will have a low impact factor- this is largely measured in how many of your fellow researchers cite your work, and its crucial to an academic researcher's career.

If your colleagues can easily see that your statistical analysis is flawed, they won't waste their own time doing work that builds off of yours.


It is not necessarily true that just because they are called scientists, real people with real motives, funded by organizations of real people with real motives, operating in a political regime controlled by real people with real motives, will act that way. (By "that way" I mean "your work will have a low impact factor- this is largely measured in how many of your fellow researchers cite your work" and "If your colleagues can easily see that your statistical analysis is flawed, they won't waste their own time doing work that builds off of yours.")

See e.g. Chan and Boliver "The Grandparents Effect in Social Mobility", http://asr.sagepub.com/content/78/4/662 (HT http://www.arnoldkling.com/blog/a-grandparent-effect/) and many of the works that it cites. There seems to be a sizable mutually-citing "scientific" literature devoted to studying correlation in biological descent while dogmatically not controlling for ordinary DNA-based inheritance of psychological traits from parent to child. (Indeed, not even controlling for DNA-based inheritance of physical traits. Health and height and attractiveness are correlated with income, and are significantly physically heritable.) By not controlling for the properties which are transmitted by sperm meeting egg, your can find all sorts of fascinating possibilities to investigate to explain the correlation. (E.g. Chan and Boliver carefully note that "well-connected grandparents could also use their social contacts to help grandchildren with job searches." Good, good, carefully investigate all the possibilities.)


> but your work will have a low impact factor- this is largely measured in how many of your fellow researchers cite your work, and its crucial to an academic researcher's career.

Nope! This is how we might like to to work, but cool associations will get written up forever, long after they have been refuted by randomized experiments. (Heck, papers can outright be retracted and still get cites.)

Hahaha - did I say refuted by randomized experiments? Like that ever happens to more than a tiny fraction of epidemiology correlations... No, one's career is quite safe. No one ever lost tenure because the correlations they built their career on turned out to be lame.

> If your colleagues can easily see that your statistical analysis is flawed, they won't waste their own time doing work that builds off of yours.

No, the point here is that your work can be immaculate and still never reflect causality. How are you going to collect data on every lurking variable? You can't, of course.


I'm optimistic enough to believe in a few wild cards trying to do good science. A whole bunch of the rest have to put a little bit of effort into competing with them.

It occurred to me a while after my other comment that a belief in the ability of people that test as health conscious to actually do things that improve their health might not be especially justified. I mean, lots of people are quite happy with whatever quackery, maybe it really does all come out in the wash.


In his article "Science pseudoscience nutritional epidemiology and meat" http://garytaubes.com/2012/03/science-pseudoscience-nutritio... Gary Taubes discusses how epidemiological studies can fail to account for something called the "compliance effect" - the missing variable.

It was an eye opener for me.


He quotes a textbook chapter in his conclusion. So the idea that you have to at least be careful with your statistics is orthodoxy, not some missing component of the field.

(I realize that doesn't make any guarantees about common practices, but it is maybe a little capricious to repudiate the entire field)


Controling for hidden variables (the general case of this paradox) is a big deal for all science areas.


> What Simpson's paradox is not

> Every ommitted variable problem

Not quite the same thing as the OP (not sure if you were implying it was). Regardless, OVP is still a great thing to have in your mental framework.


Simpson's Paradox is scary if you've not seen it before, but you should not stop here and instead proceed immediately on to omitted variable bias and the conversation here (http://normaldeviate.wordpress.com/2013/06/20/simpsons-parad...).

In short, Simpson's paradox occurs because probability distributions and causal claims are distinct things which behave differently. It's nothing more than a particular, insidious example of correlation not implying causation. It's perfectly possible for a probability distribution to have a "contradictory" shape, but perfectly impossible for logical statements about the world to be contradictory.

The resolution is that you shouldn't let your probability distributions turn into logical statements without analyzing your causal assumptions. This will lead you to whether or not excluding a variable is omitted variable bias (and whether including an improper one will lead to included variable bias, which is rarely recognized).


The point of an interactive illustration like this is that it should make it more intuitive and easier to understand than the core concept, not more confusing.

What is the meaning of the green and purple lines on the graph, why do they have different gradients and why can't I adjust them? Why does the Simpson's Paradox apply sometimes and not others? Why are there so many bars and donut charts? What does the gray circle around the donuts mean? Information overload.


I feel as thought both graphs with purple and green lines are accurately explained. Were you trying to skim through the article or were you still left confused after reading it? The top one is explained in the accompanying text and the bottom one is explained with a labeled x and y axis and relies on the information provided above.


If the article describes the subject adequately, what is the point of the graphic? I found the interactive graphic more confusing than the description.

Compare and contrast to the clarity of their central limit theorem demonstation [1] vs its wikipedia page [2].

[1] http://blog.vctr.me/posts/central-limit-theorem.html

[2] http://en.wikipedia.org/wiki/Central_limit_theorem


By that, I meant that the article described what the graph was representing, not that the article described the information presented adequately. Also, I absolutely disagree with the assumption you made that graphs shouldn't be present if the information is presented in the article. It's good to provide visual demonstrations when applicable in order to help clarify subjects.


It took a little playing around to figure out some of the graphics.

The donuts indicate which group (men or women) was admitted more in each kind of department.

Play around with the sliders and you can see it toggle in the combined row, while simultaneously a yes/no answer updates after the question "Simpson's paradox?" It would really help if yes/no was highlighted. Basically, you can make the paradox appear by dropping the % of women applying to easy departments. You can't make the donut switch in easy or hard departments individually... that's the whole point: women have a slight advantage in each.

Honestly, I found the "Proper Pooling" section to have the most intuitive explanation. Combined, there is a bias. By department, there is no bias. The reason is clear: very few women applying to easier departments.


Agreed. The text is straightforward enough to understand, but the graphics are irritating and don't contribute much.


It's a bit confusing, but overall, I really love the presentation of this article. I wish more serious research was presented like this, with lots of knobs for fiddling and experimentation. (See also: Bret Victor.)


It doesn't help things aren't labelled very well. I had to fiddle with things to work it all out. But at least I could, which gives it a leg up over an equivalently labelled static graph.

In the doughnut charts, blue is accepted rate, red is rejected rate (as labelled earlier in the article).

The gray circle indicates which has a higher percentage of acceptance between male and female.

The y-axis on the line graph is %age admitted combined over both departments. As you drag the women slider, you'll see the %age accepted rate for women combined over both department matches the y value of the women line in the line chart.

The illustration is to show how the distribution of applicants between the easy and hard departments (the lurking explanatory) effects the combined acceptance percentages (the explained). This is the focal point, so all other variables need to be fixed.

The fixed values are: in the easy department, 62% of men are accepted and 80% of women are accepted. In the hard department, 26% of men are accepted and 27% of women are accepted. In both departments women are slightly favoured over men, so both always have a gray circle around the women's charts.

There are 1,835 women applicants and 1,362 men applicants, making a total of 2,691.

The combined accept rates is derived from this static data and the user controlled variable of the distribution of applicants between the departments. You can pick some arbitrary inputs and trace the maths through to calculate how many apply to each department, and how many of those get accepted, then add that up to find the combined value (be warned the actual percentages aren't the nice rounded ones they label them as).

While the point on the purple line is below the point on the green line in the y axis, the combined acceptance percentage for women is larger than the men. There is no Simpson's paradox here because taking the lurking explanation into account - that the distribution of applicants between departments has an effect - does not change the outcome. Women are favoured in all three measurements; the gray circle is around all the women's charts.

When the point on the purple line is above the point in the green line in the y axis, the combined acceptance percentage for women flips to being smaller than men. Here we have the Simpson's paradox - taking into account the lurking explanation changes the outcome. Women are favoured in the breakdown by department, but men appear favoured in the combined statistic.

I found the interactive chart helped me figure out this explanation, but it wasn't helped by poor labelling and the fact they seem to have used real (read messy) statistics to perform the underlying calculations with rather than the nice rounded "62%" etc. they label.


Reminds me of Anscombe's Quartet: four datasets that have the same mean, variance, correlation, and linear regression, but are really very different. http://en.wikipedia.org/wiki/Anscombe%27s_quartet


maybe we'll do that one next!


Judea Pearl has formalized a resolution to Simpson's Paradox:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.34....

An oversimplification of his idea would be to say that given assumptions about the causal independence of variables, it becomes clear which way you should group the data. Although it's not always possible to make these independence assumptions, much of the time they are obvious and uncontroversial.

Taking the Berkeley gender bias case as an example: We know it's possible that biological gender influences which department graduates apply to, but that it's impossible that the department a graduate applies to influences their biological gender. This fact alone tells us that we need to look at the data by department rather than in aggregate, resolving the paradox.


Well, it certainly makes more sense to look at the by department data rather than the aggregate data. But I would say that the fact that we see such large disparities in which departments are applied to is already enough to refute the idea that male and female applicants are pretty much the same, and that therefore it would be pretty reckless to conclude any sex discrimination based on the difference in acceptance rates.


And Pearl's point is that the "real world" logic you just applied is both important and not statistical—you need to augment your analysis with these causal assumptions in order to translate probability into meaningful causal statements.


That's a great reference. I'm going to use it from now on too.


I won't claim to have fully internalized all of it, but Judea Pearl's book Causality is incredible.


I've read a lot of his papers and I own Causality, but unfortunately it ended up at my parents and I always forget to grab it when I'm there. I really would like to get further into the book.

But this is a great, fast example of why his work is important.


Reading up on Simpson's Paradox again made me realize something: women retrieving custody more often than men appears to me to be a perfect example of Simpson's Paradox. Overall, women get custody more often than men, this is true, but if you consider only the cases where men actually asked for or attempted to retrieve custody, this is no longer true[0].

[0]: http://www.villainouscompany.com/vcblog/archives/2012/04/chi...


How to interpret that, though? Seems likely to me that who asks for custody would be influenced by their chances of receiving it. How many of those cases are "contested custody"?

Then that article places the claim "Additional evidence, however, indicates that women may be less able to afford the lawyers and experts needed in contested custody cases (see “Family Law Overview”) and that, in contested cases, different and stricter standards are applied to mothers." without providing any data. So it ends up being a propaganda piece, which makes it not very trustworthy.


FWIW, Simpson's paradox is a "veridical" paradox, not a "vertical" paradox as noted in the article. Apparently veridical isn't yet accepted by spelling checkers. (As I'm writing, Chrome offers the single suggestion "vertical" for my "veridical".)


For me this was far better explained in the following text (also posted on HN not so long ago): http://www.michaelnielsen.org/ddi/if-correlation-doesnt-impl...


It's nice visually but the text-based information could be more thorough - which would result in less links at the bottom.The more info contained on the page the better!


Anyone interested in this stuff should definitely watch the lectures from the Stats 110 class from Harvard that's up on iTunes U (and perhaps other places).

Lecture 6 talks about this paradox and if I recall correctly he might even talk about this exact case:

https://itunes.apple.com/us/course/statistics-110-probabilit...


That is some sexy javascript.


My first thought as well


thank you! :D


I had a hard time understanding the last graphic, because I was reading on an iPad, and the sliders don't work. I thought it was all static, and couldn't match the numbers in the graph with the previously mentioned numbers, as the defaults of the sliders do not give a simpson's paradox.


Really interesting text, thanks.

But I think this last statement

"or Texas schools to waste money copying Wisconsin."

was meant to be the other way around


I may have read it wrong, but I think the statement is correct. Texas minorities outperformed Wisconsin's, but the stats made it look like Wisconsin's minorities did better.


To elaborate: Black students in Texas outperform black students in Wisconsin, hispanic students in Texas outperform hispanic students in Wisconsin, and white students in Texas outperform white students in Wisconsin.

Overall test scores are higher in Wisconsin only because white students in both states are the highest scoring group, and Wisconsin schools have a higher proportion of white students than Texas schools.

To the extend that any of this can be copied, it would probably be better for Wisconsin to improve the test scores of its ethnic groups to Texan levels, rather than for Texas to emulate Wisconsin's lilly-whiteness.


Those interactive charts and graphs!! (My first encounter with vudlab and am loving it!)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: