Hacker News new | past | comments | ask | show | jobs | submit login

Shouldn't they control for that? Otherwise it's simply correlation rather than causation.



I don't think that controlling for variables can produce a conclusion of causation. (I'm posting half because I think this is correct, and half because I'd love to hear someone more knowledgeable about statistics confirm or deny this :) )

I think that controlling for variables means that you've tried to reduce the impact of other variables on the two that you're interested in. From Wikipedia: "In statistics, controlling for a variable is the attempt to reduce the effect of confounding variables on an observational study. It means that when looking at the effect of one variable, all other variable predictors are held constant." [1]

If you could control for all other variables then you'd know how much of a connection between the two variables that you're looking at.

I think this doesn't guarantee causation, though - you'd need to do experiments where you adjust the independent variable and then verify that the dependent variable changes the way you're proposing it should.

At least, I think that's how it works. Anyone else want to chime in?

[1] https://en.wikipedia.org/wiki/Controlling_for_a_variable


Yes, you're right. To guarantee causation, you need to either directly manipulate the variable, or control for every other variable in the universe (impossible). The more extraneous variables you control for, though, the more evidence you have for causation.


Not true, you do not have to control for everything. You have to control for everything that may have an effect on both the treatment and the outcome. That implies you have to assume that everything you don't control for does not affect both the treatment and outcome, (though something can affect one). This assumption is not testable but sometimes reasonable.

You also have to assume that it's possible for every person/unit of study to have a non-zero probability of receiving either treatment for all levels of the variables you are controlling for in order for the effect you're estimating to be defined. That is more likely to be violated the more things you have to control for.


> I don't think that controlling for variables can produce a conclusion of causation.

Controlling for variables can rule out alternative causal relationships (shared causes between A and B rather than A causes B) as explanation for correlation, but can't rule out coincidence. They strengthen the case for these plausibility of a causal explanation.


In my view, controlling for a variable X doesn't directly strenghten a causal claim, but allows to "rule out" another plausible explanation. In a way it shows that the observed effect is not accounted for, even if we take X into account, which means that it could be that the father's age influences filial geekiness. It could still be the case that another variable (that was not controlled for) accounts for the effect.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: