I don’t understand why very large neural networks can’t model causality in principal.
I also don’t understand the argument that even if NNs can model causality in principal they are unlikely to do so in practice (things I’ve heard: spurious correlations are easier to learn, the learning space is too large to expect causality to be learned from data, etc).
I also don’t understand why people aren’t convinced that LLM can demonstrate causal understanding in setting where they have been used for things like control like decision transformers… like what else is expected here?
He argued that, because machine learning is just based on correlational statistics, it would never be able to produce reasoning about causation.
Which is, at least in retrospect (GPT turned out to be able to do causal reasoning), a fallacy: It's like assuming humans can't think about gold because they do not themselves consist of gold. Or: That humans can't manually evaluate a computer program, because they are not themselves computers.
>That humans can't manually evaluate a computer program, because they are not themselves computers.
Well, yes, that’s why they were designed in the first place: to carry at scale those repetitive dull tasks that aggregates to an amount which exceeds human abilities and patience.
No, our submarines doesn’t have what it takes to swim, but really there is no drama here: there are still useful amazing peace of engineering.
I think one of the major difficulties is dealing with unobserved confounders. The world is complex and it is unlikely that all relevant variables are observed and available
I also don’t understand the argument that even if NNs can model causality in principal they are unlikely to do so in practice (things I’ve heard: spurious correlations are easier to learn, the learning space is too large to expect causality to be learned from data, etc).
I also don’t understand why people aren’t convinced that LLM can demonstrate causal understanding in setting where they have been used for things like control like decision transformers… like what else is expected here?
Please enlighten me