2. It was far from obvious a priori that this could work at all.
The existence of the sentiment ‘lol you trained a big model on X so obviously it produces a good probability distribution on X’ only exists because big models proved to be extraordinarily more effective than anyone expected.
> Besides, if your corpus is asymptomatically everything, why wouldn’t fitting it perfectly fitting it be the goal?
That's called Google, and it serves a different purpose. Perfectly learning the distributional properties of text instead is not overfitting, that's just fitting.
>The existence of the sentiment ‘lol you trained a big model on X so obviously it produces a good probability distribution on X’ only exists because big models proved to be extraordinarily more effective than anyone expected.
No, the "popularity" of that sentiment exists because of their "effectiveness". That sentiment was existing and being voiced 10 years ago.
Otherwise brilliant and rational people just go all mystical when we start talking about meaningless words like “intelligence” or “consciousness”.
An animal or a human or a big matrix perform X well on Y task. That’s quantitative and objective.
All this “is it smart” bullshit is thinly-veiled “how does the world still rotate around my subjective experience given this thing writes better Tolkien fan fiction than I do”.
Performance on tasks. Everyone else shuffle over to the Philosophy department.
2. It was far from obvious a priori that this could work at all.
The existence of the sentiment ‘lol you trained a big model on X so obviously it produces a good probability distribution on X’ only exists because big models proved to be extraordinarily more effective than anyone expected.