1. They aren't overfit. 2. It was far from obvious a priori that this could work...

benreesman · on July 23, 2022

When you’re churning through terrawords of training data and aren’t cross-validating against proportionally-sized hold out sets?

It’s not a certainty that you’re overfitting but the burden is to show otherwise.

Besides, if your corpus is asymptomatically everything, why wouldn’t fitting it perfectly fitting it be the goal?

It would probably be more accurate to say that the bias/variance tradeoff loses meaning as the training set goes to infinity.

Grapevine is that the big actors are going to 5x their flops at any price in the next three years.

What does validation loss even mean at that scale?

Veedrac · on July 24, 2022

> Besides, if your corpus is asymptomatically everything, why wouldn’t fitting it perfectly fitting it be the goal?

That's called Google, and it serves a different purpose. Perfectly learning the distributional properties of text instead is not overfitting, that's just fitting.

trention · on July 23, 2022

>The existence of the sentiment ‘lol you trained a big model on X so obviously it produces a good probability distribution on X’ only exists because big models proved to be extraordinarily more effective than anyone expected.

No, the "popularity" of that sentiment exists because of their "effectiveness". That sentiment was existing and being voiced 10 years ago.

benreesman · on July 24, 2022

You’ve stated this point better than I did.

Otherwise brilliant and rational people just go all mystical when we start talking about meaningless words like “intelligence” or “consciousness”.

An animal or a human or a big matrix perform X well on Y task. That’s quantitative and objective.

All this “is it smart” bullshit is thinly-veiled “how does the world still rotate around my subjective experience given this thing writes better Tolkien fan fiction than I do”.

Performance on tasks. Everyone else shuffle over to the Philosophy department.