>But increasing the model size alone doesn't seem to make sense to anyone for so...

littlestymaar · on May 16, 2023

> Literally this is what I said

You also literally said that:

> We already know that the higher the parameter count, the lower the training data required

And if you scroll up a bit, you'll see that this was the assertion that I've been questioning since the beginning.

Also, even this other assertion

> a 50 billion parameter model will far outperform a 5 billion one TRAINED ON THE SAME DATA.

is unsupported in the general case: will it be the case if both were trained on 10b Token? They'll both be fairly under-trained, but I suspect the performance of the biggest model would suffer more than the small one.

AFAIK, there's no reason to believe that the current architecture of LLM scaled to 100 trillions of parameters would be able to be trained efficiently on just a few millions of token like humans, and the paper you quoted sure isn't backing this original argument of yours.

og_kalu · on May 16, 2023

>You also literally said that:

> We already know that the higher the parameter count, the lower the training data required

>And if you scroll up a bit, you'll see that this was the assertion that I've been questioning since the beginning.

They follow each other. If you have a target in mind, it's the same thing in different words.

>AFAIK, there's no reason to believe that the current architecture of LLM scaled to 100 trillions of parameters would be able to be trained efficiently on just a few millions of token like humans

I didn't say it was a given. And in my original comment , I say as much.

Also Object recognition leads to abstraction. Motion perception to causality. Proprioception is a big part of human reasoning. We're not trained on only millions of tokens. And our objective function(s) are different.

Humans would not in fact outperform Language models on what they are actually trained to do. https://arxiv.org/abs/2212.11281