>You also literally said that: > We already know that the higher the parameter c...

>You also literally said that:

> We already know that the higher the parameter count, the lower the training data required

>And if you scroll up a bit, you'll see that this was the assertion that I've been questioning since the beginning.

They follow each other. If you have a target in mind, it's the same thing in different words.

>AFAIK, there's no reason to believe that the current architecture of LLM scaled to 100 trillions of parameters would be able to be trained efficiently on just a few millions of token like humans

I didn't say it was a given. And in my original comment , I say as much.

Also Object recognition leads to abstraction. Motion perception to causality. Proprioception is a big part of human reasoning. We're not trained on only millions of tokens. And our objective function(s) are different.

Humans would not in fact outperform Language models on what they are actually trained to do. https://arxiv.org/abs/2212.11281