Hacker News new | past | comments | ask | show | jobs | submit login

>But increasing the model size alone doesn't seem to make sense to anyone for some reason.

It's not Economically viable or efficient to just scale model size.

>This has nothing to do with your claim that “We already know that the higher the parameter count, the lower the training data required”. To back such a claim we'd need a 540b model trained on 10b token beating / rivaling with a 8b parameters trained on 400b. I'm not aware of anything like this existing today.

Literally this is what I said

>a 50 billion parameter model will far outperform a 5 billion one TRAINED ON THE SAME DATA.

A 400b dataset is not the same training data as a 10b dataset




> Literally this is what I said

You also literally said that:

> We already know that the higher the parameter count, the lower the training data required

And if you scroll up a bit, you'll see that this was the assertion that I've been questioning since the beginning.

Also, even this other assertion

> a 50 billion parameter model will far outperform a 5 billion one TRAINED ON THE SAME DATA.

is unsupported in the general case: will it be the case if both were trained on 10b Token? They'll both be fairly under-trained, but I suspect the performance of the biggest model would suffer more than the small one.

AFAIK, there's no reason to believe that the current architecture of LLM scaled to 100 trillions of parameters would be able to be trained efficiently on just a few millions of token like humans, and the paper you quoted sure isn't backing this original argument of yours.


>You also literally said that:

> We already know that the higher the parameter count, the lower the training data required

>And if you scroll up a bit, you'll see that this was the assertion that I've been questioning since the beginning.

They follow each other. If you have a target in mind, it's the same thing in different words.

>AFAIK, there's no reason to believe that the current architecture of LLM scaled to 100 trillions of parameters would be able to be trained efficiently on just a few millions of token like humans

I didn't say it was a given. And in my original comment , I say as much.

Also Object recognition leads to abstraction. Motion perception to causality. Proprioception is a big part of human reasoning. We're not trained on only millions of tokens. And our objective function(s) are different.

Humans would not in fact outperform Language models on what they are actually trained to do. https://arxiv.org/abs/2212.11281




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: