>a 50 billion parameter model will far outperform a 5 billion one trained on the same data. and a 500b one would far outperform that 50 billion one.
I'm not so sure. I'm pretty sure there are diminishing returns at play after some point.
Plus haven't we already seen models with much less billions of parameters perform the same or very close to ChatGPT with had a much higher count (Llama and its siblings)?
>a 50 billion parameter model will far outperform a 5 billion one trained on the same data. and a 500b one would far outperform that 50 billion one.
I'm not so sure. I'm pretty sure there are diminishing returns at play after some point.
We can speculate about just how far this scaling can go or how far is even necessary but all i've said there is true. We have models trained and evaluated on all those sizes.
>Plus haven't we already seen models with much less billions of parameters perform the same or very close to ChatGPT with had a much higher count (Llama and its siblings)?
Only by training on far more data. Llama 13b has to be trained on over 3x more data just to reach the original GPT-3 model from 2020 (not 3.5).
>We can speculate about just how far this scaling can go or how far is even necessary but all i've said there is true. We have models trained and evaluated on all those sizes.
The part about "far outperforming", which is the main claim, is wrong though. We saw models much smaller being developed that fare quite well, and are even competitive, with the larger ones.
You already said "only by training on far more data", which is different than "more parameters" being the only option.
>You already said "only by training on far more data", which is different than "more parameters" being the only option.
I never said more parameters was the only way to increase performance. I said the training data required to reach any arbitrary performance x reduces with parameter size.
It's literally right there in what I wrote.
>a 50 billion parameter model will far outperform a 5 billion one TRAINED ON THE SAME DATA.
I'm not so sure. I'm pretty sure there are diminishing returns at play after some point.
Plus haven't we already seen models with much less billions of parameters perform the same or very close to ChatGPT with had a much higher count (Llama and its siblings)?