Hacker News new | past | comments | ask | show | jobs | submit login

You run into the typical neural net problem with this logic. OpenAI (or at least Sam Altman) have already publicly acknowledged that the diminishing returns they're seeing in terms of model size are sufficient to effectively declare that 'the age of giant models is already over.' [1] It seems many people were unaware of his comments on this topic.

Neural networks in literally every other field always repeat the exact same pattern. You can get from 0-80 without breaking a sweat. 80-90 is dramatically harder but you finally get there. So everybody imagines getting from 90-100 will be little more than a matter of a bit more compute and a bit more massaging of the model. But it turns out that each fraction of a percent progress you make starts becoming exponentially more difficult - and you eventually run into an asymptote that's nowhere near what you are aiming for.

A prediction based on the typical history of neural nets would be that OpenAI will be able to continue to make progress on extremely specific metrics, like scoring well on some test or another, largely by hardcoding case-specific workarounds and tweaks. But in terms of general model usage, we're unlikely to see any real revolutionary leaps in the foreseeable future.

If we see model accuracy increase I'd expect it to be thanks not to model improvement, but instead by doing something like adding a second layer where the software cross references the generated output against a 'fact database' and regenerates its answer when some correlation factor is insufficiently high. Of course that'd completely cripple the model's ability to ever move 'beyond' its training. It'd be like if mankind was forced to double check that any response on astronomy we made confirmed that the Earth is indeed the center of the universe, with no ability to ever change that ourselves.

[1] - https://www.wired.com/story/openai-ceo-sam-altman-the-age-of...




There is an argument that Altman's statement is just trying to distract competitors from outspending OpenAI. Prior to GPT-4 there was no indications that there are diminishing returns (at least on a log scale).

The tremendous progress over the last year makes me vary of your statement that progress will stop coming from model size improvements.


>There is an argument that Altman's statement is just trying to distract competitors from outspending OpenAI

As if competitors, say Google, will take a competitor at his words and say "damn, let's scrap the expansion plans, then"?

That argument sounds highly implausible.

>The tremendous progress over the last year makes me vary of your statement that progress will stop coming from model size improvements.

Isn't "tremendous progress" before the dead-end always the case with diminishing returns and low hanging fruits?


I don't think it is implausible. If engineers come to management at Google and ask for 4 bn to do a moonshoot 6 month AI training run, then such a smoke screen statement can be highly effective. Even if they delay their plans for 4 weeks to evaluate the scaling first, it is another 4 weeks headstart for OpenAI.

Also not everyone can bring 500m and more to the table to train a big model in the first place.

> tremendous progress

There are things which just seem to scale and others which don't. So far it seems that adding more data and more compute don't seem to flatten out that much.

At least we should give it another year to see where it leads us.


>Sam Altman) have already publicly acknowledged that the diminishing returns they're seeing in terms of model size are sufficient to effectively declare that 'the age of giant models is already over.'

He never said anything about technical diminishing returns. He's saying we're hitting a wall economically.

The Chief Scientist at Open AI thinks there's plenty of ability left to squeeze out.


You can see his comments, in context, here: https://youtu.be/T5cPoNwO7II?t=356

Economics was not hinted or implied in any way. Diminishing returns on model size doesn't mean there's nothing left to squeeze out, it just means that what gains are made are going to be in model refinement, rather than going the NVidia vision of a quadrillion weight system and expecting large, or even linear, gains from that hop up in model size.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: