Firstly no, the gap between 3 and 4 is not anything as large as the gap between ...

og_kalu · on Sept 25, 2023

>the gap between 3 and 4 is not anything as large as the gap between 2 and 3.

We'll just have to agree to disagree. 3 was a signal of things to come but it was ultimately a bit of a toy, a research curiosity. Utility wise, they are worlds apart.

>if there really was no innovation at all as you claim, then there will be a limit hit at computing capability and cost.

computing capability and cost are just about the one thing you can bank on to reduce. already training gpt-4 today would be a fraction of the cost than it was when open ai did it and that was just over a year ago.

Today's GPU's take ML into account to some degree but they are nowhere near as calibrated for it as they could be. That work has just begun to start.

Of any of the possible barriers, compute is exactly the kind you want. It will fall.

WendyTheWillow · on Sept 25, 2023

Do you realize I'm not disagreeing with you about the difference between 3 and 4? Reread what I wrote. I contrasted 3 and 4 with 2 and 3, which you seem to be entirely ignoring. 3 and 4 could be worlds apart, but wouldn't matter if 2 and 3 were two worlds apart, for example.

And it is not true that computing power will continue to reduce; Moore's Law has been dead for some time now, and if incremental growth in LLMs require exponential growth in computing power the marginal difference won't matter. You would need a matching exponential growth in processing capability which is most certainly not occurring. So compute will not fall at the rate you would need it to for LLMs to actually compete in any meaningful way with human software engineers.

We are not guaranteed to continue to progress in anything just because we have in the past.

og_kalu · on Sept 25, 2023

>And it is not true that computing power will continue to reduce; Moore's Law has been dead for some time now, and if incremental growth in LLMs require exponential growth in computing power the marginal difference won't matter.

This is a lot of unfounded assumptions.

You don't need Moore's Law. GPU's are not really made with ML training in mind. You don't need exponential growth for anything. The money Open ai spent on GPT-4 a year ago could train a model twice as large today. and that amount is a drop in the bucket for the R&D of large corporations. Microsoft gave open ai 10B. amazon gave anthropic 4B

>So compute will not fall at the rate you would need it to for LLMs to actually compete in any meaningful way with human software engineers.

I don't think the compute reuired is anywhere near as much as you think it is.

https://arxiv.org/abs/2309.12499

>We are not guaranteed to continue to progress in anything just because we have in the past.

Nothing is guaranteed. But the scaling plots show no indication of a slow down so it's up to you to provide a concrete reason this object in motion is going to stop immediately and conveniently right now. If all you have is "well it just can't keep getting better right" then visit the 2 and 3 threads to see how meaningless such unfounded assertions are.

WendyTheWillow · on Sept 25, 2023

I think you fundamentally don't understand the nature of exponential growth, and the power of diminishing returns. Even if you double the GPU capacity over the next year, you won't even remotely begin to come close enough to producing a step-level growth of capability such as what we experienced between 2 to 3, or even 3 to 4. The LLM concept can only take you so far, and we're approaching the limits of what an LLM is capable of. You generally can't just push an innovation infinitely, it will have a drop-off point somewhere.

the "Large" part of LLMs is probably done. We've gotten as far as we can with those style of models, and the next innovation will be in smaller, more targeted models.

> As costs have skyrocketed while benefits have leveled off, the economics of scale have turned against ever-larger models. Progress will instead come from improving model architectures, enhancing data efficiency, and advancing algorithmic techniques beyond copy-paste scale. The era of unlimited data, computing and model size that remade AI over the past decade is finally drawing to a close. [0]

> Altman, who was interviewed over Zoom at the Imagination in Action event at MIT yesterday, believes we are approaching the limits of LLM size for size’s sake. “I think we’re at the end of the era where it’s gonna be these giant models, and we’ll make them better in other ways,” Altman said. [1]

[0] https://venturebeat.com/ai/openai-chief-says-age-of-giant-ai...

[1] https://techcrunch.com/2023/04/14/sam-altman-size-of-llms-wo...

og_kalu · on Sept 25, 2023

>The LLM concept can only take you so far, and we're approaching the limits of what an LLM is capable of.

You don't know that. This is literally just an assertion. An unfounded one at that.

If you couldn't predict how far in 2017 the LLM concept would take us today, then you definitely have no idea how far it could actually go.

>believes we are approaching the limits of LLM size for size’s sake

Nothing to do with thinking they wouldn't improve from scale.

https://web.archive.org/web/20230531203946/https://humanloop...

An interview from Altman later clarifying.

"6. The scaling laws still hold Recently many articles have claimed that “the age of giant AI Models is already over”. This wasn’t an accurate representation of what was meant.

OpenAI’s internal data suggests the scaling laws for model performance continue to hold and making models larger will continue to yield performance. The rate of scaling can’t be maintained because OpenAI had made models millions of times bigger in just a few years and doing that going forward won’t be sustainable. That doesn’t mean that OpenAI won't continue to try to make the models bigger, it just means they will likely double or triple in size each year rather than increasing by many orders of magnitude"

Yes there are economic compute walls. But that's the kind of problem you want, not "innovation".

WendyTheWillow · on Sept 25, 2023

Er, that's not how arguments work. What we can't know is that those trends will continue, so it's on you to demonstrate that they will, despite evidence suggesting they won't.

As for as what you linked, Altman is saying the same thing I'm saying:

> That doesn’t mean that OpenAI won't continue to try to make the models bigger, it just means they will likely double or triple in size each year rather than increasing by many orders of magnitude.

This is exactly my point; doubling or tripling of the size will be possible, but it won't result in a doubling of performance. We won't see a GPT 5 that's twice as good as GPT 4, for example. The jump from 2 to 3 was exponential. The jump from 3 to 4 was also exponential, though not as much. The jump from 4 to 5 will follow that curve, according to Altman, which means exactly what he said in my quote; the value will continue to decrease. For a 2 to 3 type jump, GPU technology would have to completely transform in capability, which there are no indications that we've found that innovation.

og_kalu · on Sept 25, 2023

My argument was that improvement from scale would continue. There is absolutely evidence suggesting this.

Gpt-4 can perform nearly all tasks you throw at it with well above average human performance. There literally isn't any testable definition of intelligence it fails that a big chunks of humans wouldn't also fail. You seem to keep missing the fact that We do not need an exponential improvement from 4.

WendyTheWillow · on Sept 25, 2023

> Gpt-4 can perform nearly all tasks you throw at it with well above average human performance.

It can't even generate flashcards from a textbook chapter, because it can't load the entire chapter into memory. Heck, it doesn't even know what textbook I'm talking about; I have to provide the content!

It fails constantly at real world coding problems, and often does so silently. If you tried to replace a software developer with GPT 4, you would be left with a gaping productivity hole where that developer you replaced once existed. The improvement GPT 5 would have to provide is multiple orders of magnitude in order for this to be a realistic proposition.

I use it daily and know better than to trust its output.

og_kalu · on Sept 26, 2023

>It can't even generate flashcards from a textbook chapter, because it can't load the entire chapter into memory. Heck, it doesn't even know what textbook I'm talking about; I have to provide the content!

Okay...? That's a context window problem. and you could manage it if you sent the textbook in chunks.

>The improvement GPT 5 would have to provide is multiple orders of magnitude in order for this to be a realistic proposition.

No..it wouldn't

https://arxiv.org/abs/2309.12499

WendyTheWillow · on Sept 26, 2023

So by your own words, in order to use the LLM usefully, I need to manually manage it? Do you know what I don’t have to manually manage? A person.

I can feed a person a broad, complex or even under formed idea and they can actively troubleshoot until the problem is resolved, further monitoring and tweaking their solution so the problem remains resolved. LLMs can’t even come close to doing that.

You’re proving my point for me; it’s a tool, not a developer. Zero jobs are at risk.

Also not for nothing, but no, sending the textbook in chunks doesn’t work as the LLM can’t then synthesize complex ideas that span the entire chapter. You have to compose a set of notes first, then feed it the notes, and even then the resulting flashcards are meaningfully worse than what I could come up with myself.

og_kalu · on Sept 25, 2023

Moreover you keep saying we can't scale infinitely. Sure...but nobody is saying we have to. 4 is not as scaled from 3 as 3 was from 2. Doesn't matter, still massive gap.

WendyTheWillow · on Sept 25, 2023

As I said already, the gap from 3 to 4 was substantially smaller than the gap between 2 to 3, and all indications are that the gap from 4 to 5 will also be further smaller than that.

Karrot_Kream · on Sept 25, 2023

> And it is not true that computing power will continue to reduce; Moore's Law has been dead for some time now, and if incremental growth in LLMs require exponential growth in computing power the marginal difference won't matter.

I think the stronger argument here won't necessarily be Moore's Law related but a change in architecture. Things like Apple's Neural Engine, Google's TPMs, or Geohot's Tinybox. In Intel's Tick-Tock model, this is the Tock for the previous Tick of larger datasets so to speak.

(Note: I don't necessarily agree, just trying to make a stronger argument than just invoking Moore's Law.)