That is a terrible headline. It implies that the _abilities_ are a mirage, but it's actually the "emergent" part that might be a mirage -- which is to say it's not an unpredictable "phase transition" but gradual and predictable improvement in ability as the models scale.
Not only that, but what's the point in evaluating a language model on its ability to do arithmetic? It's like criticizing a talking dog because the C++ code it wrote was full of buffer-overflow bugs.
In reality, if you ask a good LLM to solve a math-related problem, it will write and run a Python program to return the answer. Sometimes this even works. Sometimes it returns garbage. And sometimes it realizes that its own answer isn't realistic and tries a different approach. Sometimes people claim this isn't a valid manifestation of intelligence.
Completely pointless study, unworthy of Wired or HN.
This is a study of "emergent" properties of LLMs — whether something unexpected shows up (i.e. imagine that talking dog suddenly becoming great at pointer arithmetic and never using-after-free).
It was noticed that LLMs can do some arithmetic, but we are yet uncertain how much and how it happens exactly.
Arithmetic is treated as a proxy for general reasoning abilities.
Which is stupid, because it's not. A pocket calculator can perform arithmetic, as can a Python program, or for that matter a Japanese cormorant, who counts the fish it helps you catch. None of those are considered capable of "reasoning" on their own.
Meanwhile, GPT4 will cheerfully write a program to carry out the required arithmetic operations (and then some.) A study that doesn't acknowledge that is worthless at best.
> Meanwhile, GPT4 will cheerfully write a program to carry out the required arithmetic operations (and then some.) A study that doesn't acknowledge that is worthless at best.
We don't want to see if the LLM can be used as a tool to do arithmetics, but whether it can learn complex data relationships like arithmetics. Arithmetics is a stepping stone, not a goal so that the model can solve it by invoking a calculator isn't relevant. The problems we want it to solve doesn't have tools like calculators so it doesn't help getting us there.
Exactly, because as Godel showed, aritmhmetic can in principle model mnany other formal systems, so if you can see arithmetic examples and generalize to the full set of rules and learn to apply them consistently, then you have a powerful reasoning tool.
These models are clearly capable of doing this. There is no theoretical reason why you should expect them to to fail at this. One day they will be able to do this perfectly and nobody gets the silly idea of generating a program to do it anymore. There is no need for another bitter lesson where "clever" AI researchers and engineers waste their career adding a hundred different workarounds to these minor problems.
I don't know... the ability to write code to solve an otherwise ill-suited problem seems pretty general to me. It seems like a big step in a concrete direction, as opposed to a lot of Goedelian navel-gazing about arithmetic and Peano axioms and whatnot.
Agreed that generalized architectures will ultimately win out over hand-tweaked ones. But the patent wars that will eventually be fought over this stuff are where the real bitter lessons will come into play. At some point, we'll be forced back into the hand-optimization business because someone like OpenAI (or another Microsoft proxy) will have locked down a lot of powerful basic techniques.
And “emergent” in this context usually means “behaviour resulting from the interaction of a large number of individually simple units”, not “suddenly appearing.”