It's a Tokyo based company, Sakana means fish in Japanese, and the icon of the company is a bunch of fish. What's up with the Danger AI Labs Hebrew translation in the tweet and article?
funny paper, I still don't know what was the goal of it. It is evident to anyone that LLM can't perform any meaningful reasoning, why even bothering in building such an infrastructure to test whether it is able to become a "scientist".
They do a phenomenal job of guessing the next word, and our language is redundant enough that that alone, carried out recursively, can produce quite interesting results. But reasoning? I'm certain everybody has gotten in this pattern, because it happens on pretty much anything where the LLM doesn't answer right on the first shot:
---
LLM: The answer is A.
Me: That's wrong. Try again.
LLM: Oh I'm sorry, you're completely right. The answer is B.
Me: That's wrong. Try again.
LLM: Oh I'm sorry, you're completely right. The answer is A.
Me: Time to short NVDA.
LLM: As an AI language learning model without real-time market data or the ability to predict future stock movements, I can't advise on whether it's an appropriate time to short NVIDIA or any other stock.
Yeah, if an LLM was truly capable of reasoning, then whenever it makes a mistake, e.g. due to randomness or due to lack of knowledge, then pointing out the mistakes and giving steps on correcting the mistakes should result in basically a 100% success rate, since the assistant has infinite capacity to accommodate the LLM's weaknesses.
When you look at things like https://arxiv.org/abs/2408.06195 you notice that the amount of tokens needed to solve trivial tasks is somewhat ridiculous. On the order of 300k tokens for a simple grade school problem. That is roughly three hours at a rate of 30 token/s. You could fill 400 pages of a book with that many tokens.
I think it depends on your standards. LLMs are by far the best general purpose artificial reasoning system we've made yet, but also they aren't really very good at it. Especially more complex steps and things that require rigor (chain-of-thought prompting and such helps, but still, they have super-human knowledge but the reasoning skills of maybe a young child)
> super-human knowledge but the reasoning skills of maybe a young child
Super-human knowledge is certainly true (all of wikipedia in multiple languages at all times, quickly)
Consider however, an important distinction. A young child is really, exactly not the way to think of these machines and their outputs. The implicit connection there is that there is some human-like progress to more capability.. not so.
Also note that "chain of reasoning" around 2019 or so, was exactly the emergent behavior that convinced many scientists that there was more going on that just a "stochastic response" machine. Some leading LLMs do have the ability to solve multi-step puzzles, against the expectations of many.
My "gut feeling" is that human intelligence is multi-layered and not understood; very flexible and connected in unexpected ways to others and the living world. These machines are not human brains at all. General Artificial Intelligence is not defined, and many have reasons to spin the topic in public media. Let's use good science skills while forming public opinion on these powerful and highly-hyped machines.
Whenever you poke people about LLMs solving decidable/computable problems, they get defensive and claim that they are not good for that. You are supposed to generate code that solves the decidable problem, heavily implying that retrieval, approximation and translation are the only true capabilities of LLMs.
Empirically while you wait for yet another print on arxiv, I use the Sonnet 3.5 API every day to reason problems, iterate ideas and write highly optimised python and things like CUDA kernals. There is some degree of branching, guiding and correction; but oh boy there is certainly higher order synthesis and causal reasoning going on and its certainly not all from me.