Seriously. I actually feel as impressed by the chain of thought, as I was when ChatGPT first came out.
This isn't "just" autocompletion anymore, this is actual step-by-step reasoning full of ideas and dead ends and refinement, just like humans do when solving problems. Even if it is still ultimately being powered by "autocompletion".
But then it makes me wonder about human reasoning, and what if it's similar? Just following basic patterns of "thinking steps" that ultimately aren't any different from "English language grammar steps"?
This is truly making me wonder if LLM's are actually far more powerful than we thought at first, and if it's just a matter of figuring out how to plug them together in the right configurations, like "making them think".
When an AI makes a silly math mistake we say it is bad at math and laugh at how dumb it is. Some people extrapolate this to "they'll never get any better and will always be a dumb toy that gets things wrong". When I forget to carry a 1 when doing a math problem we call it "human error" even if I make that mistake an embarrassing number of times throughout my lifetime.
Do I think LLM's are alive/close to ASI? No. Will they get there? If it's even at all possible - almost certainly one day. Do I think people severely underestimate AI's ability to solve problems while significantly overestimating their own? Absolutely 10,000%.
If there is one thing I've learned from watching the AI discussion over the past 10-20 years its that people have overinflated egos and a crazy amount of hubris.
"Today is the worst that it will ever be." applies to an awful large number of things that people work on creating and improving.
You are just catching up to this idea, probably after hearing 2^n explanations about why we humans are superiors to <<fill in here our latest creation>>.
I'm not the kind of scientist that can say how good an LLM is for human reasoning, but I know that we humans are very incentivized and kind of good at scaling, composing and perfecting things. If there is money to pay for human effort, we will play God no-problem, and maybe outdo the divine. Which makes me wonder, isn't there any other problem in our bucket list to dump ginormous amounts of effort at... maybe something more worth-while than engineering the thing that will replace Homo Sapiens?
Reasoning would imply that it can figure out stuff without being trained in it.
The chain of thought is basically just a more accurate way to map input to output. But its still a map, i.e forward only.
If an LLM coud reason, you should be able to ask it a question about how to make a bicycle frame from scratch with a small home cnc with limited work area and it should be able to iterate on an analysis of the best way to put it together, using internet to look up available parts and make decisions on optimization.
No LLM can do that or even come close, because there are no real feedback loops, because nobody knows how to train a network like that.
It’s like every single sentence you just wrote is incorrect?
1. You’re making up some weird goalposts here of what it means to reason. It’s not reasoning unless it can access the internet to search for parts? No. That has nothing to do with reasoning. You just think it would be cool if it could do that.
2. “Can figure out stuff without being trained on it”
That’s exactly what it’s doing in the cypher example. It wasn’t trained to know that that input meant the corresponding output through the cypher. Emergent reasoning through autocomplete, sure, but that’s still reasoning.
3. “Forward only”. If that was the case then back and forth conversations with the llm would be pointless. It wouldn’t be able to improve upon previous answers it gave you when you give it new details. But that’s not how it works. If tell it one thing, then separately tell it another thing, it can change its original conclusion based on your new input.
4. Even desolate your convoluted test for reasoning, ChatGPT CAN do what you asked… even using the internet to look up parts it can either do out of the box or could do if given a plug-in to allow that.
Ill give you a more formal definition. A model can be said as reasoning, when it can use existing information to figure out new data that has not been in the training set.
Here is a better example - lets say your input is 6 pictures of some object from each of the cardinal viewpoints, and you tell model these are the views and ask it how much it weighs. The model should basically figure out how to create a 3d shape and compute a camera view, and iterate until the camera view matches the pictures, then figure out that the shape can be hollow or solid, and to compute the weight you need the density, and that it should prompt the user for it if it cannot determine the true value for those from the information and its trained dataset.
And it should do it without any specific training that this is the right way to do this, because it should be able to figure out this way through breaking the problem down into abstract representations of sub problems, and then figuring out how to solve those through basic logic, a.k.a reasoning.
What that looks like, I don't know. If I did I would certainly have my own AI company. But i can tell you for certain we are not even close to figuring it out yet, because everyone is still stuck on transformers, like multiplying matricies together is some groundbreaking thing.
In the cypher example, all its doing is basically using a separate model to break a particular model into chain of thought, and prompting that. And there is plenty in the training set of GPT about decrypting cyphers.
>Forward only
What I mean is that when its generate a response, the computation happens on a snapshot from input to output, trying to map a set of tokens, into a set of tokens. Model doesn't operate on a context larger than the window. Humans don't do this. We operate on a large context, with lots of previous information compressed, and furthermore, we don't just compute words, we compute complex abstract ideas that we then can translate into words.
>even using the internet to look up parts it can either do out of the box or could do if given a plug-in to allow that.
So apparently the way to AI is to manually code all the capability into LLMS? Give me a break.
Just like with Chat GPT4, when people were screaming about how its the birth of true AI, give this model a year, it will find some niche use cases (depending on cost), and then nobody is going give a fuck about it, just like nobody is really doing anything groundbreaking with GPT4.
Your conclusion is absurd. If you agree this model is overall an improvement on the prior one, ie performs better on the same tasks and can do tasks the previous one couldn’t, it’s basically a given that it will get more use than GPT4
Better in niche areas doesn't mean its going to get more use.
Everyone was super hyped about all the "cool" stuff that GPT4 could solve, but in the end, you still can't do things like give it a bunch of requirements for a website, let it run, and get a full codebase back, even though that is well within its capabilities. You have to spend time with prompting in to get it to give you what you want, and in a lot of cases, you are better off just typing the code yourself (because you can visualize the entire project in your head and make the right decisions about how to structure stuff), and using it for small code generations.
This model is not going to radically change that. It will be able to give you some answers that you had to specifically manually prompt before automatically, but there is no advanced reasoning going on.
What is “advanced reasoning” and why isn’t this doing it? If you made a chinese room to output coherent chains of reasoning, it would functionally be equally useful to an actual reasoner, with or without the capacity for sentience or whatever.
Basically, if you had a model that could reason, it should be able to figure out new infromation. I.e lets say you map some bytes of the output to an api for creating a tcp socket and communicating over it. The model should be able figure out how to go out on the internet and search for information, all by itself, without any explicit training on how to do that.
So without prior information, it should be able to esentially start out with random sequences in those bytes, and seeing what the output is, then eventually identify and remember patterns that come out. Which means there has to be some internal reward function for someting that differentiates good results from bad results, and some memory that the model has to remember what good results are, and eventually a map of how to get information that it needs (the model would probably stumble across Google or ChatGPT at some point and time after figuring out http protocol, and remember it as a very good way to get info)
Philosophically, I don't even know if this is solvable. It could be that we just through enough compute at all iterations of architectures in some form of genetic algorithm, and one of the results ends up being good.
In artificial intelligence, reasoning is the cognitive process of drawing conclusions, making inferences, and solving problems based on available information. It involves:
Logical Deduction: Applying rules and logic to derive new information from known facts.
Problem-Solving: Breaking down complex problems into smaller, manageable parts.
Generalization: Applying learned knowledge to new, unseen situations.
Abstract Thinking: Understanding concepts that are not tied to specific instances.
AI researchers often distinguish between two types of reasoning:
System 1 Reasoning (Intuitive): Fast, automatic, and subconscious thinking, often based on pattern recognition.
System 2 Reasoning (Analytical): Slow, deliberate, and logical thinking that involves conscious problem-solving steps.
Testing for Reasoning in Models:
To determine if a model exhibits reasoning, AI scientists look for the following:
Novel Problem-Solving: Can the model solve problems it hasn't explicitly been trained on?
Step-by-Step Logical Progression: Does the model follow logical steps to reach a conclusion?
Adaptability: Can the model apply known concepts to new contexts?
Explanation of Thought Process: Does the model provide coherent reasoning for its answers?
Analysis of the Cipher Example:
In the cipher example, the model is presented with an encoded message and an example of how a similar message is decoded. The model's task is to decode the new message using logical reasoning.
Steps Demonstrated by the Model:
Understanding the Task:
The model identifies that it needs to decode a cipher using the example provided.
Analyzing the Example:
It breaks down the given example, noting the lengths of words and potential patterns.
Observes that ciphertext words are twice as long as plaintext words, suggesting a pairing mechanism.
Formulating Hypotheses:
Considers taking every other letter, mapping letters to numbers, and other possible decoding strategies.
Tests different methods to see which one aligns with the example.
Testing and Refining:
Discovers that averaging the numerical values of letter pairs corresponds to the plaintext letters.
Verifies this method with the example to confirm its validity.
Applying the Solution:
Uses the discovered method to decode the new message step by step.
Translates each pair into letters, forming coherent words and sentences.
Drawing Conclusions:
Successfully decodes the message: "THERE ARE THREE R'S IN STRAWBERRY."
Reflects on the correctness and coherence of the decoded message.
Does the Model Exhibit Reasoning?
Based on the definition of reasoning in AI:
Novel Problem-Solving: The model applies a decoding method to a cipher it hasn't seen before.
Logical Progression: It follows a step-by-step process, testing hypotheses and refining its approach.
Adaptability: Transfers the decoding strategy from the example to the new cipher.
Explanation: Provides a detailed chain of thought, explaining each step and decision.
Conclusion:
The model demonstrates reasoning by logically deducing the method to decode the cipher, testing various hypotheses, and applying the successful strategy to solve the problem. It goes beyond mere pattern recognition or retrieval of memorized data; it engages in analytical thinking akin to human problem-solving.
Addressing the Debate:
Against Reasoning (ActorNightly's Perspective):
Argues that reasoning requires figuring out new information without prior training.
Believes that LLMs lack feedback loops and can't perform tasks like optimizing a bicycle frame design without explicit instructions.
For Reasoning (Counterargument):
The model wasn't explicitly trained on this specific cipher but used logical deduction to solve it.
Reasoning doesn't necessitate physical interaction or creating entirely new knowledge domains but involves applying existing knowledge to new problems.
Artificial Intelligence Perspective:
AI researchers recognize that while LLMs are fundamentally statistical models trained on large datasets, they can exhibit emergent reasoning behaviors. When models like GPT-4 use chain-of-thought prompting to solve problems step by step, they display characteristics of System 2 reasoning.
Final Thoughts:
The model's approach in the cipher example aligns with the AI definition of reasoning. It showcases the ability to:
Analyze and understand new problems.
Employ logical methods to reach conclusions.
Adapt learned concepts to novel situations.
Therefore, in the context of the cipher example and according to AI principles, the model is indeed exhibiting reasoning.
>What I mean is that when its generate a response, the computation happens on a snapshot from input to output, trying to map a set of tokens, into a set of tokens. Model doesn't operate on a context larger than the window
The weights in the model have the larger context, the context length size of data is just the input, which then gets multiplied by those weights, to get the output.
For your “better example”, it can literally already do this. I just tested this with 4o and it worked great (and I’ll say more accurately than a human would estimate most likely). I used 4o because it appears that the chain of thought models don’t accept image input yet.
I don’t want to post identifiable information so I will avoid linking to the convo or posting screenshots but you can try it yourself. I took 5 pictures of a child’s magnetic tile sitting on the floor and here is the output:
Me: (5 pictures attached)
Me: Estimate how much this weighs.
ChatGPT 4o:
From the images, it appears that this is a small, plastic, transparent, square object, possibly a piece from a magnetic tile building set (often used in educational toys). Based on the size and material, I estimate this piece to weigh approximately 10 to 20 grams (0.35 to 0.7 ounces). If it's part of a toy set like Magna-Tiles, the weight would be on the lower end of that range.
But for some reason I have a feeling this isn’t going to be good enough for you and the goalposts are about to be pushed back even farther.
“In the cypher example, all it’s doing is basically using a separate model to break a particular model into chain of thought, and prompting that. And there is plenty in the training set of GPT about decrypting cyphers.”
I’m sorry, but are you suggesting that applying a previously learned thought process to new variables isn’t reasoning? Does your definition of reasoning now mean that it’s only reasoning if you are designing a new-to-you chain of thought? As in, for deciphering coded messages, you’re saying that it’s only “reasoning” if it’s creating net new decoding methodologies? That’s such an absurd goalpost.
You wouldn’t have the same goalposts for humans. All of your examples I bet the average human would fail at btw. Though that may just be because the average human is bad at reasoning haha.
I didn't ask for an estimation, I asked for the exact weight. A human can do this given the process I described.
If the chain of thought was accurate, then it would be able to give you an internemdiate output of the shape in some 3d format spec. But nowhere in the model does that data exist, because its not doing any reasoning, it just still all statistically best answers.
I mean sure, you could train a model on how to create 3d shapes out of pictures, but again, thats not reasoning.
I don't get why people are so attached to these things being intelligent. We all agree that they are usefull. Like it shouldn't matter if its not intelligent to you or anyone else.
I think you need to re-calibrate your expectations... I'm not saying this is a solved problem by any means, but I just tried this out with Claude Sonnet 3.5, and these instructions seem quite reasonable and detailed to me (about what I might expect if I spoke to a human expert and they tried to explain the steps to me over the telephone, for example). Does this mean this LLM is able to "reason"? I don't know that I would make THAT bold of a claim, but I think your example is not sufficient to demonstrate something that LLMs are fundamentally incapable of... in other words, the distance between "normal LLM statistical tricks" vs "reasoning" keeps getting smaller and smaller.
---
My base prompt:
> Here is a hypothetical scenario, that I would like your help with: imagine you are trying to help a person create a bicycle frame, using their home workshop which includes a CNC machine, commonly available tools, a reasonable supply of raw metal and hardware, etc. Please provide a written set of instructions, that you would give to this person so that they can complete this task.
And all I'm saying is that, you probably need a different example of what "reasoning" is, because the one you gave is something that Claude is seemingly able to do.
Even long term learning it does to some extent. Admittedly I’m not very familiar with what it’s doing, but it does create “memories” which appear to be personal details that it deems might be relevant in the future. Then I assume it uses some type of RAG to apply previously learned memories to future conversations.
This makes me wonder if there is or could be some type of RAG for chains of thought…
The mechanism is that there is an additional model that basically outputs chain of thought for a particular problem, then runs the chain of thought through the core LLM. This is no different from just a complex forward map lookup.
I mean, its incredibly useful, but its still just information search.
I think it's similar, although I think it would be more similar if the LLM did the steps in lower layers (not in English), and instead of the end being fed to the start, there would be a big mess of cycles throughout the neural net.
That could be more efficient since the cycles are much smaller, but harder to train.
It doesn't do the 'thinking' in English (inference is just math), but it does now verbalize intermediate thoughts in English (or whatever the input language is, presumably), just like humans tend to do.
that's my assessment too. there's even a phenomenon I've observed both in others and myself, when thrust into a new field and given a task to complete, we do it to the best of our ability, which is often sod all. so we ape the things we've heard others say, roughly following the right chain of reasoning by luck, and then suddenly say something that in hind sight, with proper training, we realise was incredibly stupid. we autocomplete and then update with rlhf.
we also have a ton of heuristics that trigger a closer look and loading of specific formal reasoning, but by and large, most of our thought process is just auto complete.
Yeah, humans are very similar. We have intuitive immediate-next-step-suggestions, and then we apply these intuitive next steps, until we find that it lead to a dead end, and then we backtrack.
I always say, the way we used LLMs (so far) is basically like having a human write text only on gut reactions, and without backspace key.
An exception i came up with was from a documentary on einstein that described how he did his thought experiments. He would, of course, imagine novel scenarios in his head, which led him to the insights he could rephrase into language. I worry language models will still lack that capacity for insights driven by imagination.
This isn't "just" autocompletion anymore, this is actual step-by-step reasoning full of ideas and dead ends and refinement, just like humans do when solving problems. Even if it is still ultimately being powered by "autocompletion".
But then it makes me wonder about human reasoning, and what if it's similar? Just following basic patterns of "thinking steps" that ultimately aren't any different from "English language grammar steps"?
This is truly making me wonder if LLM's are actually far more powerful than we thought at first, and if it's just a matter of figuring out how to plug them together in the right configurations, like "making them think".