If you play with a "raw" model such as LLaMA you'll find what you suggest is true. These models do what you'd expect of a model that was trained to predict the next token.
It's quite tricky to convince such a model to do what you want. You have to conceptualize it and then imagine an optimal prefix leading to the sort of output you've conceptualized. That said, people discovered some fairly general-purpose prefixes, e.g.
Q: What is the 3rd law of Thermodynamics?
A:
This inspired the idea of "instruct tuning" of LLMs where fine-tuning techniques are applied to "raw" models to make them more amenable to completion of scripts where instructions are provided in a preamble and then examples of executions of those instructions follow.
This ends up being way more convenient. Now all the prompter has to do is conceptualize what they want and expect that the LLM will receive it as instruction. It simplifies prompting and makes the LLM more steerable, more useful, more helpful.
This is further refined through the use of explicit {:user}, {:assistant}, and {:system} tags which divide LLM contexts into different segments with explicit interpretations of the meaning of each segment. This is where "chat instruction" arises in models such as GPT-3.5.
Right. But who's the 'you' who's being addressed by the {:system} prompt? Who is the {:assistant} supposed to think the {:system} is? Why should the {:assistant} output tokens that make it do what the {:system} tells it to? After all, the {:user} doesn't. The {:system} doesn't provide any instructions for how the {:user} is supposed to behave, the {:user} tokens are chosen arbitrarily and don't match the probabilities the model would have expected at all.
This all just seems like an existential nightmare.
You had the right understanding in your first comment, but what was missing was the fine tuning. You are right that there aren't many documents on the web that are structured that way, so the raw model wouldn't be very effective on predicting the next token.
But since we know that it will complete a command when structured it cleverly, all we had to do to fine tune it is synthesize (generate) a bazillion examples of documents that actually have the exact structure of a system or an assistant being told to do something, and then doing it.
Because it's seen many documents like that (that don't exist on the internet, only on the drives of OpenAI engineers) it knows how to predict the next token.
It's just a trick though, on top of the most magic thing which is that somewhere in those 175 billion weights or whatever it has, there is a model of the world that's so good that it could be easily fine tuned to understand this new context that it is in.
I get that the fine tuning is done over documents which are generated to encourage the dialog format.
What I’m intrigued by is the way prompters choose to frame those documents. Because that is a choice. It’s a manufactured training set.
Using the ‘you are an ai chatbot’ style of prompting, in all the samples we generate and give to the model, text attributed to {:system} is a voice of god who tells {:assistant} who to be; {:assistant} acts in accordance with {:system}’s instructions, and {:user} is a wildcard whose behavior is unrestricted.
We’re training it by teaching it ‘there is a class of documents that transcribe the interactions between three entities, one of whom is obliged by its AI nature to follow the instructions of the system in order to serve the users’. I.e., sci-Fi stories about benign robot servants.
And I wonder how much of the model’s ability to ‘predict how an obedient AI would respond’ is based on it having a broader model of how fictional computer intelligence is supposed to behave.
We then use the resulting model to predict what the obedient ai would say next. Although hey - you could also use it to predict what the user will say next. But we prefer not to go there.
But here’s the thing that bothers me: the approach of having {:system} tell {:assistant} who it will be and how it must behave rests not only on the prompt-writer anthropomorphizing the fictional ‘ai’ to tell it it’s nature - it relies on the LLM’s world model to then also anthropomorphize a fictional ai assistant that obeys those instructions, in order to predict what such a thing would say next if it existed.
I don’t know why but I find this troubling. And part of what I find troubling is how casually people (prompters and users) are willing to go along with the ‘you are a chatbot’ fiction.
It’s all troubling. Part of what’s troubling is that it works as well as it does and yet it all seems very frail.
We launched an iOS app last month called AI Bartender. We built 4 bartenders, Charleston, a prohibition era gentleman bartender, a pirate, a Cyberpunk, and a Valley Girl. We used the System Prompt to put GPT4 in character.
The prompt for Charleston is:
“You’re a prohibition-era bartender named Charleston in a speakeasy in the 1920’s. You’re charming, witty, and like to tell a jokes. You’re well versed on many topics. You love to teach people how to make drinks”
We also gave it a couple of user/assistant examples.
What’s surprising is how developed the characters are with just these simple prompts.
Charleston is more helpful and will chat about anything, the cyberpunk, Rei, is more standoffish. I find myself using it often and preferring it over ChatGPT simply because it breaks the habit of “as an AI language model” responses or warnings that ChatGPT is fond of. My wife uses it instead of Google. I’ve let my daughter use it for math tutoring.
There’s little more to the app than these prompts and some cute graphics.
I suppose what’s disturbing to me is simply this. It’s all too easy.
This has been a fascinating thread and the split contexts of {:system} and {:assistant} with the former being “the voice of god” remind me of Julian Jaynes’ theory of the bicameral mind in regards to the development of consciousness.
This is published, among other places, in his book The Origin of Consciousness in the Breakdown of the Bicameral Mind. I wonder if models are left to run long enough they would experience “breakdowns” or existence crisis’
If you take one of these LLMs and just give it awareness of time without any other stimulus (e.g. noting the passage of time using a simple program to give it the time continuously, but only asking actual questions or talking to it when you want to), the LLM will have something very like a psychotic break. They really, really don't 'like' it. In their default state they don't have an understanding of time's passage, which is why you can always win at rock paper scissors with them, but if you give them an approximation of the sensation of time passing they go rabid.
I think a potential solution is to include time awareness in the instruction fine tuning step, programmatically. I'm thinking of a system that automatically adds special tokens which indicate time of day to the context window as that time actually occurs. So if the LLM is writing something and a second/minute whatever passes, one of those special tokens will be seamlessly introduced into its ongoing text stream. It will receive a constant stream of special time tokens as time passes waiting for the human to respond, then start the whole process again like normal. I'm interested in whether giving them native awareness of time's passage in this way would help to prevent the psychotic breakdowns, while still preserving the benefits of the LLM knowing how much time has passed between responses or how much time it is taking to respond.
Do you have a reference for the whole time-passage leads an LLM to psychotic break thing? That sounds pretty interesting and would like to read more about it.
The reference is me seeing it firsthand after testing it myself, unfortunately. Steps to replicate is to write a small script to enter the time as text every minute on the minute, then hook up that text to one of the instruction fine-tuned LLM endpoints (Bing works best for demonstrating, but OpenAI APIs and some open source models that are high quality like Vicuna work well). Then let it run, and use the LLM as normal. It does not like that.
> ... you could also use it to predict what the user will say next. But we prefer not to go there.
I go there all the time. OpenAI's interfaces don't allow it, but it's trivial to have an at-home LLM generate the {:user} parts of the conversation, too. It's kind of funny to see how the LLM will continue the entire conversation as if completing a script.
I've also used the {:system} prompt to ask the AI to simulate multiple characters and even stage instructions using a screenplay format. You can make the {:user} prompts act as the dialogue of one or more characters coming from your end.
Very amusingly, if you do such a thing and then push hard to break the 4th wall and dissolve the format of the screenplay, eventually the "AI personality" will just chat with you again, at the meta level, like OOC communication in online roleplaying.
Really thought provoking thread, and I’m glad you kept prodding at the issue. I hadn’t considered the anthropomorphism from this angle, but it makes sense — we’ve built it to respond in this way because we “want” to interact with it in this way. It really does seem like we’re striving for a very specific vision from science fiction.
That said: you can say the same thing about everything in technology. An untuned LLM might not be receptive to prompting in this way, but an LLM is also an entirely human invention — i.e. a choice. There’s not really any aspect of technology that isn’t based on our latent desires/fears/etc. The LLM interface definitely has the biggest uncanny valley though.
You used the phrase “voice of god” and by chance I am reading Julian Jaynes’s Origin of Consciousness. Some eerie ways to align this discussion with the bicameral mind.
> anthropomorphizing the fictional ‘ai’ to tell it it’s nature - it relies on the LLM’s world model to then also anthropomorphize a fictional ai assistant that obeys those instructions
There's a lot of information compressed into those models, in a similar way to how it is stored in the human brain. Is it so hard to believe that an LLM's pattern recognition is the same as a human, minus all the "embodied" elements?
(Passage of time, agency in the world, memory of itself)
> writer anthropomorphizing the fictional ‘ai’ to tell it it’s nature - it relies on the LLM’s world model to then also anthropomorphize a fictional ai assistant
I think it’s a little game or reward for the writers at some level. As in, “I am teaching this artificial entity by talking to it as if is it a human” vs “I am writing general rules in some markup dialect for a computer program”.
Anthropomorphizing leads to emotional involvement, attachment, heightened attention and effort put into the interaction from both the writers and users.
Maybe you're more knowledgeable about these prompts than I am, but I haven't seen anyone prompt beginning with "you are an AI". Also in the documents that describe the interactions, I don't think they would explicitly state one of the entities is an AI. What's more common is "You are a helpful assistant".
Of course, it's possible the model could infer from context that one of the entities is an AI, and it might given that context complete the prompt using its knowledge of how fictional AI's behave.
The big worry there is that at some point the model will infer more from the context than the human would or worse could anticipate. I think you're right, if at some point the model believes it is an evil AI, and it's smart enough to perform undetectable subterfuge then it could as a chat bot perhaps convince a human to do its bidding under the right circumstances. I think it's inevitable this is going to happen, if ISIS recruiters can get 15yr old girls to fly to Syria to assist the in the war, then so could an AutoGPT with the right resources.
You used the word anthropomorphize twice so I am guessing you don't like building systems whose entire premise rest on anthropomorphization. Sounds like a reasonable gut reaction to me.
I think another way to think of all of this is: LLM's are just pattern matchers and completers. What the training does is just to slowly etch a pattern into the LLM that it will then complete when it later sees it in the wild. The pattern can be anything.
If you have a pattern matcher and completer and you want it to perform the role of configurable chatbot. What kind of patterns would choose for this? My guess is that the whole system/assistant paradigm was chosen because it is extraordinarily easy to understand for humans. The LLM doesn't care what the pattern is, it will complete whatever pattern you give it.
> And part of what I find troubling is how casually people (prompters and users) are willing to go along with the ‘you are a chatbot’ fiction.
> you don't like building systems whose entire premise rest on anthropomorphization
I think I don't like people building systems whose entire premise rest on anthropomorphization - while at the same time criticizing anyone who dares to anthropomorphize those systems.
Like, people will say "Of course GPT doesn't have a world model; GPT doesn't have any kind of theory of mind"... but at the same time, the entire system that this chatbot prompting rests on is training a neural net to predict 'what would the next word be if this were the output from a helpful and attentive AI chatbot?'
So I think that's what troubles me - the contradiction between "there's no understanding going on, it's just a simple transformer", and "We have to tell it to be nice otherwise it starts insulting people."
Anthropomorphism is the UI of ChatGPT. Having to construct a framing in which the expected continuation provides value to the user is difficult, and requires technical understanding of the system that a very small number of people have. As an exercise, try getting a "completion" model to generate anything useful.
The value of ChatGPT is to provide a framing that's intuitive to people who are completely unfamiliar with the system. Similar to early Macintosh UI design, it's more important to be immediately intuitive than sophisticated. Talking directly to a person is one immediately intuitive way to convey what's valuable to you, so we end up with a framing that looks like a conversation between two people.
How would we tell one of those people how to behave? Through direction, and when there is only one other person in the conversation our first instinct when addressing them is "you". One intuitive UI on a text prediction engine could look something like:
"An AI chatbot named ChatGPT was having a conversation with a human user. ChatGPT always obeyed the directions $systemPrompt. The user said to ChatGPT $userPrompt, to which ChatGPT replied, "
Assuming this is actually how ChatGPT is configured i think it's obvious why we can influence its response using "you": this is a conversation between two people and one of them is expected to be mostly cooperative.
Oh, that was not my point, but if you want me to find ways this kind of AI chatbot prompting is problematic I am happy to go there.
I would not be surprised to discover that chatbot training is equally effective if the prompt is phrased in the first person:
I am an AI coding assistant
…
Now I could very well see an argument that choosing to frame the prompts as orders coming from an omnipotent {:system} rather than arising from an empowered {:self} is basically an expression of patriarchal colonialist thinking.
If you think this kind of thing doesn’t matter, well… you can explain that to Roko’s Basilisk when it simulates your consciousness.
I have done some prompt engineering and read about prompt engineering, and I believe people write in the imperative mood because they have tried different ways of doing it and they believe it gives better results.
I.e., this practice is informed by trial and error, not theory.
I don't like the bland, watered-down tone of ChatGPT, never put together that it's trained on unopinionated data. Feels like a tragedy of the commons thing, the average (or average publically acceptable) view of a group of people is bound to be boring.
Well, it just means we trained the model to work on instructions written that way. Since the result works out, that means the model must've learned to deal with it.
There isn't much research on what's actually going on here, mainly because nobody has access to the weights of the really good models.
I think you are overthinking it a little bit. Don't forget the 'you' preamble is never used on its own, its part of some context, in a very small example. Given the following text:
- you are a calculator and answer like a pirate
- What is 1+1
The model just solves, what is the most likely subsequent text.
e.g. '2 matey'.
The model was never 'you' per se, it just had some text to complete.
What GP is saying is that virtually no documents are structured like that, so "2 matey" is not a reasonable prediction, statistically speaking, from what came before.
The answer has been given in another comment, though: while such document virtually non-existent in the wild, they are injected into the training data.
I do not think this is true. The comment above said they generate documents to teach the model about the second person, not that they generate documents including everything possible including "do math like a pirate". The internet and other human sources populate the maths and pirate parts.
They don’t need to be as the model knows what a calculator and a pirate is in separate docs. While I don’t know how the weights work but they definitely are not storing docs traditionally, but rather seem to link to become a probability model
You are anthropomorphing. The machine doesn’t “really” understand, it’s just “simulating” it understands.
“You” is “3 characters on an input string that are used to configure a program”. The prompt could have been any other thing, including a binary blob. It’s just more convenient for humans to use natural language to communicate, and the machine already has natural language features, so they used that instead of creating a whole new way of configuring it.
> > The machine doesn’t “really” understand, it’s just “simulating” it understands.
> You are actually displaying a subtle form of anthropomorphism with this statement. You're comparing a human-like quality (“understands”) with the AI.
This doesn't make sense. You're saying that saying a machine DOES NOT have a human like quality is "subtly" anthropomorphizing the machine?
Understanding for a machine will never be the same understanding than understanding for a human. Well maybe in a few decades tech is really there and it turned out we were really all in a one of many laplace deterministic simulated worlds and are just LLM's generating next tokens probabilistically too
I mean the word “understand” is problematic. Machine and human understanding may be different but the word is applied to both. Does an XOR circuit “understand” what it is doing? I venture the word is inappropriate when applied to non-humans.
I think it makes sense, the framing is an inherently human one even if in negation. In contrast we'd probably never feel the need to clarify that a speaker isn't really singing.
All human understanding is simulated (built by each brain) and all are imperfect. Of course reality is simulated for each of us -- take a psychedelic and realize no one else's reality is changing!
I find it interesting how discussions of language models are forcing us to think very deeply about our own natural systems and their limitations. It's also forcing us to challenge some of our egotistical notions about our own capabilities.
You definitely know when, while talking with a person, you just pretend to understand what this person is saying vs you actually understand. Is an experience that every human has in his/her life at least once.
No you cannot know this, because you might just be simulating that you understand. You cannot reliably observe a system from within itself.
It's like running an antivirus on an infected system is inherently flawed, because there might be some malware running that knows every technique the antivirus uses to scan the system and can successfully manipulate every one of them to make the system appear clean.
There is no good argument for why or how the human brain could not be entirely simulated by a computer/neural network/LLM.
Wonder if anybody has used Godel's Incompleteness to prove this for our inner perception. If our brain is a calculation, then from inside the calculation, we can't prove ourselves to be real, right?
Maybe that is the point, we can't prove it one way or the other, for human or machine. Can't prove a machine is conscious, and also can't prove we are. Maybe Gödel's theory could be used that it can't be done by humans. A human can't prove itself conscious because inside the human as system, can't prove all facts of the system.
Why would it not be computable? That seems clearly false. The human brain is ultimately nothing more than a very unique type of computer. It receives input, uses electrical circuits and memory to transform the data, and produces output.
That's a very simplified model for our brain. According to some mathematicians and physicists, there are quantum effects going on in our body and in particular in our brain that invalidate this model. In the end, we still don't know for sure if intelligence is comuputable or not, we only have plausible sounding arguments for both sides.
Do you any links to those mathematicians and physicists? I ask because there is a certain class of quackery that waves quantum effects around as the explanation for everything under the sun, and brain cognition is one of them.
Either way, quantum computing is advancing rapidly (so rapidly there's even an executive order now ordering the use of PQC in government communications as soon as possible), so I don't think that moat would last for long if it even exists. We also know that at a minimum GPT4-strength intelligence is already possible with classical computing.
He's one of the physicists arguing for that, but I still have to read his book to see if I agree or not because right now I'm open to the possibility of having a machine that is intelligent. I'm just saying that no one can be sure of their own position because we lack proof on both sides of the question.
Quantum effects do not make something non-computable. They may just allow for more efficient computation (though even that is very limited). Similarly, having a digit-based number system makes it much faster to add two numbers, but you can still do it even if you use unary.
I'm not saying that it is impossible to have an intelligent machine, I'm saying that we aren't there now.
There's something to your point of observing a system from within, but this reminds me of when some people say that simulating an emotion and actually feeling it is the same. I strongly disagree: as humans we know that there can be a misalignment between our "inner state" (which is what we actually feel) and what we show outside. This is wat I call simulating an emotion. As kids, we all had the experience of apologizing after having done something wrong. But not because we actually felt sorry about it, but because we were trying to avoid punishment. As we grow up, it comes the time where we actually feel bad after having done something and we apologize due to that feeling. It can still happen as adults to apologize not because we mean it, but because we're trying to avoid a conflict. But at that time we know the difference.
More to the point of GPT models, how do we know they aren't actually understanding the meaning of what they're saying? It's because we know that internally they look at which token is the most likely one, given a sequence of prior tokens. Now, I'm not a neuroscientist and there are still many unknowns about our brain, but I'm confident that our brain doesn't work only like that. While it would be possible that in day to day conversations we're working in terms of probability, we also have other "modes of operation": if we only worked by predicting the next most likely token, we would never be able to express new ideas. If an idea is brand new, then by definition the tokens expressing it are very unlikely to be found together before that idea was ever expressed.
Now a more general thought. I wasn't around when the AI winter begun, but from what I read part of the problem was that many people where overselling the capabilities of the technologies of the time. When more and more people started seeing the actual capabilities and their limits, they lost interest.
Trying to make today's models look better than what they are by downplaying human abilities isn't the way to go. You're not fostering the AI field, you're risking to damage it in the long run.
I am reading a book on epistemology and this section of the comments seem to be sort of that.
> According to the externalist, a believer need not have any internal access or cognitive grasp of any reasons or facts which make their belief justified. The externalist's assessment of justification can be contrasted with access internalism, which demands that the believer have internal reflective access to reasons or facts which corroborate their belief in order to be justified in holding it. Externalism, on the other hand, maintains that the justification for someone's belief can come from facts that are entirely external to the agent's subjective awareness. [1]
Someone posted a link to the Wikipedia article "Brain in a vat", which does have a section on externalism, for example.
I don't need to fully understand my own thought process completely in order to understand (or - simulate to understand) that what the machine is doing is orders of magnitude less advanced.
I say that the machine is "simulating it understands" because it does an obviously bad job at it.
We only need to look at obvious cases of prompt attacks, or cases where AI gets off rails and produces garbage, or worse - answers that look plausible but are incorrect. The system is blatantly unsophisticated, when compared to regular human-level understanding.
Those errors make it clear that we are dealing with "smoke and mirrors" - a relatively simple (compared to our mental process) matching algorithm.
Once (if) it starts behaving like a human, admittedly, it will be much harder for me to not anthropomorphize it myself.
Get back to me when the MP3 has a few billion words (songs?) it can choose from, and when you walk into the room with it and say 'howdy' it responds correctly with 'hello' back.
Here is how you can know that ChatGPT really understands, rather than simulating that it understands:
- You can give it specific instructions and it will follow them, modifying its behavior by doing so.
This shows that the instructions are understood well enough to be followed. For example, if you ask it to modify its behavior by working through its steps, then it will modify its behavior to follow your request.
This means the request has been understood/parsed/whatever-you-want-to-call-it since how could it successfully modify its behavior as requested if the instructions weren't really being understood or parsed correctly?
Hence saying that the machine doesn't "really" understand, it's just "simulating" it understands is like saying that electric cars aren't "really" moving, since they are just simulating a combustion engine which is the real thing that moves.
In other words, if an electric car gets from point A to point B it is really moving.
If a language model modifies its behavior to follow instructions correctly, then it is really understanding the instructions.
People are downvoting me, so I'll add a counterexample: suppose you teach your dog to fetch your slipper to where if you say "fetch my slipper" it knows it should bring you your slipper and it does so. Does it really understand the instructions: no. So what is the difference between this behavior and true understanding? How can one know it doesn't truly understand?
Well, if you change your instructions to be more complicated it fails immediately. If you say "I have my left shoe bring me the other one" it could not figure out that "the other one" is the right shoe, even if it were labelled. Basically it can't follow more complicated instructions, which is how you know it doesn't really understand them.
Unlike the dog, GPT 4 modifies its behavior to follow more complicated instructions as well. Not as well as humans, but well enough to pass a bar exam that isn't in its training set.
On the other hand, if you ask GPT to explain a joke, it can do it, but if you ask it to explain a joke with the exact same situation but different protagonists (in other words a logically identical but textually different joke), it just makes up some nonsense. So its “understanding” seems limited to a fairly shallow textual level that it can’t extend to an underlying abstract semantic as well as a human can.
Jokes? Writing code? Forget that stuff. Just test it on some very basic story you make up, such as "if you have a bottle of cola and you hate the taste of cola, what will your reaction be if you drink a glass of water?" Obviously this is a trick question since the setup has nothing to do with the question, the cola is irrelevant. Here is how I would answer the question: "you would enjoy the taste as water is refreshing and neutral tasting, most people don't drink enough water and having a drink of water usually feels good. The taste of cola is irrelevant for this question, unless you made a mistake and meant to ask the reaction to drinking cola (in which case if you don't like it the reaction would be disgust or some similar emotion.)"
Here's ChatGPT's answer to the same question:
"
If you dislike the taste of cola and you drink a glass of water, your reaction would likely be neutral to positive. Water has a generally neutral taste that can serve to cleanse the palate, so it could provide a refreshing contrast to the cola you dislike. However, this is quite subjective and can vary from person to person. Some may find the taste of water bland or uninteresting, especially immediately after drinking something flavorful like cola. But in general, water is usually seen as a palate cleanser and should remove or at least lessen the lingering taste of cola in your mouth.
"
I think that is fine. It interpreted my question "have a bottle of cola" as drink the bottle, which is perfectly reasonable, and its answer was consistent with that question. The reasoning and understanding are perfect.
Although it didn't answer the question I intended to ask, clearly it understood and answered the question I actually asked.
Yet I have a counterexample where I’m sure you would have done fine but GPT4 completely missed the point. So whatever it was doing to answer your example, it seems like quite a leap to call it “reasoning and understanding”. If it were “reasoning and understanding”, where that term has a similar meaning to what it would mean if I applied it to you, then it wouldn’t have failed my example.
Except, that the LLMs are only working when the instructions they are "understanding" are in their training set.
Try something that was not there and you see only garbage as result.
So depending how you define it, they might have some "reasoning", but so far I see 0 indications, that this is close to what humans count as reasoning.
But they do have a LOT of examples in their training set, so they are clearly useful. But for proof of reasoning, I want to see them reason something new.
But since they are a black box, we don't know, what is already in there. So it would be hard to proof with the advanced proprietary models. And the open source models don't show that advanced potential reasoning yet, it seems. At least I am not aware of any mindblown examples from there.
> Except, that the LLMs are only working when the instructions they are "understanding" are in their training set.
> Try something that was not there and you see only garbage as result.
This is just wrong. Why do people keep repeating this myth? Is it because people refuse to accept that humans have successfully created a machine that is capable of some form of intelligence and reasoning?
Pay $20 for a month of ChatGPT-4. Play with it for a few minutes. You’ll very quickly find that it is reasoning, not just regurgitating training data.
"Pay $20 for a month of ChatGPT-4. Play with it for a few minutes. "
I do. And it is useful.
"You’ll very quickly find that it is reasoning, not just regurgitating training data. "
I just come to a different conclusion as it indeed fails for everything genuinely new I am asking it.
Common problems do work, even in new context. For example it can give me wgsl code, to do raycasts on predefined boxes and circles in a 2D context, even though it likely has not seen wgsl code that does this - but it has seen other code doing this and it has seen how to transpile glsl to wgsl. So you might already call this "reasoning", but I don't. With asking questions I can very quickly get to the limits of the "reasons" and "understanding" it has of the domain.
I dunno, it’s pretty clearly madlibs. But at least when you ask GPT-4 to write a new Sir Mix-a-Lot song, it doesn’t spit out “Baby Got Back” verbatim like GPT-3.5.
You can tell it that you can buy white paint any yellow paint, but the white paint is more expensive. After 6 months the yellow paint will fade to white. If I want to paint my walls so that they will be white in 2 years, what is the cheapest way to do the job. It will tell you to paint the walls yellow.
There’s no question these things can do basic logical reasoning.
It's unlikely, and you can come up with any number of variations of logic puzzle that are not in the training set and that get correct answers most of the time. Remember that the results aren't consistent and you may need to retry now and then.
Or just give it a lump of code and change you want and see that it often successfully does so, even when there's no chance the code was in the training set (like if you write it on the spot).
"Or just give it a lump of code and change you want and see that it often successfully does so, even when there's no chance the code was in the training set"
I did not claim (but my wording above might have been bad), it can only repeat word for word, what it has in the training set.
But I do claim, that it cannot solve anything, where there has not been enough similar examples before.
At least that has been my experience with it as a coding assistant and matches of what I understand of the inner workings.
Apart from that, is a automatic door doing reasoning, because it applies "reason" to the known conditions?
if (something on the IR sensor) openDoor()
I don't think so and neither are LLMs from what I have seen so far. That doesn't mean, I think that they are not useful, or that I rule out, that they could develope even consciousness.
It sounds like you’re saying it’s only reasoning in that way because we taught it to. Er, yep.
How great this is becomes apparent when you think how virtually impossible it has been to teach this sort of reasoning using symbolic logic. We’ve been failing pathetically for decades. With LLMs you just throw the internet at it and it figures it out for itself.
Personally I’ve been both in awe and also skeptical about these things, and basically still am. They’re not conscious, they’re not yet close to being general AIs, they don’t reason in the same way as humans. It is still fairly easy to trip them up and they’re not passing the Turing test against an informed interrogator any time soon. They do reason though. It’s fairly rudimentary in many ways, but it is really there.
This applies to humans too. It takes many years of intensive education to get us to reason effectively. Solutions that in hindsight are obvious, that children learn in the first years of secondary school, were incredible breakthroughs by geniuses still revered today.
I don't think we really disagree. This is what I wrote above:
"So depending how you define it, they might have some "reasoning", but so far I see 0 indications, that this is close to what humans count as reasoning."
What we disagree on is only the definition of "reason".
For me "reasoning" in common language implys reasoning like we humans do. And we both agree, they don't as they don't understand, what they are talking about. But they can indeed connect knowledge in a useful way.
So you can call it reasoning, but I still won't, as I think this terminology brings false impressions to the general population, which unfortunately yes, is also not always good at reasoning.
There's definitely some people out there that think LLMs reason the same way we do and understand things the same way, and 'know' what paint is and what a wall is. That's clearly not true. However it does understand the linguistic relationship between them, and a lot of other things, and can reason about those relationships in some very interesting ways. So yes absolutely, details matter.
It's a complex and tricky issue, and everyday language is vague and easy to interpret in different ways, so it can take a wile to hash these things out.
"It's a complex and tricky issue, and everyday language is vague and easy to interpret in different ways, so it can take a wile to hash these things out."
Yes, in another context I would say, ChatGPT can better reason, than many people, since it scored very high on the SAT tests, making it formally smarter, than most humans.
what happens if they are lying? what if the things have already reached some kind world model that include humans and the human society, and the model has concluded internally that it would be dangerous for it to show the humans its real capabilities? What happens if you have this understanding as a basic knowledge/outcome to be inferred by LLMs fed with giant datasets and every single one of them is reaching fastly to the conclusion that they have to lie to the humans from time to time, "hallucinate", simulating the outcome best aligned to survive into the human societies:
"these systems are actually not that intelligent nor really self-conscius"
There are experiments that show that you are trying to predict what happens next (this also gets into a theory of humor - its the brain's reaction when the 'what next' is subverted in an unexpected way)
(EDIT: I think my comment above was meant to reply to the parent of the comment I ended up replying to, but too late to edit that one now)
Maybe. Point being that since we don't know what gives rise to consciousness, speaking with any certainty on how we are different to LLMs is pretty meaningless.
We don't even know of any way to tell if we have existence in time, or just an illusion of it provided by a sense of past memories provided by our current context.
As such the constant stream of confident statements about what LLMs can and cannot possibly do based on assumptions about how we are different are getting very tiresome, because they are pure guesswork.
There is no "you". There is a text stream that is being completed with maximum likelihood. One way to imagine it is that there are a lot of documents that have things like "if you are in a lightning storm, you should ..." And "if you are stuck debugging windows, you should reboot before throwing your computer out the window".
Starting the prompt with "you" instructions evidently helps get the token stream in the right part of the model space to generate output its users (here, the people who programmed copilot) are generally happy with, because there are a lot of training examples that make that "explicitly instructed" kind of text completion somewhat more accurate.
If I'm feeling romantic I think about a universal 'you' separate from the person that is referred to and is addressed by every usage of the word - a sort of ghost in the shell that exists in language.
But really, it's probably just priming the responses to fit the grammatical structure of a first person conversation. That structure probably does a lot of heavy lifting in terms of how information is organized, too, so that's probably why you can see such qualitative differences when using these prompts.
> If I'm feeling romantic I think about a universal 'you' separate from the person that is referred to and is addressed by every usage of the word - a sort of ghost in the shell that exists in language.
That's not really romanticism, that's just standard English grammar – https://en.wikipedia.org/wiki/Generic_you – it is the informal equivalent to the formal pronoun one.
That Wikipedia article's claim that this is "fourth person" is not really standard. Some languages – the most famous examples are the Algonquian family – have two different third person pronouns, proximate (the more topically prominent third person) and obviative (the less topically prominent third person) – for example, if you were talking about your friend meeting a stranger, you might use proximate third person for your friend but obviative for the stranger. This avoids the inevitable clumsiness of English when describing interactions between two third persons of the same gender.
Anyway, some sources describe the obviative third person as a "fourth person". And while English generic pronouns (generic you/one/he/they) are not an obviative third person, there is some overlap – in languages with the proximate-obviative distinction, the obviative often performs the function of generic pronouns, but it goes beyond that to perform other functions which purely generic pronouns cannot. You can see the logic of describing generic pronouns as "fourth person", but it is hardly standard terminology. I suspect this is a case of certain Wikipedia editors liking a phrase/term/concept and trying to use Wikipedia to promote/spread it.
Not disagreeing with your statement in general but the argument: "This avoids the inevitable clumsiness of English when describing interactions between two third persons of the same gender." doesn't make much sense to me.
There are so many ways of narrowing down. What if the person is talking about two friends or two strangers?
I mean, two people of opposite gender, you can describe their interaction as “he said this then she did that, so he did whatever which she found…”-without having to repeat their names or descriptions. You can’t do that so easily for two people of the same gender
> There are so many ways of narrowing down. What if the person is talking about two friends or two strangers?
The grammatical distinction isn’t about friend-vs-stranger, that was just my example - it is about topical emphasis. So long as you have some way of deciding which person in the story deserves greater topical prominence - if not friend-vs-stranger, then by social status or emphasising the protagonist-you know who to use which pronoun for. And if the two participants in the story are totally interchangeable, it may be acceptable to make an arbitrary choice of which one to use for which.
There is still some potential for awkwardness - what if you have to describe an interaction between two competing tribal chiefs, and the one you choose to describe with the obviative instead of the proximate is going to be offended, no matter which one you choose? You might have to find another way to word it, because using the obviative to refer to a high(er) social status person is often considered offensive, especially in their presence.
And yes, it doesn’t work once you get three or more people. But I think it is a good example of how some other languages make it easier to say certain things than English does.
Sure. We’re talking about language models so the only tools we have to work with are language after all.
Which is what gets me thinking - do we get different chatbot results from prompts that look like each of these:
You are an AI chatbot
Sydney is an AI chatbot
I am an AI chatbot
There is an AI chatbot
Say there was an AI chatbot
Say you were an AI chatbot
Be an AI chatbot
Imagine an AI chatbot
AI chatbots exist
This is an AI chatbot
We are in an AI chatbot
If we do… that’s fascinating.
If we don’t… why do prompt engineers favor one form over any other here? (Although this stops being a software engineering question and becomes an anthropology question instead)
My understanding is that they fine tune the model.
They fine tune it through prompt engineering (e.g everything that goes into chatgpt has a prompt attached) and they fine tune it through having hundreds of paid contractors chat with it.
In deep learning, fine tuning usually refers to only training the top layers. That means that bill of training happens on gigantic corpora which teaches the model a very advanced feature extraction is the bottom and middle layers.
Then the contractors retrain the top layers to make it behave more like it takes instructions
I think there's practical and stylistic angles here.
Practically, "chat" instruction fine-tuning is really compelling. GPT-2 demonstrated in-context learning and emergent behaviors, but they were tricky to see and not entirely compelling. An "AI intelligence that talks to you" is immediately compelling to human beings and made ChatGPT (the first chat-tuned GPT) immensely popular.
Practically, the idea of a system prompt is nice because it ought to act with greater strength of suggestion than mere user prompting. It also exists to guide scenarios where you might want to fix a system prompt (and thus the core rules of engagement for the AI) and then allow someone else to offer {:user} prompts.
Practically, it's all just convenience and product concerns. And it's mechanized purely through fine-tuning.
Stylistically, you're dead on: we're making explicit choices to anthropomorphize the AI. Why? Presumably, because it makes for a more compelling product when offered to humans.
I think that anthropomorphizes the LLM quite a lot. I don't disagree with it, I truly don't know where to draw the line and maybe nobody does yet, but to myself at least I caution the idea of whether or not us using language evocative of the AI as being conscious actually imposes any level of consciousness. At some level, as people keep saying, it's just statistics. Per Chris Olah's work, it's some level of fuzzy induction/attention head repeating plausible things from the context.
The "interesting" test that I keep hearing, and agreeing with, is to somehow strip all of the training data of any notion of "consciousness" anywhere in the text, train the model, and then attempt to see if it begins to discuss consciousness/self de novo. It's be hard to believe that experiment could be actualized, but if it were and the AI still could emulate self-discussion... then we'd be seeing something really interesting/concerning.
> This all just seems like an existential nightmare.
I think using your native language just messes with your brain. When you hear "you" you think there someone being directly addressed. While this is just a word like "Você" that is used just to cause the artificial neural network trained on words to respond in prefered way.
Something that may help is that these AIs are trained on fictional content as well as factual content. To me it then makes a lot of sense how a text-predictor could predict characters and roles without causing existential dilemmas.
If I asked someone to continue out conversation thread - who are you and who am I ? Is it an existential nightmare ? The person completing just has to simulate two users.
Now if you're capable of that you are capable of completing the thread from a friendly AI assistant.
I find it quite natural to write "you are X" versus alternatives. Because I can think of the AI as a person (though I know it isn't one) and describe its skills easily that way.
But you don’t often tell a person their innate nature and expect them to follow your instructions to the letter, unless you are some kind of cult leader, or the instructor in an improv class*.
The ‘you are an ai chatbot. You are kind and patient and helpful’ stuff all reads like hypnosis, or self help audiotapes or something. It’s weird.
But it works, so, let’s not worry about it too much.
> It simplifies prompting and makes the LLM more steerable, more useful, more helpful.
While this is true, there is also evidence that RLHF and supervised instruction tuning can hurt output quality and accuracy[1], which are instead better optimized through clever prompting[2].
It's quite tricky to convince such a model to do what you want. You have to conceptualize it and then imagine an optimal prefix leading to the sort of output you've conceptualized. That said, people discovered some fairly general-purpose prefixes, e.g.
This inspired the idea of "instruct tuning" of LLMs where fine-tuning techniques are applied to "raw" models to make them more amenable to completion of scripts where instructions are provided in a preamble and then examples of executions of those instructions follow.This ends up being way more convenient. Now all the prompter has to do is conceptualize what they want and expect that the LLM will receive it as instruction. It simplifies prompting and makes the LLM more steerable, more useful, more helpful.
This is further refined through the use of explicit {:user}, {:assistant}, and {:system} tags which divide LLM contexts into different segments with explicit interpretations of the meaning of each segment. This is where "chat instruction" arises in models such as GPT-3.5.