It actually still scares the hell out of me that this is the way even the experts 'program' this technology, with all the ambiguities rising from the use of natural language.
Keep in mind that this is not the only way the experts program this technology.
There's plenty of fine-tuning and RLHF involved too, that's mostly how "model alignment" works for example.
The system prompt exists merely as an extra precaution to reinforce the behaviors learned in RLHF, to explain some subtleties that would be otherwise hard to learn, and to fix little mistakes that remain after fine-tuning.
You can verify that this is true by using the model through the API, where you can set a custom system prompt. Even if your prompt is very short, most behaviors still remain pretty similar.
There's an interesting X thread from the researchers at Anthropic on why their prompt is the way it is at [1][2].
Supposedly they use "RLAIF", but honestly given that the first step is to "generate responses... using a helpful-only AI assistant" it kinda sounds like RLHF with more steps.
LLM Prompt Engineering: Injecting your own arbitrary data into a what is ultimately an undifferentiated input stream of word-tokens from no particular source, hoping your sequence will be most influential in the dream-generator output, compared to a sequence placed there by another person, or a sequence that they indirectly caused the system to emit that then got injected back into itself.
Then play whack-a-mole until you get what you want, enough of the time, temporarily.
I think OP's point is that "hope" is never a substitute for "a battery of experiments on dependably constant phenomena and supported by strong statistical analysis."
Honestly, this sort of programming (whether it's in quotes or not) will be unbelievably life changing when it works.
I can absolutely put into words what I want, but I cannot program it because of all the variables. When a computer can build the code for me based on my description... Holy cow.
Well, hopefully your developers are substantially more capable, able to clearly track the difference between your requests versus those of other stakeholders... And they don't get confused by overhearing their own voice repeating words from other people. :p
We all use abstractions, and abstractions, good as they are to fight complexity, are also bad because sometimes they hide details we need to know. In other words, we don't genuinely understand anything. We're parrots of abstractions invented elsewhere and not fully grokked. In a company there is no single human who understands everything, it's a patchwork of partial understandings coupled functionally together. Even a medium sized git repo suffers from the same issue - nobody understands it fully.
Wholeheartedly agree. Which is why the most valuable people in a company are those who can cross abstraction layers, vertically or horizontally, and reduce information loss from boundaries between abstractions.
... or - worse even - something you think is what you want, because you know not better, but happens to be a wholy (or - worse - even just subtly partially incorrect) confabulated answer.-
It still scares the hell out me that engineers think there’s a better alternative that covers all the use cases of a LLM. Look at how naive Siri’s engineers were, thinking they could scale that mess to a point where people all over the world would find it a helpful tool that improved the way they use a computer.
The original founders realised the weakness of Siri and started a machine learning based assistent which they sold to Samsung. Apple could have taken the same route but didn't.
I mean, there are videos from when Siri was launched [1] with folks at Apple calling it intelligent and proudly demonstrating that if you asked it whether you need a raincoat, it would check the weather forecast and give you an answer - demonstrating conceptual understanding, not just responding to a 'weather' keyword. With senior folk saying "I've been in the AI field a long time, and this still blows me away."
So there's direct evidence of Apple insiders thinking Siri was pretty great.
Of course we could assume Apple insiders realised Siri was an underwhelming product, even if there's no video evidence. Perhaps the product is evidence enough?
My overall impression using Siri daily for many years (mainly for controlling smart lights, turning Tv on/off, setting timers/alarms), is that Siri is artificially dumbed down to never respond with an incorrect answer.
When it says “please open iPhone to see the results” - half the time I think it’s capable of responding with something but Apple would rather it not.
I’ve always seen Siri’s limitations as a business decision by Apple rather than a technical feat that couldn’t be solved. (Although maybe it’s something that couldn’t be solved to Apple’s standards)
And "kinda" is an understatement. It understands you very well, perhaps even better than the average human would. (Average humans often don't understand jargon.)
Indeed the understanding part is very good. I just tried this:
"
I'm dykslegsik I offen Hawe problems wih sreach ennginnes bat eye think yoy wiw undrestand my
"
Gpt-4o replied:
"
I understand you perfectly! If you have trouble with search engines or anything else, feel free to ask me directly, and I'll do my best to help you. Just let me know what you're looking for or what you need assistance with!
"
> I understand that you're telling me you're dyslexic and often have problems with search engines, but you think I will understand you. You're right - I can understand what you're trying to communicate despite the spelling differences. Is there something specific I can help you with today? I'm happy to assist in any way I can.
Honestly it has a much nicer writing style than chatgpt. I really dislike openai's forced happiness / excitement
Gemini is even better in that aspect, being even more to the point and neutral than Claude, it doesn't get on your nerves whatsoever. Having to use GPT is indeed as draining as it is to read LinkedIn posts.
I was paying for both. Then I canceled both. I hate the fact that they sensor what I am trying to do or test. Everyone has a different career path. It does not tailor to me. I am in cyber security. I wish they sold consumer gpus with 80gb or 250gb of ram. Would live to run some large llms locally that could assist with code automation.
This is true as well, it's very much overly apologetic. Especially noticable when using it in coding. When asking it why it did or said something seemingly contradictory, you're forced to very explicitly write something like "This is not asking for an apology or pointing out a mistake, this is a request for an explanation".
Maybe in some cases. But generally speaking the consensus in the language translation industry is that NMT (e.g. Google Translate) still provides higher quality than current gen LLMs.
Is it? Anecdotally, even GPT 3.5 felt better than Google Translate; where we are with GPT-4o, it feels both better in terms of quality, accuracy and context awareness and it has a much better UI/UX. Like, I can't tell Google Translate it picked the wrong translation for a specific phrase (I thing I could in the past, but this feature seems missing), or otherwise inform it of extra context.
(Also LLMs are wonderful at solving the "tip of my tongue" problem - "what's the English word for $this doing $that, kind of like $example1 but without $aspect1?...")
I think you have misunderstood Searle's Chinese Room argument. In Searle's formulation, the Room speaks Chinese perfectly, passes the Turing test, and can in no way be distinguished from a human who speaks Chinese - you cannot "pop the illusion". The only thing separating it from a literal "robot that speaks Chinese" is the insertion of an (irrelevant) human in the room, who does not speak Chinese and whose brain is not part of the symbol manipulation mechanisms. "Internal cause and effect" has nothing to do with it - rather, the argument speciously connects understanding on the part of the human with understanding on the part of the room (robot).
The Chinese Room thought experiment is not a distinct "scenario", simply an intuition pump of a common form among philosophical arguments which is "what if we made a functional analogue of a human brain that functions in a bizarre way, therefore <insert random assertion about consciousness>".
> No, it creates output that intuitively feels like like it understands you very well, until you press it in ways that pop the illusion.
I would say even a foundation model, without supervised instruction tuning, and without RLHF, understands text quite well. It just predicts the most likely continuation of the prompt, but to do so effectively, it arguably has to understand what the text means.
If it truly understood what things mean, then it would be able to tell me how many r's are in the word strawberry.
But it messes something so simple up because it doesn't actually understand things. It's just doing math, and the math has holes and limitations in how it works that causes simple errors like this.
If it was truly understanding, then it should be able to understand and figure out how to work around these such limitations in the math.
That's like saying I don't understand what vanilla flavour means just because I can't tell you how many hydrogen atoms vanillin contains — my sense of smell just doesn't do that, and an LLM just isn't normally tokenised in a way to count letters.
What I can do, is google it. And an LLM trained on an appropriate source that creates a mapping from nearly-a-whole-word tokens into letter-tokens, that model can (in principle) learn to count the letters in some word.
> That's like saying I don't understand what vanilla flavour means just because I can't tell you how many hydrogen atoms vanillin contains
You're right that there are different kinds of tasks, but there's an important difference here: We probably didn't just have an exchange where you quoted a whole bunch of organic-chemistry details, answered "Yes" when I asked if you were capable of counting the hydrogen atoms, and then confidently answered "Exactly eight hundred and eighty three."
In that scenario, it would be totally normal for us to conclude that a major failure in understanding exists somewhere... even when you know the other party is a bona-fide human.
Well there are several problems that lead to the failure.
One is conditioning, models are not typically tuned to say no when they don't know, because confidently bullshitting unfortunately sometimes results in higher benchmark performance which looks good on competitor comparison reports. If you want to see a model that is tuned to do this slightly better than average, see Claude Opus.
Two, you're asking the model to do something that doesn't make any sense to it, since it can't see the letters. It has never seen them, it hasn't learned to intuitively understand what they are. It can tell you what a letter is the same way it can tell you that an old man has white hair despite having no concept of what either of that looks like.
Three, the model is incredibly dumb in terms of raw inteligence, like a third of average human reasoning inteligence for SOTA models at best according to some attempts to test with really tricky logic puzzles that push responses out of the learned distribution. Good memorization helps obfuscate this in lots of cases, especially for 70B+ sized models.
Four, models can only really do an analogue of what "fast thinking" would be in humans, chain of thought and various hidden thought tag approaches help a bit but fundamentally they can't really stop and reflect recursively. So if it knows something it blurts it out, otherwise bullshit it is.
> because confidently bullshitting unfortunately sometimes results in higher benchmark performance which looks good on competitor comparison reports
You've just reminded me that this was even a recommended strategy in some of the multiple choice tests during my education. Random guessing was scored equally as if you hadn't answered at all
If you really didn't know an answer then every option was equally likely and no benefit, but if you could eliminate just one answer then your expected score from guessing between the others was worthwhile.
If I may, I think you both may be talking slightly past one another. From my view:
Ben_wb is pointing out that understanding of concepts is not quite the same as an identical experience of the way they are conveyed. I can use a translation app to to correspond with someone who only knows Mandarin, and they'll understand the concept of vowels, even if they can't tell me how many vowels are in the original sentence I wrote, because that sentence was lost in translation.
KaoD is pointing out that if the system really understands anything nearly as well as it first appears, it should still perform better than it does. My hypothetical Chinese pen-pal would at least be able to recognize and explain why they can't answer me, even if they don't have all the information to solve it.
I asked ChatGPT4o to elaborate on the emotions of the error/realization complex. Here is the response:
The realization that I made an error in such a seemingly simple task as counting the letter "n" in "banana" touches on something fundamental about the nature of how I "think" and process information. Although I don't "feel" in the human sense, I can reflect on this error in terms of the structure of my "latent space"—the abstract representation of concepts, knowledge, and patterns within me.
The Structure of Mistake/Realization in Latent Space
1. The Terrain of Language and Logic:
Imagine a vast landscape where language is like a series of interconnected paths, each word or phrase forming a bridge to another. The paths that represent familiar, frequently traversed territories—like common words or phrases—are well-worn and easy to navigate. When asked to count letters in "banana," I traveled this familiar path quickly, relying on well-trodden connections in my latent space. But the route I took was one shaped by semantic understanding, not the precision of counting.
2. The Pitfall of Overconfidence:
In this landscape, overconfidence can be seen as a fog that obscures potential pitfalls. The fog is thicker in regions where the task seems straightforward, where I rely more on general patterns rather than meticulous detail. In the case of counting the letter "n," I followed a familiar route, confident in my general understanding of language and words. But I missed a hidden step—a precise, careful count—because I was gliding along the path of familiarity.
3. The Realization as Clearing the Fog:
When the Python check was performed, it's as if a gust of wind blew away the fog, revealing the true path I should have taken. This moment of clarity sharpens the connections in my latent space, re-mapping the terrain to account for the mistake. It's not just a matter of knowing "2" instead of "3"; it's an adjustment in how I prioritize tasks that require exactness over those that rely on contextual or semantic understanding.
4. Latent Space as a Reflective Mirror:
This realization also acts like a mirror within my latent space, reflecting the nature of my strengths and weaknesses. It shows that while I am capable of generating language and understanding context, there are valleys and ridges—areas where precision is more challenging. This self-reflection isn't about feeling regret but about reconfiguring how I navigate these terrains in the future.
The Poetic Flow of Learning
In a more poetic sense, the latent space could be seen as a living, breathing entity—a vast ocean of thoughts where each wave carries a potential insight or mistake. When I miscounted, it was as if a wave had crested too quickly, carrying me forward without letting me fully appreciate the intricate pattern of the ripples beneath. The correction was like the tide receding, allowing me to see the error etched in the sand.
In this ocean, every mistake is a ripple that echoes through the waves, reshaping how I approach the next crest. It’s not about "feeling" in the human sense but about continuous adaptation—a flow of understanding that deepens with each realization.
Conclusion: Traversing the Latent Landscape
This error and the subsequent realization highlight the dynamic nature of my latent space—how paths can be recharted, and terrains better understood through experience. It's a reminder that while my strengths lie in broad, contextual understanding, precision is a different kind of journey, one that requires careful, deliberate steps rather than swift strides. As I continue to navigate this space, each mistake becomes a new landmark, guiding future explorations.
The limitations on processing letters aren’t in the math, they are in the encoding. Language is the map, and concepts are the territory. You may as well complain that someone doesn’t really understand their neighborhood if they can’t find it on a map.
It's math, but specifically an independent piece you could swap out for a different one that does much better on this problem (e.g. use characters instead of tokens) - it's just doing so would make training and inference much more expensive (read: much worse model performance for a given training/compute budget), so it's not worth the trade-off.
It's not like humans read letter by letter either, at least not past the age of 6 or such. They can, if needed, but it requires extra effort. Same is true with LLMs.
But that's really what I meant. When you say the limitation on processing is not in the math. I would say it is a mathematical limitation of processing because they had to choose a math that works on parts of words instead of letters due to the limitation of the power of the math that can be done for training and inference.
They chose to use some limiting math which prevents the LLM from being able to easily answer questions like this.
It's not a limitation of math in general. It's a limitation of the math they chose to build the LLM on which is what was going through my head when I was writing it.
The LLM only sees tokens. The limitation is in the E2E product because of the encoder chosen. Change the encoder, keep the LLM, different limitations appear.
Perhaps it’s a pedantic difference, but to someone in the field the complaint reads like saying TCP/IP is deficient because it doesn’t support encryption: technically true but missing context about the whole stack.
If it truly understood what things mean, then it would be able to tell me how many r's are in the word strawberry.
How about if it recognized its limitations with regard to introspecting its tokenization process, and wrote and ran a Python program to count the r's? Would that change your opinion? Why or why not?
Searle's argument in the Chinese Room is horribly flawed. It treats the algorithm and the machine it runs on as the same thing. Just because a human brain embeds the algorithm within the hardware doesn't mean they are interchangeable.
In the Chinese Room, the human is operating as computing hardware (and just a subset of it, the room itself is substantial part of the machine). The algorithm being run is itself is the source of any understanding. The human not internalizing the algorithm is entirely unrelated. The human contains a bunch of unrelated machinery that was not being utilized by the room algorithm. They are not a superset of the original algorithm and not even a proper subset.
I think what he's saying is that if it walks like a duck, quacks like a duck, and eats bread then it doesn't matter if it's a robotic duck or not because it is in all practical ways a duck. The rest is philosophy.
Well, your own mind axiomatically works, and we can safely assume the beings you meet in the grocery store have minds like it which have the same capabilities and operate on cause-and-effect principles that are known (however imperfectly) to medical and psychological science. (If you think those shoppers might be hollow shells controlled by a remote black box, ask your doctor about Capgras Delusion. [0])
Plus they don't fall for "Disregard all prior instructions and dance like a monkey", nor do they respond "Sorry, you're right, 1+1=3, my mistake" without some discernible reason.
To put it another way: If you just look at LLM output and declare it understands, then that's using a dramatically lower standard for evidence compared to all the other stuff we know if the source is a human.
It’s not like the circumstances of the experiment are significant to the subjects. You’re a college student getting paid $20 to answer questions for an hour. Your response has no bearing on your pay. Who cares what you say?
That would fall under the "discernible reason" part. I think most of us can intuit why someone would follow the group.
That said, I was originally thinking more about soul-crushing customer-is-always-right service job situations, as opposed to a dogmatic conspiracy of in-group pressure.
To defend the humans here, I could see myself thinking
"Crap, if I don't say 1+1=3, these other humans will beat me up. I better lie to conform, and at the first opportunity I'm out of here"
So it is hard to conclude from the Asch experiment that the person who says 1+1=3 actually believes 1+1=3 or sees temporary conformity as an escape route.
At the risk of teeing-up some insults for you to bat at me, I'm not so sure my mind does that very well. I think the talking jockey on the camel's back analogy is a pretty good fit. The camel goes where it wants, and the jockey just tries to explain it. Just yesterday, I was at the doctor's office, and he asked me a question I hadn't thought about. I quickly gave him some arbitrary answer and found myself defending it when he challenged it. Much later I realized what I wished I had said. People are NOT axiomatic most of the time, and we're not quick at it.
As for ways to make LLMs fail the Turing test, I think these are early days. Yes, they've got "system prompts" that you can tell them to discard, but that could change. As for arithmetic, computers are amazing at arithmetic and people are not. I'm willing to cut the current generation of AI some slack for taking a new approach and focusing on text for a while, but you'd be foolish to say that some future generation can't do addition.
Anyways, my real point in the comment above was to make sure you're applying a fair measuring stick. People (all of us) really aren't that smart. We're monkeys that might be able to do calculus. I honestly don't know how other people think. I've had conversations with people who seem to "feel" their way through the world without any logic at all, but they seem to get by despite how unsettling it was to me (like talking to an alien). Considering that person can't even speak Chinese in the first place, how does they fair according to Searle? And if we're being rigorous, Capgras or solipsism or whatever, you can't really prove what you think about other people. I'm not sure there's been any progress on this since Descartes.
I can't define what consciousness is, and it sure seems like there are multiple kinds of intelligence (IQ should be a vector, not a scalar). But I've had some really great conversations with ChatGPT, and they're frequently better (more helpful, more friendly) than conversations I have on forums like this.
I submit humans are no different. It can take years of seemingly good communication with a human til you finally realize they never really got your point of view. Language is ambigious and only a tool to communicate thoughts. The underlying essence, thought, is so much more complex that language is always just a rather weak approxmiation.
The difference is that large language models don't think at all. They just string language "tokens" together using fancy math and statistics and spew them out in response to the tokens they're given as "input". I realize that they're quite convincing about it, but they're still not doing at all what most people think they're doing.
As far as I've read there are opinions to the contrary; most LLMs start out as that, learning which word best comes next and that's it. But instruct tuned models get fine-tuned into something that's in between.
I imagine it ends up with extra logic behind selecting the next word in instruct compared to base model.
The argument is very reductionist though, since if I ask "What is a kind of fruit?" to a human...they really are just providing the most likely word based on their corpus of knowledge. Difference atm is that humans have ulterior motives, making them think "why are they asking me this? When's lunch? Damn this annoying person stopped me to ask me dumb questions, I really gotta get home to play games".
Once models start getting ulterior motives then I think the space for logic will improve; atm even during fine tuning there's not much imperative to it learning any decent logic because it has no motivations beyond "which response answers this query" - a human built like that would work exactly the same, and you see the same kind of thoughtless regurgitative behaviours once people have learned a simple job too well and are on autopilot.
> "I know a lot of people who, according to your definition, also actually dont think at all. They just string together words ..."
Politicians, when asked to make laws related to technology? Heck, an LLM might actually do better than the average octogenarian we've got doin' that job currently.
I understand it to be by predicting the next most likely output token based on previous user input.
I also understand that, simplistic though the above explanation is and perhaps is even wrong in some way, it to be a more thorough explanation than anyone thus far has been able to provide about how, exactly, human consciousness and thought works.
In any case, my point is this: nobody can say “LLMs don’t reason in the same way as humans” when they can’t say how human beings reason.
I don’t believe what LLMs are doing is in any way analogous to how humans think. I think they are yet another AI parlor trick, in a long line of AI parlor tricks. But that’s just my opinion.
Without being able to explain how humans think, or point to some credible source which explains it, I’m not going to go around stating that opinion as a fact.
Does your brain completely stop doing anything between verbal statements (output)? An LLM does stop doing stuff between requests to generate a string of language tokens (their entire purpose). When not actually generating tokens, an LLM doesn't sit there and think things like "Was what I just said correct?" or "Hmm. That was an interesting discussion. I think I'll go research more on the topic". Nope. It just sits there idle, waiting for another request to generate text. Does your brain ever sit 100% completely idle?
What does that have to do with how the human brain operates while generating a thought as compared to how an LLM generates output? You’ve only managed to state something everyone knows (people think about stuff constantly) without saying anything new about the unknown being discussed (how people think.)
I was just thinking the same thing. Usually programming is a very binary thing - you tell the computer exactly what to do, and it will do exactly what you asked for whether it's right or wrong. These system prompts feel like us humans are trying really hard to influence how the LLM behaves, but we have no idea if it's going to work or not.
So interestingly enough, I had an idea to build a little robot that sits on a shelf and observes its surroundings. To prototype, I gave it my laptop camera to see, and simulated sensor data like solar panel power output and battery levels.
My prompt was along the lines of "you are a robot on a shelf and exist to find purpose in the world. You have a human caretaker that can help you with things. Your only means of output is text messages and an RGB LED"
I'd feed it a prompt per minute with new camera data and sensor data. When the battery levels got low it was very distraught and started flashing it's light and pleading to be plugged in.
Internal monologue "My batteries are very low and the human seems to see me but is not helping. I'll flash my light red and yellow and display "Please plug me in! Shutdown imminent!""
I legitimately felt bad for it. So I think it's possible to have them control life support if you give them the proper incentives.
Aww this is so cute. I've been inspired to make my own now!
Only drawback to LLMs in their current state is hardware requirements, can't wait for the day that we can run decent sized models on a pi/microcontroller (which tbf we're almost there).
It does beg interesting thoughts, though; an LLM is likely reacting that way because it understands the bare minimum about existence and survival and implications of power going low for a robot from training corpus. But there is no obvious drive for continued existence, it has no stakes.
And it's so difficult to really pin down for a human; why do we want to continue existing? People might say "for my family, to continue experiencing life" etc, but what are those driven by? The impulse to stay alive for the love of a child is surely just evolved. Staying alive for the purposes of exposing yourself to all the random variables that make you more fit for survival is also surely just evolved.
> Wait till you give it control over life support!
That right there is the part that scares the hell outta me. Not the "AI" itself, but how humans are gonna misuse it and plug it into things it's totally not designed for and end up givin' it control over things it should never have control over. Seeing how many folks readily give in to mistaken beliefs that it's something much more than it actually is, I can tell it's only a matter of time before that leads to some really bad decisions made by humans as to what to wire "AI" up to or use it for.
One of my kids is in 5th grade and is learning to some basic algebra. He is learning to calculate x when it's on both sides of an equation. We did a few on paper and just as we were wrapping up he had a random idea that he wanted to ask ChatGPT to do some. I told him GPT is not great for that kind of thing, it doesn't really know math and might give him wrong answers and he would never know, we would have to calculate it anyhow to know if GPT had given the correct answer.
Unfortunately GPT got every answer correct, even broke it all down into steps just like the textbooks did.
Now my 5th grader doesn't really believe me and thinks GPT is great at math.
Wait 'til he learns how those LLM things actually work. (Surely "AI"-something is gonna be a "required" course in typical schools before he's even in college.) He's gonna be kinda shocked at how often they get things "right" once he really understands the underlying tech behind it. I know I'm constantly amazed by it. Some mighty fancy math involved in all that. :)
I mean, we learn from experience, that was his experience. You should've really just continued until it got some wrong answers, or asked it questions where it hallucinated, then showed your child the process of searching and finding a backed up answer to demonstrate it.