LLMs are incapable of telling the truth. There's almost no way they could develop one that only responds correctly like that. It'd have to be a fundamentally different technology.
Yep, the idea of truth or falsity is not part of the design, and if it was part of the design, it would be a different and vastly (like, many orders of magnitude) more complicated thing.
If, based on the training data, the most statistically likely series of words for a given prompt is the correct answer, it will give correct answers. Otherwise it will give incorrect answers. What it can never do is know the difference between the two.
> If, based on the training data, the most statistically likely series of words for a given prompt is the correct answer, it will give correct answers.
ChatGPT does not work this way. It wasn't trained to produce "statistically likely" output, it was trained for highly rated by humans output.
Not exactly. ChatGPT was absolutely trained to produce statistically likely output, it just had an extra training step added for human ratings. If they relied entirely on human ratings there would not have been sufficient data to train the model.
LLMs are not incapable of telling the truth. They tell the truth all the time. They're incapable of knowing if what they're telling is the truth.
That said, traditional search engines suffer from the same problem, although with different symptoms. Search engines point to web pages generated by people (or some other way) which may or may not contain accurate information. They base their selection of those pages based on things like strength of authority, keyword similarity, etc... But truth never directly enters the picture (indirectly by tweaking the "trusted" set of pages).
> LLMs are not incapable of telling the truth. They tell the truth all the time. They're incapable of knowing if what they're telling is the truth.
We're well into philosophy at this point, but I'd argue that telling the truth requires knowledge of the truth. Individual statements could be true, false, or neither. But I'd say neither an LLM nor a monkeys-with-typewriters generator could be said to tell the truth, because what they're doing is both orthogonal to truth and lacking in agency. If we stretch to thinking of it as having agency, then I think the correct term would be "bullshitting", aka producing speech "without regard for truth". [1]
>They're incapable of knowing if what they're telling is the truth.
Right, because all they're doing is regurgitating the data they were trained on. It's like the kid in your CS class that copies someone else's coding assignment and just changes all the function names, variable names, and comments but not the actual executable stuff and never even bothers to compile and run it to see if it produces the desired output.
I feel like people don't appreciate how hard this is. But maybe because I raised on Zork, I really appreciate this sort of stuff. Here's an example prompt I just gave to ChatGPT:
"If Tim drops a coconut on the tail of a dog, what are two things the dog might do?"
Response:
"If Tim drops a coconut on the tail of a dog, two possible things the dog might do are:
1. Yelp or bark loudly in pain, as the impact of the falling coconut could be quite painful.
2. Try to escape or run away, as the dog may associate the sudden impact with danger and feel the need to get away from the source of the pain. Additionally, the dog may feel scared or anxious, which can cause it to run away."
Then I said, "But what if it was a soft coconut that was as light as a feather?"
Response:
"If Tim dropped a soft coconut that was as light as a feather, it's possible that the dog wouldn't feel much or any discomfort from the impact. In this scenario, two things the dog might do are:
1. Look around to see what just hit it and possibly sniff at the coconut to investigate.
2. Continue with its previous activity, ignoring the soft impact as if it never happened. The dog may not perceive the coconut as a threat or something worth reacting to, especially if it was light enough to not cause any pain."
I just can't read these responses and think, "Ehh... just a mindless regurgitation as expected from any LLM". These simple prompt responses impress me and I kind of know the technology -- although my experience in RNNs/LSTM is very dated.
Honestly, I'd love to see Zork rewritten with ChatGPT as a parser. No more trying to figure out how write the prompt for how to use the key in the door!! :-)
> Honestly, I'd love to see Zork rewritten with ChatGPT as a parser. No more trying to figure out how write the prompt for how to use the key in the door!! :-)
That was done as AI Dungeon, but there was some consternation due to the combo of charging for it and GPT's predilection for generating wild and possibly illegal sex scenes even when you don't ask it to.
> Right, because all they're doing is regurgitating the data they were trained on.
That is not true, it's clearly able to generalize. (If it can do anagrams, it's silly to say it's just regurgitating the instructions for doing anagrams it read about.)
But it doesn't try to verify that what it says might be true before saying it.
It can't do anagrams though (every now and then it might get a common one right but in general it's bad at letter- based manipulations/ information, including even word lengths, reversal etc.).
It doesn't know what letters are because it sees BPE tokens, but if you forgive that it does something like it.
example prompt: Imagine I took all the letters in "Wikipedia" and threw them in the air so they fell on the ground randomly. What are some possible arrangements of them?
Similarly, it can almost do arithmetic but apparently forgets to carry digits. That's wrong but it's still generalization!
Interestingly enough, it immediately got "Hated for ill" (presumably because there are source texts that discuss that very anagram). But it took about 10 goes to finally a correct anagram for "Indebted sniper", though the best it could do was "pride-bent snide". I then asked it which world leader's title this might also be anagram of and it some how decided "Prime Minister" was a valid anagram of the same letters.
But regular search engines only regurgitate what they've indexed, yet don't invent outright nonsense when they don't know (if you asked Google who won the superbowl in 2024 the nature of the results make it clear it simply doesn't have that information. Though if you change it to "world cup" one of the top answers says "portugal was the defending champion, defeating Argentina". The result is titled "2024 futsal world cup"!)
I don't think it is concealing the origin, but likely doesn't actually know the origin. That said, I agree that if they can provide sources (even probabilistically), that would be a good step forward.
The model is capable of generating many different responses to the same prompt. An ensemble of fact checking models can be used to reject paths that contain "facts" that are not present in the reference data (i.e. a fixed knowledge graph plus the context).
My guess is that the fact checking is actually easier, and the models can be smaller since they should not actually store the facts.
Exactly. Given a source of truth, it can't be that hard to train a separate analytic model to evaluate answers from the existing synthetic model. (Neglecting for the moment the whole Gödel thing.)
The problem isn't going to be developing the model, it's going to be how to arrive at an uncontroversial source of ground truth for it to draw from.
Meanwhile, people are complaining that the talking dog they got for Christmas is no good because the C++ code it wrote for them has bugs. Give it time.
Not true or false; just present or absent in the reference data. Note that false negatives will not result in erroneous output, so the model can safely err on the side of caution.
Also 100% accuracy is probably not the real threshold for being useful. There are many low hanging fruits today that could be solved by absolutely tiny error correcting models (e.g. arithmetic and rhyming).
The missing piece seems to be that for certain questions it doesn't make sense to extrapolate, and that if it's a question about what will happen in the future, it should answer in a different manner (and from my own interactions with ChatGPT it does exactly that, frequently referring to the cut-off time of its training data).