LLMs are incapable of telling the truth. There's almost no way they could develo...

CommieBobDole · on Feb 13, 2023

Yep, the idea of truth or falsity is not part of the design, and if it was part of the design, it would be a different and vastly (like, many orders of magnitude) more complicated thing.

If, based on the training data, the most statistically likely series of words for a given prompt is the correct answer, it will give correct answers. Otherwise it will give incorrect answers. What it can never do is know the difference between the two.

astrange · on Feb 13, 2023

> If, based on the training data, the most statistically likely series of words for a given prompt is the correct answer, it will give correct answers.

ChatGPT does not work this way. It wasn't trained to produce "statistically likely" output, it was trained for highly rated by humans output.

mrtranscendence · on Feb 13, 2023

Not exactly. ChatGPT was absolutely trained to produce statistically likely output, it just had an extra training step added for human ratings. If they relied entirely on human ratings there would not have been sufficient data to train the model.

astrange · on Feb 13, 2023

The last step is what matters. "Statistically likely" is very underdetermined anyway, answering everything with "e" is statistically likely.

(That's why original GPT3 is known for constantly ending up in infinite loops.)

svachalek · on Feb 13, 2023

"e" is not a likely response to anything. I think you are not understanding the type of statistics involved here.

astrange · on Feb 14, 2023

GPT3 doesn't create "responses". Not till it's been trained to via RLHF.

kenjackson · on Feb 13, 2023

LLMs are not incapable of telling the truth. They tell the truth all the time. They're incapable of knowing if what they're telling is the truth.

That said, traditional search engines suffer from the same problem, although with different symptoms. Search engines point to web pages generated by people (or some other way) which may or may not contain accurate information. They base their selection of those pages based on things like strength of authority, keyword similarity, etc... But truth never directly enters the picture (indirectly by tweaking the "trusted" set of pages).

wpietri · on Feb 13, 2023

> LLMs are not incapable of telling the truth. They tell the truth all the time. They're incapable of knowing if what they're telling is the truth.

We're well into philosophy at this point, but I'd argue that telling the truth requires knowledge of the truth. Individual statements could be true, false, or neither. But I'd say neither an LLM nor a monkeys-with-typewriters generator could be said to tell the truth, because what they're doing is both orthogonal to truth and lacking in agency. If we stretch to thinking of it as having agency, then I think the correct term would be "bullshitting", aka producing speech "without regard for truth". [1]

[1] https://en.wikipedia.org/wiki/On_Bullshit

DebtDeflation · on Feb 13, 2023

>They're incapable of knowing if what they're telling is the truth.

Right, because all they're doing is regurgitating the data they were trained on. It's like the kid in your CS class that copies someone else's coding assignment and just changes all the function names, variable names, and comments but not the actual executable stuff and never even bothers to compile and run it to see if it produces the desired output.

kenjackson · on Feb 13, 2023

I feel like people don't appreciate how hard this is. But maybe because I raised on Zork, I really appreciate this sort of stuff. Here's an example prompt I just gave to ChatGPT:

"If Tim drops a coconut on the tail of a dog, what are two things the dog might do?"

Response: "If Tim drops a coconut on the tail of a dog, two possible things the dog might do are:

1. Yelp or bark loudly in pain, as the impact of the falling coconut could be quite painful.

2. Try to escape or run away, as the dog may associate the sudden impact with danger and feel the need to get away from the source of the pain. Additionally, the dog may feel scared or anxious, which can cause it to run away."

Then I said, "But what if it was a soft coconut that was as light as a feather?"

Response: "If Tim dropped a soft coconut that was as light as a feather, it's possible that the dog wouldn't feel much or any discomfort from the impact. In this scenario, two things the dog might do are:

1. Look around to see what just hit it and possibly sniff at the coconut to investigate.

2. Continue with its previous activity, ignoring the soft impact as if it never happened. The dog may not perceive the coconut as a threat or something worth reacting to, especially if it was light enough to not cause any pain."

I just can't read these responses and think, "Ehh... just a mindless regurgitation as expected from any LLM". These simple prompt responses impress me and I kind of know the technology -- although my experience in RNNs/LSTM is very dated.

Honestly, I'd love to see Zork rewritten with ChatGPT as a parser. No more trying to figure out how write the prompt for how to use the key in the door!! :-)

astrange · on Feb 13, 2023

> Honestly, I'd love to see Zork rewritten with ChatGPT as a parser. No more trying to figure out how write the prompt for how to use the key in the door!! :-)

That was done as AI Dungeon, but there was some consternation due to the combo of charging for it and GPT's predilection for generating wild and possibly illegal sex scenes even when you don't ask it to.

ec109685 · on Feb 14, 2023

Exactly, there is more there. Here's an example where it gets theory of mind questions right: https://arxiv.org/abs/2302.02083

astrange · on Feb 13, 2023

> Right, because all they're doing is regurgitating the data they were trained on.

That is not true, it's clearly able to generalize. (If it can do anagrams, it's silly to say it's just regurgitating the instructions for doing anagrams it read about.)

But it doesn't try to verify that what it says might be true before saying it.

wizofaus · on Feb 13, 2023

It can't do anagrams though (every now and then it might get a common one right but in general it's bad at letter- based manipulations/ information, including even word lengths, reversal etc.).

astrange · on Feb 13, 2023

It doesn't know what letters are because it sees BPE tokens, but if you forgive that it does something like it.

example prompt: Imagine I took all the letters in "Wikipedia" and threw them in the air so they fell on the ground randomly. What are some possible arrangements of them?

Similarly, it can almost do arithmetic but apparently forgets to carry digits. That's wrong but it's still generalization!

wizofaus · on Feb 14, 2023

Interestingly enough, it immediately got "Hated for ill" (presumably because there are source texts that discuss that very anagram). But it took about 10 goes to finally a correct anagram for "Indebted sniper", though the best it could do was "pride-bent snide". I then asked it which world leader's title this might also be anagram of and it some how decided "Prime Minister" was a valid anagram of the same letters.

wizofaus · on Feb 13, 2023

But regular search engines only regurgitate what they've indexed, yet don't invent outright nonsense when they don't know (if you asked Google who won the superbowl in 2024 the nature of the results make it clear it simply doesn't have that information. Though if you change it to "world cup" one of the top answers says "portugal was the defending champion, defeating Argentina". The result is titled "2024 futsal world cup"!)

lr4444lr · on Feb 13, 2023

Traditional search engines aren't putting their imprimatur onto information by concealing its origin.

kenjackson · on Feb 13, 2023

I don't think it is concealing the origin, but likely doesn't actually know the origin. That said, I agree that if they can provide sources (even probabilistically), that would be a good step forward.

mortehu · on Feb 13, 2023

The model is capable of generating many different responses to the same prompt. An ensemble of fact checking models can be used to reject paths that contain "facts" that are not present in the reference data (i.e. a fixed knowledge graph plus the context).

My guess is that the fact checking is actually easier, and the models can be smaller since they should not actually store the facts.

CamperBob2 · on Feb 13, 2023

Exactly. Given a source of truth, it can't be that hard to train a separate analytic model to evaluate answers from the existing synthetic model. (Neglecting for the moment the whole Gödel thing.)

The problem isn't going to be developing the model, it's going to be how to arrive at an uncontroversial source of ground truth for it to draw from.

Meanwhile, people are complaining that the talking dog they got for Christmas is no good because the C++ code it wrote for them has bugs. Give it time.

swatcoder · on Feb 13, 2023

That’s quite the system that can take in any natural language statement and confirm whether its true or false.

You might be underestimating the scope of some task here.

mortehu · on Feb 13, 2023

Not true or false; just present or absent in the reference data. Note that false negatives will not result in erroneous output, so the model can safely err on the side of caution.

Also 100% accuracy is probably not the real threshold for being useful. There are many low hanging fruits today that could be solved by absolutely tiny error correcting models (e.g. arithmetic and rhyming).

astrange · on Feb 13, 2023

There's research showing you can tell if something is a hallucination or memorized fact based on the activation patterns inside the LM.

wizofaus · on Feb 13, 2023

The missing piece seems to be that for certain questions it doesn't make sense to extrapolate, and that if it's a question about what will happen in the future, it should answer in a different manner (and from my own interactions with ChatGPT it does exactly that, frequently referring to the cut-off time of its training data).