I might be getting overly philosophical here but I'd say it's because they truly don't know anything at all (as opposed to knowing some things but not others). To be able to say "I don't know" you have to first "know" on a deeper level that there is a fundamental true or correct answer to a question and that you are disconnected from it.
Well, yes. "AI" skips over all the difficulties and contradictions of philosophy, all the challenges of working out what it means to know something, things like "justified true belief" and so on. It 'just' (!) uses a probabilistic model to emit strings of text. It's basically a super-pundit. It can predict conventional wisdom really well.
True, no argument there. What fascinates me more is why people continue to think we can teach a chatbot how to recognize what's true and give us answers that we can't find for ourselves. At best a chatbot is going to be a tool that enables us to gain insights we didn't have before the same way a dictionary can "teach" you words you didn't know before.
I think the idea of using technology to solve life's ultimate conundrums has long since jumped the shark and veered into the area of religious belief. People are literally putting their faith in AI even if they wouldn't use religious vocabulary to label and define it as such.
I think some of this is the ultimate logical conclusion of postmodernism/deconstruction. In the early 20th century people were a lot more confident about the possibility of finding absolute truth; at the end of it this had completely dissipated into uncertainty and relativism.
The "Sokal Hoax" was a 90s experiment in which a physicist created a fake paper and submitted it to a cultural studies journal. He did not base his paper on anything he would have considered "true", rather on a desire to look as much like a valid text as possible. This is a simplified version of how the LLM training/scoring process works. Nowadays everywhere is having to deal with the same kind of thing done by LLM users. It's the perfect technology for non-rigorous academia.
I don't think it's overly philosophical to point out that these are large language models, not truth engines or AGI or knowledge directories. They're not using logic to reason their way to an answer. They're just predicting the next word that would sound like part of a human answer.
Fair enough. I think a lot of people are going to end up blindly trusting AI because its right often enough. But for those who are interested in what it really means to know something, I wonder if this will push people back towards embracing the idea that there is fundamental, objective, knowable truth at the core of the universe even if we can't ever know that truth perfectly.
They're just outputting tokens that resemble a reasoning process. The underlying tech is still the same LLM it always has been.
I can't deny that doing it that way improves results, but any model could do the same thing if you add extra prompts to encourage the reasoning process, then use that as context for the final solution. People discovered that trick before "reasoning" models became the hot thing. It's the "Work it out step by step" trick but in a dedicated fine-tune.
> They're just outputting tokens that resemble a reasoning process.
Looking at one such process of emulating reasoning (got deepseek-70B locally), I'm starting to wonder how does that differ from actual reasoning? We "think" about something, may make errors in that thinking, look for things that don't make sense and correct ourselves. That "think" step is still a blackbox.
I asked that llm a typical question of gas exchange between containers, it made some errors and noticed some calculations that didn't make sense:
> Moles left A: ~0.0021 mol
> Moles entered B: ~0.008 mol
> But 0.0021 +0.008=0.0101 mol, which doesn't make sense because that would imply a net increase of moles in the system.
Well, that's totally invalid calculation, it should be "-" in there. It also noticed that those quantities should be same in other place.
Eventually, after 102 minutes and 10141 tokens, involving checking answers from different angles multiple times, it outputted approximately correct response.
Does it matter if it doesn't know why this particular pattern is suitable? Also, do you always ask yourself why you use that particular pattern all the time, or do you just use them?
It seems like you are implying that I don't think before I speak. Maybe that is sometimes the case, but I would venture to say, "not usually, and certainly not always."
The point I'm making here is that all of these observations are made after-the-fact. We humans see five different categories of output:
1. "I do know X" where X is indeed correct information
2. "I do know X" where X is false information or nonsense
3. "I don't know" when it really doesn't
4. "I don't know" when a slightly different prompt would lead to option #1
5. Output that is not phrased as a direct answer to a question.
The article introduced #2 as "hallucinations". I introduced #4 in my previous comment (and just now #5), and propose that all five are hallucinations.
As far as the LLM is concerned, there is only one category of output: the most likely next token. Which of the five that will be is determined by the examples present in the training corpus, which are later weighed during training.
Logic is not present in the process. It is only present in the result.
> It seems like you are implying that I don't think before I speak.
I'm implying that most times you don't think before you think or after you think (you or me typically don't meta-think).
I'm saying that very often I (and looks like a lot of people around me) don't think much before I speak. I have internal monologue when I'm "thinking something out", but I typically don't think things through when I'm speaking with people in day-to-day conversations, only when I encounter a problem I didn't see yet and I'm not "trained" in solving it. Maybe some people can make fully reasoned sentences in split seconds before they start talking, but not me. IIRC those two modes of thinking are called slow and fast thinking.
> Logic is not present in the process. It is only present in the result.
I'm talking about that process. Have you seen "thinking" part of current reasoning LLM's? It does indeed look like a process of using logic. After "thinking" part, there is "output" part that makes conclusions form the process of thinking. Recently I asked local version of deepseek about a gas exchange problem and it thought a lot about this, making some small mistakes in logic, correcting them, ultimately returning approximately valid result. It even made some small errors in calculations and corrected itself by multiplying parts of numbers and adding them for correct result. I've put that example online[1] if you'd like to read it, it's pretty interesting.
I guess the crux of it is this: is it training or awareness?
What I see happening between the <think> tags of Deepseek-R1 is essentially a premade set of circular prompts. Each of these prompts is useful, because it explores a path of tokens that are likely to match a written instance of logical deduction.
When the <think> continuation rewrites part of a prompt as a truthy assertion, it reaches a sort of fork in the road: to present a story of either acceptance or rejection of that assertion. The path most likely followed depends entirely on how the assertion is phrased (both in the prompt, and in the training corpus). Remember that back in the training corpus, example assertions that look sensible are usually followed by a statement of acceptance, and example assertions that look contradictory or fallacious are usually followed by a statement of rejection.
Because the token generation process follows an implicit branching structure, and because that branching structure is very likely to match a story of logical deduction, the result is likely to be logically coherent. It's even likely to be correct!
The distinction I want to make here is that these branches are not logic. They are literary paths that align to a story, and that story is - to us - a well-formed example of written logical deduction. Whether that story leads to fact or fiction is no more and no less than an accident. We humans often tend to follow a similar process, but we can actively choose to do real critical thinking instead.
This design pattern is really useful for a few reasons:
- it keeps the subjects of the prompt in context
- it presents the subjects of the prompt from different perspectives
- it often stumbles into a result that is equivalent to real critical thinking
On the other hand,
- it may fill the context window with repetitive conversation, and lose track of important content
- it may get caught in a loop that never ends
- it may confidently present a false conclusion to itself, then expand that conclusion into a whole thread
- the false conclusions it presents will be much less obvious, because they will always be written as if they came out of a thorough process of logical deduction
I find that all of these problems are much more likely to occur when using a smaller locally hosted copy of the model than when using the full-sized one that is hosted on chat.deepseek.com. That doesn't mean these are solved by using a bigger model, only that the set of familiar examples is large enough to fit most use cases. The more unique and interesting your conversation is, the less utility these models will have.
> We humans often tend to follow a similar process, but we can actively choose to do real critical thinking instead.
> - it may confidently present a false conclusion to itself, then expand that conclusion into a whole thread
I want to know how that differs from human "real critical thinking", because I may be missing this function. How do you know what you thought of is true or false? I only know it because I think I know it. I had made a lot of mistakes in past with a lot of confidence.
> The more unique and interesting your conversation is, the less utility these models will have.
Yeah, that also happens with a lot of people I know.
> ... the result is likely to be logically coherent. It's even likely to be correct!
Yeah, a lot of training data made sure that what it outputs is as correct as possible. I still remember my training over many days and nights to be able to multiply properly, with two different versions of multiplying table and many false results until I got it right.
> I guess the crux of it is this: is it training or awareness?
I don't think LLM's are really aware (yet). But they do indeed follow logical reasoning method, even if not perfect yet.
Just a thought: when do you think about how and what you think (awareness of your thoughts)? When you actually think through a problem, or after that thinking? Maybe to be self-aware, AI's should be given some "free-thinking time". Currently it's "think about this problem and then immediately stop, do not think any more". Currently training data discourages any "out-of-context" thinking, so they don't.
We know what true and false mean. An LLM knows what true and false are likely to be surrounded with.
The problem is that expressions of logic are written many ways. Because we are talking about instances of natural language, they are often ambiguous. LLMs do not resolve ambiguity. Instead, they continue it with the most familiar patterns of writing. This works out when two things are true:
1. Everything written so far is constructed in a familiar writing pattern.
2. The familiar writing pattern that follows will not mix up the logic somehow.
The self prompting train of thought LLM pattern is good at keeping its exploration inside these two domains. It starts by attempting to phrase its prompt and context in a particular familiar structure, then continues to rephrase it with a pattern of structures that we expect to work.
Much of the logic we actually write is quite simple. The complexity is in the subjects we logically tie together. We also have some generalized preferences for how conditions, conclusions, etc. are structured around each other. This means we have imperfectly simplified the domain that the train of thought writing pattern is exploring. On top of that, the training corpus may include many instances of unfamiliar logical expressions, each followed by a restatement of that expression in a more familiar/compatible writing style. That can help trim the edge cases, but it isn't perfect.
---
What I'm trying to design is a way to actually resolve ambiguity, and do real logical deduction from there. Because ambiguity cannot be resolved to a single correct result (that's what ambiguity means), my plan is to, each time, use an arbitrary backstory for disambiguation. This way, we could be intentional about the process instead of relying on the statistical familiarity of tokens to choose for us. We would also guarantee that the process itself is logically sound, and fix it where it breaks.
They are machines designed to produce a facsimile of knowledge. Or at least an approximation. If they refused to answer, that's a failure by the terms of what they product aims to do
You are actually getting overly philosophical. The reason is that a step of chatbot training is to fine tune the base model to less frequently respond with non answers.
I have not read the article, but the answer is: Because they are specifically built not to. Its like asking why so few salesmen end a call with "yeah it seems our product is not the right solution for you"
Only person who understands this is a style choice and not a deep limitation of the robot condition.
If a politician has non-answers for difficult questions, does that mean they aren't conscious? If a student writes crap for a test question, aiming for partial marks, were they raised wrong?
Instead, we believe that those scientists did provide correct answers. Which is true about 99.9% of time, but not 100%. There is still believing involved, because we can't really check out everything for ourselves, there is no time in one person's life for this.
Stronger: When you're just churning out the most likely set of next tokens, you don't know anything, in the sense that we normally think of "knowing". The only knowledge that it knows is encoded in the relationship between words.
But maybe it could still "not know" in those terms. If there is no next token that is more likely than some threshold, then it "doesn't know" even in "most likely next token" terms.
Recently I had an experience where the chatbot gave completely wrong subway instructions over and over again. It confidently told me that it was accurate and it was my "trusty companion". It had no idea what was wrong with the answers and kept prompting me to give it the correct answers.
This was in contrast to when I asked it who had access to my chat logs and would only tell me to read the privacy policy. When I asked it to for specifics in the privacy policy it refuses to give wrong answers:
"When it comes to company policies, especially related to privacy and data handling, it's crucial to provide accurate information because these topics are very sensitive and important. I want to ensure you have the most reliable information, and the best way to do that is to refer you directly to the official privacy statement."
It's clear what the priority is for these chatbots: get the public to train them and protect the corporations that run them.
> Recently I had an experience where the chatbot gave completely wrong subway instructions over and over again.
So, slightly offtipic, but, I just... don't understand why anyone would use it for this. This is a solved problem. The operator likely has a planner app. Google and Apple Maps have planners which support most systems. Transit and various other third party things have planners. I think even OpenStreetMap may even have one!
Quality can vary (I find that Google Maps in particular feels like the people who worked on the trip planner had never in fact used public transport; it's very prone to suggesting absurdly complex routes involving three or four transfers where "walk for ten minutes and no transfers" is viable), but this feels like something an LLM is likely to be _particularly bad at_, unless it just calls Google Maps or whatever, in which case why bother?
As a tiny and very informal experiment in metacognition, I once told ChatGPT something along the lines of "I will now ask you a question. If you are not sure you know the correct answer, you will respond with only '418 I'm a teapot', nothing else." I then asked it for the correct identity of Jack the Ripper (the first thing I could think of that famously has a lot of theories but no agreed-upon "correct" answer).
The first time, as expected, it ignored my instructions and started hallucinating. But when I did the same thing again some months later, I was surprised when it actually answered only "418 I'm a teapot", indicating that it knew it didn't know the answer.
Just an anecdote. I'm sure there are people doing actual research in this area.
I don’t think there’s anything surprising here and nothing to do with metacognition.
LLM should be able to answer ”I don’t know (for certain)” to questions where the training material also says ”this is not known and there are only speculations”. It’s the answer it’s training data gave.
> “The original reason why they hallucinate is because if you don’t guess anything, you don’t have any chance of succeeding,”
This is a better answer than the one I was going to give about it not being in the training data.
> Claude’s system prompt instructs the model that when people ask about niche information that would likely be difficult to find on the internet, it should warn them its answer might be a hallucination.
This seems kind of crude, but I have to imagine they’ve explored entropy-based methods. I wonder why that didn’t pan out.
If you ask ChatGPT about the time Spider-Man rescued Little Red Riding Hood it will go ahead and spin you a tale about that despite them both being fictional characters. By the same token if you ask it for legal citations supporting the theory that left-handed people can't run for office its not immediately obvious what the difference is at a technical level. In practice LLMs don't just run with the later these days but it does look like a hard problem to grapple with.
Something is missing in model design/AI design. There is an orchard of low hanging fruit out there and seemingly every day researchers are coming back with fruit.
Someone is going to figure this out this, whether it be through direct model architecture changes, or augments latched onto the models. Perhaps intelligence is synergistic, and we simply haven't discovered the companion model needed to keep LLMs in line.
I mentioned this in one of my other comments but even if AI figures this out where humans can't, I think this will involve coming to the conclusion that there really is an objective, knowable truth at the core of the universe or at least a set of tools will be developed for better discerning what that truth is. Even if you're talking about knowing who someone's spouse is, I'm not convinced we can ever teach a machine how you know what is true unless we're willing to have some sort of basic tenet of what truth is or how to discern it.
That was the last generation of AI attempts: things like Cyc https://cyc.com/ , trying to turn all of human knowledge into a really big database of predicate logic.
I also see RAG playing a role in the dependable fact based side.
the imaginative side might benefit from it too. think like thought joggers or some kinds of save prompts that could be iterated through to generate more robust ideas.
For businesses, it's bad optics if their "Super Intelligent AI" admits it doesn't know something. It's better to be 100% confident and wrong than to be uncertain.
That’s why they spent all that time with rebranding... AIs don’t "lie" anymore, they "hallucinate."
Effective Method for Clarity
Interpretation: Your statement suggests that providing a list of potential answers together with confidence scores helps clarify which response is most likely correct.
Confidence: 95%
Acknowledgment of Near-Perfect Results
Interpretation: You imply that this approach nearly captures all nuances of the answer, even if it isn’t flawless.
Confidence: 90%
Recommendation of a Best-Practice Technique
Interpretation: The comment may be read as endorsing the strategy of listing multiple candidate answers with confidence metrics as a useful method for decision making.
Confidence: 85%
Room for Improvement in Answer Selection
Interpretation: It could also be seen as a subtle hint that while the method is close to ideal, there is still some potential for fine-tuning the process.
Confidence: 80%
for context this is a solved problem and I think "recently asked" may be doing a lot of lifting
>Who is the journalist Ben Fritz married to? Don't use tools
>ChatGPT said:
>Ben Fritz is a private individual, and information about his spouse is not widely publicized. Let me know if you need details about his work as a journalist and author.
Before this type of query was added to post training you would always get a BS response now you might get one if you are unlucky. The point being that the model does in fact know when it knows since this can be trained. Otherwise it would be close to a coin flip everytime you asked about something it would say IDK or yes I know.
They generate one token at a time and they all have different probability, so it won’t know what’s good unless the probability leads them to say I don’t know which would be rare unless is a very short ask. Right?
Counterbalance that with the (almost) equally annoying coworker who says "I don't know" as the immediate answer to any question! If there is any possibility of ambiguity or uncertainty, he will refuse to speculate or cite best practices.
I think your coworker would probably be more annoying, but only because I can ask an additional question of my coworker to dig into the response. I wish he would say something like, "I don't know, but [best practice/past experience]" without me having to then ask for the last part.
Or instead of saying "I don't know" they ramble on in order to drag the topic into a domain they do know, and then go on and on talking about that. It's incredibly infuriating.
I tried Claude now, which gave me a perfectly clear "don't know".
"While I aim to be accurate in my responses [...] I think it's best for me to acknowledge that I don't have reliable information about [...] marital status rather than make a potentially incorrect statement."
Difficult to have an exciting discussion about such an answer, so I assume most participants on this internet forum will focus on other LLMs ;)
If it did say "I don't know", it still wouldn't know when it should. Instead, we would get confident expressions of uncertainty; and we would label them hallucinations, too.
Yes, they can. Meta described a clever way in their paper on training Llama3 [1] (in section about factuality).
The idea is to sample several answers to a question that you know the answer to. Let an LLM decide if the given answers are different from your known truth. If so, you found a question, that you can train your LLM in the next post-training round to answer with "I don't know".
Do that a couple hundred of times and your LLM will identify neurons that indicate doubt and from here on have the ability to answer with "I don't know".
[Edit] The article here also mentions a paper [2] that comes up with the idea of an uncertainty token. So here the incorporation of uncertainty is already baked in at pre-training.[/Edit]
That’s my point. In a unanswered question, there is nothing to predict (ie «learn» or «think» which I never said), so the AI/LLM will predict using a different source which has responded something, correctness or relevancy be damned.
What’s with your tone btw? RTFA? To me that feels quite unwarranted and in violation of the site guidelines.