> now quickly taking on the role of a general reasoning engine
And this right here is why it's important to emphasize the "stochastic parrot" fact. Because people think this is true and are making decisions based on this misunderstanding.
> The ersatz fluency and coherence of LMs raises several risks, precisely because humans are prepared to interpret strings belonging
to languages they speak as meaningful and corresponding to the
communicative intent of some individual or group of individuals
who have accountability for what is said
Is there any Researcher who maintains that LLM models contain Reasoning and intent?
Those who are working on this models are not confused, they know what they are, the Public is confused.
How is that different from "This evidence for X raises the risk that people falsely believe that X"? That's an argument for X, not against. And nothing in that paper, even if I discard the dross (ie. everything except one section on page 7), seems to actually make an argument against X of any strength beyond "it is wrong because it is wrong".
My point is this: I disagree with you. This is not because I have "misunderstood" something; it is because I understand the stochastic-parrot argument and think it is erroneous. And the more you talk about "the risk that people will come to falsely believe" rather than actual arguments, the less convincing you sound. This paternalistic tendency is a curse on science and debate in general.
> it is because I understand the stochastic-parrot argument and think it is erroneous.
Okay then, what exactly about it is erroneous? Because stochastically sorting the set M of known tokens by likelyhood of being the next, is literally what LLMs do.
There's a class of statements that can be either interpreted precisely, at which point the claim they make is clearly true but trivial, or interpreted expansively, at which point the claim is significant but no longer clearly true.
This is one of those: yes, technically LLMs are token predictors, but technically any nondeterministic Turing machine is a token predictor. The human brain could be viewed as a token predictor [1]. The interesting question is how it comes up with its predictions, and on this the phrase offers no insight at all.
> The human brain could be viewed as a token predictor
No it really couldn't, because "generating and updating a 'mental model' of the environment." is as different from predicting the next token in a sequence, as a bees dance is from a structured human language.
The mental model we build and update is not just based on a linear stream, but many parallel and even contradictory sensory inputs that we make sense of not as abstract data points, but as experiences in a world of which we are part of. We also have a pre-existing model summarizing our experience in the world, including their degradation, our agency in that world, and our intentionality in that world.
The simple fact that we don't just complete streams, but do so with goals, both immediate and long term, and fit our actions into these goals, in itself already shows how far a humans mental modeling is from the linear action of a language model.
But the human mental model is purely internal. For that matter, there is strong evidence that LLMs generate mental models internally. [1] Our interface to motor actions is not dissimilar to a token predictor.
> The mental model we build and update is not just based on a linear stream, but many parallel and even contradictory sensory inputs
So just like multimodal language models, for instance GPT-4?
> as experiences in a world of which we are part of.
> The simple fact that we don't just complete streams, but do so with goals, both immediate and long term, and fit our actions into these goals
Unfalsifiable! GPT-4 can talk about its experiences all day long. What's more, GPT-4 can act agentic if prompted correctly. [2] How do you qualify a "real goal"?
> For that matter, there is strong evidence that LLMs generate mental models internally.
Limited models, such as those representing the state of a game that it was trained to do: Yes. This is how we hope deep learning systems work in general.
But I am not talking about limited models. I am talking about ad-hoc models, built from ingesting the context and semantic meaning of a string of tokens, that can simulate reality and allows drawing logical conclusions from it.
In regard to my example given elsewhere in this HN thread: I know that Mike exits the elevator first because I build a mental model of what the tokens in the question represent. I can draw conclusions from that model, including new conclusions whos token-representation would be unlikely in the LLMs model, which doesn't explain anything about reality, but explains how tokens are usually ordered in the training set.
The relevant keyword you want is "zero-shot learning". (EDIT: Correction; "in-context learning". Sorry for that.) LLMs can pick up patterns from the context window purely at evaluation time using dynamic reinforcement learning. (This is one of those capabilities models seem to just pick up naturally at sufficient scale.) Those patterns are ephemeral and not persisted to memory, which I agree makes LLMs less general than humans, but that seems a weak objection to hang a fundamental difference in kind on.
edit: Correction: I can't find a source for my claim that the model specifically picks up reinforcement learning across its context as the algo that it uses to do ICL. I could have sworn I read that somewhere. Will edit a source in if I find it.
edit: Though I did find this very cool paper https://arxiv.org/abs/2210.05675 that shows that it's specifically training on language that makes LLMs try to work out abstract rules for in-context learning.
edit: https://arxiv.org/abs/2303.07971 isn't the paper I meant, since it only came out recently, but it has a good index of related literature and does a very clear analysis of ICL, demonstrating that models don't just learn rules at runtime but learn "extract structure from context and complete the pattern" as a composable meta-rule.
> In regard to my example given elsewhere in this HN thread: I know that Mike exits the elevator first because I build a mental model of what the tokens in the question represent. I can draw conclusions from that model, including new conclusions whos token-representation would be unlikely in the LLMs model, which doesn't explain anything about reality, but explains how tokens are usually ordered in the training set.
I mean. Nobody has unmediated access to reality. The LLM doesn't, but neither do you.
In the hypothetical, the token in your brain that represents "Mike" is ultimately built from photons hitting your retina, which is not a fundamentally different thing from text tokens. Text tokens are "more abstracted", sure, but every model a general intelligence builds is abstraction based on circumstantial evidence. Doesn't matter if it's human or LLM, we spend our lives in Plato's cave all the same.
> In the hypothetical, the token in your brain that represents "Mike"
Mike isn't represented by a token. "Mike" is a word I interpret into an abstract meaning in an ad-hoc created, and later updated or discarded model of a situation in which exist only the elevator, some abstract structure around it, and the laws of physics as I know them from knowledge and experience.
> built from photons hitting your retina, which is not a fundamentally different thing from text tokens.
The difference is not in how sensory input is gathered. The difference is in what that input represents. For the LLM the token represents...the token. That's it. There is nothing else. The token exists for its own sake, and has no information other than itself. It isn't something from which an abstract concept is built, it IS the concept.
As a consequence, an language model doesn't understand whether statements are false or nonsensical. It can say that a sequence is statistically less likely than another one, but that's it.
"Jenny leaves first" is less likely than "Mike leaves first".
But "Jenny leaves first" is probably more likely than "Mario stands on the Moon", which is more likely than "catfood dog parachute chimney cloud" which is more likely than "blob garglsnarp foobar tchoo tchoo", which in turn is probably more likely than "fdsba254hj m562534%($&)5623%$ 6zn 5)&/(6z3m z6%3w zhbu2563n z56".
To someone reaching the conclusion that Mike left the elevator first by drawing that conclusion from an abstract representation of the world, all these statements are equally wrong. To a language model, they are just points along a statistical gradient. So in a language models world a wrong statement can still somehow be "less wrong" than another wrong statement.
---
Bear in mind when I say all this, I don't mean to say (and I think I made that clear elsewhere in the thread) that this mimickry of reasoning isn't useful. It is, tremendously so. But I think it's valueable to research and understand the difference in mimicking reason by learning how tokens form reasonable sequences, and actual reasoning from abstracting the world into models that we can draw conclusions from.
Not in the least because I believe that this will be a key element in developing things closer to AGIs than the tools we have now.
> an ad-hoc created, and later updated or discarded model of a situation in which exist only the elevator, some abstract structure around it, and the laws of physics as I know them from knowledge and experience.
LLMs can do all of this. In fact, multimodality specifically can be shown to improve their physical intuition.
> The difference is not in how sensory input is gathered. The difference is in what that input represents. For the LLM the token represents...the token. That's it. There is nothing else. The token exists for its own sake, and has no information other than itself. It isn't something from which an abstract concept is built, it IS the concept.
The token has structure. The photons have structure. We conjecture that the photons represent real objects. The LLM conjectures (via reinforcement learning) that the tokens represent real objects. It's the exact same concept.
> As a consequence, an language model doesn't understand whether statements are false or nonsensical.
Neither do humans, we just error out at higher complexities. No human has access to the platonic truth of statements.
> So in a language models world a wrong statement can still somehow be "less wrong" than another wrong statement.
Of course, but so with humans? I have no idea what you're trying to say here. As with humans, in a LLM token improbability can derive from lots of different reasons, including world model violation, in-context rule violation, prior improbability and grammatical nonsense. In fact, their probability calibration is famously perfect, until RLHF ruins it. :)
> Bear in mind when I say all this, I don't mean to say (and I think I made that clear elsewhere in the thread) that this mimickry of reasoning isn't useful.
I fundamentally do not believe there is such a thing as "mimickry of reason". There is only reason, done more or less well. To me, it's like saying that a pocket calculator merely "mimicks math" or, as the quote goes, whether a submarine "mimicks swimming". Reason is a system of rules. Rules cannot be "applied fake"; they can only be computed. If the computation is correct, the medium or mechanism are irrelevant.
To quote gwern, if you'll allow me the snark:
> We should pause to note that a Clippy² still doesn’t really think or plan. It’s not really conscious. It is just an unfathomably vast pile of numbers produced by mindless optimization starting from a small seed program that could be written on a few pages. It has no qualia, no intentionality, no true self-awareness, no grounding in a rich multimodal real-world process of cognitive development yielding detailed representations and powerful causal models of reality; it cannot ‘want’ anything beyond maximizing a mechanical reward score, which does not come close to capturing the rich flexibility of human desires, or historical Eurocentric contingency of such conceptualizations, which are, at root, problematically Cartesian. When it ‘plans’, it would be more accurate to say it fake-plans; when it ‘learns’, it fake-learns; when it ‘thinks’, it is just interpolating between memorized data points in a high-dimensional space, and any interpretation of such fake-thoughts as real thoughts is highly misleading; when it takes ‘actions’, they are fake-actions optimizing a fake-learned fake-world, and are not real actions, any more than the people in a simulated rainstorm really get wet, rather than fake-wet. (The deaths, however, are real.)
> I fundamentally do not believe there is such a thing as "mimickry of reason". There is only reason, done more or less well.
if transaction.amount > MAX_TRANSACTION_VOLUME:
transaction.reject()
else:
transaction.allow()
Is this code reasoning? It does, after all, take input and make a decision that is dependent on some context, the transactions amount. It even has a model of the world, albeit a very primitive one.
No, of course it isn't. But it mimicks the ability to do the very simple reasoning about whether or not to allow a transaction, to the point where it could be useful in real applications.
So yes, there is mimicry of reasoning, and it comes in all scales and levels of competence, from simple decision making algorithms, purely mechanical contraptions such as overpressure-valves, all the way up to highly sophisticated ones that use stochastic analysis of sequence probabilities to show the astonishing skills we see in LLMs.
I feel this is mostly going to come down to how we define the word. I suspect we agree that there's no point in differentiating "reasoning" from "mimicked reasoning" if the performed actions are identical in every situation.
So let's ask differently: what concrete problem do you think LLMs cannot solve?
> what concrete problem do you think LLMs cannot solve?
From the top of my head:
Drawing novel solutions from existing scientific data for one. Extracting information from incomplete data that is only apparent by reasoning (such as my code-bug example given elsewhere in this thread), aka. assuming hidden factors. Complex math is still beyond them, predictive analysis requiring inference is an issue.
They also still face the problem of, as has been anthropomorphized so well, "fantasizing", especially during longer conversations; which is cute when they pretend that footballs fit in coffee-cups, but not so cute when things like this happens:
These certainly don't matter for the things I am using them for, of course, and so far, they turn out to be tremendously useful tools.
The trouble, however, is not with the problems I know they cannot, or cannot reliably, solve. The problem is with as of yet unknown problems where humans, me included, might assume they can solve, and suddenly it turns out they can't. What these problems are, time will tell. So far we have barely scratched the surface of introducing LLMs in our tech products. So I think it's valueable to keep in mind that there is, in fact, a difference between actually reasoning, and mimicking it, even if the mimicry is to a high standard. If for nothing else, then only to remind us to be careful in how, and for what, we use them.
I mean, do you think a LLM cannot draw a novel solution from existing data, fundamentally, because its reasoning is "of the wrong kind"? That seems potentially disprovable. - Or do you just think current products can't do it? I'd agree with that.
What's the easiest novel scientific solution that AI couldn't find if it wasn't in its training set?
No, because it doesn't reason, period. Stochastic analysis of sequence probabilities != Reasoning. I explained my thoughts on the matter in this thread to quite some extend.
> That seems potentially disprovable.
You're welcome to try and disprove it. As for prior research on the matter:
And afaik, Galactica wasn't even intended to do novel research, it was only intended for the, time consuming but comparably easier, tasks of helping to summarize existing scientific data, ask questions about it in natural language and write "scientific code".
Alright, I'll keep an eye open for instances of networks doing scientific reasoning.
(My own belief is that reasoning is 95% habit and 5% randomness, and that networks don't do it because it hasn't been reflected in their training sets, and they can't acquire the skills because they can't acquire any skills not in the training set.)
>Is there any Researcher who maintains that LLM models contain Reasoning and intent?
That's the funny thing about the (in)famous OpenAI letter; the first sentence kind of does this:
>AI systems with human-competitive intelligence can pose profound risks to society and humanity, as shown by extensive research[1]
'human-competitive intelligence' sounds like reasoning to me. What's even funnier is that [1] is the stochastic parrot paper, which argues exactly the opposite!
> 'human-competitive intelligence' sounds like reasoning to me.
Yes, and when AIs reach that level of intelligence, we can revisit the question.
However, as long as LLMs will confidently try to explain to me why several footballs fit in an average coffee mug, I'd say we are still quite some way away from "human-competitive intelligence".
Yes, I fully agree with you! there's no reasoning or intelligence in modern LLMs, but the OpenAI open letter and recent comms strongly implies there is
Since ChatGPT I've become much more aware of my own thoughts and written text. I'm now often wondering whether I'm just regurgitating the most frequently used next word or phrase or whether it could actually be described as original. Especially, for things like reacting with short answers to chat messages, I am confident that these are only reactionary answers without alternatives, which could have come from ChatGPT trained on my chat log. I feel like knowing and seeing how ChatGPT works can elevate our own thinking process. Or maybe it only is similar to awareness meditation.
In this vein, ChatGPT is a nice way to start thinking about any topic; ask it about its opinion on anything and it will give you the most run-of-the-mill, middle-of-the-road text that is possible, standing for nothing. If you find yourself agreeing with ChatGPT it may be time to reconsider your own thinking!
LLMs may be an approximation of our knowledge, but understand that there’s more to reasoning than a language model. GPT understands how tokens relate to each other, but humans understand how objects, places, and abstract concepts relate to each other as well. We have a little further to go for AGI.
I do believe that GPT4 is a really good approximation of our language though, and feel similarly to you when I respond off the cuff.
> I think we’re now way past that now with LLMs now quickly taking on the role of a general reasoning engine.
No we're not, and no they are not.
An LLM doesn't reason, period. It mimics reasoning ability by stochastically chosing a sequence of tokens. Alot of the time these make sense. At other times, they don't make any sense. I recently asked an LLM:
"Mike leaves the elevator at the 2nd floor. Jenny leaves at the 9th floor. Who left the elevator first?"
It answered correctly that Mike leaves first. Then I asked:
"If the elevator started at the 10th floor, who would have left first?"
And the answer was that Mike still leaves first, because he leaves at the 2nd floor, and that's the first floor the elevator reaches. Another time I asked an LLM how many footballs fit in a coffe-mug, and the conversation reached a point where the AI tried to convince me, that coffe-mugs are only slightly smaller than the trunk of a car.
Yes, they can also produce the correct answers to both these questions, but the fact that they can also spew such complete illogical nonsense shows that they are not "reasoning" about things. They complete sequences, that's it, period, that's literally the only thing a language model can do.
Their apparent emergent abilities look like reasoning, in the same way as Jen from "The IT crowd" can sound like shes speaking Italian, when in fact she has no idea what she is even saying.
> but the fact that they can also spew such complete illogical nonsense shows that they are not "reasoning" about things
Have you ever seen the proof that 2=1 ? It looks convincing, but it's illogical because it has a subtle flaw. Are the people who can't spot the flaw just "looking like they are reasoning", but really they just lack the ability to reason? Are witnesses who unintentionally make up memories in court cases lacking reasoning? Are children lacking reasoning when you ask them why they drew all over the walls and they make up BS?
You can't just spout that an LLM lacks reasoning without first strictly defining what it means to reason. Everybody keeps going on and on about how an LLM can't possibly be intelligent/reasoning/thinking/sentient etc. All of these are extremely vague and fuzzy words that have no unambiguous definition. Until we can come up with hard metrics that define these terms, nobody is correct when they spout their own nonsense that somehow proves the LLM doesn't fit into their specific definition of fill in the blank.
> Are the people who can't spot the flaw just "looking like they are reasoning", but really they just lack the ability to reason?
Lacking relevant information or insight into a topic, isn't the same as lacking the ability to reason.
> You can't just spout that an LLM lacks reasoning without first strictly defining what it means to reason.
Perfectly worded definition available on Wikipedia:
Reason is the capacity of consciously applying logic by drawing conclusions from new or existing information, with the aim of seeking the truth.
"Consciously", "logic", and "seeking the truth" are the operative terms here. A sequence predictor does none of that. Looking at my above example: The sequence "Mike leaves the elevator first" isn't based on logical thought, or a conscious abstraction of the world built from ingesting the question. It's based on the fact that this sequence has statistically a higher chance to appear after the sequence representing the question.
How does our reasoning work? How do humans answer such a question? By building an abstract representation of the world based on the meaning of the words in the question. We can imagine Mike and Jenny in that Elevantor, we can imagine the elevator moving, floor numbers have meaning in the environment, and we understand what "something is higher up" means. From all this we build a model and draw conclusions.
How does the "reasoning" in the LLM work? It checks which tokens are likely to appear after another sequence of tokens. It does so by having learned how we like to build sequences of tokens in our language. That's it. There is no modeling of the situation going on, just stochastic analysis of a sequence.
Consequently, an LLM cannot "seek truth" either. If a sequence has a high chance of appearing in a position, it doesn't matter if it is factually true or not, or even logically sound. The model isn't trained on "true or false". It will, likely more often than not say things that are true, but not because it understands truth, but because the training data contain a lot of token sequences that, when interpreted by a human mind, state true things.
Lastly, imagine trying to apply a language model to an area that depends completely on the above definition of reasoning as a consequence of modeling the world based on observations and drawing new conclusions from that modeling.
> Until we can come up with hard metrics that define these terms, nobody is correct when they spout their own nonsense that somehow proves the LLM doesn't fit into their specific definition of fill in the blank.
"Consciously", "logic", and "seeking the truth" are not objectively verifiable metrics of any kind.
I'll repeat what I said: Until we come up with hard metrics that define these terms, nobody can be correct. I'll take investopedia's definition for what a metric means, as that embodies the idea I was getting at the most succinctly:
> Metrics are measures of quantitative assessment commonly used for assessing, comparing, and tracking performance or production.[0]
So, until we can quantitatively assess how an LLM performs compared to a human in "consciousness", "logic", and "seeking the truth", whatever ambiguous definition you throw out there will not confirm or deny whether an LLM embodies these traits as opposed to a human embodying these traits.
The sequence "Mike leaves the elevator first" has a high statistical probability. The sequence "Jenny leaves the elevator first" has a lower probability that that. But it probably has still a much higher probability than "Michael is standing on the Moon", which in turn may be more likely than "Car dogfood sunshine Javascript", which is still probably more likely than "snglub dugzuvutz gummmbr ha tcha ding dong".
Note that none of these sequences are wrong in the world of a language model. They are just increasingly unlikely to occur in that position. To us with our ability to reason by logically drawing conclusions from an abstract internal model of the world, all these other sequences either represent false statements, or nonsensical word sald.
GPT-4 reasons about a lot of gotcha logic puzzles correctly any pre GPT-4 opinions should be reconsidered, that is effectively two different epochs in the history of AI effectiveness and reasoning.
> Me: Mike leaves the elevator at the 2nd floor. Jenny leaves at the 9th floor. Who left the elevator first?
> GPT-4: Mike left the elevator first, as he got off at the 2nd floor, while Jenny left at the 9th floor.
> Me: If the elevator started at the 10th floor, who would have left first?
> GPT-4: If the elevator started at the 10th floor and went downward, then Jenny would have left first, as she got off at the 9th floor, while Mike left at the 2nd floor.
> Me: How many footballs fit in a coffe-mug?
> GPT-4: A standard football (soccer ball) has a diameter of around 22 centimeters (8.65 inches), while a coffee mug is typically much smaller, with a diameter of around 8-10 centimeters (3-4 inches). Therefore, it is not possible to fit a standard football inside a coffee mug. If you were to use a mini football or a much larger mug, the number of footballs that could fit would depend on the specific sizes of the footballs and the mug.
It easily answered all of your questions and produces explanations I would expect most reasonable people to make.
That changes exactly nothing about the validity of my statement.
Yes, GPT-4 is better at this mimicry than GPT-3 or GPT-3.5. And GPT-3 was better at it than GPT-2. And all of them were better than my out-of-fun home-built Language Model projects that I trained on small <10GiB Datasets, which in turn were better at it than my Poc models
trained on just a few thousand words.
But being better at mimicking reason, is still not reasoning. The model doesn't know what a coffeemug is, and it doesn't know what a football is. It also has no idea how elevators work. It can form sequences that make it look to us that it does and knows all these things, but in reality, it only knows that "then Jenny would have left first" is a more likely sequence of tokens at that point, given that the sequence before included "started at the 10th floor".
Bear in mind, this doesn't mean that this mimicry isn't useful. It is, tremendously so. I don't care how I get correct answers, I only care that I do.
Simple: I know that humans have intentionality and agency. They want things, they have goals both immediate and long term. Their replies are based not just on the context of their experiences and the conversation but their emotional and physical state, and the applicability of their reply to their goals.
And they are capable of coming up with reasoning about topics for which they have no prior information, by applying reasonable similarities. Example: Even if someone never heard the phrase "walking a mile in someone elses shoes", most humans (provided they speak english) have no difficulty in figuring out what this means. They also have no trouble figuring out that this is a figure of speech, and not a literal action.
>Simple: I know that humans have intentionality and agency. They want things, they have goals both immediate and long term. Their replies are based not just on the context of their experiences and the conversation but their emotional and physical state, and the applicability of their reply to their goals.
This all seems orthogonal to reasoning, but also who is to say that somewhere in those billions of parameters there isn't something like a model of goals and emotional state? I mean, I seriously doubt it, but I also don't think I could evidence that.
Correct, but the problem is how you prove that for humans is by using the output and inferring that. You can apply the same criteria to ML models. If you don't, you need some other criteria to rule out that assumption for ML models.
For humans I can simply refer to my own internal state and look at how I arrive by conclusions.
I am of course aware that this is essentially a form of Ipse dixit, but I will do it anway in this case, because I am saying it as a human, about humans, and to other humans, and so the audience can just try it for themselves.
> You assume that. You can only maybe know that about yourself.
I can also only say with certainty that planetary gravity is an attracting force on the very spot I am standing on. I haven't visited every spot on every planet in the universe after all.
That doesn't make it any more likely that my extrapolation of how gravity works here is wrong somewhere else. Russels Teapot works both ways.
> How do you know that the ML model doesn't?
For the same reason why I know that a Hammer or an Operating System don't. I know how they work. Not in the most minute details, and of course the actual model is essentially a black box, but it's architecture, and MO are not.
It completes sequences. That is all it does. It has no semantic understanding of the things these sequences represent. It has no understanding of true or false. It doesn't know math, it doesn't know who person xyz is, it doesn't know that 1993 already happened and 2221 did not. It cannot have abstract concepts of the things represented by the sequences, because the sequences are the things in its world.
It knows that a sequence is more or less likely to follow another sequence. That's it.
From that limited knowledge however, it can very successfully mimick things like math, logic, and even reasoning to an extend. And it can mimick them well enough to be useful in a lot of areas.
But that mimickry, however useful, is still mimickry. It's still the Chinese-Room thought experiment.
However, I don't really accept the idea that this isn't reasoning, but I'm not entirely sold either way.
I'd say if it mimics something well enough then eventually it's just doing the thing, which is the same side of the argument I fall on with Searle's Chinese Room Argument. If you can't discern a difference, is there a difference?
So far GPT-4 can produce better work than like 50% of humans and better responses to brain teaser questions than most of them too, I'm at least just in a bubble and so I don't run into people that stupid that often. So it's easier for me to see the gaps still.
> I'd say if it mimics something well enough then eventually it's just doing the thing
Right up to the point where it actually needs to reason, and the mimickry doesn't suffice.
My above example about the Football and the Coffemug is an easy one, the objects are well represented in its training data. What if I need a reason why the Service Ping spikes every 60 seconds, here is the code, please LLM look it up. I am sure I will get a great and well written answer.
I am also sure it won't be the correct one, which is that some dumb script I wrote, which has nothing to do with the code shown, blocks the server for about 700ms every minute.
Figuring out that something cannot be explained with the data represented, and thus may come from a source unseen, is one example of actual reasoning. And this "giving up on the data shown" is something I have yet to see any AI do.
I could say the same about most second rate software engineers. Thats why im not moved by your arguments. Theres plenty of peope just as stupid and who will give you confidently wrong answers.
I think we’re now way past that now with LLMs now quickly taking on the role of a general reasoning engine.