> but at the end of the day they could just be madly improvising for all you know
I think it's important to draw a line between the kind of madness and improvisation a human performs in these situations vs. an LLM.
While it's still very possible that the LLM spits out something entirely wrong, the reasons for this wrongness are going to be quite different than the reasons a human performing improv will improvise something - anything - that keeps the improvisation going.
The human is still fundamentally limited by the corpus of knowledge available to them, and by the limits of human cognition in processing all available information and generating a result.
While I think the general point of thinking of an LLM as improvising vs. Googling is interesting to consider, I don't think it gets us closer to a useful mental model. A good LLM is still going to be right about most of the things it "improvises", even though it doesn't "know" anything. The person doing improv is not concerned about saying things that are factually true, and is focused on dynamically generating a story or joke or narrative based only on the immediate context.
So while I do recognize some similarities, I don't think the similarities help us think about LLMs in a more useful way.
> The person doing improv is not concerned about saying things that are factually true, and is focused on dynamically generating a story or joke or narrative based only on the immediate context.
I disagree, as this is exactly what LLMs do! Its literally the function of a text completion transformers model.
OpenAI has just "trained" their improv actor to speak like an an assistant and educated it well. They tell the model to roleplay an assistant in the script, and injects all sorts of facts and formatting into their script the user doesn't see. I believe they constrain grammar and post process it too. But ultimately the text is written by a heavily constrained improv actor that will absolutely improvise anything to keep the text flowing until the stop token arrives, even if truth from memory/context is usually the most likely improvisation.
Except this is not exactly what LLMs do. This is quickly crossing into an anthropomorphism fallacy.
The thing that matters most is how accurate (or not) the LLM is, regardless of how it arrived at its output.
A human doing improv bears some surface level resemblance to an LLM processing text, but neither the context nor the mechanics are actually similar, nor are the theoretical limits of computing power.
The broader point is that this mental model doesn’t help us better understand LLMs, and as LLMs continue to improve, any resemblance to “improv” will be less and less relevant (not to mention inaccurate).
I don't mean to anthropomorphize it, but I am trying to drive the point that current LLMs are text completion models. The current architectures continue a script, one token at a time... that's what it does, even if its a trillion trillion parameters and heavily aligned on all all accurate human communication ever.
And that's similar enough to the basic mental loop of a "trained" improv actor to serve as a metaphor, I think.
> And that's similar enough to the basic mental loop of a "trained" improv actor
My point is that this is only superficially similar, and to claim deeper similarity is the anthropomorphic fallacy.
To form a useful mental model, the mechanics need to be similar enough to inform the person evaluating the system as the underlying system evolves.
As LLMs evolve, the improv analogy becomes less and less useful because the LLM gets more and more accurate while the person doing improv is still just doing improv. The major difference being that people doing improv aren’t trying to be oracles of information. Perfecting the craft of improv may have nothing to do with accuracy, only believability or humor.
More generally, thinking of LLMs as analogous to intelligent humans introduces other misconceptions, leads people to over-estimate accuracy, etc.
The oddity at the bottom of all of this is that eventually, LLMs may be considered accurate enough to be used reliably as sources of information despite this “improvisation”, at which point it’s not really fair to call it improv, which would massively undersell the utility of the tool. But a perfectly accurate LLM may still not match any of the primary characteristics of an intelligent human, and it won’t matter as long as it’s useful.
> A good LLM is still going to be right about most of the things it "improvises", even though it doesn't "know" anything.
When I check on the factual details of what ChatGPT tells me, it seems to be wrong a lot of the time. Maybe not most, but I've stopped relying on it for factual information.
I think this is a matter of framing. When compared to a system that does nothing but assembles grammatically correct sentences, the correctness of LLM output is impressively high.
But ultimately I’m not saying that the accuracy of LLMs is sufficient, just that thinking about LLMs as improv machines isn’t very helpful.
It has an encyclopedia around and a bit of human-scale time to consult it before coming up with the next line, but it's still more of an improv actor playing an expert than an actual expert.
Sufficiently clever improv actor with sufficiently complete encyclopedia will at some point become indistinguishable from actual experts, but LLMs are still far from that point in general. They can already be great at fooling non-experts though, like an actor that needs to be just factually-correct enough to be believable in their role.
It sounds like you agree then and its only a matter of the corpus at hand. An improv actor would be called a comedian if they tried to interpolate based on spotty context.
It’s a matter of the corpus at hand and the predictability of a human vs. a computer.
With more training and improved algorithms, the computer can reach a high rate of accurate “improvisation”, at which point it doesn’t matter if it’s improv or not.
My issue is with using improv to augment our mental models of LLMs, and I don’t agree that this is the end result. Taking this analogy too far misleads us about LLMs.
I think it's important to draw a line between the kind of madness and improvisation a human performs in these situations vs. an LLM.
While it's still very possible that the LLM spits out something entirely wrong, the reasons for this wrongness are going to be quite different than the reasons a human performing improv will improvise something - anything - that keeps the improvisation going.
The human is still fundamentally limited by the corpus of knowledge available to them, and by the limits of human cognition in processing all available information and generating a result.
While I think the general point of thinking of an LLM as improvising vs. Googling is interesting to consider, I don't think it gets us closer to a useful mental model. A good LLM is still going to be right about most of the things it "improvises", even though it doesn't "know" anything. The person doing improv is not concerned about saying things that are factually true, and is focused on dynamically generating a story or joke or narrative based only on the immediate context.
So while I do recognize some similarities, I don't think the similarities help us think about LLMs in a more useful way.