Hacker News new | past | comments | ask | show | jobs | submit login

Exactly this. I too find this to be the best intuition for LLMs right now: they're not comparable to an entire combined human mind - they're comparable to subconscious, or inner voice (as in, the part of your subconscious that interfaces with your conscious using language - aka. "the voice in your head", if you have one).

So, as you say, if we had as much training as those LLMs, we'd be similarly good at coding by gut feel, with barely a conscious thought - and that's across pretty much any domain and technology that existed today. Compare with generic LLMs: a typical adult will be quite adept at saying somewhat coherent things on autopilot when prompted (!), which is reasonable given nearly two decades of constant exposure to natural language as written and spoken - but that same adult will be nowhere as good at this as GPT-4, and definitely not across so many different domains.




"Exactly this. I too find this to be the best intuition for LLMs right now: they're not comparable to an entire combined human mind - they're comparable to subconscious, or inner voice"

Strongly disagree.

LLM traps you inside an intellectual bell curve.


What is your take then? And please don't say "stochastic parrot" or "hype train".


I view it as long-form autocomplete.


> I view it as long-form autocomplete.

My wife sometimes views me as long-form autocomplete, and sometimes as a spell and grammar checker. Hell, my reply to your comment here is indistinguishable from a "long-form autocomplete".

Point being, that autocomplete has to work in some way. Our LLM autocompletes have been getting better and better at zero-shot completion to arbitrary long-form text, including arbitrary simulated conversations with a simulated human, without commensurate increase in complexity or resource utilization. This means they're getting better and better at compressing their training data - but in the limit, what is the difference between compression and understanding? I can't prove it formally, but I rather strongly believe they are, fundamentally, the same thing.

Also: if it walks like a duck, quacks like a duck, swims like a duck, ducks like a duck, and is indistinguishable from a duck on any possible test you can think of or apply to it, then maybe your artificial faux-duck effectively turned into a real duck?


> what is the difference between compression and understanding? I can't prove it formally, but I rather strongly believe they are, fundamentally, the same thing.

I'm not sure this is true in general. I feel as if I understand something when I grasp it in its entirety, not when I've been able to summarize it concisely. And conceptually I can compress something without understanding it by manually implementing compression algorithms and following their instructions by rote.

I think understanding and compression are plausibly related; one test of whether I understand something is whether I can explain it to a layperson. But I don't see how they're equivalent even asymptotically.

> then maybe your artificial faux-duck effectively turned into a real duck?

I can't really get behind this sentiment. If a language model behaves like a duck in every readily observable particular then we can substitute language models for ducks, sure. But that does not imply that a language model is a duck, and whether it even could be a duck remains an interesting and important question. I'm sympathetic to the argument that it doesn't really matter in day-to-day practice, but that shouldn't stop us from raising the question.


> But I don't see how they're equivalent even asymptotically.

You wrote:

> I feel as if I understand something when I grasp it in its entirety, not when I've been able to summarize it concisely.

But what does it mean to "grasp it in its entirety"? To me, it means you learned the patterns that predict the thing and its behavior. That understanding lets you say, "it is ${so-and-so}, because ${reason}", and also "it will do ${specific thing} when ${specific condition} happens, because ${reason}", and have such predictions reliably turn true.

To me, replacing a lot of memorized observations with more general principles - more general understanding - is compression.

A simplified model: you observe pairs of numbers in some specific context. You see (1, 2) and (3, 6), then (9, 18), then (27, 54), and then some more numbers you quickly notice all follow a pattern:

  Pair_n = (x, y), where:
  - y = 2*x
  - x = 3^n
A thousand of such pairs pass you by, before they finally stop. Do you remember them all? It's not a big deal ever since you figured out the pattern - you don't need to remember all the number pairs, you only need to remember the formula above, and that n started at 0 and ended at 999.

This is what I mean by understanding being fundamentally equivalent to compression: each pattern or concept you learn lets you replace memorizing some facts with a smaller formula (program) you can use to re-derive those facts. It's exactly how compression algorithms work.

And yes, in this sense, we are lossy compressors.


The devil’s in the details, or, in this case, the joint distribution between what a person would produce and what the model produces. If you came up with a way to train monkeys to write Hamlet on a typewriter, it’s still Hamlet. We’re not there yet - to the point where they consistently expand human potential for thought - but we could be, someday.


I have been thinking along the same lines. The chain of thoughts that arise during meditation remind me a lot of language generators.


I've had that thought for over a decade now. I felt that my inner voice is a bit of a Markov chain generator at the border between my conscious and unconscious, randomly stringing some thoughts in form of sentences (often mixed-language, to boot), and conscious-level thinking involves evaluating those thought streams - cutting some off completely, letting others continue or mixing them and "feeding back" to the generator, so it iterates more on those.

Markov chains (and a lot of caching) were a good high-level working model, but quite inadequate in power when inspected in detail. Deep language models I initially ignored, as they felt more like doubling down on caching alone and building convoluted lookup tables. But, to my surprise, LLMs turned not only to be a better high-level analogy - the way they work in practice feels so close to my experience with my own "inner voice", that I can't believe this is just a coincidence.

What I mean here is, in short: whenever I read articles and comments about strengths and weaknesses of current LLMs (especially GPT-4), I find that they might just as well be talking about my own "inner voice" / gut-level, intuition-driven thinking - it has the same strengths and the same failure modes.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: