It's the other way around. The model is impeccable at "understanding text." It's...

nottorp · 2025-02-24T13:17:18 1740403038

I thought it quantifies the probability that a certain word (their output) follows a given word sequence (their training corpus and the prompt)?

ben_w · 2025-02-24T13:25:12 1740403512

Only if you wildly oversimply to the level of being misleading.

The precise mechanism LLMs use for reaching their probability distributions is why they are able to pass most undergraduate level exams, whereas the Markov chain projects I made 15-20 years ago were not.

Even as an intermediary, word2vec had to build a space in which the concept of "gender" exists such that "man" -> "woman" ~= "king" -> "queen".

nottorp · 2025-02-24T13:28:02 1740403682

> Only if you wildly oversimply to the level of being misleading.

Maybe I'm asking for an explanation :)

Since you seem to understand the mechanism, can you do a 3 line summary please?

ben_w · 2025-02-24T13:43:46 1740404626

3 lines? That's still going to be oversimplifed to the point of being wrong, but OK.

Make a bunch of neural nets to recognise every concept, the same way you would make them to recognise numbers or letters in handwiting recognition. Glue them together with more neural nets. Put another on the end to turn concepts back into words.

For a less wrong but still introductory summary that still glosses over stuff, about 1.5 hours of 3blue1brown videos, #4-#8 in this playlist: https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_...

nottorp · 2025-02-24T13:56:05 1740405365

Thanks!

... Oh interesting. And those concepts are hand picked or generated automatically somehow?

> For a less wrong but still introductory summary that still glosses over stuff, about 1.5 hours of 3blue1brown videos

Sorry, my religion forbids me from watching talking heads. I'll have to live with your summary for now. Until I run into someone who condensed those 1.5 hours in text that takes at most 30 min to read...

ben_w · 2025-02-24T14:01:41 1740405701

> Oh interesting. And those concepts are hand picked or generated automatically somehow?

Fully automated.

> Sorry, my religion forbids me from watching talking heads.

What about professional maths communicators who created their own open sourced python library for creating video content and doesn't even show their face on most videos?

nottorp · 2025-02-24T16:26:47 1740414407

My problem is with the time wasted compared to written info, not with talking heads per se.

ben_w · 2025-02-24T17:33:38 1740418418

He doesn't waste time. No fluff.

You're unlikely to get a better time-quality trade-off on any maths topic than a 3blue1brown video.

He's the kind of presenter that others try to mimic because he's so good at what he does — you may recognise the visuals from elsewhere because of the library he created[0] in order to visualise the topics he was discussing.

[0] https://docs.manim.community/en/stable/faq/installation.html

There's also a playback speed slider in YouTube. I use it a lot.

gs17 · 2025-02-24T18:43:04 1740422584

Simplifying to that point is more of what a Markov chain is. LLMs are able to generalize a lot more than that, and it's sufficient to "understand text" on a decent level. Even a relatively small model can take, e.g. even this poorly prompted request:

  "The user has requested 'remind me to pay my bills 8 PM tomorrow'. The current date is 2025-02-24. Your available commands are 'set_reminder' (time, description), 'set_alarm' (time), 'send_email' (to, subject, content). Respond with the command and its inputs."

And the most likely response will be what the user wanted.

A Markov chain (only using the probabilities of word orders from sentences in its training set) could never output a command that wasn't stitched together from existing ones (i.e. it would always output a valid command name, but if no one had requested a reminder for a date in 2026 before it was trained, it would never output that year). No amount of documents saying "2026 is the year after 2025" would make a Markov chain understand that fact, but LLMs are able to "understand" that.