LLMs are not even good wordcels

retrac · 2024-06-14T14:37:20

Current LLMs are not aware of the letters in text. It's too expensive computationally to assign a token to each letter. They operate at the level of tokens, which tend to be word-level: each morph-em-e is broke-n down in-to seg-ment-s and assign-ed an int-e-ger.

The result is that LLMs are quite oblivious to spelling and such. There are some exceptions working at the letter level, but mostly, yeah. This applies to ChatGPT etc.

PaulHoule · 2024-06-14T14:54:17

Most LLMs can't even talk about the tokenization they use which I think indicates that they don't have the right knowledge to do wordplay at the letter level.

tuatoru · 2024-06-14T17:37:30

> One thing that is clear to me is that we’re vastly overestimating the “intelligence” of LLMs, if it even makes sense to use that word in this context. We’re being fooled by their apparent capacity to produce human-sounding text.

Who is this "we"?