Sounds like you want the LLM to get the answer right in all simple, easy cases b...

shkkmo · 2024-06-23T22:21:32.000000Z

> I hate to break it to you but people do not meet that standard either and misunderstand each other plenty

Sure, and there are ways to tell when people don't understand the words they use.

One of the ways to check how well people understand a word or concept is to ask them a question they haven't seen the answer for.

It is the difference in performance on novel tasks that allows us to separate understanding from memorization in both people and computer models.

The confusing thing here is that these LLMs are capable of memorization at a scale that makes the lack of understanding less immediately apparent.

> You're entitled to your own standard of what it means to understand words but for millions of people it's doing great at it.

It's not mine, the distinction I am drawing is widespread and common knowledge. You see it throughout education and pedagogy.

> It is as though you said a dog couldn't really play chess if it plays legal moves all day every day from any position and for millions of people, but sometimes fails to see obvious mates in one in novel positions that never occur in the real world.

While I would say chess engines can play chess, I would not say the chess engines understands chess. Conflating utility with understanding simply serves to erase an important distinction.

I would say that LLMs can talk and listen. And perhaps even that it understand how people use language. Indeed, as you say, millions people show this every day. I would however not say that LLMs understand what they are saying or hearing. The words are themselves meaningless to the LLM beyond their use in matching memorized patterns.

Edit: Let me qualify my claims a little further. There may indeed be some words that are understood by some LLMs, but it seems pretty clear there are definitely some important ones that aren't. Given the scale of memorized material, demonstrating understanding is hard but assuming it is not safe.