Partly because they don't read letters. They read "tokens" which are usually mul...

sillysaurusx · on March 1, 2023

This is a common myth but in practice no one (as far as I know) has shown that byte level predictions result in superior overall performance.

(The word “overall” is important, since the papers that have claimed this usually show better performance in specialized situations that few people care about. Whereas everyone cares about reversing strings.)

If you were to fine tune chatgpt on reversing strings as a task, it would very quickly overfit and get 100% accuracy.

It can’t reverse strings perfectly for the same reason it can’t play chess very well: it hasn’t been explicitly trained to. But that’s true of almost every aspect of what it’s doing.

modeless · on March 1, 2023

I'm not claiming that character level predictions result in superior overall performance. Not at all. My claim is merely that it's more difficult for models to reverse character strings specifically when their direct input is not individual characters. Not impossible, and sure you could fine-tune it for perfect results. But the whole reason large language models are interesting is that they don't require fine-tuning to perform an incredible range of tasks.

jacobsenscott · on March 1, 2023

Even if you give it enough training data to accurately reverse all the strings you give to it, that wouldn't help it reverse the order of a guest list to a dinner. But once you teach a person how to "reverse" one of those things they could reverse the other.