> You can ask them to reverse numbers or re-arrange words and they'll faceplant ...

> You can ask them to reverse numbers or re-arrange words and they'll faceplant in the same way as soon as the input gets beyond a small threshold. Here surely there wouldn't be an issue with tokenization.

My guess is the training data contains many short pairs of forward and backward sequences, but none after a certain threshold length (due to how quickly the number of possible sequences grows with length). This would imply there's no actual reversing going on, and the LLM is instead using the training data as a lookup table.