I just wonder if numbers were written right to left, llms would be much better at arithmetic. You can 'predict' the least significant digit by reusing the already written digits in the computation, but to generate most significant ones, you generally need to do the entire computation in one go.
Yes. This has already been demonstrated by "Teaching Arithmetic to Small Transformers" https://arxiv.org/abs/2307.03381 , I'm not sure what OP adds except demonstrating that you can do that via the embedding itself rather than the tokenization.
> We first demonstrate that conventional training data is not the most effective for arithmetic learning, and simple formatting changes can significantly improve accuracy. This leads to sharp phase transitions as a function of training data scale, which, in some cases, can be explained through connections to low-rank matrix completion. Building on prior work, we then train on chain-of-thought style data that includes intermediate step results. Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed. We also study the interplay between arithmetic and text data during training and examine the effects of few-shot prompting, pretraining, and model scale. Additionally, we discuss length generalization challenges.
This is an interesting idea but probably hard to verify.
A tangent is that positional systems were originally invented with least digit first, I believe.
The Babylonian sexagesimal system was like that as was the Arabic one (where first is on the right).
The most significant digit first convention came when right-to left numbers were used in left-to-right systems without reversing them in writing. To this day we read the more common smaller numbers least significant digit first to varying degrees.
16 = six teen, sech zehn
98 = acht und neunzig, achten negentig, ثمانية وتسعون