I just wonder if numbers were written right to left, llms would be much better a...

gwern · 2024-05-28T15:01:31 1716908491

Yes. This has already been demonstrated by "Teaching Arithmetic to Small Transformers" https://arxiv.org/abs/2307.03381 , I'm not sure what OP adds except demonstrating that you can do that via the embedding itself rather than the tokenization.

> We first demonstrate that conventional training data is not the most effective for arithmetic learning, and simple formatting changes can significantly improve accuracy. This leads to sharp phase transitions as a function of training data scale, which, in some cases, can be explained through connections to low-rank matrix completion. Building on prior work, we then train on chain-of-thought style data that includes intermediate step results. Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed. We also study the interplay between arithmetic and text data during training and examine the effects of few-shot prompting, pretraining, and model scale. Additionally, we discuss length generalization challenges.

weinzierl · 2024-05-28T11:05:33 1716894333

This is an interesting idea but probably hard to verify.

A tangent is that positional systems were originally invented with least digit first, I believe.

The Babylonian sexagesimal system was like that as was the Arabic one (where first is on the right).

The most significant digit first convention came when right-to left numbers were used in left-to-right systems without reversing them in writing. To this day we read the more common smaller numbers least significant digit first to varying degrees.

16 = six teen, sech zehn

98 = acht und neunzig, achten negentig, ثمانية وتسعون

lupire · 2024-05-28T20:21:36 1716927696

Left to right is fine, but it takes more work (multi shot) to do carries.

17 + 14 = 20 + 11 = 30 + 1 = 31

vs 17 + 14 = 10 + 10 + 10 + 1 = 31

spencerchubb · 2024-05-28T14:56:38 1716908198

They do that in the paper. Least significant digit on the left