I compared GPT-4 Turbo with my previous tests on GPT-4, and the results are quite interesting. GPT-4 Turbo is better at arithmetic and makes fewer errors in multiplying four-digit numbers. In fact, it makes significantly fewer errors with five-digit numbers. The level of errors on five-digit numbers is high but much lower than with four-digit numbers in GPT-4. multiplication of floats XX.MMMM and YYY.ZZZZ produces errors in 5th digit. This is order of magnitude better than GPT-4.
But the point about how it just "improves" with slightly larger numbers, but still fails at really big numbers, shows that it's not really "reasoning" about math in a logical way - that's the point I was getting at.
For example, once you teach a grade schooler the basic process for addition, they can add 2 30 digit numbers correctly fairly easily (whether they want to do it or not is a different story). The fact that LLMs still make errors at larger numbers points to the fact that they're not really "learning" the rules of arithmetic.
Of course, it isn't. It approximates. I bet you'll get better results by increasing the depth of the network, as with each layer, you'll achieve a more accurate approximation. I have an idea for achieving this without significantly increasing the number of layers, and I'm currently working on it as a side project. However, this idea might prove to be useless after all, as it requires training the model from scratch with a lot of synthetic data mixed in. Experiments on small models look promising, but they are negligible, and I can't afford to train a larger model from scratch for a side project.
Isn't actually just impossible for it to do it well on arbitrarily large inputs like this even from computational complexity point of view. If it doens't know it's allowed to do step by step multiplication (addition is maybe ok). I'm not sure it's a criticism against its ability to reason. It's similar to asking someone to do addition in 5 seconds with no paper. like of course at some point it won't be able to do it for a large enough number. BTW strongly disagree that the average grade schooler will be able to add 2 30digit numbers even with paper without making a mistake.
It isn't fair to expect an LLM to solve arithmetic. It should be able to instruct to various specialized sub-processors, I don't think we really do anything different.