Isn't it more that they don't have ready access to the much-more-fundamental concept of decimal numbers?
My understanding was that they tokenized them into chunks and tried to learn associations between the chunks, the same as if one was breaking apart English words.
So "2+2=4" isn't being treated that differently from "all's well that ends well." This might lead to a kind of Benny's Rules [0] situation, where sufficient brute-force can make a collection of overfitted non-arithmetic rules appear to work.
My understanding was that they tokenized them into chunks and tried to learn associations between the chunks, the same as if one was breaking apart English words.
So "2+2=4" isn't being treated that differently from "all's well that ends well." This might lead to a kind of Benny's Rules [0] situation, where sufficient brute-force can make a collection of overfitted non-arithmetic rules appear to work.
[0] https://blog.mathed.net/2011/07/rysk-erlwangers-bennys-conce...