Hi, the author here. Thanks for posting this! The Python behind the scenes series covers how different aspects of Python are implemented in the CPython interpreter. This time we'll ask and answer the following question: How is Python able to support arbitrary large integers? More specifically, we'll learn:
* How CPython represents arbitrary-precision integers.
* How CPython implements arbitrary-precision arithmetic, such as addition and multiplication.
* How CPython's implementation of arbitrary-precision arithmetic compares to other implementations.
Micropython does much more efficient tagged integers (making unaligned pointers represent the small integers, once out of range can be replaced with an aligned pointer to a bigint).
I think the only reason not to have that is maintaining c-python compatibility, but it would have been nice if they could have moved to it for the 3.0 break.
IMO, that excuse doesn't hold water on 64 bit machines. On 64 bit machines, not using tagged pointers (using the CPU for fast arithmetic on word sized integers) and instead using the slow branchy sign-magnitude representation that cPython uses for integers is unconscionable, in terms of both performance and memory usage.
this is a post from 16 years ago, and there might be someone who comes around with a solution that resolves his concerns (granted, I feel like it would be hard to come up with a great solution in a language like C, having some ubiquitous unwrapping mechanism + better type checking would make this doable though....)
It might still be possible to change, by changing the macros that unwrap integers to support tagged integers, then waiting a decade or two before actually starting to use tagged integers. Things that hadn't been recompiled in a decade or two, but were for some reason running the latest interpreter would break. Presumably the number of breakages would be negligible.
They bring up gmp which i'm surprised to never see discussed here. Underpins so many tech stacks (+ gcc relies on it) and is basically always involved in the cryptography piece of those given how many rely on huge numbers.
interesting to notice the text in the "comparisons to other implementations"
"How fast is CPython's implementation of bignums compared to other implementations? ... One important thing to know about these results is that the fastest programs use bignums provided by the GMP library and not the bignums provided by the language. If we exclude the programs that use GMP bindings, we get the following results... "
I assumed the readers would follow the link to see the original results. GMP is undoubtedly the state-of-the-art of arbitrary-precision arithmetic. The point of this table was to compare bignum implementations provided by different languages, so programs that use GMP bindings are excluded. Probably it would make sense to include the best result that uses GMP, but I think GHC is quite representative of GMP's performance.
Amazing that Wilfred Hughes and yourself took such a straightforward approach instead-of complaining that a lot of the programs used GMP. (Take that as praise.)