Python behind the scenes #8: how Python integers work

r4victor · on Feb 10, 2021

Hi, the author here. Thanks for posting this! The Python behind the scenes series covers how different aspects of Python are implemented in the CPython interpreter. This time we'll ask and answer the following question: How is Python able to support arbitrary large integers? More specifically, we'll learn:

  * How CPython represents arbitrary-precision integers.
  * How CPython implements arbitrary-precision arithmetic, such as addition and multiplication.
  * How CPython's implementation of arbitrary-precision arithmetic compares to other implementations.

I welcome your feedback and questions. Thanks!

cma · on Feb 9, 2021

Micropython does much more efficient tagged integers (making unaligned pointers represent the small integers, once out of range can be replaced with an aligned pointer to a bigint).

I think the only reason not to have that is maintaining c-python compatibility, but it would have been nice if they could have moved to it for the 3.0 break.

pansa2 · on Feb 9, 2021

Guido has been strongly opposed to the use of tagged integers:

https://mail.python.org/pipermail/python-dev/2004-July/04614...

woadwarrior01 · on Feb 10, 2021

IMO, that excuse doesn't hold water on 64 bit machines. On 64 bit machines, not using tagged pointers (using the CPU for fast arithmetic on word sized integers) and instead using the slow branchy sign-magnitude representation that cPython uses for integers is unconscionable, in terms of both performance and memory usage.

rtpg · on Feb 10, 2021

this is a post from 16 years ago, and there might be someone who comes around with a solution that resolves his concerns (granted, I feel like it would be hard to come up with a great solution in a language like C, having some ubiquitous unwrapping mechanism + better type checking would make this doable though....)

pansa2 · on Feb 10, 2021

> this is a post from 16 years ago

Sure, but the only opportunity to change this was along with the other breaking changes in Python 3.0, which is now 12 years old.

KMag · on Feb 10, 2021

It might still be possible to change, by changing the macros that unwrap integers to support tagged integers, then waiting a decade or two before actually starting to use tagged integers. Things that hadn't been recompiled in a decade or two, but were for some reason running the latest interpreter would break. Presumably the number of breakages would be negligible.

kevin_thibedeau · on Feb 10, 2021

This isn't portable C. You can't assume that some pointers will be invalid and free for other uses.

jashmatthews · on Feb 10, 2021

The runtime also controls the alignment of allocations.

cma · on Feb 10, 2021

You can have a fallback to always use int objects for architectures without that alignment requirement.

mafaa · on Feb 10, 2021

They bring up gmp which i'm surprised to never see discussed here. Underpins so many tech stacks (+ gcc relies on it) and is basically always involved in the cryptography piece of those given how many rely on huge numbers.

mistrial9 · on Feb 10, 2021

interesting to notice the text in the "comparisons to other implementations"

"How fast is CPython's implementation of bignums compared to other implementations? ... One important thing to know about these results is that the fastest programs use bignums provided by the GMP library and not the bignums provided by the language. If we exclude the programs that use GMP bindings, we get the following results... "

igouy · on Feb 10, 2021

Amazing!

If we don't want the programs that used the GMP library, then we can simply not show those results!

Except "Some languages rely on GMP to implement built-in bignums. They are marked with an asterisk (*)." — why are they shown?

(And why isn't there a C program + GMP measurement?)

r4victor · on Feb 10, 2021

I assumed the readers would follow the link to see the original results. GMP is undoubtedly the state-of-the-art of arbitrary-precision arithmetic. The point of this table was to compare bignum implementations provided by different languages, so programs that use GMP bindings are excluded. Probably it would make sense to include the best result that uses GMP, but I think GHC is quite representative of GMP's performance.

I'm not the first one who draw such a comparison. This article goes into some more detail: http://www.wilfred.me.uk/blog/2014/10/20/the-fastest-bigint-...

igouy · on Feb 10, 2021

Amazing that Wilfred Hughes and yourself took such a straightforward approach instead-of complaining that a lot of the programs used GMP. (Take that as praise.)