yes, that is true. but aot compilers never make things *slower* than interpretat...

ptx · on Jan 9, 2024

There is already an AOT compiler for Python: Nuitka[0]. But I don't think it's much faster.

And then there is mypyc[1] which uses mypy's static type annotations but is only slightly faster.

And various other compilers like Numba and Cython that work with specialized dialects of Python to achieve better results, but then it's not quite Python anymore.

[0] https://nuitka.net/

[1] https://github.com/python/mypy/tree/master/mypyc

kragen · on Jan 9, 2024

thanks, i'd forgotten about nuitka and didn't know about mypyc!

actionfromafar · on Jan 9, 2024

Check out:

https://shedskin.github.io/

Python to C++ translation

vanderZwan · on Jan 9, 2024

> fucking cpython is allocating its integers on the heap and motherfucking reference-counting them

And here I thought that it was shocking to learn that v8 allocates doubles on the heap recently. (I mean, I'm not a compiler writer, I have no idea how hard it would be to avoid this, but it feels like mandatory boxed floats would hurt performance a lot)

kragen · on Jan 9, 2024

nanboxing as used in spidermonkey (https://piotrduperas.com/posts/nan-boxing) is a possible alternative, but i think v8 works pretty hard to not use floats, and i don't think local-variable or temporary floats end up on the heap in v8 the way they do in cpython. i'm not that familiar with v8 tho (but i'm pretty sure it doesn't refcount things)

vanderZwan · on Jan 9, 2024

> i think v8 works pretty hard to not use floats

Correct, to the point where at work a colleague and I actually have looked into how to force using floats even if we initiate objects with a small-integer number (the idea being that ensuring our objects having the correct hidden class the first time might help the JIT, and avoids wasting time on integer-to-float promotion in tight loops). Via trial and error in Node we figured that using -0 as a number literal works, but (say) 1.0 does not.

> i don't think local-variable or temporary floats end up on the heap in v8 the way they do in cpython

This would also make sense - v8 already uses pools to re-use common temporary object shapes in general IIRC, I see no reason why it wouldn't do at least that with heap-allocated doubles too.

kragen · on Jan 9, 2024

so then the remaining performance-critical case is where you have a big array of floats you're looping over. in firefox that works fine (one allocation per lowest-level array, not one allocation and unprefetchable pointer dereference per float), but maybe in chrome you'd want to use a typedarray?

IainIreland · on Jan 9, 2024

As I understand it, V8 keeps track of an ElementsKind for each array (or, more precisely, for the elements of every object; arrays are not special in this sense). If an array only contains floats, then they will all be stored unboxed and inline. See here: https://source.chromium.org/chromium/chromium/src/+/main:v8/...

I assume that integers are coerced to floats in this mode, and that there's a performance cliff if you store a non-number in such an array, but in both cases I'm just guessing.

In SpiderMonkey, as you say, we store all our values as doubles, and disguise the non-float values as NaNs.

kragen · on Jan 9, 2024

thank you for the correction!

vanderZwan · on Jan 9, 2024

Maybe, at that point it is basically similar to the struct-of-arrays vs array-of-structs trade-off, except with significantly worse ergonomics and less pay-off.

ngrilly · on Jan 9, 2024

I so much agree with your comment on memory allocation. Everybody is focusing on JIT, but allocating everything on the heap, with no possibility to pack multiple values contiguously in a struct or array, will still be a problem for performance.