Hacker News new | past | comments | ask | show | jobs | submit login

yes, that is true. but aot compilers never make things slower than interpretation, and they can afford more expensive optimizations

also, even mature jit compilers often only make limited improvements; jython has been stuck at near-parity with cpython's terrible performance for decades, for example, and while v8 was an enormous improvement over old spidermonkey and squirrelfish, after 15 years it's still stuck almost an order of magnitude slower than c https://benchmarksgame-team.pages.debian.net/benchmarksgame/... which is (handwaving) like maybe a factor of 2 or 3 slower than self

typically when i can get something to work using numpy it's only about a factor of 5 slower than optimized c, purely interpretively, which is competitive with v8 in many cases. luajit, by contrast, is goddam alien technology from the future

with respect to your int×int example, if an int×int specialization is actually vastly superior, for example because the operation you're applying is something like + or *, an aot compiler can also insert the guard and inline the single-instruction implementation, and it can also do extensive inlining and even specialization (though that's rare in aots and common in jits). it can insert the guards because if your monomorphic sends of + are always sending + to a rational instance or something, the performance gain from eliminating megamorphic dispatch is comparatively slight, and the performance loss from inserting a static hardcoded guess of integer math before the megamorphic dispatch is also comparatively slight, though nonzero

this can fall down, of course, when your arithmetic operations are polymorphic over integer and floating-point, or over different types of integers; but it often works far better than it has any right to. in most code, most arithmetic and ordered comparison is integers, most array indexing is arrays, most conditionals are on booleans (and smalltalk actually hardcodes that in its bytecode compiler). this depends somewhat on your language design, of course; python using the same operator for indexing dicts, lists, and even strings hurts it here

meanwhile, back in the stop-hitting-yourself-why-are-you-hitting-yourself department, fucking cpython is allocating its integers on the heap and motherfucking reference-counting them




There is already an AOT compiler for Python: Nuitka[0]. But I don't think it's much faster.

And then there is mypyc[1] which uses mypy's static type annotations but is only slightly faster.

And various other compilers like Numba and Cython that work with specialized dialects of Python to achieve better results, but then it's not quite Python anymore.

[0] https://nuitka.net/

[1] https://github.com/python/mypy/tree/master/mypyc


thanks, i'd forgotten about nuitka and didn't know about mypyc!


Check out:

https://shedskin.github.io/

Python to C++ translation


> fucking cpython is allocating its integers on the heap and motherfucking reference-counting them

And here I thought that it was shocking to learn that v8 allocates doubles on the heap recently. (I mean, I'm not a compiler writer, I have no idea how hard it would be to avoid this, but it feels like mandatory boxed floats would hurt performance a lot)


nanboxing as used in spidermonkey (https://piotrduperas.com/posts/nan-boxing) is a possible alternative, but i think v8 works pretty hard to not use floats, and i don't think local-variable or temporary floats end up on the heap in v8 the way they do in cpython. i'm not that familiar with v8 tho (but i'm pretty sure it doesn't refcount things)


> i think v8 works pretty hard to not use floats

Correct, to the point where at work a colleague and I actually have looked into how to force using floats even if we initiate objects with a small-integer number (the idea being that ensuring our objects having the correct hidden class the first time might help the JIT, and avoids wasting time on integer-to-float promotion in tight loops). Via trial and error in Node we figured that using -0 as a number literal works, but (say) 1.0 does not.

> i don't think local-variable or temporary floats end up on the heap in v8 the way they do in cpython

This would also make sense - v8 already uses pools to re-use common temporary object shapes in general IIRC, I see no reason why it wouldn't do at least that with heap-allocated doubles too.


so then the remaining performance-critical case is where you have a big array of floats you're looping over. in firefox that works fine (one allocation per lowest-level array, not one allocation and unprefetchable pointer dereference per float), but maybe in chrome you'd want to use a typedarray?


As I understand it, V8 keeps track of an ElementsKind for each array (or, more precisely, for the elements of every object; arrays are not special in this sense). If an array only contains floats, then they will all be stored unboxed and inline. See here: https://source.chromium.org/chromium/chromium/src/+/main:v8/...

I assume that integers are coerced to floats in this mode, and that there's a performance cliff if you store a non-number in such an array, but in both cases I'm just guessing.

In SpiderMonkey, as you say, we store all our values as doubles, and disguise the non-float values as NaNs.


thank you for the correction!


Maybe, at that point it is basically similar to the struct-of-arrays vs array-of-structs trade-off, except with significantly worse ergonomics and less pay-off.


I so much agree with your comment on memory allocation. Everybody is focusing on JIT, but allocating everything on the heap, with no possibility to pack multiple values contiguously in a struct or array, will still be a problem for performance.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: