Oh, very interesting! This is somewhat similar to a hack project I did recently, but more dynamic -- going from bytecode and doing the assembly at runtime amps things up a bit! Mine works from a Python AST and is much more static, but it also supports quite a lot more Python syntax. http://benhoyt.com/writings/pyast64/
Indeed I had already linked to your really cool project at the end. Perhaps I should have made it stand out more. The AST approach is more stable, as the Python bytecode can change between any release while the AST changes in a slower, evolutionary pace.
I tried writing a bytecode optimizer a few months ago, and actually got some basic inlining and constant propagation and other such optimizations working, but then realized it's gonna be fairly useless since bytecode changes between Python versions and isn't necessarily even going to be consistent across different Python implementations. So I abandoned that project and started working on an AST-based optimizer. Sadly I actually deleted that project (which is otherwise not something I normally do) as a means to get myself to stop sinking more time into it, but now that means I still have to mentally figure out some of the techniques I came up with again if I want the AST-based version to go anywhere...
def sumitup(n):
total = 0
for i in range(n):
total = total + i
return total
It was optimized quite well, but still had loops. I know this is a lot to ask, but I would have expected it to be possible to specialize it to a loopless variant:
def sumitup(n):
if n < 0:
return 0
else:
return n*(n-1) // 2
I'll try to just talk about what makes Python hard to run quickly (especially as compared to less-dynamic languages like JS or Lua).
The thing I wish people understood about Python performance is that the difficulties come from Python's extremely rich object model, not from anything about its dynamic scopes or dynamic types. The problem is that every
operation in Python will typically have multiple points at which the user can override the behavior, and these features are used, often very extensively. Some examples are inspecting the locals of a frame after the frame
has exited, mutating functions in-place, or even something as banal as overriding isinstance. These are all things that we had to support, and are used enough that we have to support efficiently, and don't have analogs in
less-dynamic languages like JS or Lua.
Newer versions of Python have an in-band way of detecting if an object has been overridden. Foreseeably this could have been done completely internal to the VM. Defaulting to closed and requiring an explicit step to override behavior would have made optimizing Python far easier.
A couple years ago I was toying around with libjit in python, wrote some bindings, traced down (most of) the seqfaults and then got distracted by other things. Was going to dust it off to try to hook into pep-0523 but totally forgot about it until you posted that link.
What is your impression of libjit? It looks really nice. I've only tried GNU Lightning, and I made Python bindings for it (although, today I would have used cffi to interface with it instead of my ctypes approach, because of the header files): https://github.com/cslarsen/lyn
I'd love to see a comparison of the various JIT libraries. I guess both give you optimizations and register allocation for free. Of course, the JITting speed is quite important as well.
It seemed pretty solid though I didn't do anything too complicated with it.
Mostly what messing around with it did was taught me that I had a lot to learn about compiler construction which is what I've been slowly doing since then.