Hacker News new | past | comments | ask | show | jobs | submit login

sorry, I don't know the internals of the python VM. I honestly don't know why erlang vm is faster than python vm. But it is (in single-threaded raw compute, not taking into account GC)!

Re: graal, there is erjang, but nobody uses it.




> I honestly don't know why erlang vm is faster than python vm.

One reason is that on platforms that support computed goto, when a BEAM file is loaded, the list of bytecodes for each function is translated into a list of machine code addresses that implement each bytecode (a.k.a. threaded code). Bytecode dispatch then skips one layer of indirection vs. CPython. CPython looks up each bytecode's machine code address in an array every time it dispatches a bytecode. (Or, if the platform doesn't support computed goto, CPython uses a regular switch statement, which hopefully gets compiled to a jump table.)


Interesting, so it's very implementation specific. Could CPython do something similar? Or is it the Erlang language semantics that allow for this optimization?


There's nothing stopping CPython from using a threaded interpreter, though doing it in C requires a compiler supporting GCC's nonstandard labels-as-values[0]/computed goto and undefined behavior (using goto to jump across functions).

Threaded code is a pretty standard implementation strategy for some languages (notably Forth), and my understanding it was even a pretty common compiler implementation strategy in the 1970s/ early 1980s. It's a pretty easy optimization, particularly for a stack-based VM or a VM with a small number of registers.

The main downside is memory usage, as Python bytecode can be memory-mapped directly from disk and therefore shared across processes and discarded by the OS under memory pressure rather than being written out to swap. There's obviously a bit of startup overhead at class load time.

Though, I'm a bit surprised that most stack-based interpreters don't initially load classes with functions implemented as a dead-simple threaded implementation consisting of a pointer to the start of a regular bytecode interpreter and a pointer into a memory-mapped buffer of the on-disk bytecode. Based on performance counters, they could JIT the hotspots into regular threaded code.

[0] https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html


Part of the issue is that because Python is so dynamic lot of lookups have to be done to do something as simple as a function call. Tight loops of arithmetic (for example) in Python are actually fairly fast, but if you call a function it has to do a load of work to work out what code you actually want to run. It might have to:

Look up which object the method belongs to, look up the namespace, check if there's an implementation of __getattr__ or equivalent, maybe call that, get the result and then actually call the function. More complicated cases get worse.

Erlang/Elixir are much simpler and get their flexibility from functional programming. There's much less indirection so less work the interpreter needs to do.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: