> I honestly don't know why erlang vm is faster than python vm. One reason is th...

didibus · on Dec 2, 2020

Interesting, so it's very implementation specific. Could CPython do something similar? Or is it the Erlang language semantics that allow for this optimization?

KMag · on Dec 2, 2020

There's nothing stopping CPython from using a threaded interpreter, though doing it in C requires a compiler supporting GCC's nonstandard labels-as-values[0]/computed goto and undefined behavior (using goto to jump across functions).

Threaded code is a pretty standard implementation strategy for some languages (notably Forth), and my understanding it was even a pretty common compiler implementation strategy in the 1970s/ early 1980s. It's a pretty easy optimization, particularly for a stack-based VM or a VM with a small number of registers.

The main downside is memory usage, as Python bytecode can be memory-mapped directly from disk and therefore shared across processes and discarded by the OS under memory pressure rather than being written out to swap. There's obviously a bit of startup overhead at class load time.

Though, I'm a bit surprised that most stack-based interpreters don't initially load classes with functions implemented as a dead-simple threaded implementation consisting of a pointer to the start of a regular bytecode interpreter and a pointer into a memory-mapped buffer of the on-disk bytecode. Based on performance counters, they could JIT the hotspots into regular threaded code.

[0] https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html