Building a baseline JIT for Lua automatically (2023)

tiffanyh · 2024-01-11T15:36:10.000000Z

FYI - Mike Pall is back working on LuaJIT.

And v3.0 is underway.

https://github.com/LuaJIT/LuaJIT/issues/1092

synergy20 · 2024-01-11T16:01:58.000000Z

this, best news for me for the day.

vardump · 2024-01-11T19:30:42.000000Z

Mike Pall is doing pretty amazing work. Lots of respect.

ngrilly · 2024-01-11T10:12:51.000000Z

It was a very interesting read, in the context of Python getting a copy-and-patch JIT compiler in the upcoming 3.13 release [1], to understand better the approach.

[1] https://tonybaloney.github.io/posts/python-gets-a-jit.html

dabber · 2024-01-11T11:12:17.000000Z

> It was a very interesting read, in the context of Python getting a copy-and-patch JIT compiler in the upcoming 3.13 release

Indeed! In fact, someone mentioned this article in the thread about the Python JIT from a few days ago:

https://news.ycombinator.com/item?id=38924826

ngrilly · 2024-01-11T13:04:39.000000Z

I missed that. Thanks for the link!

summarity · 2024-01-11T09:04:02.000000Z

The last (interpreter only) version mentioned that neither GC nor modules were implemented. Did that change?

The JIT work is exciting but even more exciting would be a faster, fully featured interpreter for platforms with runtime code generation constraints (e.g. iOS) for integration into engines like Love

fwsgonzo · 2024-01-11T09:16:43.000000Z

There is already Luau if you need a sandbox. Neither Lua nor LuaJIT are sandboxes. There is also my libriscv project if you need a low latency sandbox, without JIT.

See: http://lua-users.org/lists/lua-l/2011-02/msg01582.html

I'm not sure what you mean by code generation constraints though.

summarity · 2024-01-11T10:37:47.000000Z

I haven't mentioned sandboxes and don't need them. As an example, Love integrates LuaJIT, but the JIT is disabled in i-platforms. As is mentioned by LuaJIT:

> Note: the JIT compiler is disabled for iOS, because regular iOS Apps are not allowed to generate code at runtime. You'll only get the performance of the LuaJIT interpreter on iOS. This is still faster than plain Lua, but much slower than the JIT compiler. Please complain to Apple, not me. Or use Android. :-p

So to return to my original comment, the improvement that I'm seeing here is a faster _interpreter_, which is something advertised on the luajit-remake repo.

vardump · 2024-01-11T17:23:21.000000Z

Looks like LuaJIT is still going to be faster, because Deegen requires runtime code generation, thus executable + writable pages, which iOS platform does not allow.

fwsgonzo · 2024-01-11T10:47:21.000000Z

Ah yes, it is indeed going to be faster.

pansa2 · 2024-01-11T11:34:27.000000Z

> Neither Lua nor LuaJIT are sandboxes.

Maybe we have different definitions of “sandbox”, but I thought the Lua interpreter was one? That is, isn’t it safe (or can be made safe) to embed the interpreter within an application and use it to run untrusted Lua code?

fwsgonzo · 2024-01-11T11:43:54.000000Z

http://lua-users.org/wiki/SandBoxes

There is a lot of information there, but it doesn't seem to be able to handle resource exhaustion, execution time limits or even give any guarantees. It does indicate that it's possible to use as a sandbox, and has a decent example of the most restrictive setup. But I would for example compare it with Luau's SECURITY.md.

From https://github.com/luau-lang/luau/blob/master/SECURITY.md:

> Luau provides a safe sandbox that scripts can not escape from, short of vulnerabilities in custom C functions exposed by the host. This includes the virtual machine and builtin libraries. Notably this currently does not include the work-in-progress native code generation facilities.

> Any source code can not result in memory safety errors or crashes during its compilation or execution. Violations of memory safety are considered vulnerabilities.

> Note that Luau does not provide termination guarantees - some code may exhaust CPU or RAM resources on the system during compilation or execution.

So, even luau will have trouble with untrusted code, but it does give certain guarantees, and writes specifically about what is not covered. I think that's fair. And then libriscv.

From https://github.com/fwsGonzo/libriscv/blob/master/SECURITY.md:

> libriscv provides a safe sandbox that guests can not escape from, short of vulnerabilities in custom system calls installed by the host. This includes the virtual machine and the native helper libraries. Do not use binary translation in production at this time. Do not use linux filesystem or socket system calls in production at this time.

> libriscv provides termination guarantees and default resource limits - code should not be able to exhaust CPU or RAM resources on the system during initialization or execution. If blocking calls are used during system calls, use socket timeouts or timers + signals to cancel.

So, it is possible to provide limits while still running fast. I imagine many WebAssembly emulators can give the same guarantees.

lambdaone · 2024-01-11T12:04:01.000000Z

This is a beautiful piece of work. Connecting all the semantic levels is hard work, and this does it elegantly. It goes to show that old-fashioned technology like object files and linkers is still useful, and can still pay off in unexpected ways as part of new technology.

rfl890 · 2024-01-11T12:31:50.000000Z

Object files and linkers are old fashioned? What replaced them?

khiner · 2024-01-11T13:02:06.000000Z

I think by “old fashioned” op means they are old technologies, not that they are obsolete.

eachro · 2024-01-11T09:47:54.000000Z

I tried reading this post and it just went way over my head. Anyone have any good resources on background material to even start?

JonChesterfield · 2024-01-11T11:03:43.000000Z

It's a template jit with a strange implementation.

Instead of writing the bytes directly, it uses llvm to compile functions that refer to external symbols and then patches copies of those bytes at jit time. That does have the advantage of being loosely architecture agnostic.

Template jits can't register alloc across bytecodes so that usually messes up performance. That can be partially mitigated by picking the calling convention of the template/stencils carefully, in particular you don't want to flush everything to/from the stack on every jump for a register architecture.

It's not in the same league of engineering as luajit, but then not much is.

chombier · 2024-01-11T11:05:30.000000Z

This article is about a technique "Copy-and-Patch" for just-in-time (JIT) compilation that is both fast to compile, produces reasonably efficient machine code, and is easier to maintain than writing assembly by hand.

The section "Copy-and-Patch: the Art of Repurposing Existing Tools" describes the heart of the method, which is to use an existing compiler to compile a chunk of c/c++ code corresponding to some bytecode instruction, then patch the resulting object file in order to tweak the result (e.g. to specify the instruction operands) in a similar fashion to symbol relocation happening during link.

Given a stream of bytecode instruction, the JIT compilation reduces to copying code objects (named "stencils") corresponding to bytecode instructions from a library of precompiled stencils then patching the stencils as needed, which is very fast compared to running a full-blown compiler like LLVM from the syntax tree.

Of course, the resulting code is slower than full-blown Ahead-of-Time (AOT) compilation, but the authors describe a few tricks to keep the execution speed within a reasonable margin of AOT. For instance, they leverage tail calls to replace function calls with jumps, compile sequences of frequently associated bytecode instructions together, and so on.

reginaldo · 2024-01-11T11:19:58.000000Z

My advice would be to read Piumarta's "Optimizing direct threaded code by selective inlining" paper [1] first, and then read the references from the wikipedia article [2].

If the piumarta paper is still over your head, take a look at its references, but they will refer to Java, SmallTalk and Forth which might be a distraction.

[1] https://groups.csail.mit.edu/pag/OLD/parg/piumarta98optimizi... [2] https://en.wikipedia.org/wiki/Copy-and-patch

mariusseufzer · 2024-01-11T10:16:12.000000Z

Same here! Would love to understand what this is about!

stargrazer · 2024-01-11T10:20:18.000000Z

I am using https://luajit.org/ in my GCC C++ project.

Can I use this faster Lua JIT in my project as a replacement? And if so, how so?

The existing luajit doesn't do v5.1, so it would be nice to use this newer engine at the newer baseline lua version level.

tarruda · 2024-01-11T10:22:32.000000Z

I'm skeptical that there's any JIT for any programming language that can match the raw performance and memory efficiency of LuaJIT

Rochus · 2024-01-11T11:44:07.000000Z

The Mono VM is about twice as fast, V8 is even a bit faster: https://github.com/rochus-keller/Oberon/blob/master/testcase...

pansa2 · 2024-01-11T11:26:51.000000Z

I’m sure all the modern JavaScript JITs would beat LuaJIT for raw performance. JS JITs were already faster when I compared them several years ago and have only improved since - whereas LuaJIT has almost been standing still for a decade.

Rochus · 2024-01-11T11:47:54.000000Z

> whereas LuaJIT has almost been standing still for a decade

LuaJIT 2.1 improved a bit (~7%) from 2017 to 2023: http://software.rochus-keller.ch/are-we-fast-yet_LuaJIT_2017...

saagarjha · 2024-01-11T11:54:10.000000Z

This is a baseline JIT, so it will be slower than LuaJIT (which has a sophisticated optimizer).

fullstop · 2024-01-11T14:28:25.000000Z

> The existing luajit doesn't do v5.1, so it would be nice to use this newer engine at the newer baseline lua version level.

LuaJIT only does v5.1, but does have some non-breaking features from v5.2 and a few optional 5.2 features which could break 5.1 compatibility.

chombier · 2024-01-11T11:10:47.000000Z

For those interested, the ACM page for the paper has a good introductory video https://dl.acm.org/doi/abs/10.1145/3485513