The first thing to point out is that our implementation isn't pure WASM. We attempted to compile Postgres for WASM directly from source, but it was more complicated that we anticipated.
Crunchy's HN post provided some hints about the approach they took, which was to virtualize a machine in the browser. We pursued this strategy too, settling on v86 which emulates an x86-compatible CPU and hardware in the browser.
I’m out-of-domain but very curious about this part - it seems like a pretty extreme solution with a lot of possible downsides. Does this mean the “just compile native code to WASM” goal is still far off?
The problem is you need operating system features to run Postgres as-is, e.g. mapping memory, forking processes, manipulating files. What is missing is a WASM kernel that skips the x86 emulation but implements enough of the other stuff.
For example, for just one of many hairy problems, consider that Postgres uses global variables in each backend for backend-local state (global state as such is in shared memory). How does this look in assembly, accounting for both the kernel and userspace components? This is the problem.
A general way to convey this is: the more system calls a piece of software uses, the more difficult a WASM target without architecture emulation becomes. And Postgres doesn't even obligate that many obscure ones.
Thanks, those are specific requirements I could definitely see WASM struggling to meet.
In my experience in a large+mature enough codebase (particularly one that is already multi-platform, like Postgres appears to be) many of those requirements are wrapped in an abstraction layer to allow targeting new platforms, but some requirements (like memory mapping) could definitely be dealbreakers if the target platform doesn't naturally support them.
This solution still seems awfully complex (and probably not very efficient) but I certainly see why it's probably the "easiest" option.
I suppose, Postgres is portable, but it's portable to multi-tasking operating systems with virtual memory (which puts in a rather broad category of programs). This goes beyond wrapping how various system calls work on various platforms, but rather changing how accesses are generated, e.g. so backend 1 sees memory location 1 for its global field, backend 2 sees memory location 2 for that same global variable etc. Unlike functions that are frequently wrapped, there is no error code (save ones generated by a processor, e.g. segfault or bus error) or function called for loading an address.
Long story short, I think the need to bypass MMU hardware emulation would prove among the most difficult problems. It will probably require assistance from the compiler, I don't know enough about WASM to guess how mature such relocations would be.
Supabase developer here. I've tried compiling directly to WASM, but it did not go well. As I recall, there were features used by PostgreSQL that WASM didn't support yet. This is definitely something we'll revisit though, especially as WASM matures!
Crunchy's HN post provided some hints about the approach they took, which was to virtualize a machine in the browser. We pursued this strategy too, settling on v86 which emulates an x86-compatible CPU and hardware in the browser.
I’m out-of-domain but very curious about this part - it seems like a pretty extreme solution with a lot of possible downsides. Does this mean the “just compile native code to WASM” goal is still far off?