Hacker News new | past | comments | ask | show | jobs | submit login
Lucet: Native WebAssembly Compiler and Runtime (fastly.com)
387 points by kickdaddy on March 28, 2019 | hide | past | favorite | 98 comments



The wasm runtime wars are heating up! Exciting times :)

Really pumped to see this open sourced. And the performance properties look awesome.

One interesting thing we’re seeing in this space is sort of two parallel paths emerge: do you want to support JavaScript, or not? An example of the former is CloudFlare and their Workers platform. Hopefully they’ll follow Fastly’s lead and open source their runtime too, but it’s built on top of V8 because they want to support JavaScript. You also gain the additional advantage of all the engineering that Google puts into V8.

The other option is stuff like Lucent, wasmer, and wasmtime. By dropping the JavaScript requirement, you can build something that really screams, as seen here. You can partially regain some support via AssemblyScript, the TypeScript subset that compiles to JS. But we haven’t seen JavaScript compile directly to wasm yet because if you want that, well, V8 exists. And you do have to build it all yourself.

JavaScript is one of most popular programming programming languages that exists. Time will tell which approach is better, but it’s really fun to watch all of this cool technology explode onto the scene right now.

(Disclaimer: I have connections to all of these projects in various ways. Everyone involved in all of them is doing great work.)


Sounds really exciting.

Are comparisons to things like the JVM or native binaries already possible?


I'm not 100% sure what you mean by that final sentence, could you maybe elaborate?


Sure.

Does WASM have better performance than the JVM? If not, could it have better performance theoretically? Is it more secure? How much slower than a regular binary would it be? etc.


Like Steve says, performance is a complicated thing to measure. For one view on it, Lucet ships with a suite of microbenchmarks that compare its execution of wasm with the same C code compiled natively. The `make bench` target runs these. The most alarming regressions are in simple functions that take string arguments - the arguments have to be copied into the sandbox, and then the results copied out, in order to run a very simple function.

So, in those cases, we don't expect to match native, but things will get better when GC proposal lands in WASM, which gives support for operating on memory regions that are outside of WASM linear memory. But in most applications we've experimented with, we haven't found this overhead to be a showstopper.


Ah, cool.

Performance is really difficult to properly measure, because each of these projects have different performance profiles, and new ones keep popping up, like Lucet did today! And if you write a benchmark, then something like https://hacks.mozilla.org/2018/10/calls-between-javascript-a... happens, and all of a sudden the numbers are all different. So it's really hard to speak about wasm generally this way, it's better to talk about specific implementations and use-cases, IMHO. And to understand that benchmarks need to be updated in order to still be relevant.

Security is an interesting axis; like the JVM (as far as I know), wasm is memory safe by design. But security is more holistic than that. One interesting thing about wasm is that you have to say up-front what things you want to call in the host, which provides the ability for the host to say "nope, you're not gonna be able to do that." And of course, logic bugs can lead to security vulnerabilities in all of these platforms.


I think wasm's memory safety is weaker than JVM. wasm's checking occurs at control flow points and the linear memory boundary, but does not extend down to individual objects. For example arrays are not bound checked in wasm.


The JVM and wasm have completely different memory safety stories on the inside of the sandbox, because wasm can run arbitrary C, so it has to cope with that somehow.

But the thing Steve is pointing out is that they're equivalent from the outside, where neither can corrupt the host environment.


But this is true of an ordinary Unix process too, via virtual memory. A Unix process can’t corrupt the kernel or other processes. But in practice there’s still a lot you can do with a buffer overflow vulnerability.


The issue with Unix process is that a process has the same rights than the user who ran that process. The design goal of WASI is to provide a capability-based system[1], so an attacker exploiting a buffer overflow wouldn't be able to access things that the original program wasn't supposed to.

[1]: https://hacks.mozilla.org/2019/03/standardizing-wasi-a-webas...


Then it's a bit bizarre to call it memory safe. I would say that "sandbox" is far more established term for what it is or does. Because if by using seccomp and/or user namespaces I would say that Linux/C is memory safe wouldn't it be an unhelpful statement? Can we say that Chrome or Qemu are implemented on memory safe platform?


Wasm doesn't really have "arrays", so I don't think wasm checking them would really make sense. I guess you could argue that this is a distinction without a difference. You still won't get memory unsafety, which in my mind, is the higher order bit. YMMV of course.


You do get memory unsafety. OpenSSL compiled to wasm would still be vulnerable to Heartbleed. The JVM would prevent it.


Please re-read what Rusky said; the point is about the boundary. Yes, wasm programs can mess up their own memory. That’s not what I’m talking about. I should be more clear about this in the future though, thanks.


I don't understand this point. We don't say that C is memory safe because of the kernel boundary, even though this boundary does provide important safety guarantees.

edit: I guess this viewpoint makes sense for the use case of "thousands of wasm programs in the same process." They are protected from each other, in a sense qualitatively the same as Unix processes. This is still a much weaker guarantee than the JVM provides.


The goal is to allow existing C code to run, so in the end you'll need the same freedom when it comes to memory access. The difference between this and plain process is that the sandbox uses a capability-based security model, instead of giving all the user's rights to the process [1].

[1]: https://hacks.mozilla.org/2019/03/standardizing-wasi-a-webas...


There is not a mention of memory safety in this post. It's very confusing term in this context. I wouldn't call running a binary with syscall interception (with qemu for example) as "memory safe".


If the memory safety of your own code is a high priority, consider writing in Rust. If you’re trying to get a C codebase to run everywhere, then that’s fine too but it seems unrealistic to expect the wasm run time to magically make it memory safe.


It is possible to corrupt data on the stack due to out of bounds.


Yes, bugs can always exist.


Not sure what you mean by that, but probably not that stack corruption is a necessary evil of any language.


Not when you use a VM that actually cares about security at all layers, but anyway WASM is doing everything better. /s


Using such a VM requires rewriting existing C code. Wasm's approach allows that to happen incrementally, where it matters, by using memory safe languages like Rust, instead of relying on the VM.


Not at all, as proven by memory tagging on Solaris/SPARC, iOS and upcoming Android/ARM v8.3.

So leaving memory tagging out of WASM was a deliberate design decision.

There are already better alternatives to write secure code if using C is not a requirement, so again nothing new on WASM other than its hype.


Sure, and we use software memory tagging with things like LLVM sanitizers. It's great.

But a) the hardware Wasm needs to support doesn't have tagging, and b) the software equivalent requires support from the allocator and has a large performance penalty.

So yes, it was a deliberate design decision, taken in order to support existing C programs on the Web.

This is really getting tiring- learning from the past is great, but fetishizing it to the point of denying anything new has any value is... not.


WASM wouldn't even be a thing if it wasn't for Mozilla politics against PNaCL.

Tiring is the continuous advertising that WASM is the second coming of Christ in VM implementations.


Those "politics" were there for a reason. Wasm is a solid improvement over PNaCl, which I will note you have curiously left out of all your other old-VM-worship claims.


I see.

Nice, sounds like the Android/Fuchsia approach security-wise :)


> Does WASM have better performance than the JVM? If not, could it have better performance theoretically? Is it more secure? How much slower than a regular binary would it be? etc.

Hard to answer as they're different domains. In theory, once optimized and JIT'd, the asm could be the same on non-GC'd sections. JVM bytecode is higher level which means it both benefits (i.e. can optimize more) and is hamstrung (i.e. GC). As for a "regular binary", the JVM knows no such thing and really neither does WASM. Also depends on what "regular" is (i.e. what does the code do) and which runtimes you use and how much is AOT'd vs JIT'd vs interpreted.


WASM has a subset of a subset of a percentage of features that JVM provides.

It's basically a runtime for C/C++-like feature set [1], JVM has support for many of the features that are still on WASM's roadmap [2]

I would be very wary of any performance comparisons between WASM and ... pretty much anything else (except, possibly, bare C?).

[1] https://webassembly.org/docs/high-level-goals/

[2] https://webassembly.org/docs/future-features/


If I understood this correctly, that's what the Fastly people do. They compare it to bare C binaries and besides string-performance, it holds up good.


Author here- happy to take questions.


How do you achieve the security guaraties for the content of the .so? Are they equivalent to those of wasm? If yes, how (technicalky) you achieve the reported speedup in verifying the security?


The lucet compiler and runtime each assume the other is implemented correctly, and its the job of some external system to make sure the runtime only loads code that came from the compiler. (We have infrastructure at Fastly that already does this for our edge cloud, and every use case will have different requirements).

The shared object code, when executed with lucet-runtime, should provide guarantees equivalent to those given by the wasm spec. However, we're not 100% to spec compliance yet, so there are some corners (e.g. import globals) that are not implemented. These dont cause a problem in practice, at the moment, because none of the toolchains we use to emit wasm use those features.

Loading code into the runtime is fast because its mostly a call to `dlopen`, followed by deserializing some metadata (with effort made to make this as efficient as possible). Instantiating modules is fast because it amortizes the syscall overhead by having pools of instances mostly-set-up ahead of time.


> Loading code into the runtime is fast because its mostly a call to `dlopen

Does that mean that you measure only loading and not the compilation and verification in your “50 ms” time?


Instantiation takes under 50 microseconds (us) on the lab machine I tested on, and 65us on my colleague's laptop. Loading the code takes about that long as well (see thread: https://twitter.com/acfoltzer/status/1111387279434485760). Compilation is not counted towards any of those tallies.

In our use case, we do compilation on a control plane system and distribute the shared object file to the fleet, so we aren't too concerned by it. For applications where compilation time is a bigger concern, I'm super impressed by this (still early) work on a one-pass wasm compiler: https://github.com/CraneStation/lightbeam


> 50 microseconds

Yes, sorry for my error in writing, of course, 50 ms would be only 20 times per second. Thanks a lot for your explanations.


so do we keep a pool of VMs with the so file loaded already and instantiate only the module on demand which means allocation of memory and setup of other data structures


Why did you write your own implementation rather than contribute directly to any of the projects with similar aims, e.g. wasmer or wasmtime.


We started the codebase that became Lucet in July 2017. It went through a few fast iterations, and by Jan 2018 we had Cranelift running with AOT compilation. In the early days of Cranelift that meant adding support for position-independent code output and filling in a bunch of gaps in the x86_64 encodings and other really low level stuff. At the time, wasmtime (under the name wasm-standalone, i think?) was a lot less mature, and wasmer was not public yet. So, in a sense, we did (& continue to) contribute to both of those runtimes via Cranelift and other dependencies like Faerie.

We spent the bulk of 2018 solving performance and integration problems, rather than cleaning up the codebase to the point where we could open source it. In the meantime wasmer was released, and both wasmtime and wasmer made a ton of progress. Now that our stuff is open source, it is easier for us to collaborate with other runtimes, possibly by moving more code back into the common Cranelift parent, or by adopting modules from each other.


Thanks for the answer, that makes a lot of sense.


I'm guessing Lucet was developed concurrently with the others, but unlike them, it had an extended period of internal development and testing first.


What would you say are the current main limitations of this approach? What is Lucet not meant to be good at?


At the moment, a lot of WASM tooling assumes instances interact with a JS engine. So, we aren't compatible with a bunch of existing tooling, like Rust's wasm-bindgen. We're working on our own tool to fill that hole in the ecosystem.

We also aren't 100% of the way to spec compliance yet. We have a plan to get there, but spent the last year or so putting the bulk of our effort into performance and integration with our edge cloud systems.


That's awesome. One of my main complaints with wasm-bindgen is it doesn't let you pull in any C libraries(vs older emscripten path) effectively walling off one of Rusts awesome capabilities(interop with a large established ecosystem).

Whatever tool that takes it's place here would be awesome if you plug in cleanly with the CC crate. I've had a lot of success using Rust as the glue between a lot of existing C/C++ libraries and would love to carry that forward to the WASM world.


I didn't know about WASI before and it looks great. As a C/C++ dev, one of my concerns about WASM was that most of native WASM runtimes are interfacing towards Emscripten which is not quite standardized.

Do you expect WASI will soon replace the current Emscripten exports interfaces?


We hope so! Mozilla has been working on WASI support from Rust, and our team is working on supporting it from AssemblyScript as well: https://github.com/jedisct1/wasa.


Interesting! I wonder this compares with other WebAssembly runtimes (Wasmer?)


Thanks for asking! I'm Syrus, I started the Wasmer project.

Lucet just released, but so far these are the current differences as far as I can tell:

1. Wasmer is meant to execute any WebAssembly file and run on any platform. Wasmer currently runs on Mac, Linux and Windows. Lucet only works on Linux at the moment, since its main goal is to be executed on Fastly's infra.

2. We support multiple compiler backends: dynasm, cranelift and LLVM. Each with a tradeoff between runtime speed and compilation speed (more info here https://github.com/wasmerio/wasmer/tree/master/lib#backends ). Lucet's architecture only supports one backend at the moment.

3. Wasmer has multiple interfaces/ABI integrations, including Emscripten, and soon WASI. Lucet initially shipped with WASI support, which is awesome.

4. Wasmer has a C/C++ API and its runtime can be integrated with other languages (C/C++, Rust and PHP at the moment)

And, of course... a link to the project if anyone else wants to take a look! https://github.com/wasmerio/wasmer


That is a good summary! Lucet's runtime has a C API as well. We haven't created bindings to languages beyond C and Rust, but it should be pretty straightforward using the C API.


Thanks for the correction! And congrats for the great work


Now that I see this, a question comes to mind: Why do we have yet another VM? Why didn't browsers just implement LLVM? Is it the sandbox?

Don't get me wrong, I'm excited to see wasm spread, but the question does cross my mind.


LLVM is not really a VM in the sense of the JVM. LLVM-IR is not platform independent, and was never intended for this use-case.

It is also significantly more complex than wasm, which is partially why wasm ended up working. Start small, add over time: that's the way of the web platform.


More importantly LLVM-IR isn't stable. It's intended to just be an intermediate between the bundled front-ends & back-ends. There's some limited compatibility provided ( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compa... ), but nothing close to suitable for an actually persisted format.


There's a subset of LLVM-IR that IS platform independent. PNacl, and many others rely on this.

The reason for wasm's victory is two found:

1. WASM is a stack based VM (LLVM IR is register based). Additionally, instructions chosen for wasm were inspired by the byte code representation used in existing JS JIT compilers. This meant that WASM almost fit effortlessly into the browser compiler pipeline. Compare this to adding the LLVM jit engine that the nacl philosophy uses. WASM was easier to implement. Stack based VMs are also (apparently) easier to sandbox, with a significantly lower attack surface

2. Firefox was philosophically against nacl/PNacl. Considering how Webview Safari eventually chose LLVM jit for it's JS JIT, as of 2019, PNacl would probably have won, given Microsoft (chromium edge), Google and Safari all support LLVM jit anyways.


Safari ditched LLVM JIT for B3 a couple of years ago and V8 has never used it, so it was never really established for JS engines.


Every reliable complex system started as a reliable simple system and evolved.


"Worse Is Better", but describing it from the other side.


I see, that's a very good point. Thank you.


The spiritual ancestor or wasm, pnacl, was in fact based on a safe subset of the llvm IR. I think llvm/wasm interop will be hugely important, and will receive a lot of attention from the nascent wasm ecosystem.


Where were you when WebGL, WebRTC, HTML5, etc was introduced? Did you ask "Why do we have yet another graphics api/real time communications api/video streaming api?" back then? No you didn't. The web needs solutions that are built for the web, otherwise it's not going to work.

Especially the "Why not the JVM?" questions tick me off. The people asking this question must be living under a rock. The Web already tried java applets and failed.


I suspect that the JVM isn't an option since Oracle v Google showed that it's not as open as we'd hoped and nobody wants to try the same trick with .NET.


Sun and Oracle are pretty open to JVM vendors that don't play license games.

https://en.wikipedia.org/wiki/List_of_Java_virtual_machines#...

Additionally, Google could have bought Sun, and decided to see how it would burn instead.


The JVM isn't an option because java applets have failed. They are a massive security nightmare and they only run java, as opposed to everything (at least that's the promise of WASM).


Google’s PNaCl was essentially portable LLVM.


> With Lucet, Fastly’s edge cloud can execute tens of thousands of WebAssembly programs simultaneously, in the same process, without compromising security. [emphasis mine]

How does it handle Spectre, etc.?


We have a security document that addresses the big picture, and this specific concern as well: https://github.com/fastly/lucet/blob/master/SECURITY.md#cave... For speculative execution, we don't yet implement all of the mitigations possible in Lucet, but will in the near future.


Given that in-process sandboxing in the face of spectre is currently an unsolved problem, how do you plan on actually addressing it? The only established mitigation at this point is to just use process isolation, so what's the plan or is there just not one?


Even interprocess isolation is not a solved problem in the face of SMT (hyperthreading), so yes- there is none. You simply can’t deliver these kinds of shared metal products and defend against cache timing attacks. You also can’t deliver these kinds of products at this price and be profitable without sharing metal, so here we are. If you care about being isolated, you can’t share hardware- simple as that.


There's a huge difference in viability & mitigations here, though. In a shared metal system a portsmash attack isn't likely to be practically viable, and worst case is sched affinity or similar is used to just firewall off processes from co-inhabiting the same physical core. Kernel scheduler could even do this in a clever way, this has a very clear path to which it is basically fully mitigated.

So all signs point to shared metal still being viable security-wise, with mitigations either already deployed or pretty straightforward. Shared hardware will remain totally fine.

Shared-process, though, nobody is really talking about making that viable from a security perspective.


So currently it does compromise on security? Seems extremely misleading to claim any security if you are just currently ignoring the last years worth of major security issues.


From what I gather the thing is based on WASI which in early beta and for which many parts don't exist or don't work, networking and file access being a few of those [1]: Note that everything here is a prototype, and while a lot of stuff works, there are numerous missing features and some rough edges. One big thing that's not done yet is the actual mechanism to provide a directory as a pre-opened capability, to allow files to be opened. Some of the pieces are there (__wasilibc_register_preopened_fd) but they're not used yet. Networking support is also incomplete.

In other words, this is a somewhat premature announcement when it comes to fulfilling those promises.

[1] https://github.com/CraneStation/wasmtime/blob/master/docs/WA...


That sentence about pre-opened directory capabilities not being supported is actually outdated. They're supported now, so I've now updated the documentation. Thanks for pointing that out!


It also compromises on resource abuse: "lucet does not currently provide a framework for protecting against guests that consume excessive CPU time (e.g. via an infinite loop). These protections must be provided by the host environment."

I'm not sure how you're supposed to handle that, either, given host environment usually does limiting at a process granularity but this doesn't use multiple processes.


Most OSes have a way to specify priority for a given thread.


Priority yes, but that's barely useful. cgroups, rlimits, cpulimit, etc... are far more useful here, and are all per-process.


You can use timer_create to arrange to deliver a signal after some amount of CPU time elapsed, then terminate the sandbox from the signal handler.


It would be interesting to see if you could compile+run+reproduce this:

https://github.com/flxwu/spectre-attack-demo


Excited to see this come about and how it could be used with the OpenFaaS watchdog on Kubernetes. https://docs.openfaas.com/architecture/watchdog/ - is the 5 nano seconds the time to fork at the OS level or a kind of in-process hot performance?

I got an error with the example however.. is everyone else seeing the same thing?

Unpacking wasi-sdk (3.0) ... Setting up wasi-sdk (3.0) ... Removing intermediate container d552f4538e26 ---> 713ff6032205 Step 8/8 : ENV WASI_SDK=/opt/wasi-sdk ---> Running in 4189f307a30e Removing intermediate container 4189f307a30e ---> a142a5620a28 Successfully built a142a5620a28 Successfully tagged lucet-dev:latest Lucet hasn't been installed yet... installing... Creating a RELEASE build cargo build --all --release --bins --lib error: failed to read `/lucet/pwasm-validation/Cargo.toml`

Caused by: No such file or directory (os error 2) Makefile:11: recipe for target 'build' failed make: * [build] Error 101


to answer the first question: It is 50us create a new instance from a loaded WebAssembly module. The module is compiled AOT into a shared object file, which is then loaded into the runtime using `dlopen`. We create instances from a region, which is basically a pool of memory that is already mostly setup, to minimize the computation & syscalls required in instance creation.


You need to checkout submodules - $ git submodule init && git submodule update. Sorry, multiple people have reported this problem, and we're adding it to the docs right now!


Indeed.

And even if you know that a project uses submodules, forgetting `--recursive` happens constantly.

The script will now automatically install the required submodules.


Native WebAssembly sure is an exciting topic. But I can't think of a single concrete use case that someone could use---besides having another standard for secure bytecode to choose from. Help?


As the blog post mentions,

> Lucet is designed to take WebAssembly beyond the browser, and build a platform for faster, safer execution on Fastly’s edge cloud.

> Lucet is the engine behind Terrarium, our experimental platform for edge computation using WebAssembly. Soon, we will make it available on Fastly’s edge cloud as well.

You can see the Terrarium announcement here: https://www.fastly.com/blog/edge-programming-rust-web-assemb...

So that's at one concrete use case for you! I'm sure we'll be seeing more pop up in the future, there's so much going on here.


It could take off for the same reason node.js took off: it's easy and simple to leverage your existing frontend codebase for backend use. Running the same webassembly directly in your backend environment may be easier than compiling to a different target and debugging the quirky differences.


For example integrating third party crossplatform code (plugins, mods, behaviors, etc) into native projects (game engines, databases, etc).


Can you comment on how big an effort it would be to support arm platforms?


how does it look like this will work from a stack and a request/response flow perspective?

meaning, am I calling it as a function from within my vcl config? or am i mapping my service-id straight to a binary?


We're not quite there yet, but we're working on answering those questions right now. Stay tuned!


Why use this as opposed to Google NaCl or PNaCl?



The NaCl/PNaCl team is working on Web Assembly these days, for one.


(P)NaCl is dead a long long time ago


Google already deprecated them in favor of WebAssembly.


How does Lucet's WebAssembly performance compare to LuaJIT (one of the fastest JITs in existence) right now including VM warm-up time? Also, what's the GUI story like outside of browsers?


I have not compared it to LuaJIT, but given that Lucet uses an AOT architecture, its hard to make a fair comparison.

There is no GUI story yet, but Lucet provides a WASI (https://wasi.dev) implementation, so as that standard evolves, we may be able to support GUIs through those interfaces.


What other fast JITs in existence are there that are worth noting?


HotSpot, the JavaScript JITs?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: