Cervus: A WebAssembly subsystem for Linux

kccqzy · on May 30, 2018

The next step of evolution predicted by Gary Bernhardt. https://www.destroyallsoftware.com/talks/the-birth-and-death...

giancarlostoro · on May 30, 2018

I've seen this talk linked so many times and now that I've finally watched it, definitely recommend it it's pretty good.

no_identd · on May 30, 2018

A few related matters which might seem unrelated until one starts seeing the bigger picture:

http://lampwww.epfl.ch/~amin/pub/collapsing-towers.pdf Amin, Nada; Rompf, Tiark - Collapsing Towers of Interpreters [January 2018]

http://bootstrappable.org/ (see also http://langsec.org )

https://docs.racket-lang.org/medic/index.html - paper explaining it: https://www.cs.utah.edu/plt/publications/fpw15-lf.pdf

sequel to that paper: https://dl.acm.org/citation.cfm?id=3136019

https://www.reddit.com/r/nosyntax

https://www.reddit.com/r/programming/comments/2gw9u8/program...

https://grothoff.org/christian/habil.pdf The GNUnet System

https://wiki.debian.org/SameKernel

http://drops.dagstuhl.de/opus/volltexte/2017/7276/pdf/LIPIcs... Wang, Fei; Rompf, Tiark - Towards Strong Normalization for Dependent Object Types (DOT)

https://www.reddit.com/r/MachineLearning/comments/7s9etv/r_b...

https://news.ycombinator.com/item?id=16343020 Symbolic Assembly: Using Clojure to Meta-program Bytecode - Ramsey Nasser

http://conal.net/papers/compiling-to-categories/

https://icfp17.sigplan.org/event/icfp-2017-papers-kami-a-pla...

https://www.reddit.com/r/asm/comments/7af7a4/def_con_25_xlog...

Each of these, and this, paves another brick into a road to a very very different computational paradigm... I post this here without much explanation (and I've left a lot of other very relevant stuff out) of how these parts fit together, and I apologize for that, but I simply lack the time to give this the proper writeup it would deserve.

brian_herman · on May 30, 2018

The end is neigh and it is the death of javascript.

emmelaich · on May 30, 2018

Don't horse around. The word is `nigh`

zaarn · on May 30, 2018

I think it's the start of something wonderful...

chatmasta · on May 30, 2018

From the readme:

> I'm busy with my College Entrance Examination until ~June 10, 2018

This dude is in high school?! When I was your age I thought I was smart for writing a youtube scraper in PHP...

Awesome work, really creative solution. Good luck to you.

rqs · on May 30, 2018

From hes profile: https://github.com/losfair

> High school student. Interested in rust, operating systems, distributed systems and virtualization.

Hehe, when I was in hes age, my friend call me a hacker because I know how to use ping and ipconfig command on Windows.

Compare to him, I was really lame. What makes me more sad is, I'm still lame even today LOL.

danielvaughn · on May 30, 2018

If you want to feel better, I didn't know what a variable was until I was 28. I feel pretty lame compared to virtually everyone here on HN haha.

rqs · on May 30, 2018

Well, then, here, take my hug (>^_^)>

danielvaughn · on May 31, 2018

<(^_^<)

knoq · on May 30, 2018

Yay progress!

Really though, they seem quite bright. But I like seeing younger people excelling like this because it means the tutorials, wikis, blogs, etc. that their "forefathers" wrote are doing "good" in the world.

steveklabnik · on May 30, 2018

I believe the other person working on a wasm kernel in Rust is also in high school. They take two different approaches, I can’t wait to see how all of this turns out!

zamber · on May 30, 2018

A real High School drama, not that fake stuff on the telly ;).

nynx · on May 31, 2018

Yep, almost done though!

akhilcacharya · on May 30, 2018

And now I'm sad

erlend_sh · on May 30, 2018

Highly related: https://github.com/nebulet/nebulet

“(Going to be) A microkernel that implements a WebAssembly "usermode" that runs in Ring 0.”

It’s inspired by the Microsoft experiment Singularity OS.

davidgrenier · on May 30, 2018

I've been wondering for a while as to why nobody was trying to run an entire VM at ring 0, the benefits would be significant. I was just not aware that's what Singularity/Midori were doing.

I'm glad more people are picking up on it.

monocasa · on May 30, 2018

Linux and the BSDs have had a VM for decades in the form of BPF.

Before that the exokernels absolutely loved VMs at ring 0.

masklinn · on May 30, 2018

Doesn't Ling/ErlangOnXen run in Ring0?

qop · on May 30, 2018

Ling has the concept of hypercalls, as if you were running plain Linux on xen. So, it's less an actual unikernel and more like a skin that molds itself to xen, and looks like a unikernel.

So yes. But no.

codetrotter · on May 30, 2018

What are the benefits of doing so?

naasking · on May 30, 2018

The cost of context switching is near zero. No need for expensive TLB flushes, for instance. No need for paged memory whose overhead sometimes accounts for up to 50% of program runtimes. Better isolation properties, since you can sandbox individual objects instead of whole processes.

lkurusa · on May 30, 2018

> expensive TLB flushes

Note that modern CPUs store a tag of the current "address space ID" next to the TLB line, thus the cost of the flushing is heavily reduced.

> No need for paged memory

Unless you mean swapping, this doesn't apply as x86_64 long mode requires paging to be enabled.

naasking · on May 30, 2018

> Note that modern CPUs store a tag of the current "address space ID" next to the TLB line, thus the cost of the flushing is heavily reduced.

Yes, tagged TLBs are much better, but the overhead is still not negligible. You can check the microkernel literature for all of the inefficiencies encountered in modern CPUs, and a language-based OS would eliminate most of them because protection is moved into the language itself.

> Unless you mean swapping, this doesn't apply as x86_64 long mode requires paging to be enabled.

a) This benefits from better caching since there's only one set of page tables, and b) CPUs are designed around the common uses, so if this OS design catches on, you'll start seeing CPUs that don't require page tables.

monocasa · on May 30, 2018

> Note that modern CPUs store a tag of the current "address space ID" next to the TLB line, thus the cost of the flushing is heavily reduced.

Except you flush the TLBs as part of the meltdown mitigation.

pjmlp · on May 30, 2018

It could also be inspired by many others.

I suggest some reading about Xerox PARC and ETHZ OSes.

JNode and CosmOS are also interesting ones.

masklinn · on May 30, 2018

That description sounds a lot like a unikernel-type system.

jchw · on May 30, 2018

Having web assembly be a native subsystem... That's brilliant. Why has this not been attempted for Java or anything else for that matter? I guess you could count Microsoft's .NET implementation.

In any case, you could extend this beyond just user mode. Currently the domain of safe ring0 execution is eBPF as far as I know, but this would be way more approachable and other operating systems could implement it.

I've got no idea what the future is for this project but I really hope it doesn't stay in the realm of "fascinating but not practical."

masklinn · on May 30, 2018

> Having web assembly be a native subsystem... That's brilliant. Why has this not been attempted for Java or anything else for that matter?

There have been CPUs which could execute Java bytecode directly…

And not just actual Java processors, ARM has/had an extension for that: https://en.wikipedia.org/wiki/Jazelle

jchw · on May 30, 2018

I was aware of the Java CPUs, but somehow that seems less interesting. Maybe because at that point, it's not so different from any other CPU architecture.

pjmlp · on May 30, 2018

It has.

I remember there was Linux Journal article about it, almost 20 years ago, regarding Java.

Also this is how mainframes like IBM i work. Binaries use the TIMI format and are JITed into native code at installation time or on demand, if needed.

jacobush · on May 30, 2018

A lot was going on with Java in the early days. On the one hand, SUN resisted it, back when Java was not open source, then open source but with a heavy hand on it.

On the other RM Stallman also resisted everything Java, because it did not mesh with his vision of the future. (Prop up everything GPL, unless the proprietary solution is already an industry standard.) My favourite example was how GCC was on the cusp of getting a backend which could emit Java byte code. Imagine that!

That could have kickstarted all the JVM languages a decade earlier. Alas, many forces were against such things.

Now, we have everything on everything, but most of all Javascript everywhere. Weird. :-)

fnord123 · on May 30, 2018

You can write kernel modules for netbsd in lua.

http://mail-index.netbsd.org/source-changes/2013/10/16/msg04...

izacus · on May 30, 2018

Isn't that what JVM actually is? Java subsystem for Linux/et. al?

sanxiyn · on May 30, 2018

No, JVM makes syscalls to Linux. This makes calls, not syscalls, to Linux. There is no context switching.

zaarn · on May 30, 2018

WA in the kernel could lead to drivers being established as WA code with special interfaces. With safe interfaces to stuff like PCIe devices, the linux kernel could transform into a more hybrid kernel, similar to NT. Many funs to be had!

I imagine it could also be useful to run user programs without having to switch rings all the time...

sime2009 · on May 30, 2018

> I imagine it could also be useful to run user programs without having to switch rings all the time

That is pretty much exactly what this project is aiming at:

"Cervus implements a WebAssembly "usermode" on top of the Linux kernel (which tries to follows the CommonWA specification), enabling wasm applications to run directly in ring 0, while still ensuring safety and security."

jacobush · on May 30, 2018

Reminds me of how Forth was supposed to be the language for drivers and such, write once, run everywhere drivers.

qznc · on May 30, 2018

Another guy evaluating similar stuff: https://idea.popcount.org/2017-03-28-sandboxing-landscape/

titzer · on May 30, 2018

Do not run untrusted code at ring 0, regardless of software sandboxing technology. It's just too risky!

Otherwise neat.

geofft · on May 30, 2018

Why do you claim this?

1. Hardware sandboxing (i.e., ring not-0) isn't much better, as seen by Meltdown.

2. Most production-ready UNIXish kernels have had support for running untrusted code (namely BPF bytecode) in the kernel for decades.

3. Do you really trust all the code currently running in ring 0 on your computer? In particular, do you trust the executable loader. which handles complex untrusted input? What's the line between "code" and "not code"?

4. On most desktop machines, there's a single user, and malware being unable to get to ring 0 isn't particularly stymied; it can still exfiltrate your files, stream your webcam, log into your bank, etc. Why is ring 0 more of a concern than untrusted code elsewhere? (In fact, on most Linux desktops, malware can wait until the user runs sudo, inject itself in, and then run insmod and get to ring 0 directly....)

5. These same machines make a practice of running untrusted, JITted JavaScript and WebAssemy all the time inside the same sandbox you think is too dangerous, and the sandbox works. Not perfectly, of course, but also certainly much better than, say, the Linux kernel protects itself from local privilege escalations. Why is the same software sandbox too dangerous for use in kernelspace?

titzer · on May 30, 2018

> 1. Hardware sandboxing (i.e., ring not-0) isn't much better, as seen by Meltdown.

Meltdown was a single Intel bug, it did not occur on other CPU architectures or on AMD chips. It was a result of asynchronous permission checking and it is a side-channel disclosure (a non-write bug). It is objectively not as bad as the tens of thousands of buffer overruns and memory write vulnerabilities in software.

> 2. Most production-ready UNIXish kernels have had support for running untrusted code (namely BPF bytecode) in the kernel for decades.

Actually it is a security vulnerability as well. In fact, the Project Zero proof of concept for Variant 1 of Spectre was an attack on the BPF interpreter, not even a JIT, it's even worse with a JIT.

> 3. Do you really trust all the code currently running in ring 0 on your computer? In particular, do you trust the executable loader. which handles complex untrusted input? What's the line between "code" and "not code"?

There are levels of trust, of course. I trust the Linux kernel a heck of a lot more than, e.g. V8. And I work on V8. On the WebAssembly implementation. I didn't want to mention it, but yeah, no, I would not put my own code into the kernel.

> 4. On most desktop machines, there's a single user

Again, levels of trust. I would not, e.g. run most userspace software in the kernel, just because it's so broken it will probably bring down the system. All the other things you mention are made easier, not harder by running in the kernel.

> 5. These same machines make a practice of running untrusted, JITted JavaScript and WebAssemy all the time inside the same sandbox you think is too dangerous, and the sandbox works.

I don't want to scare you, but please don't labor under the assumption that web browsers are 100% secure. We have tons of bugs. I mentioned above that the WebAssembly implementation in Chrome is a lot of my work. We've had security vulnerabilities.

> Not perfectly, of course, but also certainly much better than, say, the Linux kernel protects itself from local privilege escalations. Why is the same software sandbox too dangerous for use in kernelspace?

Objectively, no, it isn't better than the Linux kernel. And yes, it is too dangerous. This is based on the hundreds of security bugs that I've been involved with while working on Chrome, and the hundreds more that I wasn't involved with, and the probably hundreds more that are hiding in there. Yes, we take security seriously, and we are very sober about this.

geofft · on May 30, 2018

Thank you for working on V8 and the WebAssembly implementation. :)

I still think that this is a case of knowing how the sausage is made. I've operated public-facing Linux systems for many years (but I am not either a kernel nor V8 developer, so you know what you're talking about more than I do) and the Linux kernel is ... not good. You take security seriously, and find hundreds of bugs, and it scares you, which is great. The Linux kernel does not (remember the whole "security bugs are just normal bugs" thing, plus the resistance to architectural improvements that kill bug classes, which as far as I can tell you folks seem to be very excited about).

I would hope that you think that V8 + the associated Chrome sandbox (which, to be fair, I think does not have an equivalent in this project) is secure enough to be exposed to random JavaScript / WebAssembly from malicious parties on the internet running and updating 24/7, and keep things reasonably safe, because a billion people do exactly that. I'm not saying it's perfect or unbreakable - I'm just saying I definitely don't trust Linux to be secure against random userspace from malicious parties on the internet running and updating 24/7.

davidgrenier · on May 30, 2018

There's a difference between software sandboxing of native code and not being able to run native code at all. WA being an intermediate representation, it's a different story.

I'm not saying there may not be flaws in WebAssembly or what Cervus delivers.

titzer · on May 30, 2018

Maybe my original comment wasn't clear. WebAssembly is a software sandboxing technology. Just because it is JIT compiled and has bounds checked memory does not make it secure enough to run in the kernel; a bug in the WASM engine would be a kernel exploit, and side channels would be kernel side channels.

Again, do not run untrusted code at ring 0, regardless of sandboxing technology.

mbrumlow · on May 30, 2018

I thought the point was that it became trusted code as it was transpiled into WA, due to all the checks and what not that could be applied.

So as you said a bug in the system would be a kernel exploit. How would this be any different than a exploit in today's kernels? The result would be the same, a bug fix to a kernel, or a fix to the transpiler.

My point is while user running programs in ring 3 protect from bugs and exploits from trashing the system, but they do not protect from bugs in the kernel from trashing the system. In this case you have semantically moved where the bugs can be from the kernel to compile time.

WEBASM engine bug would == kernel bug; if that system is working right then there should be no issue with running in ring 0. Think of it as compile time protection vs run time protection.

I have just woke up and feel I am doing a terrible job of conveying this.

(side note, is the ring system not just another sandboxing technology just facilitated by the CPU? And have we not just had major failures in some of those systems?)

titzer · on May 30, 2018

> So as you said a bug in the system would be a kernel exploit. How would this be any different than a exploit in today's kernels?

Because now you moved a massive amount of software (namely, the engine implementation, which includes a dynamic compiler, memory management, runtime system, etc--850,000 lines of code for V8) into the kernel, and you just eschewed the simplest of hardware mechanisms (which have been very carefully designed and tests for 50 years, plus verified and proved formally correct by hardware designers) for a very complex set of software checks that are part of a rapidly changing software system that has had dozens upon dozens of security bugs.

The whole point of defense is depth is to add additional layers of security. E.g in a browser, if software checks fail in, (ring 3 userspace) the sandboxing of system calls still doesn't allow a rogue process to even access the filesystem or make kernel calls. Then, on top of that, hardware address translation means that a compromised process cannot attack other processes. If it's all in one giant address space in ring 0, that means a single vulnerability compromises the entire system.

> My point is while user running programs in ring 3 protect from bugs and exploits from trashing the system, but they do not protect from bugs in the kernel from trashing the system.

The whole point is to reduce the TCB (trusted computing base). Bugs in the kernel are rarer because it's smaller, tested more thoroughly, has a clearer and simpler contract, changes slower, and is written by a smaller set of experts than, e.g. random userspace software.

> I have just woke up and feel I am doing a terrible job of conveying this.

No worries. Here's some background that might be useful for the discussion: https://en.wikipedia.org/wiki/Trusted_computing_base

In general, you want to minimize the trusted computing base (i.e. that running in ring 0), and you don't typically want to put a Turing machine inside it!

mbrumlow · on May 30, 2018

I don't think there is a massive amount of software required. The linked module above is just over 2k lines and compiles to around ~250kb in size. Clearly not all of those things you mentioned. All you need is the a implementation of the wasm state machine. Keep in mind this is WASM not javascript, and not asm.js. Its a entirely new platform independent specification for a byte code.

I would wager to say a implementation in rust is far safer than all of the code that goes into today's linux kernel to run non wasm binaries.

You linked me to TCB, the entire point of the the OPs link to nebullit and cervus is that with this new strategy a entire new security module and way to think about and run un-trusted code has come about. So linking to old ideals and papers explaining how computer security works today only can be used as how things are done now. When the talk about running in ring 0 comes up they are challenging those very ideas. And doing a good job of showing how it can safely be done.

Baring bugs -- yes bugs can happen -- in both the old way and in the new ideas being tossed around with wasm they both are able to provide a level of security. One relies on hardware that can't be easily changed (microcode, new cpu...) , is hard to audit -- and in some cases impossible to audit.

The new notion of compile time checking with wasm allows for a clean approach to ensuring bad programs don't crash the system. Because it does not rely on hardware, but code it can be updated and audited.

I am not arguing that somebodies show HN fun toy is going to be better than 50 years of progress. But I am arguing that a few good years of investment can jump us past those 50 years into a new age of computing, not bogged down by legacy cpu architectures.

pitaj · on May 30, 2018

The parts of V8 handling WASM don't include the JS parser / compiler / optimizer.

vitno · on May 30, 2018

This is a cool idea I've seen increasingly talked about, but in practice? I hope it never happens. Single Address space computing is a bad idea.

https://www.cs.princeton.edu/~appel/papers/memerr.pdf

Asiasweatworker · on May 30, 2018

So NaCL and PNaCL are not good enough to be comparable with WASM?

monocasa · on May 30, 2018

NaCL relied on a lot of segment register tricks that are only available to ring 3 AFAIK.

tree_of_item · on May 30, 2018

Can someone explain what this is actually useful for, if you're not a kernel developer? Would people in userland care about anything like this?

traverseda · on May 30, 2018

Gary Bernhardt explains it better in this talk: https://www.destroyallsoftware.com/talks/the-birth-and-death...

But to summarize, jumps between kernal-space and user-space are expensive. Instead of doing that, we can run a well-vetted interpretor in kernal-space, and run "userspace" programs in kernal-space, in the interpretor.

This actually isn't slower (or so it is claimed), because a JITed interpretor can be native speed on hot-code paths, and the inefficiencies for most workloads are more than made up for by not having expensive syscalls.

So what you end up with is something that is about as fast as normal compiled code for cpu-intensive workloads (maybe faster sometimes), much faster for workloads involving a lot of syscalls, and interpreted languages like python/javascript end up much faster as well, presuming they can take advantage of the efficient JIT implementation.

Personally, what most exites me about this technology path, is that it should reduce the cost of interprocess communication to near zero. Combined with a shared object model, and a capabilities system, it could be pretty awesome.

solidsnack9000 · on May 30, 2018

The most important applications are likely data processing -- batch and real-time -- and very high performance web applications. These are environments where:

* Different applications are typically isolated from one another by being on different machines.

* Application and hardware failures are handled with the "let if fail" philosophy, where individual machines are treated as disposable.

* Components are written in-house and typically are quite trusted (even if they don't deserve to be).

People in "userspace" do, on occasion, care a lot about the overhead of syscalls. Userspace networking -- https://lwn.net/Articles/713918/ -- is a different approach, where a functionality is moved wholly out of the kernel.

IshKebab · on May 30, 2018

You can run safely userland code in the kernel context. Makes it faster.

Does make Spectre rather more potent thought!