I'd love to see an eBPF vs. WASM compare/contrast. These seem to be two emerging technologies for sandboxing code in non-GC-oriented runtimes. Both have compiler backends now. It would be very informative to see how their choices are similar and different. I'm also curious if there are differences in the requirements.
You can, it just won't be validated as the CFG read by the kernel has to be a DAG. But the compiler has no problems emitting code with arbitrary loops. cBPF is different; branch offsets are unsigned there.
I'm working on an exokernel built around a eBPF VM that'll let you use loops in certain circumstances (ie. more or less regular threads that happen to be in kernel space and are preemptible).
> You can, it just won't be validated as the CFG read by the kernel has to be a DAG.
Are you saying the kernel bpf verifier can be disabled when loading a probe? AFAIK, there is no such option.
TBH, I cannot understand this statement. I've written a few thousands of lines of BCC C code as eBPF. Never encounter any reference to actually have loops in the code.
What do you mean? The source is all open, and the bytecode format is well documented. There's nothing stopping anyone from writing their own with different semantics.
There's no doc on my work as it's private jerk offy personal project that I work on when I get tired of jira tasks and process.
You can patch your kernel, or write your own vm. BPF is ultimately just a kernel module under Linux. My vm runs in kernel space just fine as well (but is built on something that looks more like sel4 rather than Linux on the inside).
If you provided enough of std, or ported that VM to no_std that rust vm would work just fine in the kernel too.
I feel like we're talking on two different levels here.
It's sort of like how Oak was this neat virtual machine for running on a early 90s PDA prototype. Then the writers of that VM realized that they had written a really general purpose VM, cleaned it up and released the first Java.
This general of a VM (talking about eBPF now) hasn't been a first class citizen of a mainstream kernel before. The devs are taking a very cautious approach (as they should), but ultimately eBPF is way bigger than a tracing tool. I wouldn't be surprised to see nearly everything you currently do with a kernel module ultimately being allowed by eBPF too. Maybe more like emulating other OS's kernels as easily as you'd start another container.
One similarity seems to be the way they both restrict user access to the stack to prevent stack corruption or overflow. The article talks about how eBPF doesn't have an explicit stack pointer and doesn't give a function access to caller or callee stack frames. I have read similar things about how WASM protects its stack, though I don't know much of the details.
I mean, their runtime models might share similarities but eBPF is not meant to be a general purpose machine, it is kinda a kernel add-on for programmable tracing.
>it is kinda a kernel add-on for programmable tracing.
Well, that's the thing I personally mostly use it for via bpftrace and bcc, but that's not the only thing. It's being used for a lot of networking related things too. XDP, CloudFlare uses it for a lot of their DDoS mitigation, etc.
There is ongoing to work to allow support for counted looping (including a working patchset that is under review) but in general it's true that an eBPF program does not allow a given segment of code to have multiple entry points (a generalization of the prior rule that jumps may only jump forward in the instruction stream).
Wasm is not sandboxed. Wasm is simply another standard way of writing instructions that can be platform independent. Wasm has no standard library software for sandboxing. Sandboxing is entirely dependent upon software implimenting execution of wasm instructions. Very few do this, there are fewer from reputable sources that do it without a JavaScript engine, and none that put sandboxing first.
WASM implementations are sandboxed - which can be defeated in theory as,
given that there were POC spectre attacks on Javascript VMs
it must be possible to do the same on what would be a Webassembly
frontend to the same backend (practically) - but eBPF (or the validator
to be more specific) is designed to conservatively only accept programs that it can guarantee have certain semantics.
That's entirely dependent upon the runtime. Wasm is advertised as many things it's not. The only thing it is, is a set of instructions. Here, go through the list. https://github.com/appcypher/awesome-wasm-runtimes
And all of those basic instructions only operate in a protected memory space. There is no stack to manipulate,no way to do syscalls or interact with the host environment in any way that is not mediated by the runtime.
Wasm instructions are not native instructions.
The spec [1] clearly states that Webassembly is sandboxed.
A non-sandboxing implementation would be either non-compliant or have bugs (which admittedly they most likely do at this point, considering they are still new).
Which is all completely dependent upon the implimentation of the executing software.
Lets take it from the beginning,
1. wasm is a set of instructions.
2. those instructions have to be turned into instructions the hardware understands to execute them.
No where in here is there a requirement of sandboxing. With a sandbox, a 'protected memory space' is dependent upon the implimentor. There's no such thing as a magic software sandbox that you just drop into software and congrats, secure. You impliment it.
> No where in here is there a requirement of sandboxing
Let me quote the Webassembly spec [1]:
> WebAssembly provides no ambient access to the computing environment in which code is executed. Any interaction with the environment, such as I/O, access to resources, or operating system calls, can only be performed by invoking functions provided by the embedder and imported into a WebAssembly module.
Those limitations are true of most virtual machines. Lua bytecode has no opcodes for doing I/O, invoking syscalls, or anything else that interacts outside the VM environment. AFAICT, Python is similar with the exception of some opcodes for printing to stdout. To interact with the outside environment you must load and invoke code from modules, so access is intrinsically limited by whatever modules are permitted to be loaded. In Lua a new VM state context has no modules loaded at all--not even for the string module; nor any ability to load modules--the C application needs to explicitly load the package module to register "require" in the environment.
There is a seperation between spec and implimentation. That quote doesn't at all disagree with me. That's how wasm is meant to be, but doesn't give any detail as to how that's achieved programatically. You can impliment all of that and still have a vulnerable runtime because of how you implimented it. You can also just put in system access as functions, as some runtimes do because they don't care about sandboxing.
> You can impliment all of that and still have a vulnerable runtime because of how you implimented it.
This applies to every single sandboxed language in the world.
The instructions are designed so that sandboxing the actual core is trivial, and the spec says that you have to sandbox outside of deliberate pass-throughs. That's about the best you can possibly do.
If languages can qualify as sandboxed, it sounds like WASM qualifies. (And if they can't, then we're using a broken definition of "sandboxed".)
Languages cannot qualify as sandboxed unless that language comes with a runtime that has a sandbox. Wasm does not. There is no such thing as 'trivial to sandbox'. Wasm doesn't introduce anything new in terms of it's instructions, it's bound by all the same mistakes and errors developers will make in sandboxing as there has been in the past.
So even if all possible runtimes have to be sandboxed or they're not actually implementing the language, it's not possible for a language to qualify as "sandboxed"?
Then I stand by what I said before. Your definition is broken, and you're making a semantic argument rather than actually discussing eBPF and WASM.
When you see someone say "sandboxed language" read it as "language where conforming implementations are by definition sandboxed". WASM meets that definition, as far as I can tell.
When you see someone say "WASM is sandboxed" read it as "any runtime that implements the WASM spec is sandboxed".
If wasm had a standard runtime everyone used, sure. It doesn't, runtimes are significantly fragmented. Therefore 'wasm is sandboxed' is not true, and in many cases those sandboxes are not at all being audited. It is very dangerous to make broad and demonstrably untrue statements about software security. Wasm is not a sandbox, wasm is a set of instructions. Your quality of sandbox, if at all, is up to what runtime you use. No amount of word play will change that.
The defined semantics for those instructions include sandboxing. If there is no sandbox, it's not WASM. It wouldn't be implementing the instructions as described in the spec.
You can argue that a sandbox might be low quality. That's fine. But it doesn't make it non-sandboxed.
If someone guesses what all the instructions are supposed to do and implements the wrong semantics, they didn't actually implement the same instruction set!
When all your security issues are violations of the spec, then it is not the language in the spec that is insecure.
> That's how wasm is meant to be, but doesn't give any detail as to how that's achieved programatically
True, but a conforming implementation has at least some sandboxing, since it prevents arbitrary memory access. But the degree to which it is sandboxed depends on the functions that are exposed by the runtime.
I don't see where you're going with that. Doesn't that definition apply to eBPF as well? I feel like you're trying to win a purely semantic argument which doesn't really advance the discussion.
eBPF programs are supposed to finish in finite time, which is verified at load time. The kernel also provides some data structures like maps/arrays which are of constant size as I can recall. Hence it is not suitable for general purpose programs unlike WASM.
eBPF is not for general purpose computation, it mostly functions as a sophisticated query language for extracting data, it doesn't even have loops. it is not comparable to WASM.
eBPF does have loops and it is absolutely for general purpose computation.
The verifier that exists in the Linux kernel works very hard to make sure that you are not allowed to load programs with unbounded loops but you most certainly can when those restrictions are lifted.
This is trivial, when people talk about eBPF they are talking about the implementation, its like saying you could put lisp code in C, if only you modified the C compiler.
The implementation that exists in Linux is perfectly capable of running unbounded loops. But it runs your program through a function that rejects your program if it can't prove that it terminates.
The point being that you could rip the eBPF implementation out of the kernel, remove the verify check and have a very usable VM.
Not true, the instruction set of eBPF itself /is/ Turing complete. Linux is one implementation of it and the verifier currently imposes restrictions in order to not destabilize the kernel (e.g. infinite loops). But that doesn't mean it's not not Turing complete. E.g. in future there could potentially also be a mode that allows to run unverified eBPF programs. Think of it like kernel modules which are also not verified for safety, but can be loaded with the right permissions like CAP_SYS_MODULE. In future I can image a similar mode/option for eBPF as well as long as the user has the right permissions to do so.