Not until hardware became fast enough and compilers performant enough to support this, and not until everyone had to learn their lessons with approaches like the JVM.
I meant that automatic garbage collection is not always the way to go for some people, especially in the days when JVM GCs were not as advanced as they are now.
Ok. Yes, ideally you'd allow the user of your VM to implement their own GC. But to make this fast you typically need to support special memory barrier instructions, like a modern CPU does.
Anyway, the GC-included approach is of course the one taken by Javascript and JVM.
On mainstream it already caught on with UCSD Pascal and VB (only compiled on VB 6.0 and P-Code was still an option), on the server room since a few decades, with IBM and Unisys mainframes being the survivors of such approach.
Is this a reference to another old language whose name we mispronounce today? Or just a joke to say that by 2035 JavaScript will be so irrelevant that we won't even know how its name was pronounced?
The direct inspiration of WebAssembly was asm.js, which is what the talk is about. A future where everything targets asm.js and vendors don't take the next step and create wasm.
The talk has predicts a lot of what is happening around wasm.
Your response is technically correct but misses the entire point of the video.
To make a tl;dr, the video talks about a technology called asm.js which other applications end up compiled into so they can run with near-native speeds on any system which supports asm.js. This technology ends up being a compilation target meaning that JS gets eventually forgotten as a language because everyone just compiles everything into asm.js which is executed natively on the kernel.
Just do a s/asm.js/WASM/g and it'll suddenly start making sense.
Yes, the video is pretty amazing - Gary in 2014 managed to predict pretty much everything that's happening with WASM nowadays: the push for safe execution of software compiled in unsafe languages, a common VMesque instruction set, web browsers implementing it first and pushing the frontier in this regard, and now kernels also starting to be capable of running that in order to avoid the cost of switching rings.
What we're yet to see is this last point becoming the default, so execution of user programs kind-of makes a circle by returning to kernel mode, except now a trusted WASM-to-native compiler/runtime is responsible for ensuring system safety. Might happen in the next few years.
Also the Common Lisp OS Mezzano[0] runs like that. It runs Doom and Quake by means of lowering C to LLVM-IR which is then lowered to a subset of CL which is then compiled by the trusted native Mezzano compiler.
I was playing around with server-side wasm by way of OpenCL compiled to that and interfaced via Deno. It felt pretty cool until I realized wasm is 32-bit and this limits memory to 4GB for whatever process you've compiled in it. Unfortunately this obliterated it as an option for our use case.
The idea is still really interesting, though. It's almost like micro-containerization. I was reading this and it's what sold me on attempting it:
The Memory64 proposal is coming along, with experimental support in Firefox, Chrome, Wasmtime, Node.js, and Deno. ([0] has the status of the current proposals with a supported matrix)
Hopefully it'll get fully supported soon!
wasm doesn’t have threads right now, so I wonder if you couldn’t just make multiple wasm processes and implement message passing between them to get around this limitation? I imagine if you’re building something where 4 gigs of Ram is a limitation, it would also benefit from using multiple CPU cores and you’d have to do multi-process for that anyway.
So the performance win appears to be from relying on sandboxing to safely run everything in ring zero so they can bypass the overhead of system calls. Which is actually pretty cool, and depending on the workload could actually reasonably lead to performance being better than user space.
It’s always refreshing to see interest on WebAssembly and Wasmer projects!
This kernel project is a bit outdated since is not using the latest Wasmer API (3.1), but it would be great to see community can pick it up and move it further.
By making Wasm run in the kernel we were able to get really great runtime systemcall speeds since the program has not to pay any costs for crossing kernel protection rings
90% accurate. If you are on a single core system, you might avoid a lot of syscalls, yes. But the kernel is still going to be context switching your work out to actually do the io.
Sure. I guess the point is that the kernel still does work.
There's no official kernel mechanism where IO is entirely done in userland (Like Intel's DPDK and SPDK). The kernel will be context switching & doing IO. Single core was just an extreme example to make that extremely visible & obvious, but the core point is the kernel needs time too during io_uring, whatever the core count.
So long as you have multiple cores, you can make it so that userland and the kernel are on different cores with nothing else on them. You can also disable interrupts and just spin continuously both the user and kernel thread.
Are you proposing giving up a core or more of your system to kernel- and moving your data between cores to ship it!- as a win? There's some scenarios where I can picture that being a win, where ultra-low latency compute is key but higher latency is fine- but this technical possibility doesnt general excite me.
We may technically be at 100% no context switching, but we did kill 1/8th of our cores or whatever & stress the fabric a lot more, just to satisfy a technical constraint we set up that sounds good but is actually foolish to ask for.
Going totally userland is probably smarter. Go all in on SPDK or DPDK (storage/networking). You'll need dedicated networking for the app. In general though, I think your top post was 90% right, and that's good enough, and right now hunting for 100% is a mis-goal.
Using DPDK not only requires dedicating one core to it, but also very often dedicating the NIC to your app. It's also a very heavyweight and complex framework, and disables all kinds of security.
io_uring is a compromise with a lot of advantages over that, and can be configured in different ways depending on how hard you want to optimize things.
If you give up external cores, and make your fabric soak the traffic. The first alone is probably in almost all cases worse than using it and allowing context switching, then it gets worse.
This is a parade of shitty disingenine crappy posting, that favors pathetic minor technicalities over the main use & actuality. Shame on you. I was trying to inform a little bit while 90% agreeing & you have brought confusion & misdirection at every turn. Seriously bad mojo dude, just awful. Almost no users will experience what you describe. It's still phenomenally good & great, excellent, a world better, but you are still using a 0.001% case, a technical possibility, to say you arent wrong, while admitting nothing, no caveats, to the 99.999% who wont experience what you are promosing. Kill your insane ego & be reasonable, think of all the people you are misleading just a little bit.
If you seriously think io_uring completely supplants any and all reason for projects like kernel-wasm, I continue to disagree. I'd agree that io_uring captures a huge amount of the potential value, makes things much better. But there's still barriers that get crossed in io_uring & considering other options, in my humble view, is interesting. I have seen zero willingness on your side to entertain any such possibility.
You're the one derailing the thread. I made a factual statement that is true. You argued it wasn't. I explained how it was true. Then you keep insisting with "but [...]" that are all irrelevant to the problem that this solution addresses.
Being in the kernel alone doesn't remove interrupt-driven context switching, or being scheduled out by other kernel threads. Those are all completely orthogonal concerns, which can all be fixed without needing to put more user code in the kernel.
The single threaded example I made is exactly the counter I made to this kind of narrow-minded disregard. Yeah there are interrupts happening. But whether io has to bridge the user/kernel barrier is still the main question here, and io_uring reduces but emphatically does not free us from that, it simply defers/batches/reduces the number of sys-calls. You seem unwilling in extreme to recognize this, and I don't get why you falsify your statements again and again to resist this, with your reliance on 0.001% technical possibilities almost no user would experience to justify your essentially irrational & misleading claims.
You still haven't shown any compromise. I think almost no one would agree with you. io_uring is great, but as much as you dodge around the fact, there's still a kernel/user barrier, and there are systems like kernel-wasm or DPDK that keep processing on one side or the other, and as much as you for some asinine reason or simply personal failing, you are unable to admit that obvious clear & essential advantage.
> I didn't. io_uring allows to remove context switching entirely. That is a factual statement.
If you use io_uring, there is more kernel work than there otherwise would be. Work must transit the user/kernel barrier. And this has a cost, which can be avoided by other schemes. Simple as that. Fact. Stop throwing smoke bombs & be real. Can you show even any recognition what-so-ever, any attempt to acknowledge a single statement I've made (as I have done yours), rather than blow up every single thing I've said? It seems not. This seems like enormously bad faith posting, not done in the spirit of finding out & discussing.
Yeah, basically by bypassing the OS VM memory switching/context-switch with their own brand of memory-switching/context-switching, but with one notable difference: stackless internal representation (IR) bytecodes.
Can we open up a new class of vulnerabilities in Mitre?
We already have things like io_uring to achieve this speedup.
As usual, this benchmark is comparing against an extremely naive implementation. It even uses a non-optimizing compiler!
A fair benchmark would take advantage of all the features that native code has available - huge pages, CPU acceleration, io_uring, and, most obviously, a compiler that supports -O2.
If they did that, the native code would smoke wasm.
To be fair, almost any JIT VM is capable of being "faster than native" (by which is meant pure AoT) in some circumstances because it has more information on the environment and circumstances of execution.
And faster than "native" means faster than userspace...
And by faster they mean IO performance which is fine in principle but generally people think about compute when they care about the performance overhead of WASM because there is nothing about it that should make IO slower than native but everything about it makes compute slower than natively compiled programs.
I'm surprised no one is mentioning eBPF. I know the goals are slightly different but there seems to be a lot of overlap, e.g. would we really want both runtimes in the kernel?
I was thinking that these comments about death of JS in this thread which is about running in OS kernel mode not inside the browser are coming from ChatGPT type bots or the users are product managers with no clue.
Sorry but respectfully, you might want to research things before jumping to conclusions. WebAssembly doesn’t have anything to do with a web browser. It’s literally just a virtual machine standard. The standard was written with web browsers in mind (i.e., secure sandboxes within the browser environment), but nothing about WebAssembly’s semantics and runtime environment requires a web browser.
When I put webassembly into google the very first result has this to say:
> WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine.
Yes, browsers implement this nowadays but there are other implementations as well. You can compile them into llvm bytecode for example or there's another for the GraalVM etc
WASM isn't really "native". A C/Rust/etc compiler can target it, but it's very much a non-native ISA, with its own "syscall equivalents", its own limitations, etc.
But it doesn't have to be run in a runtime; it can compile directly to a platform-native binary. Internally it has limitations, as any sandboxed code would, but it lets you take native code and compile it to a native binary with sandboxing built-in
The reason why it doesn't quite work out is that typically such a thing, AOT-compiled into native code, still carries with it its limitations and assumptions. It's technically native, but experiencing the native APIs only through a narrow peephole in
the fence.
If we let this meaning of "native" take over, we're gonna need another word for actually native things.
That's fair, although I guess "having a runtime" can also refer to things like running a garbage collector (which WASM doesn't but Python does, and even Go does), even if it's part of the same binary. But we're entering the territory where these terms start to get poorly-defined I think (new tech is pushing the boundaries!)
Subjectively I think WASM's feature and non-feature list makes it plausibly kernel-friendly in a way that eg. Python isn't, but there may not be a concrete line to draw around that