Hacker News new | past | comments | ask | show | jobs | submit login
Why does WebAssembly need the relooper algorithm, instead of gotos? (2019) (troubles.md)
144 points by tbodt on Feb 23, 2020 | hide | past | favorite | 54 comments



"There is one thing that may have been a major factor to the decision not to adopt arbitrary CFGs for WebAssembly control flow. I believe that V8 is an exception to most compilers in that it doesn’t represent code as a CFG at all - it maintains roughly JS-compatible control flow all the way from front-end to codegen. In order to support this V8 would have to be converted to use CFGs, they’d have to implement something like Relooper internally, or they’d have to write a new WebAssembly runtime from scratch. Google was and is a major roadblock to getting this implemented as they have multiple people on the committee in charge of WebAssembly who have veto power."


> they have multiple people on the committee in charge of WebAssembly who have veto power

Note that even if Google did not have people on that committee, they'd just not implement the feature in their browser, and the feature would be dead in the water. Outside of niche cases where browser choice is controlled, no web dev would adopt features that > 70% of users can't use. So the committee only reflects power dynamics that exist in the greater browser sphere. In fact, non-Google browsers probably have more seats than they have influence by raw market share.


> So the committee only reflects power dynamics that exist in the greater browser sphere.

It doesn't only reflect them, but also amplifies them. Quietly nobbling things in committee is surely a lot easier than having to field questions about why you aren't implementing Web standards (insert quotation marks as desired). There's also the danger of being embarrassed when someone writes a library which runs slowly on your browser and fast on others.


And they already had PNaCL, which was literally vetoed by the other browser vendors, politics.


Calling it “politics” is a disservice. PNaCL was closely tied to Chrome (+ PPAPI), but it paved the way for WASM. Mozilla’s asm.js was a different step leading in the same direction.


Yes it was politics, it was open source and others were invited to contribute, instead Mozilla came up with convoluted asm.js as alternative.

And it remains to be seen when WASM will reach parity with PNaCL features, most likely 5 more years or so.

It is incredible how politics slow down innovation.


Eh, PNaCl had its own share of problems (the only feature it had even over asm.js was pthread-style threading, but performance was about even and first time upstart speed for a new code blob was a lot slower both than asm.js and Wasm).

But as the "whole developer package", PNaCl quickly fell behind emscripten:

The whole feature area of interacting with the Javascript and web-API side was basically non-existant, or at best difficult (it felt more like JNI in the Android SDK/NDK), while emscripten treated this as important feature to get right from the start.

Emscripten realized early how important it is to support porting existing code bases and added shims for quite a few popular cross-platform APIs.

Working with the PNaCl SDK was just barely better than the Android NDK (which is pretty much the yardstick for the worst developer experience possible), and development just wasn't as fast and "pragmatic" than emscripten and asm.js (despite emscripten basically being a "one man show" for a long time).

TL;DR: PNaCl nostalgia considered harmful ;)


WebAssembly hype is also considered harmful.

"Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code"

https://www.usenix.org/system/files/atc19-jangda.pdf

SIMD in PNaCL

https://developer.chrome.com/native-client/reference/pnacl-c...

Currently at phase 3 on WebAssembly

GC in PNaCL,

https://chromium.googlesource.com/native_client/pnacl-llvm/+...

Currently at phase 1, on hold due to reference types, and still remains to be seen how to proceed.

Tail Calls in PNaCL

https://developer.chrome.com/native-client/reference/pnacl-b...

Currently at phase 3.

WebAssembly is so much better than the migration documentation from PNaCL into WebAssembly is full of "Not Implemented", "In future HTML 5 API", "Being worked on", ....

https://developer.chrome.com/native-client/migration

The irony of all of this, it that Chrome and Safari (mostly due to iOS) are what most developers nowadays care about, so we might end up with WebAssembly that works best on Chrome, coming full circle.

And the current WebAssembly development experience is also not something to be proud of, see Ashley Williams talk at WebAssembly Summit.

I do agree about the sore state of Android NDK, despite many improvements in the last 10 years, to the point that I think by making it such an unbearable experience to use C and C++, it is one of the mechanisms Google uses to improve Android's security.


I actually agree that WebAssembly is currently overhyped ;) But I guess that's one of the side effects of being a 'web technology'.

(e.g. the difference of code size and performance to plain old asm.js isn't by far as big as many people think).

> such an unbearable experience to use C and C++, it is one of the mechanisms Google uses to improve Android's security.

...as if Android has a spectacular security track record vs iOS ;)


I guess many just aren't paying attention to iOS land:

https://support.apple.com/en-us/HT210918


WebAssembly makes a number of strange choices that seemingly fly in the face of convention; this is one of them. It would be really nice if it someone took some time it make a FAQ on the website to answer questions like “why doesn’t WebAssembly do x like everything else”, because otherwise it just seems that they’re just doing things differently because they can…


A lot of those bad WebAssembly decisions are carried over from Emscripten/Asm.js and don't have any real rationale behind them beyond that.

Obviously, since JS doesn't support arbitrary control flow, Emscripten/Asm.js would need to convert to some simplified form of control flow (although not necessarily fully structured). It seems that this solution was carried over to WebAssembly without much deep thought given.

There does seem to be some backwards association with Java's bytecode verification complexity issues (which are the result of a bad design, not unstructured control flow). In the big GitHub Issue (https://github.com/WebAssembly/design/issues/796) on the topic, one of the core V8 developers even claims that the Java experience justifies the WebAssembly decision. However, the funclets proposal for WebAssembly (https://github.com/WebAssembly/funclets/blob/master/proposal...) shows that you can support arbitrary control flow with linear time verification if you choose a different design than Java.


> It seems that this solution was carried over to WebAssembly without much deep thought given.

That's definitely not true - a lot of thought was given at the time to this. I'm not saying the optimal conclusion was arrived at (nor am I saying the opposite), but it definitely wasn't for lack of effort and consideration.

I don't remember all the details, but some of the factors were:

* Experience with structured control flow in compilers like Emscripten, Mandreel, and Cheerp, that showed it's pretty easy to re-structure a CFG in the toolchain, and there aren't significant downsides when the VM receives that output.

* VM people concerned about the efficiency and complexity of non-structured control flow, and preferring the structural approach that has known simple ways to construct SSA etc. (see titzer's comment for more details)

* Not having a proposal like funclets on the table. (The idea for that came much later.)


WASM is a typed bytecode format where the stack state must be validatable at compile-time, guaranteeing that no validated program can mess up the stack at runtime. Having unstructured control flow would make it pretty much impossible to do that, wouldn't it?


To get linear time validation, you need conditions at each control flow transfer that can be checked locally but together imply that stack usage in the function is well-formed.

The funclet approach (in more ordinary terminology) is to ensure that the stack difference from function invocation upon entry to a basic block is the same across entries to that basic block, and that those additional stack entries all have the same (statically specified) type in each invocation. This is very natural when generating code from an SSA optimizer.

Java bytecode traditionally didn't contain any of this branch target or type information, and required the bytecode verifier to do an iterative combined control/data-flow analysis. Even now that they added some of that type information with stack maps, it's still more complicated than it needs to be due to the historical legacy of this approach.


Nah. I made a low-level VM in 2001 similar to this funclets proposal cwzwarich linked to (Hi, Cameron), and verification was a fast local check. Each funclet had a signature, you collect the signatures before you start verifying the code, and then track changes to the stack as you walk through the code, using the signatures for the effects of any calls you encounter, and finally insisting that the funclet match its declared effect. Control flow within a funclet is reducible. (I didn't call them funclets, but it sounds like the same basic scheme on glancing through. One simplification: my VM could loop only through the tail calls.)


Not impossible, as in the JVM there'd be a pass which verifies that the stack usage is deterministic along all program paths. It's nice not having to do that verifying step, but stack usage still has to be checked anyways, so whether it's actually a win is debatable


Completely unstructured, yes. However, there are many stack-based VMs with both gotos and verification. Generally speaking, the constraint is that for any label that is a jump target, the stack must have the same count and type of elements for all possible origins. It's not hard to verify.


There are other ones that can't reasonably be because of legacy, though; for example, the lack of memory permissions.


By memory permissions, do you mean setting page-level permissions, like mprotect on Unix? I’d love to see WebAssembly add support for those, along with mmap-like functionality. But there at least, unlike with control flow, there are some fundamental portability issues to consider. The only performant way to implement page-level permissions is using the native MMU. But different targets have different page sizes: most Unix systems have 4kb pages, but iOS has 16kb pages; Windows has 4kb pages which can be protected individually, but separate memory mappings can only be on 64k aligned boundaries; and Linux can be configured with multiple page size options. And some little microcontrollers don’t have an MMU at all, yet there are people trying to run WebAssembly on microcontrollers for some strange reason.


> By memory permissions, do you mean setting page-level permissions, like mprotect on Unix?

Yes, exactly.

> But different targets have different page sizes: most Unix systems have 4kb pages, but iOS has 16kb pages; Windows has 4kb pages which can be protected individually, but separate memory mappings can only be on 64k aligned boundaries; and Linux can be configured with multiple page size options.

I haven't looked into it much, but perhaps WASI could abstract some of this, or make this information available to applications?

> And some little microcontrollers don’t have an MMU at all, yet there are people trying to run WebAssembly on microcontrollers for some strange reason.

It might be a little bit overhyped ;)


I assume that important design criteria is to enable slim and fast verification and single-pass WASM to SSA or JIT backends and native compilers. To do that you want reducible flow graph.

This will become more important especially if WebAssembly is moving beyond the browser.


META: I don't understand why some authors go out of the way to obscure important pieces of information:

* Who wrote this article?

* When was this written?

Was lucky to find this pieces of information in Github [1].

1: https://github.com/Vurich/troubles.md/blob/master/content/po...


It's a common problem with news articles too - you get linked something and you don't know whether it relates to a current event or something interesting 3 years ago.


Something this article doesn't appear to spell out is that the structured format WASM is constrained to means that program CFGs will never contain irreducible loops. As noted, this approach can make it harder for the AoT compilers (they have to untangle the knots to generate valid WASM bytecode), but it makes it much easier for the JIT compilers to perform analysis and transformations at runtime. In that context, it seems like a sensible engineering decision.


The article says about the Stackifier algorithm: "no link for the latter, it’s only mentioned by this name in LLVM internal discussions".

Since the release of this article, I wrote a blog post explaining how Stackifier works, for those who are interested:

https://medium.com/leaningtech/solving-the-structured-contro...


Thanks! I used your article to successfully implement Stackifier.


I am glad it helped somebody :)


sigh

I am partly responsible for why WebAssembly is this way. You can thank/blame me for the if-else bytecodes. They are indeed, a form of compression, as they don't add expressive power. I measured carefully and they make a big difference in code size. That's why they're there!

The structured control flow requirement is to benefit all consumers. It is only a burden on producers that come from CFGs, not from ASTs and other tree-like IRs. If you have a dominator tree, then you can generate structured control flow in linear time. LLVM does this a particular way, but there are straightforward algorithms.

No, this wasn't Google throwing around its veto power or something like that. There is a good reason why control flow is structured, as hinted in comments here.

1. Structured control flow rules out irreducible loops. Irreducible loops cause problems for all JIT compilers in all browser engines, not just V8, and even in JVMs. Things get really complicated, particularly in register allocation. [1]

2. Structured control flow guarantees a stack discipline for the use of labels, mirroring the stack discipline for values. This is not only a nice symmetry, it means that a consumer that needs to allocate space per label can reuse that space as soon as a control construct is closed. That is essentially optimal for use of consumer space resources.

[1] No kidding. If you have an irreducible loop in Java bytecode, which is possible, you will never be JITed and will get stuck running 100x slower in the interpreter. We thought this through very carefully in V8. If you allow irreducible loops in Wasm, you force all engines to either stick to their lowest execution tier and run 2-100x slower, do relooping themselves, or handle the general case of irreducible loops spending multiple person-years complicating their optimizing tiers' backends for a case that is incredibly rare (and probably introducing lots of bugs). In V8 we would have probably gone for the relooper option because the other two options are bad. So that's a lose, because now the engine is more complicated, doing a rewrite of the code that could as well be done better and more efficiently offline by a producer. And there is no benefit because the engine's code would be no better than what the producer would have come up with. So we'd choose the lesser of the complexity options, but get no performance benefit, in order to avoid the absurdly bad performance hit of not being able to use the optimizing tier. Bad tradeoff now matter how you slice it, IMHO.

I am fully convinced we made the right choice here.

We should have communicated better and the relooper algorithm and tools should textbook, off-the-shelf stuff.


It's bizarre to me that there are complaints about the if-else bytecodes. You can call them "weird" but they're very easy to reason about, make it easy to write toy examples, are easier to generate from a compiler, and are easier to read and understand in disassembly. At the point where the author started talking about customized compression to recover the size gains they should have realized why the bytecodes exist! Early on in the spec process it was very, very useful to have them.

Anyone questioning whether the lack of goto was the result of an invocation of veto power can look at the design repo and see that lots of control flow consideration and discussion happened in the open before the group finally agreed upon a solution. Many of the players involved were not Google employees at the time of the decision (I don't know if they are now):

https://github.com/WebAssembly/design/issues/33

https://github.com/WebAssembly/design/issues/44

I definitely recall that people had disagreements about how control flow should work and had different goals or priorities but it was a pretty detailed and drawn-out decision-making process. I don't think it was really possible for everyone to walk away from the table happy.


As context, at the time of those issues, the control-flow restrictions were a compromise to help get the "MVP" off the ground, with the understanding that "more expressive control flow" was expected to be added later:

https://github.com/WebAssembly/design/blob/master/FutureFeat...


Unfortunately, we don't actually have data which supports this. The only data that was collected at the time showed that if/else compress better than what wasm has without if/else. There are other ways we could have compressed control flow, but we didn't do the experiments.


> If you have an irreducible loop in Java bytecode, which is possible, you will never be JITed and will get stuck running 100x slower in the interpreter.

The JVM doesn't JIT any irreducible loops?


As far as I understood, the Java language and many other high-level language can't express such loops (no arbitrary gotos), so JITing them would be a specific optimization for otherwise-crafted bytecode. Also, it probably doesn't happen often for bytecode generation libraries.

So basically a lot of potential for bugs to optimize a super-rare case that could also be solved instead by whoever produced that code.


Interestingly I came up with a similar algorithm like Relooper when I translated C to Python code. First I translate all C code to mostly equivalent Python code but including some gotos, and then I get rid of those gotos by introducing some more loops. The current solution is simple but the resulting function could be slow.

https://github.com/albertz/PyCParser/blob/080a7b0472444fd366...


Could someone reduce the GOTO complaints to the arguments why it is harmful. (Preferably without the considerations?)


The article cites Dijkstra's "considered harmful" essay, which asserts that "goto" should be removed from "higher level" languages, not "machine language".

Without reading the remainder of the article, this would seem to undermine any further assertions.

UPDATE: after reading the article, it seems the opposite is being shown. my bad.


It’s brought up to prove the opposite. I would suggest reading it through, it’s quite good.


Yeah, I figured that might be happening, once I started reading further. Blame the reader.


Unpopular opinion: GraalVM is what should have been webassembly.

Reasons: * True polyglotism * existing, feature complete implementation * compatibility with the existing JVM platform, and through polyglotism compatibility with almost any platform/library.


GraalVM doesn't solve the same problems as WebAssembly- it has no stable, portable format for those polyglot program definitions.

GraalVM is certainly something you could use as part of a WebAssembly implementation, though.


how is Graal more polyglot than wasm?! Does it support low level languages as well? Besides, it's only one implementation (controlled by Oracle) and will probably remain so forever — who would want to risk a lawsuit by reimplementing it, I wonder.

wasm is already in all major browsers. It's succeeding where no JVM-based approach ever could.


GraalVM does support low level languages, and that's one of the most impressive things about it. For example, TruffleRuby runs both Ruby and its associated C extensions on it, bringing major performance improvements due to sharing an optimizer.


Graal is licensed under the GPLv2. It's free software!


GPLv2 isn't free enough to be used in a commercial browser.


Nitpick: I assume you meant "GPLv2 isn't free enough to be used in a proprietary browser".


Which is why BSD has been such a commercial success and now rules the UNIX clones.


Is there any patent grant to go with it?


WebAssembly exists because there was an incremental path from JS to asm.js to wasm. Better solution are irrelevant unless there's a similar kind of path to them.


a single existing implementation is a problem, not a feature for the web platform


You can always rewrite a clone if throwing money is your finality.


Sounds like much more work than implementing a wasm vm+runtime. Besides, isn’t it owned by a company who is notoriously litigious against reimplementations?


GraalVM has started getting wasm support, so there is still a path forward.


WASM support as in "you can run stuff you can run in GraalVM on WASM" or as in "you can run WASM on GraalVM"? I assume the latter, which is, frankly, relatively unimpressive. It would have been pretty amazing to be able to run everything available in Graal in the browser - but yep, Graal is not a standard, and I'm not sure it could ever be put into one (because of the insane complexity and implementation effort I suspect in there). Would still have been cool, though, one can dream.

WASM will still end up being a pretty universal platform however, with it being everyone's compilation target. A lot like Java, but maybe without some of the mistakes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: