> I believe that's not even using the streaming compilation mentioned in the article
That's correct. Streaming compilation would finish earlier, but might actually benchmark more slowly because you'd be adding in the time that the compiler is idle and waiting for the network to catch up.
Preloading the .wasm file in the test lets us measure just the speed of the compiler, independent of the network.
An interesting benchmark would be the total energy usage of the CPU while loading a page in each browser. The compiler managing to idle waiting for network packets should theoretically allow some of the CPU cores to enter sleep states. (Not all, since others are still busy rendering the page, and at least one is busy doing whatever non-DMA kernel bits are involved with receiving the network packets.)
It should be the case that if the same amount of work is done, then the energy used will be the same. If it takes less work to compile the web assembly, then less energy (holding all other parameters the same). If you have to idle a CPU, then you probably use more energy (holding all other parameters the same) i.e. because you will spend more time and accomplish the same amount of real work (but waste energy on the idled core, albeit waste very little, accomplishing extra, but non-productive, work). Cannot let some other CPU parameter changes as a result of cores being idled (e.g. frequency gets boosted on non-idle cores as a result of dynamic frequency scaling with idle cores) to run this experiment. Thinking about CPU energy use is interesting :)
The real killer with energy usage in browsers is idle wake ups per second. If you've got a lot of tabs open and they're all running timers, waking up, hitting the network, etc. then they're keeping the CPU from going into low power state and thus wasting a lot of energy.
Even though I'd prefer to use Firefox I tend to stick with Safari due to the battery life advantage which really shows when you open a lot of tabs.
That's for max frequency on a given process node. Power scales with voltage squared. But that doesn't say anything about wasted power. And dynamic scaling screws that up in modern chips.
I believe I could summarize things by saying the only way you can really save energy* doing the same work+ is by using a different semiconductor process (either power/leakage-reduction-focused or smaller).
* For serious values of "energy"
+ Where the same work is not always true for a given task, if one optimizes an algorithm
For chips, power scales with voltage squared. Is also true that P=IV (since both are true, these observations cannot be in contradiction). Apparently, for chips, the current must be proportional to voltage also. Glossing over some details, turning on (off) a transistor is the same as charging (discharging) a capacitor. The energy stored on a capacitor is 1/2 C V^2. If you turn on and off the transistor periodically (say with frequency f) you use 1/2 C V^2 energy f times per second (energy per unit time is power). Normally the capacitance is ignored when discussing how power changes because for a given design the capacitance is a fixed quantity.
I think it is linear for frequency and non-linear for voltage, i.e. P~fCV^2. But in many current CPUS, the feature that adjusts frequency also adjusts voltage. That's why I stipulated that, for my comments to be true, such shenanigans as dynamic frequency (and voltage) scaling must be "turned off." I think the OP was asking, what happens to CPU energy if you load the web page with and without the optimized compilation. The OP was interested in core sleep states, but I think that dynamic frequency scaling is a confounding factor. It would be interesting to see the measurements w/ and w/out that feature perhaps.
To increase the frequency, you also have to increase the voltage so that the transistors charge faster, otherwise they won't be able to switch in the shorter time.
This is on MacOS 10.13.2. I'd love to run Firefox but the battery savings and reduced heat from using Safari makes it too hard to pass up in this regard.
A two-tier JIT. Interesting to see tiered JIT compilation catch on the way it has. I seem to remember a few years ago reading that the Java HotSpot team had given up on tiered JIT compilation as being not worthwhile.
How far we've come. A whirlwind tour of todays JITs (apologies for the million links):
The only downside I'm aware of is that it increases the pressure on the code cache. If your code cache is not large enough, it will thrash as methods are discarded then recompiled. We had significant performance problems with a server and it took quite awhile until we realized that was the cause. A cache of 256 mb was more than enough for us running a 2 million LOC monolith under Tomcat, so the absolute memory use isn't that significant. (Reference we found while researching: http://engineering.indeedblog.com/blog/2016/09/job-search-we...).
Once you know this is an issue, it's easy to monitor, but it is one more thing that can go wrong in the JVM.
Oops, I wasn't clear. I'd meant that, if I recall correctly, the HotSpot team initially experimented with combining the 'client' and 'server' JITs for tiered compilation, but decided it was a lot of complexity for little gain, and didn't commit.
Only a couple years later did they re-attempt it and stick with it.
I could be mistaken here, and I wasn't able to find anything online to support me.
.NET focus always was native code, either AOT with NGEN or JIT on load.
The only variants of .NET with interpreter support were from 3rd party implementations, and the .NET Micro Framework, used in NETduino.
And now their focus seems to be to improve their AOT story.
Another interesting evolution was Android, with Dalvik and its basic JIT, ART with AOT on installation, to ART reboot with an interpreter in Assembly, followed by JIT and AOT code cache with PGO.
Android optimizes for battery life, but it's also worth noting that Dalvik was a really rudimentary JIT, having no benefits from JIT compilation, only drawbacks, the ART with AOT being a good upgrade.
But tiered compilation is in a different league, being about speculating what's going to happen depending on what the process has witnessed thus far. The point of tiered compilation is to profile/guard stuff at runtime and recompile pieces of code based on changing conditions, which is how you can optimize virtual call sites or other dynamic pieces, which you can't do ahead of time because the missing part is the decompiler which can revert optimizations based on invalidated conditions.
It's really interesting actually, because you can profile a C++ app and use that to optimize your AOT compilation, but the compiler is still limited by the things it can prove ahead of time, or otherwise it would be memory unsafe.
I wrote "And now their focus seems to be to improve their AOT story.", I didn't say anything about .NET Core.
Should have been more explicit, as I was referring to CoreRT and .NET Native.
> But tiered compilation is in a different league, being about speculating what's going to happen depending on what the process has witnessed thus far.
Just as ART was refactored on Android 7 and 8. ART with pure AOT is only for Android 5 and 6.
Is it actually a JIT? It's just compiling everything unconditionally. I guess the fact that the second tier replaces previously compiled functions with more optimized versions makes it a JIT? Or does the definition of JIT require recompiling in response to information about which code would benefit most?
> Is it actually a JIT? It's just compiling everything unconditionally.
Still counts as JIT in my book, but you're right that it's a bit subtle.
Unix-style configure/build/install isn't considered JIT.
Installing a .Net application is pretty similar, but we don't consider it JIT.
In the usual .Net model, what's distributed is IR rather than source-code. Compilation to native code happens at install time. The build-and-install process is less explicit than the Unix way, and it's less error-prone (fewer dependency issues and issues with the compiler not liking your source code).
Really it's a very similar model to the Unix one, but we call one JIT and not the other.
Oracle Java, of course, only ever compiles to native code at runtime, and never caches native code. 'Proper' JIT. (This may be set to change in the near future though.)
Interestingly, .Net seems to be moving in the direction of full static compilation, or they wouldn't be asking devs to rebuild UWP apps to incorporate framework fixes - https://aka.ms/sqfj4h/
It might be fun to make a source-based distribution where every binary in /usr/bin started off as a link to a script that built and installed the requested executable (over the top of the link), before executing it.
Or how about a fuser filesystem on Linux to do the same? That sounds like an interesting idea. Just don't make the mistake of accidentally typing some obscenely large binary like firefox, chrome, or clang...
I think it would need to be integrated into the package management system pretty tightly (or have one of its own) to get all of the shared library dependencies.
Do you think for something to be a JIT it must only compile code immediately before it's used?
In that case the only real JIT I know of is basic-block-versioning. I think almost all JITs will compile branches or methods to some extent before they are actually needed.
Yours is probably not a reasonable definition therefore. I think a JIT is just a compiler that can compile as the program is running.
> Do you think for something to be a JIT it must only compile code immediately before it's used?
I mean, that's more or less what the name "just-in-time compiler" implies. I'm aware that the name is not necessarily a precise definition, but I'm not sure how far the definition stretches. Does JIT have a precise agreed-upon definition, or is it somewhat more vaguely defined?
No these terms never have precise meanings, and trying to debate them too much doesn't achieve much. But if your definition doesn't actually work for any examples of the thing you're defining at all except one then it's probably wrong.
Ok, fair enough. I was worried that there was some precise definition I had missed, but if that's not the case, I agree there's no point in debating it.
Well, there's at least one definition that's pretty noncontroversial, if not terribly satisfying or precise: it's not a JIT if you compile well in advance of any indication the program needs to be run.
Whether that lazy-compilation strategy is fine-grained or not isn't clearcut, I believe. I think if you distribute a C program with a bash bootstrapper calling plain old gcc to compile and run the C code only when needed, even gcc might be considered a (coarse-grained, rather rudimentary) JIT in that context.
To me it's a JIT if the compiler is needed to run the code.
If it means compile on start, it still requires the compiler to be used at load time.
Non JIT would mean you can distribute the code without the compiler. If you can't do that, its JITTED or interpreted, if instead of requiring a compiler to be present you require an interpreter.
Unsurprising, considering that browsers load code on demand and despite the original vision for Java, the JVM and CRL tend to be used for apps for which it's acceptable to have slow startup time.
Although, as always with articles on WebAssembly, it keeps repeating that wasm is faster than JavaScript, without ever mentioning the limitations of wasm wrt. JS (no GC, no interaction with the DOM or with JS libraries besides numbers, etc.). And that means there are zillions of developers who keep being misled in thinking stuff like "Why don't you compile to wasm to make your stuff faster?". That includes absurdities like "We should write a compiler from JavaScript to wasm to make all our JS faster!"
No one is saying you should do your whole application in WASM though, it's really just like native extensions in any dynamic language: of course, you're not going to get GC or interaction with dynamic parts of the language in your extension, but the reason might be:
- libraries written in another language, such as SQL.js
- hot spots of an application that can benefit from fast number crunching (e.g., gaming, visualization)
- truly cross-platform at native performance
etc.
I don't think anyone serious enough to use WASM in their application is making the assumption of using wasm will make all your stuff faster. It won't. It's just another performance tool, with its benefits subject to performance methodologies.
Sidebar: Google Doc is an interesting application in this perspective, given they render the entire application in a canvas, and the application itself is probably not written in JS. I'm excited what the future holds for tools like Google Doc.
I understand all that. I have been writing compilers for more than 10 years, including the compiler of Scala.js. I am not criticizing wasm nor its benefits.
What I profoundly dislike is that such good articles about wasm, written by excellent technical people, all silently ignore that. I am absolutely certain that the authors know all about it, but they don't mention that to their audience, which, for the majority, doesn't know. Therefore, that very silence (not just glossing over, but actual silence) brings misinformation to the masses.
I don't think that's absurd at all. WebASM really should be the ASM of the web, everything should be compiled to it. A little pre-compilation of JS to WebASM makes sense to me
As far as not making sense goes, sorry but your post doesn't make any to me, either.
Why would we possibly return to a manual memory management, raw pointer oriented, assembly language level of abstraction from the much richer and safer abstraction that JS already has? Wasm doesn't even have any notion of Characters or Strings! You really want to return to the days of each project having their own String libraries, because it's all built on top of raw asm?
Webasm is not for JS-style code! You can't have "a little pre-compilation of JS to WebASM", that makes zero sense. We already have the incredibly complex JIT compilation of JS to x64/ARM/etc, which necessarily interacts with the garbage collector, type system, permissions & security, browser debugging/profiling tools, etc, all of which wasm does not have any notion.
Wasm is a raw C-ish sandbox environment for cross-compilation of static languages to expose their behavior to javascript, for straight-line CPU performance in number crunching as in games, software rendering, numerical analysis, compression/decompression, etc.
You don't really write wasm by hand. You use a higher level language that compiles to it. I wouldn't mind using go for the web, for example, if I wanted more performance, and it takes care of all the concerns you mentioned above for you...
> Why would we possibly return to a manual memory management, raw pointer oriented, assembly language level of abstraction from the much richer and safer abstraction that JS already has?
Just wait until the JVM and Flash Runtime are ported to wasm. Downloaded and compiled on every page load :).
>JavaScript, without ever mentioning the limitations of wasm wrt. JS (no GC, no interaction with the DOM or with JS libraries besides numbers, etc.).
DOM will die as soon as the industry moves to one or two good GUI toolkits that run under Webassembly and are way faster to use than the cumbersome present combination of HTML+CSS+CSS preprocessor+JS libs.
I'm nearly certain that this will not be the case. Once you reinvent everything that the DOM does, it's highly unlikely you'll end up faster than the DOM.
Everyone thinks that the rendering engines in browsers are easy to beat in terms of performance. I thought that too, until I implemented one. They are definitely beatable, but not easily, and certainly not with an architecture like that of Qt or GTK.
I'm not so sure. You don't need to reinvent everything that the DOM does as the DOM is is burdened down with all kinds of backwards compatibility concerns and conflicting design philosophies.
E.g., I don't think any sane design of a UI toolkit would include the ability to read and modify the string representation of the UI code at runtime - yet it's a critical feature for the DOM.
Likewise, you wouldn't necessarily need the ability to access and mutate arbitrary nodes of the document tree at any time. (including mutations that might change which CSS selectors apply to a node)
E.g., you could only expose higher-level widgets instead or only expose variables that feed into a template. That would allow optimisations which aren't possible with CSS and DOM.
Finally, a WASM toolkit would be shipped with a particular website anyway, so it wouldn't need to be general-purpose.
On the other hand, there is a great incentive for website operators to make their site into a single unparseable blob: Ad-blockers. If every site had it's own internal data representation and internal rendering engine, that would make it almost impossible for ad-blockers to modify certain parts of the site while leaving others intact.
> You don't need to reinvent everything that the DOM does as the DOM is is burdened down with all kinds of backwards compatibility concerns and conflicting design philosophies.
Those can largely be avoided, and they typically don't cause global performance impacts.
> E.g., I don't think any sane design of a UI toolkit would include the ability to read and modify the string representation of the UI code at runtime - yet it's a critical feature for the DOM.
That isn't a problem. innerHTML is lazily computed from the tree structure: if you don't use it, you don't pay for it.
> Likewise, you wouldn't necessarily need the ability to access and mutate arbitrary nodes of the document tree at any time. (including mutations that might change which CSS selectors apply to a node) E.g., you could only expose higher-level widgets instead or only expose variables that feed into a template.
The main benefit of this would be to eliminate restyling, but cascading is really useful from a design point of view. That's why we've seen native frameworks such as Qt and GTK+ move to style sheets. And if you reinvent restyling, it'll be a ton of work to do better—remember that Servo and Firefox Quantum have a parallel work-stealing implementation of it. I've never seen any native toolkit that even comes close to that amount of performance effort.
> That isn't a problem. innerHTML is lazily computed from the tree structure: if you don't use it, you don't pay for it.
I'm not paying for it, the DOM implementation is - with increased complexity. (E.g., HTML parsing suddenly becomes a time-critical operation because some wiseguy decided to implement animations for his website using setTimeout and innerHTML.)
And they can't drop it because a lot of sites rely on it - however, if you wrote a new, limited-purpose renderer on top of WASM, you could decide to drop it and simplify the implementation without losing much utility.
> And if you reinvent restyling, it'll be a ton of work to do better
But that's kind of my point - if you can control which parts of the tree are exposed and which mutations are valid, you might not need to implement restyling at all. (Or in reduced scope)
I'm not talking about cascading in general, but about how you can make arbitrary changes to the DOM after initial load, which the restyler has to fully support.
> I'm not paying for it, the DOM implementation is - with increased complexity. (E.g., HTML parsing suddenly becomes a time-critical operation because some wiseguy decided to implement animations for his website using setTimeout and innerHTML.)
We're talking about performance here, not implementation complexity. Besides, it's not a win in terms of complexity if sites ship a limited subset of the Web stack to run on top of the full implementation of the Web stack that's already there.
> But that's kind of my point - if you can control which parts of the tree are exposed and which mutations are valid, you might not need to implement restyling at all. (Or in reduced scope)
Sure, you can improve performance by removing useful features. But I think it'll be a hard sell to front-end developers. Qt and GTK+ didn't add style sheets and restyling for no reason. They added those features because developers demanded them.
My point is that writing custom UI renderers using canvas and WASM might become a reasonable thing to do. For that you don't need to stick to the web stack at all, you can invent whatever language, API and data model fits your needs. Those can be a lot simpler than the DOM and therefore easier to implement with good performance.
I don't think it's really possible to do much better than the next-gen browser architecture (Servo, fully-fleshed-out Quantum) if you support the entire feature set of browsers (once typed CSSOM is a thing, anyway). You can certainly do better in constrained environments, though. For example, Leo Meyerovich is doing really neat things with data viz, where all the elements to be laid out have the same shape and you can take advantage of that to do things like GPU layout.
But if you're making your own UI kit, couldn't you just eschew CSS and the like? I was under the impression that part of the reason browser rendering is such a gnarly process is because of the reflow issues that CSS and HTML layout quirks/changes can cause. I would assume that you can implement a couple of layouts that prevent those sort of pitfalls and thus speed up rendering...
Please correct me (you know a lot, and I'm betting some of my assumptions are wrong).
Sure, you could get rid of CSS, but in favor of what? You probably need something just like flexbox, and it's not easy to beat an optimized implementation of CSS flexbox in terms of layout performance (especially if parallelized). You could eliminate the restyling step by not having cascading and selector matching, but that hurts productivity and maintainability, which is why you see frameworks like GTK+ moving toward CSS-like styling. There's no free lunch here...
No, the DOM will stay for a long time, since CSS is actually a really great way to build UI-s.
Last time I checked C/C++ based UI libraries even text selection was a problem. If there were a cross platform way to build UI-s as good and feature rich as a modern browser is now, then it will slowly die.
That's the reason we have so many Electron based apps, because it makes UI building really simple.
There isn't anything about Electron that I feel it is simple over Delphi, WinForms, JavaFX, Android, Cocoa, Qt, XAML, other than being easier for those that grew with HTML/CSS.
Caveat: It's been a long time since I've used GUI toolkits like Gtk+ and QT, so this may be an out-of-date perspective.
What I think of GUI toolkits, I think of lots of imperative code to build out an interface, e.g., "Create a window. Add a vertical box layout. Create button1. Change button1.font to xxx. Change button1.style to bold. Set the minimum height of button1 to 20px. Add button1 to the box. Create button2. Add button2 to the box. Tell the box to grow button2 when it is resized. Create button 3..."
The declarative style of HTML/CSS seems so much better. The grouping of elements becomes apparent just by looking at how they are nested, with no need to keep track of what gets added to what. And CSS gives you a really rich ability to select groups of elements, style them, try out new styles, reuse styles across pages, and so on.
CSS has definitely gotten really complicated. But then, I could never build anything in a GUI toolkit without constantly referencing the API docs to figure out how to do this or that, either...
That certainly used to be the case, but look at something like QML for Qt, and it's typically far more declarative and succinct than (most) HTML, eg: http://doc.qt.io/qt-5/qmlfirststeps.html
I'm not a frontend whiz by any means, but I've always found the widget-centric (GUI) approach fit my mental model better than the HTML centric one.
SPA architectures help, but I find most HTML designers tend to prefer raw HTML to any composed/widget approaches.
CSS (as a concept) is actually quite great, which is why you've seen the older GUI approaches adopt it. Qt itself is also leading more towards a reactive approach where you interact with an abstract data model, and the UI reflects the updates.
> Usually imperative UI code tends to be a thing only among developers that dislike RAD tooling, or game devs using immediate mode UIs.
Actually I think that most game UIs (the menus, settings, inventories - not the actual game) are done quite well - maybe immediate mode GUIs have a place outside of gamedev?
We’re nowhere near the point where that would be even vaguely feasible on one crucial point: accessibility.
If you’re talking about rendering everything on a canvas, well, there’s been the occasional discussion about making it a11y-friendly, exposing content in it to screen readers and so forth, but nothing has really happened with it.
Your WebAssembly GUI toolkit is going to be completely invisible to screen readers.
Declarative UI systems that look a lot like HTML+CSS+JS, keep cropping up in other domains besides the web; I think, but for increasing freedom to replace the JS part with some other languages to specify behavior, the basic model will be around for a long time.
You can already replace dom with a canvas renderer. Flipchart does it with a react canvas renderer, but yet it doesn't take off, so why would wasm change that?
I'll do you another. If 2D screens ever cease being the main way of interacting with computers, something like a DOM-less WASM will take over consumer computing, and the DOM will get washed away in the process.
Moving from 2D to 3D does not imply that applications will all use immediate mode. And having some standard way of doing retained mode 3D (which is the major feature provided typical 3D game engines and there is no reason why DOM could not be at least partially used as the underlying data model) seems like one of the requirements for that shift to actually happen.
So in the end we really do end up in the alternate future where every website is just a single giant SWF file - except instead of Flash, it's WASM. Hooray! /s
You can import C (and many others) libraries without re-implementing it in JS for browsers. In my company's case, we compile C-written libopus almost directly to encode audio streams into Opus on browsers, not on servers. In this way we benefit much less data traffic and server CPU loads.
Yes. The C lib will be compiled into wasm. You can use the wasm-compiled code in a browser through JS wrappers. So you can say importing C lib on JS client side anyway.
Note that wasm's main objective is to run non-JS code on browsers, not for a faster JS.
I can't tell whether you're being sarcastic or not, so I will answer as if you are not.
Yes, it is absurd. Because your wasm that was compiled from JS needs to embed an entire implementation of the dynamic nature of JS. And to make all those dynamic features remotely fast, you cannot just compile them as is. You need to use a JIT to be able to perform speculative optimizations. But then, where's the JIT? Oh, it's built inside your wasm code. And basically you end up shipping a JS interpreter+compiler+JIT as part of your wasm, instead of just the .js code. Parsing and compiling all of that will be much, much worse than parsing the .js code and feed it to the already existing JS interpreter+compiler+JIT that is in the browser.
It’s really inaccurate to call that “JavaScript” any more. You’re talking about a subset of JS that would never need to perform a GC! Really just “the subset of JavaScript which also happens to be C”.
asm.js was not a well-behaved subset of JavaScript. It was a well-behaved subset of assembly that happened to be encoded in JavaScript.
There is virtually no human-written JS code that is amenable to compilation to wasm in a meaningful way. At the very least, you need a (mostly) sound type system to be able to compile to wasm with a positive expected ROI.
The speed of wasm comes in a large part from the fact that it is entirely statically typed, which means we don't need the speculative optimizations (and their deoptimization guards) all over the place.
Exactly, plus in a dynamic environment a tracing JIT can outperform AOT compiled code. This is true for both the JVM and .NET CLR, both mature platforms.
JS is a garbage-collected language. It would require reimplementing most of the runtime, or be stuck with code that does marshalling for everything and would possibly be slower than plain JS.
> Modern CPUs also don't have hardware GC support. Intel i432 was the last attempt at it.
Intel i432 was far from the last attempt. Besides all of the Lisp HW developed after it, Azul made CPUs in the 2000s with hardware support for GC. Acceleration of concurrent copying collection requires a surprisingly low amount of CPU support.
I had my history reversed regarding i432 vs Lisp HW.
You are right I also forgot about Azul, but eventually they dropped it, because it wasn't worthwhile anymore, just like it happened with all other specialized hardware implementations.
Azul (and AFAIK most lisp machines) does not do garbage collection in hardware. In these contexts the "GC HW" involves hardware acceleration of read/write barriers required by concurrent and incremental GCs (which otherwise requires the compiler/JIT to inline implementation of barriers into user code). The reason why Azul can now use stock amd64 CPUs is that they found that you can abuse the MMU to provide exactly this kind of HW accelerated barriers.
Out of curiosity, using just released versions of browsers on this 2015 mac pro:
Firefox 57: WebAssembly.instantiate took 2990.2ms (4.1mb/s)
Chrome 63: WebAssembly.instantiate took 8736.9ms (1.4mb/s)
Safari 11.0.2: WebAssembly.instantiate took 10341ms (1.2mb/s)
If more speed is about to arrive, wow.
I'm curious what optimisations are needed / valuable for wasm files to improve streaming performance. I'm assuming if, e.g.:
def foo(baz):
bar(baz)
...
def bar(baz):
baz = baz +1
Then compilation would start and get stuck until it had a definition for bar? If so, presumably the next build time optimisations for a website will be to shuffle the code around in to as optimal an order as possible so as to improve streaming compilation speed?
Function declarations are independent from function bodies. So think C/C++ headers/source file splits. You don't need to know what code is in bar if you know it takes 1 argument of type int and returns an int. That's all you needed to know how to call it successfully, so you can compile foo in your example perfectly fine. You just need to patch up the call location later when bar is resolved to actual address (this is the "link" step in a typical AOT compilation chain, or done by the loader if it's a dynamic dependency)
Considering the major optimization in compiling is inlining, knowing the function body is very important to compilation, but I guess that can be pushed off until the next tier.
WebAsm is an intermediate not a source language. Initial in-lining & other optimization have already been performed long before it hit your browser. There could potentially be a JIT or similar to do a secondary optimization pass in the browser if something is hot, but it's probably going to be largely considered a codegen issue rather than a runtime issue.
Don't forget about the local compilation phase from C/C++/Rust to WebAssembly before you ever hit the browser. At that point, LLVM is free to optimize and inline just like with any other binary target.
I imagine the first pass could inline functions that are already compiled and skip functions that have yet to be compiled. Maybe tools that generate wasm will start reordering the functions they send to allow optimal first-pass inlining.
I have similar, just slightly higher numbers on 2017 Mac Pro. What's baffling to me is that they are something like 30% lower than they are on my over 6 year old low end desktop with i3.
I'm not familiar with Web Assembly, but the recent trend is that as the downloads become faster, web performance in a vanilla browser becomes slower, because websites just send more stuff to you. Pages grow toward infinity. Also, if, like @sjrd mentioned, this code can't manipulate DOM or can use only a restricted set of JS objects, the where will the gain be? Is this intended to be used for number crunching code in the browser runtime? Help bitcoin miner scripts? What's the purpose then?
DOM manipulation is already on the cards, should be out soon. This is likely a separate sub-team of people just working on making it as fast as possible.
3) Stuff that into Electron and distribute to Mac/Linux/Windows
Why distribute the electron wrapped wasm on Windows instead of using the real native Windows app? It's more consistent this way! Single codebase! Developer efficiencies!
Small deviation - write a native linux or mac app (instead of targeting Windows for the initial app)... mostly because I feel like developing on those platforms is so much more enjoyable.
Good point. Linux is probably the easier one because you'd need to build your UI toolkit into it and we don't have the source for Cocoa or whatever this year's Windows UI toolkit is called.
I realize this is likely to be the most likely way native WASM apps are implemented because it's the most obvious to web developers, but I also think it's the wrong approach.
Webassembly.org's own docs mention that it's intended to be agnostic about its runtime environment[0]. Electron is for packaging HTML, CSS and JS into a "native" application, but WASM doesn't actually need that if it's running outside the web.
Why not a native runtime on top of a cross-platform library like SDL? Just because it's "Web Assembly" doesn't mean it has to be limited to webdev paradigms.
@sjrd is talking about limitations for JS not other languages.
C, C++, Rust and others already have their own DOM/JS support.
WASM is exciting for statically typed languages especially. JS is not the target. It might eventually benefit from faster parsing but that's not the motive now.
To wit, as described in their blog post:
https://blogs.windows.com/msedgedev/2017/04/20/improved-
Edge validates and compiles wasm code lazily. Thus, this simplistic benchmark isn't really measuring compile time on Edge. In contrast, Firefox, Chrome and Safari are doing some amount of AOT compilation before WebAssembly.instantiate() resolves.
Here's a stupid question, but is the result of the Firefox and Chrome "instantiate" the exact same? Is the compilation doing the same job, or one could be performing more optimizations? Aka faster compilation but slower execution.
Could someone explain how Edge is performing so well or any references to what they have done in this regard? Has the Edge team already implemented this streaming and tiering compiler technique?
You tell Edge, "compile this", Edge replies "done!" when what it has done is just verifying it's valid WASM. When you then call a function, it's compiled.
No offense to Yehuda in general (he is doing great work), but Ember.js so ignorant of any js-size recommendations, that it seems weird to quote Yehuda in that context.
does someone here, familiar with webassembly semantics, know if it’s theoretically possible to start streaming execution of code? I.e. as soon as the “main” (?) function is in, and block on every function call which is not yet compiled, recursively? Or could the last block of webassembly bytecode potentially change the semantics of the first?
Sooner or later, that’s an avenue people will want to explore, I assume?
Interesting article...I did not realize that the WASM needs to be compiled into machine code on the client system, I just assumed it would be directly interpreted by the JS engine.
As a side note, it is interesting to see that multithreaded compilation of a single page provides significant performance benefits here...this is usually not done with C/C++ code compilation from what I understand about it
Well, the difference between "interpreted" and "compiled" has become very blurry during the last 20 years. These days, most "interpreted" programming languages are actually compiled to machine code on the client system.
This includes the JVML, of course, but also JavaScript, Python (with PyPy), etc. PHP isn't quite there yet, but it's coming.
> As a side note, it is interesting to see that multithreaded compilation of a single page provides significant performance benefits here...this is usually not done with C/C++ code compilation from what I understand about it
It's slightly different, but native code is typically compiled concurrently, too. The meat of it is often handled by the build system rather than the compiler itself, but that's not so different.
Thanks to pressure from HHVM I assume. Nothing was happening in the PHP language for freaking three years.
To be fair the benchmarks usually take a wordpress
or drupal installation and do a requests per second measurement, which IMO is a real world benchmark.
No hate, I just don't get why hhvm doesn't get any love for what they did. Maybe because from HPHPc to HHVM they seriously gave the PHPc a competition and people kind of got mad.
> Nothing was happening in the PHP language for freaking three years.
To be fair that's not because they where sleeping, but because they attempted to do something that proved too hard (unicode support) and they had to abandon it. That's why PHP skipped version 6.
>No hate, I just don't get why hhvm doesn't get any love for what they did.
I don't know - I expected to see a ton of Hack projects show up here but it's like no one cared about the language except as a wake-up call to PHP. Maybe the involvement of Facebook put people off.
> As a side note, it is interesting to see that multithreaded compilation of a single page provides significant performance benefits here...this is usually not done with C/C++ code compilation from what I understand about it
Well, that's because typically all cores are maxed out during a parallel build of large-scale C++ software, so there's no need to go any further.
With link-time optimization it's a different story…hence the work some compilers (like rustc for Rust) are doing to parallelize builds of single compilation units.
There's a simple-but-useful WebAssembly Explorer at https://mbebenita.github.io/WasmExplorer/ that interactively shows the C/C++ -> WASM Text -> x86 ASM path that WebAssembly takes.
> I did not realize that the WASM needs to be compiled into machine code on the client system
It doesn't need to be. This is a choice they've made. Other implementations of WASM could interpret it they wanted.
The Church-Turing thesis tells us that any program you can compile to machine code can also be interpreted, so it is not possible that any language needs to be compiled into machine code.
Total Aside. As a compiler and runtimes guy, I'm super excited for streaming compilation. I think stuff like this and ethereum for distributed computation is really cool stuff! :D
Streaming compilation is the way it was always historically done. One reason is that computers used to not have enough RAM to store whole non-trivial programs in it in AST or another intermediate form.
Second reason is that this approach matches how the underlying theory of languages and automatons works. One can view modern AST producing compiler frontend as compiler that compiles it's input into program that builds the resulting AST.
On the other hand many modern optimalization passes simply cannot be done in streaming manner or even by any pushdown automaton.
That's great news. On http://8bitworkshop.com/ I'd like to offer some additional WASM modules on-demand but they take 15+ seconds to load. (It seems 50% of the time is parsing and 50% module instantiation)
Guess what, downloading compiled executable code is even faster. Is that where we are heading to? Flash 2.0? Wouldn't it be great to save all the electric power that was used to compile very same code on millions of computers every day?
Sigh... let's have this thread again. Needs to be effectively sandboxed. Modern JS as a compiler target is just as opaque. JIT compilation is a win on energy relative to the energy spent in a slow interpreter. JITing WASM will take less energy than JITing emscripten-js (nevermind the energy to send over the wire)
First of all I wasn't talking about source code, I was comparing the output of a C/Rust->wasm compiler to the one of a C/Rust->x86 compiler. Since the wasm virtual machine has a JIT, I believe the compilation to wasm isn't too aggressive with optimizations. And since those make the binary bigger, I assume a wasm output would be lighter than an x86 one. I didn't benchmark it though.
And if you compare the size of the binary output with the size of the source code, the binary is bigger in many cases because of optimizations (and runtime size, for small programs). Additionally, the source code can be gziped with a good compression factor whereas the binary cannot. Then 99% of the time, the source code is lighter to send over the internet than the compiled binary.
Does wasm do runtime code specialization? I wonder if there will end up being a way to do to timing attacks against the optimizing wasm compiler/linker step ... Is it possible to setup code such that the optimization time depends on the runtime inferred type of an 'x' that you aren't supposed to have access to ...?
The term you’re looking for is speculation, not specialization, but no, I don’t think it does either. C++ and other languages targeting WASM often do type specialization, but it’s entirely done before the browser sees WASM, and has nothing to do with what you’re describing. (Which would be speculative compilation).
I’d imagine that nobody does speculative compilation since the benefit is too low given how fast the network is. Also, yes, there would be security concerns.
Caching of compiled code! As i read it they want to cache the wasm bytecode at the client level. What if servers did the caching instead? Group clients by the architectures they use and serve the cached bytecode to the right 'groups' of clients.
That would assume you trust the server not to give you malicious machine code (which you of course cannot!). wasm is specified in such a way that it is still sandboxed by the VM that compiles it. If you fetch arbitrary machine code, you cannot verify it and that leads to huge security holes!
> But there’s no good reason to keep the compiler waiting. It’s technically possible to compile WebAssembly line by line. This means you should be able to start as soon as the first chunk comes in.
Maybe they can optimize further by speculating what the next line will be...
It runs in existing browser VMs, which have been pretty battle tested.
Another interesting note is that threads are now on hold for WebAssembly due to Spectre, that is, SharedBufferArray has been disabled. Hopefully it can be enabled in the future.
This cracks me up. Modern web browsers really started to evolve in the 90's when security problems really ramped up. You used to just download excecutables and run them on your computer because the functionality wasn't there otherwise. Flash and Java applets were the initial answer to that before Javascript and HTML evolved. We've come almost full circle to browsers basically being little VM's that can do anything again, the main reason they were developed in the first place. Most people's entire computer experience is now in the browser and here come executables again which will require another internal layer to mitigate problems.
The end state ends up looking a lot like (the user-facing side of) an operating system, except that:
* the filesystem is cloud storage (Drive/Dropbox/what have you -- the Unhosted (https://unhosted.org/) architecture)
* the apps are insecure but open-source by requirement (interpreted jS)
* ... running in a controlled sandbox (the browser)
* ... using a standard UI language (HTML/CSS)
* with functionality modifiable/overridable by user preference (extensions)
It's pretty much the ecosystem you would want if you were building this from scratch! Except you'd want Html/CSS/JS to be much more intelligently designed from the start (I'm waiting so eagerly for the day that browsers natively run more scripting languages than just JS...)
It never could be done in the 90s because everything ran too slowly, but it's feasible now.
Actually, browsers are designed from the ground up to handle insecure code! That's pretty awesome, but comes at a cost: The mentioned additional layer, battery, speed?
And it's platform independent!
I'm often reminded of the XKCD quip that web pages are in fact the easily installable executables that so much of the marketplace was looking for over the past few decades.
On a desktop, we compile 30-60 megabytes of WebAssembly code per second. That’s faster than the network delivers the packets.
Funny enough, on my workstation it seems to compile something more like 60-80 MiB/s, to keep up with my network which was recently upgraded to gigabit.
Very impressive stuff, I hope workstation CPUs can keep pace with networks.
I guess i don't understand the push to wasm. Why not just embed hotspot, or a branch of it. Is there any difference?
Or going the other way, could hotspot be replaced with a wasm jit by compiling java to wasm? I know they have slightly different memory models, but I don't understand why they seem to be treated so separately.
Why Hotspot? There are many bytecode languages, with their own execution environments - .NET/CLR, Parrot, BEAM, etc. wasm is an attempt to design one specifically for the web, rather than trying to shoehorn one made for a different environment.
HotSpots optimization and code generation is far superior than any other VM.
But that is kind of my point - there are advanced vms out there. I don't see why the web needs its own vm apart from them. All the differences I see are fairly minimal.
But the bytecode and stdlib it works with are the least suitable. I could go on for a long time about JVM insns and the class model and GC assumptions that make it bad for the web. Similarly, I could go on forever about the stdlib that supports this bytecode (strings, threads, class loaders, etc) and why it's bad for the web too.
I'm not sure that JVM allows for streaming code, its centred around class-loading and classes in general. Shipping in fragments of code will require quite an overhaul of that entire architecture. Lambdas and other constructs were added much later via JSR-292 and invokedynamic, streaming stuff in will require quite a bit of shoehorning.
Streaming in this case means that you can start generating (or even executing) code before you have complete input. There are some mostly ignorable reasons (having to do with bytecode verification) why you should not stream-JIT Java bytecode. On the other hand in the whole ecosystem you will not gain anything wothwhile given the fact that you need the JVM state to be essentially complete before you start executing anything and on slightly lower-level the .class file format is designed to be compact and not meaningfully streamable.
Hotspot and wasm are both stack languages. All the other differences seem to be rather small (you could change or ditch the security model fairly easily), and wasm needs some bytecode verification too.
wasm is not a browser plugin, it is part of the browser. That means the plugin architechture does not need to exist (which has already been removed by all major browsers). Building wasm support into browsers was not an easy task, and "just embed hotspot" is not any easier.
This was tried (search for "LiveConnect"). It failed for many reasons, but the one that's most relevant to today is probably that most interesting client-side stuff (that isn't already written in JS) is written in C and C++, not a JVM language.
If that is why out failed, then wouldn't wasm fail for the same reason?
It looks like LiveConnect was a much bigger thing that MS pushed.
I'm not really understanding how this means anything. Why not just compile js et al to java bytecode? Or is wasm just another NIH protect in a long list of them in the JavaScript world? That is what it is looking like.
Sure hotspot itself couldn't be used straight up, but the changes are certainly much less than creating a whole new vm.
> If that is why out failed, then wouldn't wasm fail for the same reason?
Web Assembly targets C and C++ as source languages, unlike the JVM.
> Why not just compile js et al to java bytecode?
Because JS and Java semantics are different, and emulating JS semantics on top of the JVM is slow.
> Sure hotspot itself couldn't be used straight up, but the changes are certainly much less than creating a whole new vm.
The Web Assembly VM shares as much code as possible with the engine's JS VM. This is obviously better than using HotSpot, as the relevant code is already shipping in browsers.
Nobody is going around rewriting code for no reason.
> Web Assembly targets C and C++ as source languages, unlike the JVM.
So wasm (and hence JavaScript if you translating js into wasm) are closer c/c++ than to java? Considering all the UB in C, this cannot possibly be true.
> Because JS and Java semantics are different, and emulating JS semantics on top of the JVM is slow.
Given javas JavaScript implementation is pretty comparable, not very big, and even written in java, this also doesn't seem to be true at all.
Once you add gc into wasm, it is almost guaranteed to be closer to java than c/c++.
Yes. For example, Web Assembly has unsigned integer arithmetic and explicit memory allocation/deallocation, neither of which the JVM has.
> Considering all the UB in C, this cannot possibly be true.
Undefined behavior is a concern of the compiler of the source language. Web Assembly doesn't compile C or C++. It simply interprets a VM, the semantics of which are designed to be relatively free of undefined behavior.
> Given javas JavaScript implementation is pretty comparable, not very big, and even written in java, this also doesn't seem to be true at all.
I would highly doubt it's performance competitive at the level that browsers are at now. It strikes me as likely impossible to get performance competitive on, say, SunSpider if you aren't highly tuned for it.
Nashorn is anywhere to faster on primitive heavy computation to 50% slower post JIT. Graal is supposed to be even faster.
> Web Assembly has unsigned integer arithmetic
This is your idea of a major difference that affects implementation so much that is separate VM needs to be written? Many Java programs already do essentially manual memory management already too. Those aren't the big differences. Things like safe-memory access are bigger issues. And once wasm gets gc, it will even be closer to Java and JS.
I hope Graal outperforms everything else, so we can stop pretending like WASM is something different than what Java has been trying to do.
Firefox Nightly: WebAssembly.instantiate took 227.6ms (54.4mb/s)
Chrome Canary: WebAssembly.instantiate took 8576ms (1.4mb/s)
Wow.
(Edit: And I believe that's not even using the streaming compilation mentioned in the article, it's just the new baseline compiler in action)