Emscripten (LLVM-to-JavaScript compiler) 1.0

Smerity · on April 10, 2011

Emscripten, given the recent advancements in Javascript interpreters, holds immense promise. There is a problem however: the size of the Javascript "executables".

As the size of their executables are already competitive with their C/C++ counterparts this is not easily fixed. CPython's "Javscript" runtime is 2.8 megabytes whilst CPython's C runtime is 2.2 megabytes, less than a 30% increase. With compression the JS size decreases, but not by enough.

A 2 megabyte JS file is likely prohibitively large. Even if common JS "executables" were hosted by Google, as is currently done for jQuery and other libraries, the problem is only mildly alleviated.

Emscripten is an amazing technical feat but I fear due to this it will see little usage. Their main focus currently is improving performance, but is anyone looking into decreasing the size or improving the handling of these "executables"?

pohl · on April 10, 2011

The creators of GWT faced the same problem, and they have addressed it by making some shrewd choices.

The first such choice was to support a strictly static subset of Java - one that eliminates reflection - so that the compiler can prove exactly which lines of code will be used in the application, and aggressively eliminate those that are not. The compiler does this by building a graph rooted at the entry point and following method calls. For example, if your GWT application uses the ArrayList class, but it never uses its remove() method, then the code for remove() is not included in the resulting JavaScript output. They refer to this as the "pay as you go" design philosophy.

Other minification techniques are used in addition to this, but the most impact comes from choosing a strict static subset of the source language.

The next choice that the creators of GWT made was to make the compiler aware of a special method GWT.runAsync(). When the compiler encounters this method in the aforementioned graph, it knows that it can break the javascript into pieces, so that code on the far side of that call is loaded lazily when it is encountered, as opposed to in the initial bundle. This "developer guided code splitting" saves the user from having to wait for a 5 megabyte download when they only need to wait for 1 megabyte, etc.

There's no reason that Emscripten cannot use these techniques, other than an unwillingness to make the hard choice of sacrificing some expressiveness in the source language.

tlrobinson · on April 10, 2011

"A 2 megabyte JS file is likely prohibitively large."

Tell that to Gmail. They load at least 2MB of code. New Twitter loads 1MB. Facebook used to load that much, but it appears they've cut it down a lot recently.

It's certainly prohibitive for webpages and lightweight page-oriented web applications, but not single-page web apps. I would expect this to be used more for the latter.

salmonsnide · on April 10, 2011

An even bigger problem is speed. It only runs at ~10% of native speed.

This is definitely cool, but I think something like Google's Native Client is needed to make C/C++ work on the web. Maybe Emscripten could be used as a fallback for browsers without NaCl support.

azakai · on April 10, 2011

> An even bigger problem is speed. It only runs at ~10% of native speed.

Currently that is basically the situation, yes (but it depends on the benchmark - some run much faster). However, improvements to Emscripten can probably get that up to 25% of native speed. As LLVM and JS optimizers like Closure get better that will improve even more.

And JS engines are getting faster too. Things like type inference will work particularly well on code like this, since the code is basically statically typed (but not explicitly).

So, in the long term I believe this approach can yield code that runs at very decent speeds. Plus, when comparing to NaCl it will have two main advantages: (1) it doesn't require a plugin and runs everywhere, even on iPads, and (2) it doesn't have the sandboxing overhead of NaCl (NaCl code itself is fast, but communication with outside code on the web page is slow).

exit · on April 10, 2011

maybe browsers should have a system for explicitly caching-by-hash across domains.

illumen · on April 10, 2011

yeah that would be useful. Caching by digital signature would be useful too.

wahnfrieden · on April 10, 2011

Fair criticism, but this still has plenty of potential use for projects that are only one step away from llvm rather than two (i.e something written in C) which don't have runtimes.

wahnfrieden · on April 10, 2011

It ocurred to me that this will be useful for things like Chrome extensions too, where the code is downloaded once and stored locally, and the user doesn't mind a longer download in this case.

Nate75Sanders · on April 10, 2011

Do you not think that as time goes on (bandwidth increases) a 2MB download is something that we won't even notice?

Ruudjah · on April 10, 2011

Maybe include generic libs into browsers.

wladimir · on April 10, 2011

Your idea has the drawback that browser updates might be few and far between (for some users), so when embedding them in the browser you can never rely on a certain version of a certain library to be available.

A more general caching system would be better. For example, "Cache by digital signature", as some people above mention.

jimmyjazz14 · on April 10, 2011

In my opinion this is probably the most exciting thing being done with Javascript currently; I can't wait to see where this goes.

snotrockets · on April 10, 2011

Quote Brendan Eich (if I'm not mistaken): “JavaScript is x86 for the web”

nitrogen · on April 10, 2011

This looks very useful for now, but shouldn't we be trying to get LLVM-based native client standardized for the long term?

lautis · on April 10, 2011

If we wanted to have LLVM-based native clients, Emscripten would definitely be needed.

Distributing binaries isn't feasible as they aren't portable. Chrome would like to address this by distributing LLVM bitcode[1] and then compile this to native binary using LLVM. Portable Native Clients were introduced as an idea one year ago, but AFAIK there hasn't been much development, mostly because LLVM itself isn't very portable.

If native clients would be the path for future web apps, we can't reasonably expect the whole world to switch to LLVM in one night. There would have to be some fallback. As both Portable Native Clients and Emscripten work on LLVM bitcode, Emscripten would be the natural choice.

[1] LLVM's name for their intermediate language, low-level bytecode

ieefransi · on April 11, 2011

How are they not portable when translated to javascript ? Granted, this will probably never run Quake in the browser, but if you want a SHA256 hash translated to javascript ... I seriously doubt there's a better tool.

All the difficult C stuff instantly useable from javascript, from GWT, from ...

That's got to be a boon, right ?

bzbarsky · on April 10, 2011

One problem is that last I checked LLVM is not really all that portable across different hardware architectures (e.g. you can't use the same LLVM bitcode for both x86 and AMD64 code generation; you can't use the same LLVM bitcode for both little-endian and big-endian hardware, etc, etc).

mckoss · on April 10, 2011

Amazing. Tried to run the current CPython port on iPad and the browser crashes. :-(

thasmin · on April 10, 2011

Can someone give an example of what this would be used for?

vessenes · on April 10, 2011

He mentions he'd like to use it to get windowing environments compiled over to JS for use in the browser. I scoffed until playing around with his real live python 2.7 running on javascript in Chrome. YOW! That's just super cool.

Jebdm · on April 10, 2011

It looks like you can use it as essentially an LLVM backend, meaning that any compiler/interpreter that uses LLVM can compile into Javascript (for use in the browser).

irfn · on April 10, 2011

imagine a hugely useful c library being compiled into javascript for use in a browser env and even nodejs.

skrebbel · on April 10, 2011

on the latter example: you could also just compile that hugely useful c library as a nodejs plugin directly.