Hacker News new | past | comments | ask | show | jobs | submit login
Introducing SIMD.js (hacks.mozilla.org)
210 points by rnyman on Oct 30, 2014 | hide | past | favorite | 75 comments



Contrary to popular opinion, I think this is actually a terrible idea. By doing this, we are basically adding SIMD to the JS 'virtual machine'. IMO, we shouldn't standardize on API for things which are so low level and this is really like an implementation detail in the virtual machine.

I would suggest two alternate approaches:

1. First come with a standard SIMD specification across CPUs. In some ways like the GL specification (for GPU) or even the webrtc stuff (for p2p connections). Once you have done that, define a JS API.

2. Identify proper use cases. The example are quite bogus currently. They show a mandelbrot. Well, all this is done in the GPU. Image manipulation is done best with css level filters. Once we identify the use cases, we can them maybe add APIs to specifically address them if it's important for the web in general.

In it's current form, this is really a 'because we can' API. Where does this end? What if we just expose raw socket and networking API instead of a webrtc style API? What about pixel manipulation of images?

Also see: http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg2.... The apple guys are saying they can do all this even without the API.


For historical reasons, every language that wanted one built their own SIMD API instead of building a language-independent API first: C++, C#, Dart, OpenCL, and so on, and it's not easy to go back and change that. However, there is a lot of commonality between the SIMD APIs in those languages, so we are following de-facto standards here.

And, we certainly do have real-world use cases in mind. See [0] for one example. We're also interested in using SIMD for video codec development which can't always be done on GPUs.

[0] http://blogs.unity3d.com/2014/10/07/benchmarking-unity-perfo...

And if, in the future, Apple can show that auto-vectorization in this domain is more successful than their attempts so far have shown it to be, then JS engines can always just go back to implementing SIMD.js via a simple polyfill which the engine can auto-vectorize, leaving very little baggage in the language or implementations.


I'm also not so sure about this. I admit to having limited compiler design experience, but the WebKit FTL guys do, and they seem to be saying that many things that can be done with SIMD primitives can be done better with automatic vectorization, since you can specialize code for the specific processor it's being executed on. On top of that, it's easier for the programmer if they can write ordinary code and get SIMD performance.

Are there cases where algorithms simply cannot be expressed in a way that automatic vectorization could transform them into SIMD instructions? Or is this just a way to avoid implementing automatic vectorization in JavaScript engines?


Autovectorization has been an area of intense compiler effort for a decade or more and by and large the primary customers of it (games, video codecs, etc.) prefer the intrinsics. It's perceived as too unreliable and brittle to be relied upon, and it's easy to see why: given a choice between having to think about what the compiler's alias analysis, overflow analysis, loop trip count analysis, etc. will do and just writing an intrinsic and calling it a day, programmers will choose the latter.

This applies regardless of how good the autovectorization really is: it's in a weird catch-22 kind of space where adding more and more features to your autovectorizer can actually reduce its perceived reliability, by making the answer to "will this vectorize?" harder and harder for a programmer to answer at a glance. <xmmintrin.h> has a lot of problems, but it's reliable, and at the end of the day that's what history has shown that game devs and video codec authors want.


> (games, video codecs, etc.) prefer the intrinsics

Nit: Most of the projects I'm familiar with (libav/ffmpeg, x264, etc) prefer to break out the SIMD into hand-written functions, instead of relying on intrinsics or even inline asm. This avoids problems with register allocation and code gen, consistency/portability between compilers, etc.

Otherwise, yes, autovectorization is hard, both for application developers and compiler writers. Application code needs to be structured in a very precise way, and the correctness of C -> SIMD transformations needs to be proven. Intrinsics and hand-written SIMD aren't going away.


> This avoids problems with register allocation and code gen, consistency/portability between compilers, etc.

Yup. Which just goes to show: reliability is king.


I appreciate that autovectorization is hard to do well. However, I think the world of JS is different from the world of C/C++. JS optimization is already pretty unpredictable, since the language is dynamic (both in terms of typing and e.g. object memory layout, with the exception of typed arrays); JS primitives are farther from the metal; the optimizations that JS engines perform are implementation-specific, rarely well-documented and always in flux; and it's difficult to see what machine code actually runs for a given JS function. SIMD instructions may make some sense for JS as a compiler target, but they seem to make less sense for JS as a language that doesn't have integers or 32-bit floating point numbers. On top of that, most users of vectorization are targeting a specific architecture or even CPU, whereas JS code is meant to run anywhere. It doesn't seem like there's been much work to alter the language or tools to make it easier for programmers to reason about other sources of unpredictability, so why so much emphasis on SIMD?

I'll admit some ignorance here, but it also seems to me that a JIT may also have some advantages WRT autovectorization as compared with a static compiler, since you can collect runtime information about aliasing and loop trip count before choosing to vectorize. But if the point is to make performance easier to reason about, why not start with the rest of the language before worrying about vectorization?


> It doesn't seem like there's been much work to alter the language or tools to make it easier for programmers to reason about other sources of unpredictability, so why so much emphasis on SIMD?

But there certainly have been such efforts! Standards bodies have added features like Typed Arrays, Math.fround, etc., and work is ongoing on Classes, Typed Objects, and Modules. All of those things make performance more predictable.

There are also better devtools all the time, which help you understand performance issues better.

And there is also asm.js which aims to make a certain type of JavaScript extremely predictable.

A final point - the unpredictability you mention is exactly why a SIMD API is needed. JavaScript is more unpredictable than C and C#, but even those have added SIMD APIs, because even in their predictable worlds, autovectorization wasn't good enough.


The key is that Mozilla is betting hard on Emscripten/asm.js building marketshare/mindshare into the future, and being perceived as being exactly as performant and reliable as C/C++ running in a native process. SIMD.js should of course be able to run from non-asm.js code... but that's more of a bonus (since that's an almost-strict subset of the work required to get it to work on asm.js code, AFAIK).


> video codecs, etc.) prefer the intrinsics.

Prefer assembly. Intrinsics usually make a disaster of register allocation and you lose much of your performance to needless load/stores.


Well if it's between auto vectorization or intrinsics...

Lately I've been rather disappointed in how minimal the gains are in reducing register spills from intrinsics on modern CPUs, with their wide decode/issue, 16 registers, and dual load pipelines - by the time a loop is complex enough that a compiler spills, extra load/store uops are almost free from a micro benchmark perspective. The macro gains from smaller code and reduced cache usage are a bit bigger, but still depressingly minor for the effort expended.

But if you care about 32-bit x86 that's another story of course.


So, one of the real reasons reducing register spills does not help is not related to what you suggest, it's because on modern x86, they play games with what looks "memory" to you, so you really aren't actually spilling into "memory" anyway :)


You (and a lot of people) make it sound like its magic but it's not - http://blog.stuffedcow.net/2014/01/x86-memory-disambiguation...


It's not magic. But it's not what that blog post is talking about.

On some of these processors, 128 bytes of stack or so is not really "memory" (in the sense of being stored with memory), so spilling is not that bad.


That's the magic I'm talking about because it's not true; memory is memory and stack memory isn't treated specially by the processor. What it does have is a store buffer, which applies to all memory accesses and is what store forwarding uses to bypass L1.


I'm simply going to disagree with you on this one, because i can't make my evidence public :)


What about meeting the compiler in the middle? I like the matlab/numpy/blas approach. Ask the developer for the high level vector operation (i.e. vector addition, inner product, matrix multiplication...). And then have the library/runtime turn that into SIMD instructions.


"Autovectorization has been an area of intense compiler effort for a decade or more and by and large the primary customers of it (games, video codecs, etc.) prefer the intrinsics. It's perceived as too unreliable and brittle to be relied upon, and it's easy to see why: given a choice between having to think about what the compiler's alias analysis, overflow analysis, loop trip count analysis, etc. will do and just writing an intrinsic and calling it a day, programmers will choose the latter."

100% true for C++ (though it would be more accurate to say "4 decades" if you want to count fortran autovectorization, which has been going on since the late 70's)

But, i'll point out, plenty of the time, they end up writing slower intrinsics than the compilers autovectorization did to the same code.

(Plenty of the time they don't, too).

Additionally, all of the problems you mentioned are due to specific issues in C/C++. In other languages, autovectorization is not just "relied upon", it's basically "part of the standard" (see, e.g., Fortran 95).

Given that all of the brittleness you talk about is precisely because of the lack of pointer safety, alignment issues, and all sorts of things that simply only exist in C/C++, where programmers have a lot of control, i'm not sure it makes sense to base your argument on the experience of a language that is very different from the one this API was designed for.

All that said, truthfully, IMHO, neither autovectorization, nor intrinsics at the level you are talking, make for a good programming model in most languages.

The intrinsics at this level don't get used effectively: Among other reasons, they codegen differently on different platforms that don't directly have the exact same simd semantics, which is "all of them" :P

I know you guys are trying to avoid this by limiting the ops available/etc. It is, IMHO, a losing game.

So you end up with the same problem: People write loops that are really bad on some platforms, and good on others.

Autovectorization knows what the target looks like, but doesn't trigger in some cases people want it to.

In the end, I think doing things like Halide is a lot more useful as a programming model than simd.js

simd.js is a usable implementation mechanism for some of those programming models, but i would not sell it as the programming model itself.

In fact, almost the exact set of intrinsics mentioned in simd.js were allowed for generic operations on vectors in GCC (you can create a vector 32x4 float in a platform independent way, do normal ops on it, and it will codegen down to lower level vector ops, without ever seeing xmmintrin). It was simultaneously not high level and not low level enough.

People resorted to the lower level platform specific intrinsics to get better performance, or wrote higher level libraries to get better abstract.

In any case, i'm sure it's faster than what you have now, and certainly an advance. I'd just be careful of thinking it's going to work all that well except for targeted use cases.


Yes, we don't expect everyone will want to program to the bare SIMD.js API directly for everything; it's also intended to provide basic functionality that higher-level libraries and even specialized languages, like Halide, can be built on (when they aren't running on ARB_compute_shader).


> "will this vectorize?"

Yes, but _will it blend?_


I think it's wrong to show it as an either-or situation. Compilers can and do use both explicit operations and auto-vectorisation in the same program. GCC for example does some automation on the loops, but also allows you to use the primitives directly.

Why wouldn't we want the same in JS?


I think you're rushing to judgement pretty quickly here. For example,

> 2. Identify proper use cases. The example are quite bogus currently. They show a mandelbrot.

That's just one tiny demo. See the github repo discussions for lots of debate on use cases,

https://github.com/johnmccutchan/ecmascript_simd/issues

This has been discussed very intensively, with input from people from Intel, Mozilla, and Google. Feel free to join in as well.

In general, while I sympathize to some extent with you and the Apple position here, you are fighting a strong trend in the industry. C has a SIMD API, C# has SIMD API, Dart has a SIMD API, etc. - all those were created for good reasons. Autovectorization would be great - hopefully Apple can prove it beats the API approach - but no compiler has proven it thus far, hence SIMD APIs in all those languages I mentioned.


Thanks for the link! I tried going through the issues (randomly clicking) but I am not seeing use cases. https://github.com/johnmccutchan/ecmascript_simd/issues/89 https://github.com/johnmccutchan/ecmascript_simd/issues/85 https://github.com/johnmccutchan/ecmascript_simd/issues/84

They are good technical discussions. What I am looking for is use cases for people to use in websites.


Vertex skinning [1] (code at [2]) is a classic example. (This code is not vectorized, but it could easily be and is usually vectorized via intrinsics in games.) Any 3D game that wants to animate human characters over a long period of time is likely to be using this technique. Since we want games to run on the Web, of course, this is a use case for Web sites.

[1]: http://en.wikipedia.org/wiki/Skeletal_animation

[2]: https://github.com/h4writer/arewefastyet/blob/master/benchma...


I am no expert on this but if games is the primary target, would it make sense to make some sort of vector(ization) API? Or some tesselator API.

It's honestly baffling that we keep away from threads in javascript but do all these optimizations like SIMD which are very minor. I mean practically every CPU out there has multiple cores. Workers don't cut it because they require copying data.


> I am no expert on this but if games is the primary target, would it make sense to make some sort of vector(ization) API? Or some tesselator API.

Both of those would be harder, and less general. Game developers are asking for SIMD; they aren't asking for specialized APIs.

> It's honestly baffling that we keep away from threads in javascript but do all these optimizations like SIMD which are very minor. I mean practically every CPU out there has multiple cores. Workers don't cut it because they require copying data.

Threads aren't easy to "just add" to JS. Making a thread-safe GC perform as well as today's highly-optimized single-threaded GCs is hard. And not even counting the engineering effort required, none of the millions of lines of JavaScript out there is thread-safe. Of course we will need a way to do threads eventually, but it's much harder than SIMD.


>Making a thread-safe GC perform as well as today's highly-optimized single-threaded GCs is hard.

That is entirely a problem of your own making. You decided to bet hard on single threaded dynamic Javascript being a suitable model for all end user software. It turns out it isn't, but it's too late now.

This kind of thing is exactly why Javascript isn't a good choice as a general purpose VM platform. Which you are presumably aware of because you aren't writing the next generation of Firefox in Javascript, you are creating Rust. But apparently what isn't good enough for Mozilla is good enough for everybody else…

Which brings us to the real problem with the web - everybody except the browser vendors is a second class citizen.


Simd optimisations aren't very minor. For games anyway, we have a 90-10 rule (90% of time is spent inside 10% of the code) I'm sure that exists in other areas too. One of the most common operations performed is multiplying 4x4 matrices, which can get 300% speed ups with the correct simd operations. To get that speed up in the most called parts of your code is anything but minor


There are experiments with threads in JS, however that would be a much bigger change to the language than adding something like a SIMD API, so it is more controversial and takes longer to sort out.


Skinning hasn't been done on the CPU for about a decade now. GPUs are much better at it.


There are lots of issues, can take a while to find all the use cases in them (work has been ongoing for some time). But from memory I can recall that

https://github.com/johnmccutchan/ecmascript_simd/issues/59

is interesting here, it mentions, among other things, the major SIMD-using code portion from an important real-world codebase (IMVU).

Might also be relevant discussion on the emscripten repo,

https://github.com/kripken/emscripten/issues?q=is%3Aopen+is%...

as some debate happened there while implementing code generation that emits SIMD.js.


>2. Identify proper use cases.

SIMD is used a lot in all kinds of things dealing with video, so that's one area where it could find uses. I wouldn't mind being able to do (realtime) video processing in the browser. Some relevant reading that mentions SIMD.js briefly (though mostly that it wasn't really useful at all at the time of writing):

http://tp7.pw/articles/javascript-video-filtering/


Physics simulation is a pretty big one too - at least it's what we used when we were learning to use SIMD intrinsics in C.


It looks like the vector proposals aren't tied to any particular implementations and could be used to help extract parallelism via SIMT, VLIW, microthreads, or various other techniques in addition to the SIMD units you'd find in x86 or ARM.


> What about pixel manipulation of images?

We got this years ago. You can do per-pixel stuff via the Canvas API.


Oh. Dear. God. JavaScript is going to become the x86 of our time, i.e., the historically awful patchwork of kludges that gets passed onto each generation to kludge anew which also runs the entire world. JavaScript: we don't have integers, but, dammit, we have vector instructions. FML.

Maybe it's not too late to take up painting.


We haven't hit the insanity limit quite yet.

It looks like Alon Zakai is working on something the "Emterpreter," which is a bytecode format that Emscripten could compile to to sacrifice some runtime performance for startup time. (JS parse time is quite a big deal for Emscripten applications)

I am hoping that this effort goes really really well. So well that browser engines start natively supporting this bytecode.

Then JS can actually die.


It's rapidly moving into hardware development as well, eg:

https://tessel.io/

https://www.kickstarter.com/projects/gfw/espruino-javascript...

http://nodebots.io/

This is what programming is now.


No, it's rapidly moving into toy hardware platforms targeted at web developers.


> JavaScript: we don't have integers

More like "we don't have 64b ints, yet."



>SIMD.js is originally derived from the Dart SIMD specification

I am so ungodly sick of Google pushing DART technologies everywhere. I really am starting to think their strategy is embrace, enhance, exterminate.


If you don't like Dart, then you should actually be in favor of this move. It negates what was previously an advantage of Dart over JavaScript.


Apologies, but we don't push Dart. You don't have to use it. Google writes millions of lines of JavaScript a year and wants a better language, that's all.

As pcwalton says, SIMD.js evens an advantage that was in favor of Dart, while at the same time making JavaScript a better compile target for languages like Dart and asm.js languages like C++. It's also a really clean API that fits nicely onto typed arrays. What's not to like?


Now I'm just waiting for int64.js, which should appear when ES7 value objects come along http://www.slideshare.net/BrendanEich/value-objects2


Same here. I'm also looking forward to asm.js being a target for JVM, CLR and other garbage collected languages through Typed Objects. https://wiki.mozilla.org/Javascript:SpiderMonkey:OdinMonkey#...

Also, am hoping 'JavaScript Shared Memory, Atomics, and Locks' gets accepted by TC-39. https://docs.google.com/document/d/1NDGA_gZJ7M7w1Bh8S0AoDyEq...


Not being able to have 64bit integers in node.js on a 64bit linux kernel can be a considerable issue. This fix can't come soon enough, having to wait for the version after the next is too long.


What we really need is to spread WebCL - https://www.khronos.org/registry/webcl/specs/1.0.0/

1. It is already standardized language and API. There exist a lot of code for WebCL.

2. WebCL engines can run on CPU, they can use all CPU cores, SIMD etc., while still being a part of web browser (no special drivers required). It will give us much better performance, than asm.js, SIMD.js, Google's Native Client or any other "unstandardizible" things.



I wouldn't say DOA (which is final). As the comment there says, it doesn't make sense currently. But that could change.


Well that's what Microsoft said about Android in 2008. No wonder that Firefox is loosing popularity, when they refuse to innovate.


I'm taking my crayons and going home!


Non-Mozilla technologies are not welcome in Firefox and never have been.

That is why we have been stuck for 15 years without a lossy image format that supports transparency (meanwhile they put lots of effort into supporting an animated png format that was invented by Mozilla that nobody else in the world cares about or uses). Mozilla's NIH syndrome holds back the web.


This is demonstrably untrue, as we've implemented a number of specs that originated elsewhere. (WebAudio, WebRTC). Just because something has a spec doesn't mean it is sensible to implement, however. As Vlad points out, there's no mobile implementation of that technology, so it doesn't make sense for us to push it, especially when there's an alternative spec that does have traction.

As for image formats, there was a table going around Twitter that I can't find now showing implementation of new image formats by browser. Basically Google implemented WebP, Microsoft implemented JPEG XR, and Apple implemented JPEG2000 (IIRC). There is no consensus in that space.


>As for image formats, there was a table going around Twitter that I can't find now showing implementation of new image formats by browser. Basically Google implemented WebP, Microsoft implemented JPEG XR, and Apple implemented JPEG2000 (IIRC). There is no consensus in that space.

There is no consensus because Mozilla have spent the last 15 years rejecting every proposed format. Mozilla (repeatedly) rejected JPEG2000 long before Chrome even existed, so you can hardly cite Chrome's lack of support as an excuse for your inaction.


Neat, looks like JS is getting SIMD speedups because of fixed type objects.

Would really love to see the speedup on fixed tuple transforms and convolution algorithms.

When I saw JS speed wars originally, I started writing an adobe curve apply algorithm in JS, to apply .acv curves live using Canvas, but essentially hit CPU & shelved it.

https://github.com/t3rmin4t0r/io.dine/blob/master/lib/iodine...

http://notmysock.org/code/iodine/

I want to rewrite it using something like the newly introduced float32x4, assuming I can read out the images int o RGBA tuples.

Something like a blur would actually be possible once this is fast.


That applycurve function has what appears to be a gather, which is unfortunately outside of where we expect to be with the initial iteration of SIMD.js, as there isn't widespread CPU SIMD hardware support for it yet. However, the brighten function immediately above it, for example, is supported by features in the SIMD.js spec today.


Hey, I have implemented Curves transform in JS too! :) You can find it at http://www.Photopea.com (Ctrl+M). It works pretty fast even on large images.


It's official, Javascript is the new Assembly. I'll wait for the new Python, I never liked low level languages.


ClojureScript already exists! So do Elm, Funscript, Purescript, Haxe, and a bunch of other things which transpile (might as well just start calling it "compile" now) to JS.


I liked the "original" assembly more, at least it was designed for helping humans to write and read code.

EDIT: That said, this is good news, in a sense.


You make me wonder which assembly you're talking about. x86-64 feels heavy with legacy designs, many ways to get the same result, common pitfalls regarding memory management, and some surprising very specific gems to make some things fast… such as SIMD.

(I know little about other instruction sets.)


I'm just talking about design goals - the original assembly was introduced so that people wouldn't have to remember the numeric instructions of the processor. The instruction sets have been legacy even before that, with slight improvements occasionally, but at least the target audience was people in the beginning. With asm.js the target has been compilers to begin with. With most assembly dialects the instructions have a clear purpose, where as with asm.js you have arcane rules on what makes what stay as an integer, etc., enough so that I think most of the group of sane people would prefer writing against traditional assembly dialects than asm.js.

Maybe that's a good thing, at least then the goal is that people who care about performance don't write JS. But I'd prefer to just have the compile target as numbers then since you can't read the result anyway.


Before assembly, people wrote in binary machine code.


Python translated to Javascript. The fastest executing code ever.


https://rfk.id.au/blog/entry/pypy-js-faster-than-cpython/ "on a single carefully-tuned benchmark, after JIT warmup."


And this is why we need to simply switch to LLVM. It gives us a sane bytecode, allows any language, allows real pre-browser optimization, etc. Mozilla's moving that direction with Rust and Servo. Google has already experimented with it via pnacl, and webkit/apple already actively compiles JS to llvm.

For backward compatibility, just use and extend emscripten. As a bonus, this would allow the DOM interface to be reworked so we can get it right.


Does it really allow any language? That sounds nice in theory but how many complete languages have been fully implemented on top of vanilla LLVM and which version does it need? I see many experimental frontends that may one day be complete but by then LLVM will be at a new version and will likely be using a different bytecode.

I like the idea of having a language specifically for compiling to but either way someone has to build the prototype implementation. This idea of building LLVM, JVM or CLR in to browsers might be good but code is what matters when it comes to proposing standards.


Well LLVM heavily relies on not having forward compatibility (new versions spew out LLVM IR that can't be used by older versions) and backward compatibility is merely an afterthought.


That's a standards issue. If w3c froze a particular version as the "one true version", then whatever experimentation the llvm guys want to do with whatever other versions doesn't matter.

Another solution is to version your code <script type="llvm" src="foo.ll" version="3.6.1" />

I'm already of the belief that w3c needs to require versioning of js code instead of stuff like 'use strict' (if it's bad, just remove it in new versions and move on). For backwards compatibility, if there's no version, assume ECMAScript3.


Khronos Group standardized LLVM IR 3.2 as SPIR(Standard Portable Intermediate Representation). It is an existence proof that standardizing LLVM IR is possible.

https://www.khronos.org/spir/


I actually work a bit with SPIR. They standardized a subset of LLVM IR 3.2 as SPIR 1.2

SPIR 2.0 on the other hand already is based off LLVM IR 3.4


Where are the Bytex16 types? Or the Wordx8 types? Who the hell is using floats for every data value?


They're coming. The initial code for them is written; it's just waiting for a few other things to get checked in first.

Also, quite a lot of people use floats for every data value.


Neat! Is anyone playing with js crypto and this new stuff?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: