JavaScript in Parallel: Web Workers and SharedArrayBuffer

i_s · on March 4, 2017

It is really how disappointing the options to share data with workers are in the browser.

Combine the limited options options to transfer data quickly, and the limited API available in the worker itself (no version of a DOM, even a gutted one for doing measurements), it is no surprise how few opportunities to use them effectively there are.

When it comes to transferring objects, it is so bad people are resorting to JSON.stringifying messages. [0]

Seems like it would be easy to just add an Immutable Array and Map to the standard library, and let people use that on workers without these silly limitations. What am I missing?

[0] https://nolanlawson.com/2016/02/29/high-performance-web-work...

rl3 · on March 4, 2017

I used web workers extensively in a JS project a little over a year ago, and it was a nightmare. Granted what I was attempting was crazy, but it shouldn't have been as painful as it was.

Basically the "gotchas" are in the browser implementations. Each browser's web worker implementation is a little different, and those little differences (bugs) have huge implications in terms of what you can and can't do. For example: one browser may relay direct worker-to-worker messages via the main thread, so if your main thread is blocked your workers can't talk to each other, and that largely defeats the entire purpose of direct worker-to-worker communication.

Fortunately some of these issues have been fixed since (including that one, if memory serves), but it's slow going. I believe what happened was workers were introduced 5+ years ago, met with little interest initially, and the APIs have rotted for years until recently when interest picked up again, especially since SharedArrayBuffer.

My qualm with SharedArrayBuffer is that it kind of sucks to use in pure JS projects, because you have to serialize/deserialize everything to and from the buffer. With the Emscripten toolchain's pthreads support, as far as I'm aware you just compile your code and the heap lives inside SharedArrayBuffer. You don't have to write boilerplate serialization code, so compared to plain JavaScript it's seamless experience in that regard.

My advice to anyone using web workers in a pure JavaScript project is to use them as their name implies: offloading long-running calculations. If you try to treat them as true threads, you're going to have a bad time. Especially if your inter-worker messaging volume is high and you have frequent interdependent (blocking) calculations.

That said, the work I did was prior to SharedArrayBuffer. For very high-performance projects, it will likely be prudent to use SharedArrayBuffer itself as a messaging medium between workers.

nojvek · on March 4, 2017

Ios has a ui thread and a separate thread for application logic. React native makes good use of this to make applications silky smooth.

In Web land, the overhead of communication with workers is costly that it makes sense to do things in main thread. Preact had an worker diffing implementation but that was slower.

We once had to unzip on worker and pass to main thread, it was faster just to do it on UI thread.

nolanl · on March 4, 2017

(I'm the author of the article you cite.)

For the record, Chrome has been working on improving their postMessage performance as of M57 [1], so my benchmark should probably be re-run.

Assuming they end up in the same ballpark as Safari and Edge, the only browser for which stringification would still make sense is Firefox. (And maybe they have improved in the past year too. :))

But a lot of the conclusions from that article came from the fact that non-stringified postMessage perf was so bad in Chrome, which especially impacted performance on Android devices.

[1]: https://twitter.com/cramforce/status/824273332258209792

albertTJames · on March 3, 2017

If multithread is a possibility for the future of javascript why not make promises and async multithread and keep the same syntax we are using now... What is there to gain with workers ? It seems to me like an unnecessary addition ... but I am interested in the point of view of specialists on the matter. I may be wrong, but promises and async are for me a great formalism upon which one could build a future version of js that is multithreaded.

btilly · on March 4, 2017

Short answer, race conditions.

Here is a longer answer. Node currently uses cooperative multi-tasking. Each function call owns the CPU until it gives up the CPU by returning. Therefore all operations are implicitly atomic. Which makes them very easy to reason about.

As soon as you move to multithreading, NOTHING is atomic unless you lock it. You can even have problems with something as simple as:

    globalCounter = globalCounter + 1;

(If one thread is suspended between reading on the RHS and writing on the LHS, another thread can fetch fetch/write the value, and then that update gets lost when the first thread continues execution.)

There have been many cooperative async programming systems in the past. Every one that has moved to preemptive (which multi-threading is) has uncovered a lot of subtle, hard to spot, and hard to fix bugs because of losing implicit atomic guarantees.

So you go back to the safe solution of locking everything. But now locking/unlocking takes away a bunch of performance, limits parallelism, and creates the possibility for things like deadlocks. And now you might as well not bother with multiple CPUs! (See Python's GIL for a well-known example of this result.)

The challenge therefore is how to add some pre-emptive multitasking while avoiding creating too many unexpected nasty race conditions.

tps5 · on March 4, 2017

Here's my understanding of this:

Aysnc and threads are fundamentally different.

Aysnc refers to intelligently pausing/resuming many different operations. This is fantastic for a lot of tasks, especially tasks that require io. A task can be queued up and a callback can be attached to it. Then, while waiting for some condition to be met, your code can keep running. This results in "non-blocking code," which is familiar to all JS programmers.

Threads, in this context (web workers), refer to CPU cores. The Async model described above is all handled by a single CPU core. Most web applications don't require more than one core, but some do (or, at the very least, the demand is there). Using web workers, you can access other CPU cores, each of which has its own stack and its own separate Async event model.

The problem (from this article) with web workers is that data cannot be shared between CPU cores in JavaScript. Any data you want to pass from your main thread to a web worker must be copied, as in a bitwise copy. This article is about a solution for that, a way to share data between different threads in JavaScript, using a new standard that has been accepted by ECMA.

As far as your question, most of the time you don't want or need your promises or Async code to be handled in another thread. It would be insane to offload every single non-blocking line of code to another thread. Threads are much "heavier" than Async. Languages that have good support for threads also have Async.

However, wrapping web workers inside promises (or async/await) is absolutely something that makes sense, and something you can do now.

ic4l · on March 4, 2017

Well for one workers do not share scope, and at the moment sharing memory between the two is quite difficult. Second is race conditions like @btilly explained.

Right now workers are mainly used for extreme situations and eventually make it into libraries that others use.

I have created a library called Task.js that surfaces this idea into a promise compatible interface where you can just convert a pure function into a worker function. The end result is a promise supported function thats sends the function to a worker with your provided args and resolves when its done (also supports multiple workers and automatic queuing).

https://github.com/icodeforlove/task.js

msoad · on March 3, 2017

Because you can't write and modify object that are shared with other thread while the process is going on without consequences:

    var str = '{}'
    await JSON.parseAsync(str);
    str = '{"foo": 1}' // this line might run before line above

sebringj · on March 3, 2017

Ah so you mean call other js processes through async/yield type stuff. I think that is cognitively easier than introducing new concepts like this has with a familiar way to do things. I agree with you. There would have to be some other syntax like "thread" to note it is not in the main thread but acted just like async or something.

var somevalue = async doSomething();

var someExpensiveVAlue = thread doExpensiveThing();

var lotsOfExpensiveThings = Thread.all(threads);

voxic11 · on March 3, 2017

This is how it works in dotnet.

var val = await Task.Run(() => doSomeThreadedWork(param));

Indicates to the runtime that the specified delegate may be run on another thread. This doesn't explicitly start a new thread but rather just allows the delegate to execute on one of the thread pool threads that is managed by the runtime. It uses heuristics to decide how many threadpool threads to maintain and whether to actually use one of them to execute your delegate. A similar model would be very useful to have in JavaScript.

tracker1 · on March 5, 2017

The issue is that in .Net there are locking primatives to access shared values... where as in Node/JS you would need something that only allowed passing of strings, other primatives, and SharedArrayBuffer or similar objects that don't change underneath unexpectedly.

ic4l · on March 4, 2017

I kept finding myself needing to toss stuff in a background thread, and ended up making this: https://github.com/icodeforlove/task.js

It makes wrangling multiple workers much easier, and doesn't require you to have external JS files. (also works in node.js)

demo: http://s.codepen.io/icodeforlove/debug/ZOjBBB/NjrYzwzWdzLA

ralusek · on March 5, 2017

Looks impressive, well thought out interface. That's also an impressive grid of browser compatibility.

impostervt · on March 4, 2017

What do people actually use Web Workers for? The examples I've seen, including this one, seem contrived. I keep hoping they'll change it to allow background image manipulation, but I haven't seen much real progress in that front.

Klathmon · on March 4, 2017

I've used it for background image manipulation before.

Paint the image to a canvas, then grab the imagedata off of it, split it into as many parts as you have threads, then use the "transferrableObjects" property of postMessage to zero-copy transfer the data to each worker to be processed, transferred back, and re-stitched together.

It's pretty powerful and suprisingly easy to work with once you understand it.

[0] is a snippet from the code, but be gentile... It was a personal project where I was trying out polymer 0.5 and made a lot of questionable design choices...

Also, I've heard of the idea of using webworkers as a "first class" platform. That is do all of the core parts of your application in them and only use the "main" thread as a "ui" thread. I haven't gotten a chance to try it out, but it seems like a great idea that could really work well in some SPAs.

[0] https://github.com/Klathmon/stitchpics/blob/master/app/eleme...

bartread · on March 4, 2017

That's interesting but, AFAIK (and, believe me, I'd be happy to be corrected), what you can't do is create a canvas element (even one not attached to the DOM), and paint directly to it using the standard 2D context and drawing primitives.

Like I say, more than happy to be corrected, because that sort of thing would be incredibly helpful. Really, anything that lets you mess with a disconnected DOM in the background and then attach it in the foreground could be useful but (and, again, I'm very happy to be corrected), I don't think you can do this.

vanderZwan · on March 4, 2017

You're talking about OffscreenCanvas. It's available in Firefox and nowhere else:

https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCa...

Having said that, if you want fast image manipulation, you're probably better off doing direct manipulation of Uint8Clamped arrays anyway, since that can be much faster.

bartread · on March 7, 2017

Yeah, exactly that, although sadly Firefox represents a minority of traffic for me so OffscreenCanvas isn't a realistic option.

To your point about direct manipulation, I think you may be right. I'm talking about drawing rather than image processing, and no way I want to effectively re-implement a bunch of Canvas2D primitives, but what I could do is draw the image once on the foreground thread, extract the raw bytes, and then rotate those using image processing into a bunch of target arrays in the background. I'd then pass those arrays back to the foreground and create a set of corresponding images.

The only issue here, and I'd need to benchmark this, is that it's not clear to me how much of the time is spent drawing versus actually creating the images. If creating images is expensive this might not yield much of a gain.

The upshot is that this starts to sound like quite a bit of work when, what I could do, and which would be much simpler, is reduce the fidelity slightly: e.g., by rendering only 180 or 120 images, and tweaking the minimum angular velocity appropriately to avoid jerky animation.

And then I suppose I'm back to the point that somebody else made, which is that Web Workers may not be that useful in the real world. :/

Klathmon · on March 4, 2017

And if you can get away with it and don't mind a lot of bit shifting, it's even better working with the Uint32array which packs all 3 colors and alpha into one element which reduces your loops by 4X

vanderZwan · on March 4, 2017

Because matching the native 32 word size is better for the prefetcher, right?

Wouldn't most CPU's these days be smart enough to detect advancing the index by 4, and then using offsets?

So:

    for(let i = 0; i < someUint8Array.length; i += 4){
      let R = someUint8Array[i], G = someUint8Array[i+1],
        B = someUint8Array[i+2], A = someUint8Array[i+3];
      // ... manipulations here
    }

Klathmon · on March 4, 2017

I'm honestly not sure why it is, but across all browsers on both ARM and x86_64 arches it was almost 3x faster than doing what you wrote.

I have a feeling it's more of a JS JIT thing than a CPU prefetcher thing, but honestly I'm not really sure.

In my program I linked above, it was actually faster to use Uint32array everywhere and then use functions to pull the 4 color values from it and another function to push the 4 values back to a uint32.

Granted, it's been over a year since I last benchmarked that code, but I did reuse some of the image code recently and found iterating over a Uint32array to be significantly faster. (And funnily enough, manually unrolling the loop of Uint32array to something similar to what you wrote gave an additional small performance boost, but it was small enough to be not worth the extra weirdness in the code to me)

vanderZwan · on March 4, 2017

Thanks for the info, that might save me some benchmarking time myself in the near future ;)

Klathmon · on March 4, 2017

If it helps, I took the class I made that converts between 4 Uint8Clamped values to a Uint32 value and vice versa into an NPM package at [0].

At the very least it can show you some of the gotchas with bit shifting in JS (like how values often look negative until they are placed into a Uint32array then they become positive integers, and how you need to check for endianness)

[0] https://github.com/Klathmon/BitPacker.js

vanderZwan · on March 4, 2017

Would not have thought of checking for endianness, thanks!

Klathmon · on March 4, 2017

As the other commenter said, you are right that you can't work with canvas in the worker, but you can work directly with the array of image data which you'd often need to anyway for much processing.

I saw a library a while ago that was trying to re-implement the canvas primitives using only typed arrays so it was worker safe but I'm not sure what happened to it.

bartread · on March 7, 2017

Thanks - that might be interesting. See my comment above but, yes, I'd quite like to avoid implementing those primitives. I could get round it by drawing one image, and then processing the bytes for the rotations, but:

- This might lead to the odd jaggy artifact because I'm rotating bitmaps rather than vectors,

- It might not speed things up that much because I still need to create images from these arrays,

- It starts to seem like quite a lot of work; maybe I'd be better off reducing the number of images slightly, and compensating by speeding up the minimum angular velocity.

You can see what I'm up against at https://arcade.ly/games/asteroids/. (Also, see my comment above, where I've made substantially the same points.)

phpnode · on March 4, 2017

Anything that does significant work that would otherwise result in an unresponsive UI. For example code on the following page is compiled as you type, without a webworker this would give a very horrible user experience because compilation can take a long time for very large input - https://codemix.github.io/flow-runtime/#/try

ridiculous_fish · on March 4, 2017

I used WebWorkers for some math - basically lots of Fourier transforms, which took a few seconds to complete.

I expected to instantiate a WebWorker with a function, and was surprised to discover you have to give it a separate JavaScript file. It is not easy to adopt.

gp7 · on March 4, 2017

You can just call toString on a function and Blob the string. You have to keep your head straight about the fact that scopes won't transfer (no closures), which is probably why the API doesn't do it for you.

bartread · on March 4, 2017

Yeah, this ticked me off as well because the use of Web Workers feels extremely un-idiomatic, but what you've said makes sense. If you just passed a function it might dupe people into thinking closures would work or, and this is perhaps worse, force the standard down the route of copying all the values available in the current closure into the worker's scope, which just sounds like a terrible idea to me.

ridiculous_fish · on March 4, 2017

This didn't work in my case because I had references to other functions, for complex number handling and the like. I ended up just loading the whole minified .js file again in the Worker, which worked though it felt ridiculous.

btbuildem · on March 4, 2017

One of the native functions available in a web worker is importScript - it allows to load include other scripts (eg lodash) into the worker scope.

vanderZwan · on March 4, 2017

I wonder if bundlers like WebPack could make life simpler here..

Klathmon · on March 4, 2017

It really really does. Webpack's worker stuff make it so stupid simple that i was kicking myself for not using it a long time ago.

gliese1337 · on March 4, 2017

I've used them to do real-time audio and video transcoding, and for doing (non-security sensitive) hashing and decryption operations.

They are an excellent fit for anything that requires heavy number-crunching, and little or no DOM manipulation or network communication.

avgp · on March 4, 2017

We are experimenting with using SharedArrayBuffers and fetch to stream data, process it and fill it into a SharedArrayBuffer to then render with WebGL in the main thread. This has been working pretty great so far, but we'll have to wait for SharedArrayBuffer to get wider adoption.

Another thing was to offload tasks such as encoding GIF frames, which is also working pretty well.

vanderZwan · on March 4, 2017

Well, I'm not using them yet, but I working on a SPA where I have multiple react-virtualized-select[0] components, each with thousands of options. Creating the fastFilter options[1] for those takes multiple seconds. Although I only do it once and then re-use it, that still freezes up the page at the start, so if I could do that in parallel without freezing the UI that would improve the app for sure.

I guess for off-screen image manipulation (if for whatever reason you can't use WebGL and shaders) you'd have to resort to manual pixel manipulation by sending Uint8ClampedArrays back and forth. At least they're transferable by default.

[0] https://github.com/bvaughn/react-virtualized-select

[1] https://github.com/bvaughn/react-select-fast-filter-options

joshontheweb · on March 4, 2017

I run a timeout function that checks for delays in execution and I cross reference this with what is running at the time and move that to a web worker. In my cause this ends up being when I do heavy lifting like merging lots of Float32Arrays (audio stream data) or doing audio encoding.

Edit: Shameless Recruiting Plug. If any of you out there are js performance pros and well acquainted with this sort of thing, please reach out as I'm looking for help optimizing further and I'm a bit out of my depth on some of it. Currently this would be a contract position but could grow to something more if that is desired.

pesfandiar · on March 4, 2017

I used it to create a sandbox for an online JavaScript coding practice site (if you're interested in the write-up: http://www.pesfandiar.com/blog/2016/05/12/javascript-online-...). I should say it's a very niche use case and the sandbox is not exactly secure, but it's nice to have a separate disposable environment to run your scripts.

throwanem · on March 4, 2017

I used them last year for parsing and analyzing data from 250-300M XML files, to fulfill a business need. Took a couple of days to implement and worked quite nicely.

driverdan · on March 4, 2017

A few years back I built an app that took in data from APIs and did a lot of calculations for visualization. The page could take 5+ sec to render for large data sets, freezing everything while loading. If I built it today I'd use Web Workers.

jhgg · on March 4, 2017

We are experimenting with webworkers to power a very complicated autocomplete and scoring system in our client. So far so good. We're able to keep the UI running at 60fps while we match, score and sort results in a web-worker.

nol13 · on March 4, 2017

Extracting large .zip's of images to process before uploading, reading and writing large excel files.

Basically to move as much processing client side as possible/reasonable so the server doesn't have to do it.

rcfox · on March 4, 2017

I work on a browser-based electronics design tool. We use web workers when taking large, complex polygons and transforming them into a bunch of triangles.

SirensOfTitan · on March 4, 2017

Maybe heavy JSON parsing? Better to parse off UI-thread than drop a frame.

* Did not read article yet and am not sure about SharedArray structure.

ww520 · on March 4, 2017

Build thumbnail of images, encode audio/video, upload a file in background. Anything that takes more than a few seconds.

colordrops · on March 4, 2017

In an embedded application with low resources you could push processing into a worker if your CPU is multi core.

bhouston · on March 4, 2017

Clara.io users web workers to decompression mesh data in parallel.

franciscop · on March 4, 2017

If you don't like to have to load an external file (+ an extra request) you can use my library uwork: https://github.com/franciscop/uwork

It has some limitations, but for long lived process intensive and async functions is perfect. It has a really clean and easy syntax where you don't need to learn everything about Web Workers and their APIs to be able to use it.

If you are already using promises you might not even need anything besides wrapping the function in a callback.

z3t4 · on March 4, 2017

This is a very nice article! There's however a bug in the code, where it will "finish" when the last worker is done:

  if(msg.data.offset + msg.data.length === buffer.byteLength)

While you most likely want to wait for all workers to be done before showing the results. In this code however, the last worker will always finish last because it has more work to do (higher numbers).

It's often cheaper to scale horizontally, by spreading the work between physical machines, then to add more cores to a shared memory. So it's not such a big deal to have a single threaded program, and single threaded code is easier to reason about. SharedArrayBuffer will however be nice in JavaScript because it allows optimization is games and such, allowing you to have parallel for loops.

buttershakes · on March 5, 2017

Workers are really bad. Its anecdotal but I've written several projects and more often then not there is a weird browser quirk, a memory leak, or some other nastiness hidden in the worker implementation. I think my next project I'll go with Emscripten pthreads implementation and see if it's better.

euroclydon · on March 4, 2017

I'd just like some high level guidance on how my browser makes use of threads internally. I feel like I could have further optimized a couple web apps with that knowledge.

Waterluvian · on March 4, 2017

Excluding web workers, just one thread per tab/window. So there's really no consideration for optimization via. parallelism unless you use web workers.

And if I'm wrong, this is the quickest way to get the right info. :)

tshaddox · on March 4, 2017

I'm pretty sure you're wrong. At the bare minimum, setTimeout must use a clock in another thread, since it doesn't block your primary thread code. I'm fairly sure that most of all of the JavaScript APIs that use callbacks are using additional threads, including the ubiquitous XMLHttpRequest.

Waterluvian · on March 5, 2017

I'm not sure how the inner plumbing works. But since the queue just pops onto the stack, there's no true parallelism and therefore no optimization opportunity.

That being said, I'm not sure you're correct. There is no concern of blocking the main thread, since functions queued up via. `setTimeout()` are not tracked by some non-blocking timer. The queue is only inspected if the stack is empty. So if anything is happening, we just ignore the queue until nothing is happening. `setTimeout()` only guarantees a message will be processed after x milliseconds, not on-time.

https://developer.mozilla.org/en/docs/Web/JavaScript/EventLo...

tshaddox · on March 5, 2017

You can definitely block the main thread. XHR requests can be run synchronously by calling .open() with false as the third argument. And when you do asynchronous XHR requests, the browser can and will run multiple requests in parallel. Of course, the callbacks just get added to the event queue and run one at a time to completion on the main thread. And granted, it's hard to call this an "optimization opportunity," since you should almost never make synchronous XHR requests anyway.

Waterluvian · on March 6, 2017

True. But I'd call synchronous XHR a legacy item that you should probably never use.

SureshG · on March 4, 2017

A dart example - https://github.com/filiph/prime_finder

markdog12 · on March 4, 2017

I don't think your example uses SharedArrayBuffer? Nice to have Dart example though.

quadyeast · on March 6, 2017

<= instead of < would make the first few numbers come out correctly:

for(var n=2; n <= Math.floor(Math.sqrt(candidate)); n++)

tjfontaine · on March 4, 2017

I actually misread the title as "JavaScript in Peril", which depending on you feel about the features ...

ww520 · on March 4, 2017

What are the statuses of Web Worker support in the mobile browsers? Android? iOS?

nreece · on March 4, 2017

All browsers (except Opera Mini) support it: http://caniuse.com/#feat=webworkers

Klathmon · on March 4, 2017

pretty damn good. iOS all the way back to 6 i think? and android since around 4.4 (or chrome all the way back since forever)

btbuildem · on March 4, 2017

Opera Mini doesn't support it, the rest of them do

jpalomaki · on March 4, 2017

+1 for providing good summary in the beginning. I wish more articles had these.