You know, I'm not entirely sure how I feel about this. On the one hand: yeah, I ...

dherman · on May 5, 2016

[I'm a colleague of the OP and Mozilla/TC39 member, i.e. someone who cares a lot about the JS programming model :)]

I'm enthusiastic about SharedArrayBuffer because, unlike threads in traditional languages like C++ or Java, we have two separate sets of tools for two very separate jobs: workers and shared memory for _parallelism_, and async functions and promises for _concurrency_.

Not to put too fine a point on it, shared memory primitives are critical building blocks for unlocking some of the highest performance use cases of the Web platform, particularly for making full use of multicore and hyperthreaded hardware. There's real power the Web has so far left on the table, and it's got the capacity to unleash all sorts of new classes of applications.

At the same time, I _don't_ believe shared memory should, or in practice will, change JavaScript's model of concurrency, that is, handling simultaneous events caused by e.g. user interface actions, timers, or I/O. In fact, I'm extremely excited about where JavaScript is headed with async functions. Async functions are a sweet spot between on the one hand the excessively verbose and error-prone world of callbacks or often even hand-written promise-based control flow and on the other hand the fully implicit and hard-to-manage world of shared-memory threading.

The async culture of JS is strong and I don't see it being threatened by a low-level API for shared binary data. But I do see it being a primitive that the JS ecosystem can use to experiment with parallel programming models.

pfooti · on May 5, 2016

Yes, the thing I in particular worry about is the event dispatch system. The last thing we need there is multithreaded event dispatch, where multiple handlers fire at the same time, possibly resulting in race conditions on state managing objects.

But on closer inspection of the post, this implementation seems to be highly targeted at certain kinds of compute-bound tasks, with just the shared byte array based memory. It's well-partitioned from the trad ui / network event processing system in a way that makes me optimistic about the language.

yosefk · on May 6, 2016

I'm curious about 2 things:

1. How is the accidental modification of random JS objects from multiple threads prevented - that is, how is the communication restricted to explicitly shared memory? Is it done by using OS process underneath?

2. Exposing atomics greatly diminishes the effectiveness of automated race detection tools. Is there a specific rationale for not exposing an interface along the lines of Cilk instead - say, a parallel for loop and a parallel function call that can be waited for? The mandelbrot example looks like it could be handled just fine (meaning, just as efficiently and with a bit less code) with a parallel for loop with what OpenMP calls a dynamic scheduling policy (so an atomic counter hidden in its guts.)

There do exist tasks which can be handled more efficiently using raw atomics than using a Cilk-like interface, but in my experience they are the exception rather than the rule; on the other hand parallelism bugs are the rule rather than the exception, and so effective automated debugging tools are a godsend.

Cilk comes with great race detection tools and these can be developed for any system with a similar interface; the thing enabling this is that a Cilk program's task dependency graph is a fork-join graph, whereas with atomics it's a generic DAG and the number of task orderings an automated debugging tool has to try with a DAG is potentially very large, whereas with a fork-join graph it's always just two orderings. I wrote about it here http://yosefk.com/blog/checkedthreads-bug-free-shared-memory... - my point though isn't to plug my own Cilk knock-off that I present in that post but to elaborate on the benefits of a Cilk-like interface relatively to raw atomics.

AgentME · on May 6, 2016

1. You can't ever get a reference to regular objects that exist in other threads (workers). Communication with workers is limited to sending strings, copies of JSON objects, transfers of typed arrays, and references to SharedArrayBuffers.

2. I assume it was done at a low level so that multi-threaded C++ could be compiled to javascript (asm.js/WebAssembly).

yosefk · on May 6, 2016

For (1) does this mean that everything in the global namespace barfs when called from a worker thread?

(2) sounds like it might need a larger set of primitives, though I'm not sure.

AgentME · on May 6, 2016

1. Web workers don't share a javascript namespace or anything with the parent page. They're like a brand new page (that happens to not have a DOM). Outside of SharedArrayBuffer, there's no shared memory.

finishingmove · on May 6, 2016

As someone who doesn't know much about how parallelism primitives are implemented, I need to ask why SharedArrayBuffer needs a length to be specified? From my layman viewpoint, this seems too low-level to be used for casual everyday applications.

snissn · on May 5, 2016

> On the other hand, I quite like the single-threadedness of javascript.

Douglas Crockford's strategy of taking a language, identifying a subset of it, calling it "The Good Parts" and sticking to it is a great motivation to welcome new features, let them evolve but keep your distance from them until they're fleshed out. This has pretty much been the M.O. of Javascript and IMO has worked great..

zzzcpan · on May 5, 2016

> I understand exactly when and where my javascript code will be interrupted

That's why callbacks, promises, async/await and all that are neither multitasking, nor multithreading. They are all about control, while multithreading is all about parallelism and is essentially a very low-level specialized thing, that nobody should be using, unless absolutely necessary.

chrisseaton · on May 5, 2016

> multithreading is all about parallelism

This just isn't true. Why do you think people wrote multi-threaded applications back when almost all machines had just one processor and just one core? Threads give you concurrency as well, even if you don't want or need parallelism.

zzzcpan · on May 6, 2016

Of course they do. Anything can give you concurrency. But pretty much anything can make concurrency easier, than threads.

> Why do you think people wrote multi-threaded applications back when almost all machines had just one processor and just one core?

Almost none did. Popular networking servers were either preforking, forking or asynchronous. Desktop GUIs were event driven. Threads weren't even very usable on most systems at that time, i.e. up until a decade and a half ago or so, weren't they?

chrisseaton · on May 6, 2016

Threads weren't very usable on most systems until around 2001? No, I don't know where you've got that idea from but it's not the case.

Java had threads in 1996. The Windows 32 API had threads from at least Windows 95. Windows NT had them since 1993. I don't know when Linux got threads and couldn't find anything, but I would presume it was the mid 90s at the very latest. In fact I don't think any of these threading APIs will have even changed much since the mid 90s. They weren't new ideas at the time either!

Look at this thread programming manual from 1994 which on page 3 lists five benefits of using threads at the time, only one of which is utilising the relatively rare multiprocessors. http://www4.ncsu.edu/~rhee/clas/csc495j/MultithreadedProgram...

kibibu · on May 6, 2016

Linux got threads in 1996 when kernel 2.0 introduced the "clone" syscall, allowing multiple processes to share address space. LinuxThreads was built on top of this to implement the pthreads api.

nl · on May 6, 2016

I was using threads on Windows NT in 1998 (on single core, single processor machines). They were perfectly reliable.

dspillett · on May 6, 2016

Multi-threading was far more popular on Windows then other OSs because starting new processes was so damned expensive.

On many other systems the overhead of bringing up a new process was so much closer to that of bringing up a new thread that you only needed threads for really high performance parallel code and/or when you needed fast shared memory and/or were very memory constrained. Any time the individual tasks are fairly independent, had noticeable bottlenecks other then CPU, and data larger then the process itself, say a web server, processes were more than adequate and you don't need to worry about certain potential concurrency issues.

nl · on May 6, 2016

Yes, this is all true. But also many threading implementations (especially Linux) back then were pretty bad. Solaris and Windows were the only places where it made sense to use them.

See also "Green Threads" in early Java implementations.

nly · on May 6, 2016

Exactly. Threading actually enables a simple asynchronous blocking programming model through locks (at the expense of introducing loads of potential locking hazards).

zzzcpan · on May 6, 2016

Blocking presumes synchronicity. Locks are by definition synchronization primitives.

rubiquity · on May 5, 2016

Great point. I find myself reminding people about this all too often.

pfooti · on May 5, 2016

Well, I'm old enough to remember coding for Mac OS 8, where "multitasking" was indeed cooperative - I had to say "oh, you can interrupt me here, if you want" at different places in my code, which meant bad actors could lock the system of course. It wasn't great.

On the other hand, in the uncommon event I do have some weird javascript thing that's going to take a long time (say, parsing some ridiculously-big JSON blob to build a dashboard or something), I know I can break up my parse into promises for each phase and that I won't be locking up other UI processing as badly during that process. So: not exactly multitasking / threading as you say, but still a handy thing to think about.

spenuke · on May 5, 2016

I'm still totally ignorant of the new primitives in the original link, so maybe that's why I'm confused, but: are you saying that as of today, wrapping a big parsing job into a promise frees up the event loop? I really don't think that's the case, is it? JSON.parse is gonna be blocking/synchronous whenever it happens.

Can you explain a bit more of the implementation you're describing?

roblabla · on May 5, 2016

You can use setTimeout to "free up the event loop". Using setTimeout(fun, 0) will run fun after the event loop has been freed up IIRC. NodeJS has a function called setImmediate that does exactly that.

JSON.parse as implemented is going to be blocking. But it's possible to implement an asynchronous, non-blocking JSON parser

See also : http://stackoverflow.com/questions/779379/why-is-settimeoutf...

Edit : requestAnimationFrame is a better alternative to setTimeout(fun, 0), as it allows the browser to update the UI.

pfooti · on May 5, 2016

Not the parsing part, but the processing part. Assume I've got a big pile of data and am calculating stuff like correlations on it. If I break the process up into chunks, I can go chunk.then(nextChunk).then(afterThat) etc etc. JSON.parse still blocks, but it's the post-processing I'm talking about.

Sanddancer · on May 6, 2016

I disagree that parallelism is inherently a low-level specialized thing. There are a lot of operations, like handling any sort of media, that are naturally parallel. If I'm parsing through an image file, I know exactly where it begins and ends, and with very few bits that are dependent on other bits, it just makes sense to be able to spread that operation through the many cores that exist in a modern machine.

ihsw · on May 5, 2016

Welp, the idea is portability. This is a bridge between other platforms into JS, and -- as you mentioned -- its usage is largely specialized.

Most people don't really know what typed arrays are, but they're in ES6 nevertheless.

CiPHPerCoder · on May 5, 2016

A multi-threaded JavaScript also means this becomes more troubling:

https://github.com/nodejs/node/issues/5798

Node.js uses OpenSSL instead of the operating system's CSPRNG. The biggest argument for "WONTFIX" is "Node.js is single-threaded so OpenSSL fork-unsafety isn't a concern for us".

If JavaScript becomes multi-threaded, it's not unreasonable to expect Node.js to follow. If it does follow, expect random numbers to repeat because of OpenSSL's broken RNG.

slavik81 · on May 5, 2016

> "Node.js is single-threaded so OpenSSL fork-unsafety isn't a concern for us"

I don't see this quote within your linked issue and, as far as I can tell, there's no discussion of multi-threading.

CiPHPerCoder · on May 5, 2016

I use quotes differently than journalists. I use them to indicate "this is a separate sentence that expresses an idea mid-sentence" and to indicate tone shift, not as a quote for a specific person. I use a > prefix for direct quotes.

That exact string isn't from the Github issue, it's a summary of one argument dismissing some of the OpenSSL RNG's worst issues.

Here are two direct quotes if that's what you want:

> forking is not an issue for node.js

> The bucket list of fork-safety issues that would have to be addressed is so long that I think it's safe to say that node.js will never be fork-safe.

There was also off-ticket discussion on IRC where similar arguments were made.

slavik81 · on May 5, 2016

Forking and threading are different things. Forking creates new processes and duplicates memory. It raises entirely different issues from multi-threading, which does neither. See: http://stackoverflow.com/q/2483041/331041

They discussed forking, but did not discuss multi-threading.

brazzledazzle · on May 7, 2016

So for OpenSSL's RNG vs. the operating system's CSPRNG is there a difference between forking and multi-threading?

slavik81 · on May 7, 2016

Yes. The problem with an in-process RNG and forking is that the RNG state is duplicated, so both processes get the same sequence of numbers. Multithreading just needs locking to prevent corruption because the state is shared.

brazzledazzle · on May 8, 2016

Thanks for taking the time to explain.

atoponce · on May 5, 2016

Meh. Operating system CSPRNGs can be slow, whereas userspace CSPRNGs seeded from the OS CSPRNG can be fast, fast, fast. Explain how the OpenSSL RNG is broken, and why it's a bad idea to rely on it.

CiPHPerCoder · on May 5, 2016

That's explained in painstaking detail in the Github issue I linked to.

atoponce · on May 5, 2016

Okay. I read the whole thread (ugh). Arguments for using the kernelspace CSPRNG basically boils down to this:

1. Kernelspace CSPRNGs generally don't change, are well audited, and are generally accepted to be secure.

2. Userspace CSPRNGs don't provide any additional security benefit.

3. OpenSSL is a questionable security product with it's history of vulnerabilities.

So, with that said, let's look at each of them.

For the first point, I don't fully agree. The Linux kernel CSPRNG has changed from MD5 based to SHA-1 based. I have heard chatter (I don't have a source to cite this) that it should move to SHA-256 with the recent collision threats of SHA-1. There is also a separate movement to standardize it on NIST DRBG designs (CTR_DRBG, Hash_DRBG, HMAC_DRBG- https://lkml.org/lkml/2016/4/24/28). Starting with Windows Vista, Microsoft changed their CSPRNG to FIPS 186-2 or NIST SP 800-90A (depending on Windows version), which could be hash-based or AES counter based. OpenBSD changed from using Yarrow and arcfour to ChaCha20. So, no, kernelspace CSPRNGs change all the time.

For the second point, I greatly disagree. First, you need a specific RNG to compare to. It's considered "unsafe" to use MD5 as a CSPRNG, although that would require pre-image attacks on MD5, of which it still remains secure. Additionally, a userspace AES-256-CTR_DRBG is theoretically more secure than an AES-128-CTR_DRBG design. While that matters little it terms of practical use, the reality is that AES-256 has a larger security margin than AES-128, as I understand it. Same for using SHA-256-Hash_DRBG instead of SHA-1-Hash_DRBG. Userspace CSPRNGs can be more secure than kernelspace.

Finally, as far as I know, attacks on OpenSSL have been overwhelmingly CBC padding oracle attacks, in one form or another. There have been a couple RNG vulnerabilities with OpenSSL (https://www.openssl.org/news/vulnerabilities.html), but same with say the Linux RNG (https://github.com/torvalds/linux/commit/19acc77a36970958a4a...). So, I'm not sure this is a valid point.

The biggest reason why you should use a userspace CSPRNG is performance. System RNGs generally suck. The Linux kernel can't get much faster than about 15-20 MiBps. Similarly with Mac OS X and FreeBSD. OpenBSD can get about 80 MiBps (on testing with similar hardware), but that's just painful for a single host trying to serve up HTTPS websites, when the HDD (nevermind SSDs) can read data off at 100 MiBps without much problem. The kernelspace CSPRNG can't even keep up with disk IO.

Userspace CSPRNGs can get into 200-300 MiBps without much problem, and with AES-NI (provided that you're using AES-128-CTR_DRBG), 2 GiBps (https://pthree.org/2016/03/08/linux-kernel-csprng-performanc...).

But, I do agree with one very serious concern on using userspace RNGs in general: they can introduce bugs and vulnerabilities that don't exist with the system CSPRNG. Expecting a developer to get this right, especially one who is not familiar with cryptographic pitfalls, can be a massive challenge. But this isn't the case with the OpenSSL RNG.

So, I guess I don't see the point to move the node.js CSPRNG dependency from OpenSSL to kernelspace.

ryuuchin · on May 6, 2016

> Microsoft changed their CSPRNG to FIPS 186-2 or NIST SP 800-90A

It's changed once, in Vista SP1. Since then it's only used AES256 in CTR mode as a DRNG as specified in NIST 800-90. So I'm not sure it's fair to say it changed that much. Linux's CSPRNG has also not seen much change other than to make it more resilient in certain conditions (there was some paper on it IIRC) and to add hardware RNG support (e.g. rdrand).

> 3. OpenSSL is a questionable security product with it's history of vulnerabilities.

I don't think this is the (main) argument against its CSPRNG although it may be one of them. My understanding is the main argument against it is that it's overly complicated by design (e.g. entropy estimation, how it's initialized (especially on Windows)). You could also probably argue that it may be showing its age with its use of SHA1 but you could say the same for the Linux kernel as well.

If you want to look at a userspace CSPRNG done right (or what I believe to be one done right) just take a look at BoringSSL's[1]. In the case where there is a hardware RNG it will create a ChaCha20 instance, keyed with the OS's CSPRNG, and use that ChaCha20 instance to filter the rdrand output (as to not use it directly or xor it). If there isn't a HW RNG then it will just use the OS CSPRNG directly.

There's no entropy estimation, no way to seed it, and by design it's simple and fast. You're correct that the system's CSPRNG may not be fast enough, in fact the BoringSSL dev's mentioned this[2] citing TLS CBC mode. This is probably more a problem on Linux than Windows due to the design of the CSPRNG (Linux's is pretty slow).

So with everything being said I would argue that it's always the correct choice to use the system CSPRNG unless it otherwise can't satisfy your needs. In which case just use BoringSSL then.

As a side note if you really need to generate A LOT of random numbers just use rdrand directly. You should be able to saturate all logical threads generating random numbers with rdrand and the DRNG (digital RNG) should still not run out of entropy.

[1] https://boringssl.googlesource.com/boringssl/+/master/crypto...

[2] https://www.imperialviolet.org/2015/10/17/boringssl.html (under the "Random number generation" section)

atoponce · on May 9, 2016

> If you want to look at a userspace CSPRNG done right (or what I believe to be one done right) just take a look at BoringSSL's[1].

BoringSSL just uses /dev/urandom directly. It's not a userspace CSPRNG. And as you pointed out, for GNU/Linux systems, it's slow. This is why userspace designs, such as CTR_DRBG, HMAC_DRBG, and Hash_DRBG exist- so you can have a fast userspace CSPRNG with backtracking resistance.

Case in point. On my laptop:

$ pv < /dev/urandom > /dev/null

1.02GB 0:01:20 [13.3MB/s] [ < => ]

$ openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt < /dev/zero | pv > /dev/null

2.13GiB 0:00:11 [ 198MiB/s] [ <=> ]

And on a server with AES-NI:

$ pv < /dev/urandom > /dev/null

2.19GiB 0:01:06 [ 20MiB/s] [ <=> ]

$ openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt < /dev/zero | pv > /dev/null

31.9GB 0:00:34 [ 953MB/s] [ < => ]

I've seen other hardware with AES-NI that can go north of 2 GiBps, as I already mentioned. Although not backtracking resistant, those are fast userspace CSPRNGs, that are clean in design.

I've designed userspace CSPRNGs that adhere to the NIST SP 800-90A standards. They're seeded from /dev/urandom on every call, and perform much better than relying on /dev/urandom directly. I won't say they're bug free, but if you read and follow the standard (http://csrc.nist.gov/publications/nistpubs/800-90A/SP800-90A...), it's not too terribly difficult to get correct, and PHP, Perl, Python, Ruby, and other interpreted languages can outperform the kernelspace CSPRNG.

ryuuchin · on May 9, 2016

> BoringSSL just uses /dev/urandom directly.

Only if there's no hardware RNG support which I admit can happen (it's not perfect, I freely admit that). I suspect that for Google's use on their servers it's a non-issue (assuming where they use it and need the high(er) speed stuff they'll always have rdrand support). If there is rdrand support then it will only reseed from /dev/urandom after every 1MB (or 1024 calls) of generated random data (per thread).

daveroberts · on May 5, 2016

I know very little about the subject, but when multi-core processors were first introduced, I remember that the general wisdom was that server software could benefit, from its highly concurrent nature, but games and graphics would not, because they are generally single threaded. I wonder if the general wisdom at the time was wrong, or if there has been a big shift to take advantage of multiple cores. I'm guessing the later.

HappyTypist · on May 5, 2016

Maybe for your standard web app, but try porting Unity or Unreal Engine over without better parallelism.