Backpressure and Unbounded Concurrency in Node.js

Udo · on June 18, 2015

There are two Node projects that I've been part of which illustrate some of these problems:

The first one is a hobby project of mine, admittedly a small app with no more than 300 users connected through websockets. Node was fantastic for this use case (and I replicated this model afterwards for a largish gambling site I contracted on): listening for connections, keeping them open, manage their relatively minimal state, and buffer relatively small streams. At no point in the app is there more than one stream per connection in the air. Backend requests are issued quickly and return quickly.

This works so well because the actual backend processing is a separate construct. Even though websocket connections can and do stay open for days, they don't drain a lot of resources (except for a small amount of memory for state and data buffers). No part of the service - except for the dead-simple broker core - will ever be in a contentious state or stranded midway through an action waiting on an unreliable communication partner.

Howeve, this isn't always feasible. My second example is a Node.js project I consulted on for a bigger company. The architecture was quite different: there were the same client-facing persistent connections to manage, but on top of that the backend logic was so immensely stateful and data-laden that a large chunk of it was actually implemented right in node. This meant, because streams and callbacks seem so misleadingly elegant sometimes, that situations occurred where lots of file streams, database connections, and all sorts of backend systems had to be kept in lock-step with the frontend-facing Node.js client connections.

It's no surprise this construct choked, a lot. Some of these are actual hard choices, for example where large and long-running data streams are concerned, but in the end we managed to make the entire thing stable by decoupling client-facing mechanisms from backend resources, and yes: by putting some blunt restrictions in place.

Like with any technology, you need to be aware of the tradeoffs.

cryptica · on June 19, 2015

Running out of CPU will cause problems regardless of what platform/framework/system you're using. This problem has no meaningful software solution (whether you leave your requests pending in a queue as in Node.js or reject them outright - The end result is your user is not being serviced) - It's a hardware problem.

AdieuToLogic · on June 19, 2015

While your premise:

> Running out of CPU will cause problems regardless of what platform/framework/system you're using.

Is technically correct, your subsequent conclusion is faulty in this context since Node.js does not intrinsically use more than one CPU:

> A single instance of Node runs in a single thread. (source: https://nodejs.org/api/cluster.html)

Therefore in the context of a "meaningful software solution", if it is desired to utilize more than one logical CPU on a machine (of which modern server hardware typically have numerous), then choosing a platform capable of running on more than one CPU would allow a solution to do so by definition.

Node.js, according to the project's documentation quoted above, does not provide this support.

cryptica · on June 19, 2015

A single Node.js process does in fact use more than a single CPU core when it performs IO operations (it uses threads behind the scenes). But you would be right in thinking that your own code runs in a single thread/process by default.

That said, Node.js makes it really easy to spawn and communicate with other Node.js processes - https://nodejs.org/api/child_process.html. To compare it with Go; the main feature which Go has over Node.js is that Go allows you to run functions (defined in the same source file) as separate processes/threads. Node.js forces you to separate processes into different files.

I think that this feature of Go is cool at first but I don't think you would use it that often in a large-scale app. Usually you want to separate processes into different source files (for the same reason that you would want to define different classes in separate files).

There are many Node.js modules which automatically leverage multiple CPU cores. I'm the main author of one such module: http://socketcluster.io

jkarneges · on June 18, 2015

This problem extends to other event-driven environments as well. When you have a thread pool it's easy to know your limit (e.g. N threads can support N concurrent requests, and N is usually a known, reasonable number), but in an event-driven environment concurrency can feel limitless. For a typical event-driven HTTP server, the default limit may simply be the operating system's file descriptor limit.

You could limit fds, but to me this is crude (because you need fds for things other than incoming requests), and in some cases an fd limit isn't enough. In our case, we've built event-driven workers using ZeroMQ, where the number of peers is usually stable, but these peers may each issue a variable number of parallel requests over their ZeroMQ connections. To ensure we can still have backpressure in this situation, we have a setting to limit the number of active requests at once. The number to choose here feels a little arbitrary (as the limit is entirely enforced in-app, and it has nothing to do with hard OS limits like threads or fds), but just some number that we know the CPU can support in general.

rubiquity · on June 18, 2015

Couldn't Node just use a system thread per every accepting socket per port and then signal that thread to block when oh-shit-panic mode kicks in? I doubt very many Node programs are accepting on more than one port anyway.

AdieuToLogic · on June 19, 2015

Or...

Have a machine fronting the ones running any system lacking the ability to manage incoming request rates perform the load balancing/throttling, using something like:

http://www.haproxy.org

Best fight? Not to be in one.

jkarneges · on June 19, 2015

I think Node does a non-blocking accept in the main thread rather than an alternate thread, but in any case, yes, the solution would be to stop accepting when some maximum open connection count is reached.

seliopou · on June 19, 2015

Jane Street's Async library is a great example of backpressure (or as they call it, pushback) done right. Specifically, their Pipe module[0] is what implements this concept very nicely. In essence, a Pipe is a stage in a pipeline. You shove values into one end, and it'll pop out the other end. Then, you can build on a pipe by map-ing over it, filter-ing over it, doing whatever you want over it. Each time you add another operations like this, you're extending a pipeline with another stage, all of which can proceed concurrently.

A Pipe is an unbounded FIFO. But, you can give it a size budget. What that size budget does is determine when writes to a Pipe will block. So if a Pipe has a size budget of 0, then any write will return a Deferred.t (i.e., a promise) that will become determined only when the value has been sent downstream to the next stage of the pipe. If the size budget is 1 on the other hand, then the Deferred.t of the first write will become determined immediately, allowing computation to proceed. If on the second write the first value hasn't been sent downstream, then it'll block. Once there's only one value waiting in the first stage of the Pipe, the second Deferred will become determined.

Another nice module in the library that addresses the connection pooling issue at the end of the blog post (and really that's all it amounts to) is the Throttle module[1]. Here's you can create a Throttle object with however many connections you like, say 5, to however many servers you like, say 5 or possibly less with some redundancy in there. Whenever you want a connection to one of these servers, you go through the Throttle object to get the connection, do the work, and release it automatically when your operation completes (or throws an exception). If there are more than 5 active connections, then you block until one of those gets relinquished. If you want to fail instead of block, you can query the Throttle object to check the number of jobs running, and if there aren't any free jobs, fail.

It's a really nice library. I think that you can't compile Async to JavaScript using js_of_ocaml just yet, but it'll probably happen sometime in the near future.

By the way, this is all OCaml.

[0]: https://ocaml.janestreet.com/ocaml-core/111.28.00/doc/async_...

[1]: https://ocaml.janestreet.com/ocaml-core/111.28.00/doc/async_...

javajosh · on June 18, 2015

It's an okay discussion, but it would be infinitely better with some simple code that actually causes node to fail under load because of back-pressure problems (and then patch the test such that node survives).

rubiquity · on June 18, 2015

I haven't used Node.js in ~3 years (thank god!) but if I understand this correctly, Node.js doesn't have an internal way to stop receiving data which in turn prevents Node.js from signaling backpressure on the client by way of the client's buffers being full (assuming the client is using TCP).

That seems odd given how easy of a feature this is to implement usually. I know Node.js doesn't allow its users to use threads in their programs but they should consider carefully using threads in a few more places internally to implement necessary functionality like this.

johnsonjo · on June 19, 2015

I think you might find it useful to read the comments in this question on backpressure in Node.js. It points to a easy solution built into to Node's Streams. http://stackoverflow.com/questions/25237013/node-js-unbounde...

rubiquity · on June 19, 2015

This article (the article we're commenting on, not the article you linked) precisely points out why Streams will not save you.

johnsonjo · on June 19, 2015

The article we are commenting on points out that backpressure is not easily handled by the given tools of Node.js it speaks of a way to fix it from crashing from memory overload. The comment above points out in a way that I understood as Node being bad for not having a way to deal with backpressure with its own tools. The link I shared shows that Node.js does have a built in tool in streams to handle that exact problem. As other comments have said having too many connections will always eventually crash a single server. So my question is what will save you? Can it be done in the JS part of Node? Can it be done in a Node C and C++ module? I look at my self as a beginner when it comes to these things so I would surely enjoy an informative response.

rubiquity · on June 18, 2015

Currently feeling the downvote wrath for saying not-so-nice things about a system that didn't design for reliability in the smallest possible ways.