A Solution to CPU-intensive Tasks in IO Loops

ot · on Feb 5, 2012

A watchdog thread can be running every n milliseconds. This is very low load on the system. [...] If the loop has not moved onwards since the last sample or two it can be deemed stalled. [...] But you can move the other events in the affected loop to a fresh thread; you can go sideways when you’ve detected a blocking task.

Congratulations, you just invented (a very inefficient version of) pre-emptive multitasking.

willvarfar · on Feb 6, 2012

(article author)

Spot on except for the bit about being inefficient; presumed performance is very much the thing that makes the complexity worth it.

oconnore · on Feb 6, 2012

> presumed performance is very much the thing that makes the complexity worth it.

Presumed performance/complexity of what? That isn't even a comprehensible English sentence, never mind a well thought out rebuttal to the OP.

bkirwi · on Feb 6, 2012

It seems clear to me that's he's talking about the performance / complexity of the algorithm that the article was about? There was significant discussion of the performance in the article, but the original commenter didn't have anything to contribute but a sassy comment. I'd be interested in hearing more about why he/she disagrees with the author's arguments, but frankly, a one-line criticism deserves a one line rebuttal.

Also -- I'm not sure the comparison is accurate. Preemptive multitasking involves suspending the running process, while the author's suggestion involves emptying the queue behind the blocking process... or am I misunderstanding?

oconnore · on Feb 6, 2012

But there are two algorithms in context here, and the more complex/efficient one is the OS preemptive scheduler. It then appears that 'willvarfar' is agreeing with 'ot' in a sentence that is structured as a critique. It confused me...

> Also -- I'm not sure the comparison is accurate. Preemptive multitasking involves suspending the running process, while the author's suggestion involves emptying the queue behind the blocking process... or am I misunderstanding?

I think the author's suggestion is to transparently spawn new threads when the event queue becomes blocked by a long running process. As soon as you do this, you lose a major advantage of the event loop in the first place -- namely that you don't have to worry about concurrency control because event handlers are atomic. I don't think (and it doesn't grep) he mentions anything about emptying the queue; why would that ever be a good idea?

willvarfar · on Feb 6, 2012

spot on :)

yxhuvud · on Feb 6, 2012

What are your thoughts on the disruptor pattern? It seems to me to be something that tries to eat the cake of multithreading while still having a lot of the simplicity of eventing.

( https://code.google.com/p/disruptor/ )

willvarfar · on Feb 6, 2012

Its a very different approach. I do like their presentations and I've really got a thing for the "mechanical sympathy" term they coined.

At a glance I'd say its not really the solving the same problems and isn't generally applicable to, say, application web servers.

I would love to be educated on that score ;)

jamii · on Feb 6, 2012

The disruptor is just a very efficient event loop. It doesn't prevent you from blocking the handling thread.

yxhuvud · on Feb 6, 2012

True, if you do blocking stuff in the handler instead of in the worker threads.

rektide · on Feb 6, 2012

The final question/answer is resolved around,

Hellepoll has the concept of a task tree - tasks can be subtasks of others, and this simplifies tidy-up when one aborts. This explicit linking of tasks and callbacks can be used to determine what gets migrated when a task blocks, to ensure that the cascade of events associated with a request do not themselves get split, but fire in the originating thread and in the right order even if at some point the thread triggers the blocking watchdog.

He asks, in closing,

I am not a node.js user but I wonder if this approach could be transparent in node and not actually break any of the API contract there?

This is the work being done on 0.8, with domains and isolates. It is explicitly to allow this kind of task/work parenting to be made: https://groups.google.com/forum/#!msg/nodejs/eVBOYiI_O_A/-mA...

equark · on Feb 6, 2012

Isolates have been removed.

http://groups.google.com/group/nodejs/browse_thread/thread/c...

moonchrome · on Feb 6, 2012

Why don't you just lock each connection handler to one thread at a time and dispatch events on a thread pool ? That way connection level events are always synchronous, but event handlers are spread over the thread pool, you get optimal load balancing because events fill the pool (no processes) and thread pool can use it's own logic to grow if one channel handler used blocking IO and is blocking the a pool thread.

This is pretty much what netty does with OrderedMemoryAwareThreadPoolExecutor ?

http://netty.io/docs/stable/api/org/jboss/netty/handler/exec...

           -------------------------------------> Timeline ------------------------------------>

 Thread X: --- Channel A (Event A1) --.   .-- Channel B (Event B2) --- Channel B (Event B3) --->
                                      \ /
                                       X
                                      / \
 Thread Y: --- Channel B (Event B1) --'   '-- Channel A (Event A2) --- Channel A (Event A3) --->

willvarfar · on Feb 6, 2012

It does have advantages from code-organisation, but not from all-out performance.

One of the complexities of this is ensuring that a selector doesn't return a handle as being read/writable when another thread is still executing a previous trigger.

halayli · on Feb 6, 2012

Check out lthread_compute_begin() and lthread_compute_end() functions. It allows you to block inside a coroutine without affecting other coroutines. (example at the end of the page)

I prefer coroutines over IO loops because they result in simpler and cleaner code. And with lthread_compute feature, you get the advantages of real threads + the lightness of coroutines.

https://github.com/halayli/lthread

jconley · on Feb 6, 2012

This is almost exactly what ASP.NET and IIS do in recent iterations.

http://blogs.msdn.com/b/tmarq/archive/2007/07/21/asp-net-thr...

rektide · on Feb 6, 2012

I like how they admit right out that they pass off requests to a ThreadPool and that it's non-optimal for application performance.

Similar to IIS 6.0 (classic mode, a.k.a. ISAPI mode), the request is still handed over to ASP.NET on an IIS I/O thread. And ASP.NET immediately posts the request to the CLR Threadpool and returns pending. We found this thread switch was still necessary to maintain optimal performance for static file requests.

Author goes on to say it was done because static file serving is blocking, which ate away too heavily at a unified threadpool between IIS and ASP.NET.

I'd call out SEDA here again. Passing work items around between executors is in-advisable. That said, the typical problem to "you need a responsive request handler" is "offload the work item asap and yield," and done well there's many many right ways for that effort to look like SEDA.

jconley · on Feb 6, 2012

It's a pretty good trade-off with a heterogeneous application workload.

There is a "to the metal" mode in ASP.NET/IIS now, mentioned in that article, that lets you use the IOCP thread directly in your applications and avoid that context switch. But, of course, if you do blocking things there it will completely block your web server.

rektide · on Feb 6, 2012

So the server has multiple threads. If a handler blocks in one thread, another thread can pick up incoming requests. So far, so good. In return for needing to carefully synchronise access to shared state, we get to efficiently share that state (even if its just a hot cache of secure session cookies - things you don’t want to be validating every incoming request etc) between many threads and multiplex incoming requests between them.

Sharing state is bad, m-kay? Allow Node to do it's thing (So the server has multiple threads. If a handler blocks in one thread, another thread can pick up incoming requests.).

It's aggrieving that this model requires any given handler to be able to service any given request, tbh. Shared state is a folly. A serializing token scheme might work well: if a request fails to find the data local to it's core, it passes a serializing token of the request around the ring of handlers, asking either a, for the required data, or b, take the token and run the data.

Serializing tokens are a concept Matt Dillon spoke of often at the inception of DragonflyBSD; much like locks, except that ownership is not relinquished, someone always hold the token, but instead phase changed, yielded to another. it's a responsive less a stateful ownership.

Sadly that token ownership negotiation requires some kind of interruption in the currently-occupied worker thread: if that thread could be interrupted to do other things, this serializing token negotiation might be an acceptable argument (ending with a) no, i'm busy using that set, b) sorry, i had the data and was free, so i completed it, or c) here's the data, i'm busy and not using it). But it does still require thread-interruption. If the worker thread can yield frequently and resume, finding the answer might be a small enough invisible enough calculation to help plaster over there being interruptions altogether; that's essentially the hope. The result would be the mating of green threading to with location aware latency aware multi-processing.

willvarfar · on Feb 6, 2012

Tokens are very interesting.

A key thing I've learned on my travels is how critical shared state is to performance.

At its crudest, lots of web frameworks read a request, wait to receive it all, dispatch it to a handler, gather all the output from the handler, buffered, and then write it.

This is great from the code organization perspective.

Hellepoll is so very much faster because it streams the requests.

Without a green threading approach, this necessarily makes the code slightly uglier, and much more for the coder to juggle.

Of course there are at least three layers in an event loop framework:

* the programmer making the event loop itself who has to juggle everything;

* the prorgammer making the various protocol handlers who has to understand everything too but fits the framework conventions rather than creates them;

* finally the programmer who is using the framework who gets to use the utility and decoration and wrappings made by the protocol programmer and who hopefully doesn't have to understand in anything but the broadest terms the way the framework ticks.

I'm a big fan of Clojure and STM, but that's correctness over performance. I would hope whoever makes some other platform puts as much effort into hiding and protecting the inner shared state that is so critical to performance.

marshray · on Feb 6, 2012

Do you have a link describing the "serializing tokens" concept?

I'm looking for other uses of this boost::asio concept of a 'strand'. http://www.boost.org/doc/libs/1_49_0_beta1/doc/html/boost_as... Specifically: a serialization constraint wrapping an arbitrary set of handlers.

superrad · on Feb 6, 2012

Doesn't erlang effectively do this, allowing its processes to only execute a certain amount of vm instructions before allowing a switch to another process?

willvarfar · on Feb 6, 2012

Yes, it is one of the neat languages with the green threads that I alluded to towards the end of the article.

Your description also fits CPython, though ;)