Why Events Are A Bad Idea For High-Concurrency Servers (2003)

rdtsc · on May 2, 2014

Events (to translate to more common lingo) are another way today of saying callbacks, promises, async. Usually a big select/poll/epoll system call sits at the bottom of those. Callback chains (abstracted as promises, deferreds or futures for example) branch out from it. As in select-c1()->c2()->c3()...

The popularity of that been going up and down in waves for various reasons. Lack of multi-core machines in early 2000s was one reason. Fast CPU speeds was another. Expensive RAM was also important (threads would need a larger context while a callback context is very small).

This technique is performant still, but in specific situations and mostly in cases were callback chains are short. So something like select->c1()->c2() is nice. Some servers like HAProxy or nginx fall in that category. Those are battle tested very performant pieces of code. Another category is demos. Short examples of "look at my async code example". Those also have very shallow callback chains and also look simple and pretty. [+]

What doesn't work for events too well is deep callback chains. This where real world business logic gets to be implemented. With layers of API. Shopping carts, payment processing, various backend databases. Now you are up to ...->c14()->c15(). That doesn't work in my opinion. The concurrency context instead of being in one thread executing instructions c1....c15 is now spread across many function interspersed with then()'s or yields()'s. Oh and usually errors need to be handled, so the chain becomes a tree with some splits into errbacks for example.

I think we are seeing a second wave of resurgence of threads in forms of goroutines in Go. Rust tasks. Dart Isolates. That is a nice thing to see, I general I believe those are better at handling concurrency.

[+] Python greenlet module (gevent is based on it) is an interesting case of a hybrid. At its core it is based on an event loop but then switches to green threads (think of them as co-routines). So user effectively spawns threads and sends messages between them for example.

ryanjshaw · on May 2, 2014

I don't build web servers so perhaps I'm going off topic, but I have found it useful to sometimes combine both models - which may not be all that surprising because both models developed to satisfy different requirements, which can crop up in a single problem at the same time.

A concrete example: I'm experimenting with a small analytics service where one part of the implementation is that I need to scrape about ~10 million forum posts across ~500k web pages.

A purely event-driven approach makes scraping very easy but throttling requests, handling errors and retries quickly becomes messy and difficult to reason about. A purely thread-driven approach makes throttling much simpler but scraping harder.

I found that implementing scraping with events (i.e. push a URL to a queue + specify a callback type + state) and downloading with threads (i.e. process the queue(s) with a request throttling window) much more manageable with some nice side benefits (e.g. persistent queues are easy to implement).

I haven't looked at goroutines, Rust tasks or Dart Isolates -- are you suggesting that one model can be sufficient for general purpose computing?

jlouis · on May 2, 2014

How could you forget Erlang? It has been avoiding events for years and has done so with high success. Same with Haskell and Standard ML :)

lmm · on May 2, 2014

I think of Erlang as an event-oriented system; actor messages do not necessarily receive exactly 1 reply (or any reply at all), which seems to be the critical property that the paper is talking about. Why do you say Erlang avoids events?

pron · on May 2, 2014

The key point discussed in the article is blocking. Erlang processes, like Go's goroutines and Quasar fibers on the JVM block without sacrificing any of the performance associated with asynchronous callback-based styles. The article says that there is no problem with the thread abstraction, and blocking in particular -- there's just a problem with the implementation of kernel threads that makes blocking expensive.

Callbacks, and the various attempts to make them more palatable (composable promises, monads), change the abstraction to get around implementation problems. Lightweight threads (or user-mode threads) simply fix the implementation, and preserve the threading/blocking abstraction.

_pctq · on May 2, 2014

Sorry if this is a naive question (I'm more a user than a developer in that field), but isn't the point of multiprocess webservers precisely to keep that callback chain small (especially if we talk about multiple API calls, which is spawned in multiple requests) ?

rdtsc · on May 2, 2014

A multiprocess web server (like Apache for example) would usually not have callback chains. It would listen on a socket, in the accept() call, and when a new connection comes in it would return with a client connection socket and then spawn a process (or pick one from an existing pool) and let it handle it. It wouldn't have any callback chains (or I guess you can think of the spawn() / fork() / pick_from_process_pool() as 1 callback ).

If it is OS processes (or Erlang lightweight processes) then you gain failure isolation. If something has gone wrong and a child process crashes, only that request will fail, others will keep working. Real OS processes are usually on the heavy side.

In Go you might create a channel and spawn a goroutine for example.

_pctq · on May 2, 2014

Ok, thanks. So I guess we're talking about events as a problem for things like node.js or eventmachine.

Out of curiosity, you mentioned nginx as using callback chain ; has it any difference with apache in that regard ?

rdtsc · on May 2, 2014

It is usually consumes less memory for the same amount of concurrent connections. The callback chain for it should be pretty short. Instead of spawning a process, it would get an event that there is data on a client socket. Then parse http (this might happens across multiple callbacks, and then handle it (serve the file).

nitrogen · on May 2, 2014

I'm a fan of EventMachine, EM-synchrony, and Sinatra-synchrony for a Fiber-based cooperative Ruby green threads system backed by an event reactor, but the author of at least one those projects has abandoned the concept entirely in favor of native threads.

acjohnson55 · on May 2, 2014

This is an interesting topic, which I've been thinking about a lot. It seems like ten years later, we have the same duality going, but with successor models. We've gotten rid of the shared state that makes threading a pain to yield Actor Model/CSP and we've reified callbacks into promises/futures, events into streams, and mutable data into signals to make each encapsulated and composable in Reactive frameworks. It seems that the former model has really taken off in imperative programming and the latter in functional style.

A paper from EPFL [1] does a great job of breaking down the problems of the Observer/Event pattern and how reactive patterns shore up the workflow. It also contrasts the resulting framework with others, including actor models. Interestingly, both models seem to have settled on message passing, the main difference being whether it's the sender's or receiver's responsibility to initiate the connection. In an actor model, the sender must know the receiver's address in advance, and receivers generally accept messages from anywhere. In a reactive model, receivers are responsible for connecting to a message source, and the sender has no control over whom receives its messages. Unsurprisingly, since Odersky is one of the authors of the paper, Scala has a great deal of support for both models.

I'm not sure I fully understand the contrast between the models. The Principles of Reactive Programming [in Scala] Coursera course [2] actually devotes subsections to both models respectively, but rather oddly does not make much of a comparison between the two. I haven't read anything that fully explores this duality, or perhaps whether there is a model that generalizes/unifies them.

Does anyone know of good resources on this topic?

[1] http://lampwww.epfl.ch/~imaier/pub/DeprecatingObserversTR201...

[2] https://www.coursera.org/course/reactive

joosters · on May 2, 2014

The paper doesn't seem to discuss the important concept that writing threaded code is hard. Once your process starts using threads, suddenly every bit of state in it can be shared by all threads. Concurrency issues, locking and so on become major headaches. Reasoning about code paths and data accesses becomes amazingly complicated. We may think that we're coding experts, but throw us some threads and we'll screw up the code for them.

Also, this paper is from over ten years ago. If the concept is correct, surely lots of threaded servers would have appeared and their speed would have been beating benchmarks all over the place. Where are these servers?

rdtsc · on May 2, 2014

> The paper doesn't seem to discuss the important concept that writing threaded code is hard.

It mentions that using threads is hard. The implication is events are harder.

> suddenly every bit of state in it can be shared by all threads.

Long overlapping chains of callbacks have similar issues. Points where concurrency contexts switch are much larger (between callbacks instead of between machine instructions) but in a large system the danger is real.

> Reasoning about code paths and data accesses becomes amazingly complicated.

Based on working on large systems based on callbacks and based on threads, I've come to the conclusion that both are hard, but callbacks are much harder and messier.

> Where are these servers?

There are everywhere. Java, C++ backend systems. Twitter is using Scala. WhatsApp is using Erlang. PostgreSQL has background processes that help. Apache is the most popular web server.

Now are you saying "where they are?" because we don't read about them on HN every day? Yeah, nobody talks about stuff that works and is established and boring. So perhaps picking technology based on what is read or discussed on HN might not be the best approach ;-)

joosters · on May 2, 2014

I didn't mean that threaded servers don't exist, that's pretty obvious. I meant, as my comment stated, that there aren't threaded servers popping up and demonstrating this fantastic performance/scalability. Hell, Apache moved away from threads.

Twitter and WhatsApp could be good examples, but no-one outside the company can really say that each server there scales well. How many servers does Twitter run? Who knows? So who knows if they are each 'scaling to 100,000 threads and achieving excellent performance', as per the paper's claims?

As for which model is harder or not, I guess that is an arguable matter of opinion. When the paper is claiming that one model is a 'more natural' programming style then it's clearly moving beyond objective reasoning.

jlouis · on May 2, 2014

Erlang routinely runs 100,000 processes in the same memory space and scales very well. The reason it is possible to even handle this is that it is not based on shared memory but on message passing.

There are two things at play here: a) Is the solution easy to design/code. b) Is the solution fast enough.

There is far too much focus on (b) and relatively little attention is given to (a). The claim is that writing code in a direct style in a highly concurrent environment (Go, Erlang, Haskell, SML, ...) is easier and achieves roughly the same performance (within a factor of, say, 2).

My experience is that this is true. I have seen fast evented and threaded systems. But I do feel the threaded systems were easier to write.

tel · on May 2, 2014

Another side effect of focusing on (b) is that it frees up the resources someone else (a runtime implementer) has to focus on (a) independently. This is just another corollary of "build it, build it right, build it fast".

In particular, consider the new MIO IO manager included in Haskell GHC 7.8. It's increasing performance of IO-bound lightweight threading by orders of magnitude... all performance provided completely for free to consumers of these abstract threading services who've spent all their time making their algorithms correct, not fast.

otterley · on May 4, 2014

With respect to WhatsApp, Rick Reed gave a talk in which he discussed how he pushed Erlang to serve 2 million concurrent connections on a single server. http://www.erlang-factory.com/upload/presentations/558/efsf2...

ArkyBeagle · on May 2, 2014

It's not clear that events are harder. They are different. I have the biases of someone in the realtime space, where events have been de riguer for a long time, so....

State modelling seems pretty powerful, and it always surprises me when people resist it.

dozzie · on May 2, 2014

> Once your process starts using threads, suddenly every bit of state in it can be shared by all threads.

Only when you use highly inconvenient model of shared memory. Message passing and shared-nothing combined, as it is done in Erlang or Go, are way easier to reason about, giving at the same time more natural way of structuring code than events handling.

joosters · on May 2, 2014

I agree! It's a shame that the most common concurrent programming models are a choice between share-nothing (fork) and share-everything (threads). Anything that gives a programmer something between the two extremes is an improvement.

merijnv · on May 2, 2014

Or when using the highly inconvenient model of mutable state ;)

Purely functional language can share state between threads just fine, since any state will be immutable (barring any explicit communication primitives).

Nursie · on May 2, 2014

>> We may think that we're coding experts, but throw us some threads and we'll screw up the code for them.

I dispute this.

With a functional* approach to writing discrete, modular code with locking as fine-grained and minimal as possible, it's not that hard.

Lot's of threaded servers have appeared. They are, as others have said, absolutely everywhere.

*(not pure functional, just functional as an approach, i.e. writing C code in a side-effect and state-free manner as far as possible)

Jtsummers · on May 2, 2014

Unfortunately this (in a language like C where it's not encouraged/enforced by design) requires discipline and experience. As evidenced by some colleagues, without experienced leadership on a project a group of journeyman programmers can produce some incredibly creatively broken multithreaded code. The experience part can be taught, discipline can be encouraged and developed with the right leaders/mentors. But without motivated individuals on the team (to learn this particular aspect of development) or mentors/leaders pushing/pulling them along, it's unlikely to happen until it's too late.

But, some people have to fail before they learn. Just wish they hadn't failed on such a big project.

timdoug · on May 2, 2014

A bit of historical context: this paper came out in 2003, the same year NPTL (in RH9) and Linux 2.6.0 (first stable kernel with epoll, 2.5.44 introduced it in 2002) were released.

stormbrew · on May 2, 2014

It is worth noting that kqueue was introduced in FreeBSD 4.1 in 2000, /dev/poll in Solaris 8 at about the same time, and though somewhat different in nature, IOCP and Overlapped I/O were in Windows NT from before even that.

So there was already experience with high performance event driven mechanisms before epoll.

That said, the examples presented in this paper do appear to be linux examples.

zackmorris · on May 2, 2014

Events are just a pattern running on top of nonblocking I/O. I preferred nonblocking I/O when I first started network programming, but now I prefer blocking I/O running on separate threads.

The reason is that languages like Go (and really any language that uses pipes instead of a shared memory space) move the programmer from a tangled mess of logic and callback hell to individual processes that each resemble the main loop we’re all used to. Cumbersome tasks like fetching a file over HTTP end up looking like opening a local file. The thread just blocks until the data arrives, which the programmer generally doesn’t have to worry about. That was the original vision behind sockets (that everything is a file).

I have high hopes that generators and coroutines will merge the two paradigms. That’s because the danger with events is that their logic is a state machine and tends towards unmanageability after a few dozen states. The programmer discovers this late in a project when edge cases and failures can’t be easily isolated or fixed. We see this every day on websites whose javascript chokes and the page has to be reloaded. But the danger with threads is concurrency issues (atomicity/mutexes/semaphores etc). The programmer discovers those early on and has trouble writing even the simplest stable program, especially in a low level language like C or java. So the learning curve with blocking I/O is steep enough that I think that’s why it never went mainstream.

Anyway, if we only use channels/pipes and no shared memory, and not even allow threads access to the filesystem, then coroutines and preemptive threads become basically equivalent. I think the endgame will be a methodology in approachable languages like javascript/Go that works like Erlang, with the ability to spawn threads on different processors or even different machines, running over something like ZeroMQ or WebRTC. If we can throw in a proof of work system like Bitcoin, and be able to utilize the graphics card with CUDA/OpenCL but without ugly syntax, we’ll really have something because we’ll finally have distributed computing and processor speed won’t really matter anymore. I think something like NumPy/Octave/MATLAB could run this way and be several orders of magnitude faster than anything today, with little additional cost since so many machines are sitting idle anyway.

lmm · on May 2, 2014

So this paper's result depends entirely on their dynamic stack size technique. Was this ever implemented in real systems?

skybrian · on May 2, 2014

Yes, Go has stacks that start out tiny and grow and shrink dynamically.

jpollock · on May 2, 2014

If you can have one thread per transaction then you can avoid event loops. If you cannot then you will need to write an event loop. In the systems I work with, the transactions are long held (months) and high in number (millions). Unless we went with a language that had lightweight threads we have to use event polling. Rewriting in Erlang would come with different problems...

There are architectural benefits to event loops too. You get built in interception and if your code is free of side effects cheaper redundancy.

dap · on May 2, 2014

The paper ignores what I think is one of the biggest advantages of event-based systems over threaded ones, which is that the pathological failure modes when things go operationally sideways are often much nicer with event-based systems. It's not just that threaded code is "harder" -- as others have pointed out, event-based code is hard in a different way. But when you make a network call with a lock held, you not only block the operation in question, but anything else which needs the same lock, or which would like to use the same thread. The worst part it is that it will work most of the time -- until that network call blocks for a long time for some reason, at which point latency bubbles propagate through the application. With an event-based system in an environment that doesn't allow you to block the event loop without going way out of your way (as Node.js does, for example), the operation in question will obviously stop until the network call completes, but that won't affect anything that's not intrinsically tied to that call.

A related failure mode I've seen a lot is when an operation gets stuck with a lock held and you can't even call into the server to ask the status of the operation because that request blocks on the lock that's held. Of course, the better way to do this is to avoid holding the lock during a blocking operation, and instead use the lock only to set a flag indicating the operation is pending. That way, other threads can observe the current state while the operation is pending. The point is that you have to make these intermediate states explicit. But IMO, that's the hardest part of the event-based system anyway.

tiagobraw · on May 2, 2014

Ah, this paper is classic... It is a response to the other (equally classic) paper "Why threads are a bad idea" [0].

In fact, threads and events are dual [1], and one can be made as efficient and the other (given a proper implementation).

[0] http://www.stanford.edu/~ouster/cgi-bin/papers/threads.pdf [1] Lauer and Needham. “On the Duality of Operating System Structures” in 1978;

swah · on May 2, 2014

Also http://pt.slideshare.net/e456/tyma-paulmultithreaded1

tegeek · on May 2, 2014

This papers is from 2003. Now in 2014, Functional Programming gives enough power to event-driven systems that its hard to beat by any other tools/tech/technique.

The new mantra is Reactive Programming (http://www.reactivemanifesto.org/)

Reactive Systems are event-driven but highly concurrent, scalable, easy to maintain and extend because of the "high level" concepts of "Signals & Behaviors" instead of "low level" event handlers.

dozzie · on May 2, 2014

The manifesto seems silly. Reactive systems are defined as being scalabe, the very thing we want to achieve by using the approach (reactive approach in this case). Yet, the system needs to be scalable first in order to be called "reactive". Something went wrong.

seanmcdirmid · on May 2, 2014

The systems community has a very different understanding and use for events when compared to the PL community. All these papers are talking about wouldn't make sense from the reactive programming perspective (I work in both worlds and deal with the difference all the time).