Disruptor: High performance alternative to bounded queues (2011) [pdf]

lomnakkus · on July 8, 2016

See also Aeron[1] which is basically a networked superset of the Disruptor, featuring (IIRC) a just-as-fast machine-local implementation. Made by some of the same people as the LMAX Disruptor. It's not quite ready for general use AFAICT, but it's quickly getting there...

There's also a really informative presentation[2] by Martin Thompson on some of the ideas that they're using to get really low latency and high throughput.

[1] https://github.com/real-logic/Aeron

[2] https://www.infoq.com/presentations/aeron-messaging

lomnakkus · on July 9, 2016

Actually, I thought this might be of sufficient general interest that I submitted it as a story: https://news.ycombinator.com/item?id=12059565

(Not really sure about how the whole submit-a-thing works around here, so I just thought I'd mention it.)

SNvD7vEJ · on July 8, 2016

And for the lazy, here is Disruptor:

http://lmax-exchange.github.io/disruptor/

bjconlan · on July 9, 2016

I also really liked Trisha's presentation on this https://www.infoq.com/presentations/Concurrent-Programming-U....

Also want to append https://projectreactor.io/ which uses most if not all of the aforementioned technologies too.

vog · on July 8, 2016

I really don't want to put this down. This looks very interesting and is nicely presented. But I have trouble recognizing the main idea(s) behind the "LMAX Disruptor". To me, this all boils down to:

"You get better performance if you implement your event queue via a pre-allocated ring buffer instead of (dynamically allocated) linked-lists or arrays."

Is it really just that? If so, this is nothing new, but quite common in game programming and related fields. For example, see: https://fgiesen.wordpress.com/2010/12/14/ring-buffers-and-qu...

What am I missing?

shahbazac · on July 8, 2016

Its been many years since I saw the presentation but I believe they made a point of mentioning that ring buffers are not a new technology. They mentioned how such data structures are used all the time in network devices, such as switches.

The disruptor is an actual library based on ring buffers (other have already pointed out some other features).

In practical terms, this library has been immensely popular. Within a couple years of disruptor's release, practically every (java based) trading firm was using them. I've even seen c++ implementations, only vaguely resembling this library, being referred to as 'disruptors.'

Beyond the library, the original (and subsequent) presentations by the authors popularized the concepts of "mechanical sympathy" (understand the abstraction layer beneath what you are currently working in) and introduced a whole new set of developers to modern CPUs, thinking about cache lines, etc.

delibes · on July 8, 2016

You might want to also see Martin Thompson's old blog called Mechanical Sympathy, and in particular his post on 'false sharing'

http://mechanical-sympathy.blogspot.co.uk/2011/07/false-shar...

ignoramous · on July 8, 2016

Log4J is one such example outside the HFT realm.

https://www.grobmeier.de/log4j-2-performance-close-to-insane...

And Spring Reactor uses the disruptor as it's default async processor.

Disruptor looks like great fit for IoT and msg-relay (think whatsapp) workloads

irishjohnnie · on July 8, 2016

Ring buffers are also used in web servers, database servers, different kinds of device drivers protocol stacks, constrained embedded devices, etc.

ahoka · on July 8, 2016

What you are missing is that with the disruptor pattern you can do this lock-less while keeping the order: The producer gets a sequence number before accessing the space in the queue. This seqno guarantees the order of consumption so there will be no reordering of events on the consumer side. After creating the message, the producer commits this id. Meanwhile the consumer waits for the next seqno to become committed and consumes it.

I have implemented a multiple producer, single consumer variant in C for logging and it works quite well. Loggers can write their message concurrently to the ring buffer, but the order will be correct in the log file.

infamouscow · on July 8, 2016

Sounds exactly what Concurrency Kit already does. http://concurrencykit.org/

newobj · on July 8, 2016

"Already does"? They're both from 2011. Even if they both do the same thing, do you have a point?

therestisgone · on July 8, 2016

Except Concurrency Kit only supports a single producer.

mej10 · on July 8, 2016

The paper linked says that the most common use of Disruptor only has one producer.

irishjohnnie · on July 8, 2016

IIRC, the data structure that they used was not thread-safe. There eventually had to be an interlocked compare exchange operation which slowed throughput. Also, I couldn't imagine this working correctly in out-of-order cores without memory barriers at the very least.

ahoka · on July 9, 2016

Yes, you need to use CAS and barriers to make it thread safe. I have also used a condvar for the consumer to avoid spinning while waiting for log messages.

sethev · on July 8, 2016

The Disruptor is an example of solid engineering - the fact that there's nothing "new" is a point in its favor. Its creators acknowledge that similar approaches are common in network switches, etc. I would say these techniques aren't well known among Java programmers. Not new != widely known.

xedarius · on July 8, 2016

Mostly yes. Perhaps the thing to note is that the Disruptor moves the logic for manipulating the the indices outside of the ringbuffer itself. Allowing multiple independent consumers to exist.

The idea behind he Disruptor is to move data between threads in a contention free manor.

muratk · on July 9, 2016

The ideas reminded me of a Fowler article, and in fact he was just describing the LMAX's architecture: http://martinfowler.com/articles/lmax.html

Maybe that one is clearer for some.

vog · on July 11, 2016

That one explains the concepts in much better way, thanks for sharing!

danbruc · on July 8, 2016

ArrayBlockingQueue [1] uses a ring buffer backed by a fixed-size array, too. It is really hard to say where the speed-up comes from without seeing the code, but it is certainly not from using a ring buffer because both of the compared implementations do so. I guess they just were more careful when they designed the auxiliary state and the operation on it, for example ArrayBlockingQueue explicitly tracks the number of elements in the queue with a separate field - instead of deriving it from the insert and remove pointers - and therefore this field is contented by readers and writers. Also preallocating entries for all slots to work around Java's lack of support for arrays of value types certainly removes some overhead from the allocator.

EDIT: Just realized that the code is on GitHub - will have a look.

[1] http://grepcode.com/file/repository.grepcode.com/java/root/j...

delibes · on July 8, 2016

ArrayBlockingQueue is compared in the performance test results

https://github.com/LMAX-Exchange/disruptor/wiki/Performance-...

danbruc · on July 8, 2016

That is why I mentioned it in the first place. The paper reads very much like they think using a ring buffer provides a performance advantage over other queue implementations but then they compare two implementations using ring buffers. So their improved performance stems definitely not from using a ring buffer but from implementations details like the synchronization mechanisms used or preallocating and reusing queue entries.

eelan · on July 8, 2016

That is an old presentation. The most up to date info can be found at https://github.com/LMAX-Exchange/disruptor/wiki/Introduction

lmb · on July 8, 2016

There is a podcast at [1] where the author of the disruptor is interviewed.

1: http://www.se-radio.net/2016/04/se-radio-episode-254-mike-ba...

dintech · on July 8, 2016

Also take a look at:

http://chronicle.software/products/chronicle-queue/

irishjohnnie · on July 8, 2016

It's been a while, but didn't the performance turn out to be similar to other data structures in multi-socket scenarios because it requires interlocked -type instructions?