A Beautiful Race Condition

pmjordan · on Feb 4, 2011

Nice analysis and demonstration that even relaxing concurrency requirements is not as easy as it looks.

Interestingly, when trying to reason about and verify correct synchronisation of data structures, I immediately start drawing boxes with lots of arrows for pointers (just like those in the article) even though I don't normally need any visual aids to design my code. Somehow, holding a complex data structure in my head is no problem, but as soon as I try to think about concurrent access to something as simple as a linked list, I need to draw each state. Anyone else got any good strategies for concurrency analysis?

ComputerGuru · on Feb 4, 2011

I used to have do something similar (on a whiteboard, though) whenever I needed to write some slightly complicated concurrency code (I'm a systems developer). However, after years of writing multi-threaded and synchronization code, I've learned that the trick is to.... abstract (?) the representation in your head.

If you try to "flatten out" the concurrency, it'll be this HUGE representation that'll never fit in your skull. The trick is to think in multiple dimensions instead of flattening it out. Sort of queue the other threads in the back of your head, and pretend you're the OS. Whenever you reach a line of code for the current thread you're "imagining/debugging in your head", loop over the other consumers/producers in your model and ask yourself "what happens to this guy at this moment, or what happens if this guy was doing x, y, or z?"

Basically, don't hold the structure, hold the events. Expand the other objects as needed in your head, then stash them away. Don't try to hold them all in there, because they'll never fit.

Incidentally, since adapting to this I many times still get the urge to pick up a pen or marker and scribble down the concurrency model - except 2 seconds later I give up because it's so hard to efficiently represent multi-threaded environments/code in 2d. And that's the same difficulty you have trying to jam that representation in your head.

Remember the description of the components, not the components themselves, I guess. I hope this isn't as confusing as I fear I've made it out to be!

pmjordan · on Feb 5, 2011

Yeah, I do that for simpler things (a bunch of mutexes, condition variables, etc.) but as soon as I look at anything lock-free, by brain just fails to take it all in.

ximeng · on Feb 4, 2011

Have you seen Microsoft CHESS?

http://research.microsoft.com/en-us/projects/chess/

Tests different interleavings of multithreaded code. Maybe there are other tools like this out there.

pmjordan · on Feb 5, 2011

That looks seriously cool. Too bad it's Win32/.NET only. (the Visual Studio Team System price tag is probably hefty, too, but may well be worth it for this alone).

It actually seems like there's something similar in valgrind: http://valgrind.org/docs/manual/hg-manual.html - I'll have to check that out.

ehsanul · on Feb 4, 2011

Anyone else got any good strategies for concurrency analysis?

Probably not what you're really looking for but, maybe you could try to eschew shared state when at all possible?

I, myself, am absolutely no good with threaded code. Message-passing between multiple processes is just such a nicer way to reason about things, without worrying about race conditions. I've been using 0MQ recently, and it's really great for this, since it's easy to get it working across machines even, as well as having common messaging patterns "just work". Try it out sometime.

pmjordan · on Feb 4, 2011

When I have the choice, I use Clojure's STM or multiprocessing or whatever other means possible for concurrency. Unfortunately, I often can't avoid "hard" multithreading, e.g. in kernel code or on game consoles. I've managed to get fairly good at it in general (and try to avoid being too clever), but I'm always on the lookout for better ways.

Confusion · on Feb 4, 2011

I remember reading this article for the first time and being overcome by the urge to erase it from my memory. Every time it comes up I worry that I may have committed this error in applications still in production. I know for sure that I've wondered on multiple occasions about whether you could use a regular HashMap as an imperfect cache and I'm not sure I've always (especially when I first started writing code) erred on the safe side of caution and opted for the ConcurrentHashMap.

btrask · on Feb 4, 2011

If you have a library class or function that is not explicitly marked as thread safe, do not assume it is, regardless of whether or not it "should" be under any system of logic or reason.

Why not?

Because you have no idea what it might be doing behind the scenes. Even "immutable" objects might have mutable state under the covers of encapsulation (e.g. caching).

And even if you somehow manage to prove that the current version IS safe, perhaps the next version won't be. After the code has been running perfectly for years and the library has been upgraded dozens of times, good luck every finding your subtle, intermittent threading disaster.

Thread safety is an implementation detail unless you are explicitly told otherwise.

JoeAltmaier · on Feb 4, 2011

Nice analysis. Points up the need for synchronization primitives around complex data structures.

Which brings up its own issues: scheduling blockages. Resizing a hash map; garbage collecting; flushing file buffers. It doesn't matter, putting synchronication primitives around usually-fast-but-occasionally-dog-slow operations is the sad result of most naieve multithreaded attempts.

It results in apparently-hung GUI threads, stuttering performance, deadlocks that usually don't happen and are terrible to diagnose.

gallerytungsten · on Feb 4, 2011

I don't have anything to add regarding race conditions. But I do want to mention how awesome the mailinator service is. I've been using it for years and recommend it highly.

chicagobob · on Feb 4, 2011

Cool analysis. tl;dr:

If you're using a non-thread safe structure in a multi-threaded way and think you understand what kinds of data corruption you might get (and are ok with that), you still might get totally different and unexpected kinds of data corruption than you planned for (read the article for details).

spektom · on Feb 6, 2011

Reading javadoc helps as well sometimes. The HashMap's doc says "Note that this implementation is not synchronized", and that's enough for me for not using this class as global unsynchronized cache.

sophacles · on Feb 4, 2011

So, the question is: does a hashmap implementation exist where accessing and normal writing is not serialized, but expanding the table has the appropriate locks?

johnm · on Feb 5, 2011

In Java, use ConcurrentHashMap.

If you're particularly interested in pushing this to the edge, check out Cliff Click's non-block hash map.

mlni · on Feb 4, 2011

Here you go:

Collections.synchronizedMap(new HashMap());

riffraff · on Feb 5, 2011

he wrote "accessing is not serialized"

_3u10 · on Feb 5, 2011

That's why I use Microsoft.FSharp.Collections.Map instead. No multi-threading issues with concurrent access and no performance killing locks.

dkarl · on Feb 4, 2011

A very cool analysis. I will be filing this under "what part of 'undefined' don't you understand?"

TheAmazingIdiot · on Feb 4, 2011

This is a complete aside from any contents from the article...

This is what I see....

23."You will be lynched" 24.A Beautiful Race Condition

That's Just Wrong. Completely by accident, but still Just Wrong.