I've been doing some real-time audio programming lately, which requires the same techniques. You can't take a lock or even malloc in this kind of code, so you're forced to implement things like lock-free ring buffers to communicate with the rest of the world. In my experience this is 10x harder than conventional lock-based parallelism and, unfortunately, I don't see any FP lang coming to the rescue in this domain any time soon.