An interesting property of CPU vs. I/O concurrency, to me, is that while I/O-heavy concurrency works fine with cooperative or reduction-counting scheduling, CPU-heavy concurrency basically requires a preemptive scheduling approach (i.e. multiple native threads.)
A lot of people would assume that on a single-core machine, you could get away with running, for example, Erlang with only a single native scheduler-thread for its tasks. This would be true until the first time you called a C-FFI function that took a full CPU-second to return. Then all the other tasks in your system would wake up wondering what year it is and how their beards grew so long. Erlang added "dirty schedulers" (basically, extra scheduler-threads that will get spawned above-and-beyond the default one-per-core-with-CPU-affinity set) precisely so the preemption of CPU-heavy tasks could be pushed off into the OS's capable hands.
> This would be true until the first time you called a C-FFI function that took a full CPU-second to return.
As you said, up until last release that would be true in general. Even if you had 16 CPUs running C NIFs inside of it that block for longer than 1ms at a time would probably be a bad idea.
Word of warning, think really well about compiling in and sticking native code (NIF) drivers in the middle of the Erlang VM. You lose reliability, fault tolerance, predictable low latency response. Good examples of NIFs could be computing a hash function, or something similar that would be constant time and easily tested. Not something like getting a piece of data from a database.
I think the best case for dirty schedulers in Erlang isn't for running CPU-intensive tasks within the same VM that's doing your IO, though; it's for running a separate Erlang node, on a separate machine, and sending it CPU-heavy work to do over the distribution protocol. Effectively, an isolated Erlang node + dirty schedulers + NIFs is a souped-up, easier-to-code-for C node.
A lot of people would assume that on a single-core machine, you could get away with running, for example, Erlang with only a single native scheduler-thread for its tasks. This would be true until the first time you called a C-FFI function that took a full CPU-second to return. Then all the other tasks in your system would wake up wondering what year it is and how their beards grew so long. Erlang added "dirty schedulers" (basically, extra scheduler-threads that will get spawned above-and-beyond the default one-per-core-with-CPU-affinity set) precisely so the preemption of CPU-heavy tasks could be pushed off into the OS's capable hands.