It's not so much that I'm "hung up" on local physical concurrency, just that I don't see any reason to ignore easy gains. You can write maintainable, correct, concurrent programs that scale across cores today if you use technologies other than Node. So why wouldn't you?
Anyway, that's a good reply, and your ImageMagick case study is interesting. It just goes to show how individualized "scaling" really is. Thanks for the taking time.
I haven't actually had a project to use Node on yet, though I have an affinity for its Purity of Essence approach, and I've used Twisted a fair deal.
If it's going to actually scale or have high-availability, a system is already going to have to have a socket pool to distribute across machines. Since that already works to distribute across processes on one machine, why bother adding another layer to pool native threads? It just adds to the complexity budget.
It could easily get you much lower (1/N) latencies on unloaded systems, but in most cases at high volume the gains aren't going to be very big compared to running another N-1 single-threaded processes per box and it would take more concerted effort to keep the request latencies consistent.
It seems obvious that it's worth compromising the model and adding locks or STM to get a N00% gain for a solitary request, but what if that's only a 10% gain when you're pumping out volume? Do stop to consider that not everyone wants their individual app processes to scale across cores — actual simultaneous execution within one address space comes at a cost.
There are probably some cases at scale where the gains from thread-pooling are substantial, but I could see a lot of them being where the work wasn't very event-ish to start with, like big-data batch-flavored stuff where Hadoop would work great (especially from data locality).
Anyway, that's a good reply, and your ImageMagick case study is interesting. It just goes to show how individualized "scaling" really is. Thanks for the taking time.