Threads are not useful for I/O bound code. Anyway the requests are getting serialized into a single queue within the network driver or a disk driver. Actually, they make things worse by 1) issuing multiple out-of-order request to the disk, and 2) wasting time and memory switching that thread context back and forth. For CPU-bound tasks, it makes sense to create multiple threads, but only up to amount of actual cores available, otherwise it makes things worse again by cache thrashing. Thread pool(s) + message queues is what worked best for me.