I don’t understand this: context switching takes microseconds, I/O latencies are typically in the millisecond range. I’d think thread overhead would be negligible in an I/O-bound application (especially if you take steps to reduce the amount of memory per thread)
The important distinction is between operation latency and operation rate. Modern I/O devices are highly concurrent and support massive throughput. A device can have millisecond latency while still executing an operation every microsecond. In these cases, the operation latency doesn't matter, your thread has to handle events at the rate the device executes operations. If it is a million operations per second then from the perspective of the thread you have a microsecond to handle each operation. Context switch throughput is much lower by comparison.
In these types of systems, you may issue a thousand concurrent I/O operations before the first I/O operation returns a result. Threads don't wait for the first operation to finish, they keep a deep pipeline of I/O operations in flight concurrently so that they always have something to do.
> Modern I/O devices are highly concurrent and support massive throughput. A device can have millisecond latency while still executing an operation every microsecond. In these cases, the operation latency doesn't matter, your thread has to handle events at the rate the device executes operations.
This is true for some applications, like an OLAP database or similar. It’s not true for the typical user-facing app where you want to finish requests as soon as possible because a user is waiting and every millisecond costs you money.