And I think you've completely missed my point. My assumptions: \* Server built a...

And I think you've completely missed my point.

My assumptions:

* Server built around async I/O

* Clients with low bandwidth and high latency, but in aggregate insufficient to saturate the server's bandwidth

* Caching etc. on the server side so that server-side I/O bandwidth isn't the limiting factor

* Sufficient concurrent connections that you need more than one webserver

Under these assumptions, each webserver doesn't wait, in the OS blocking sense, for any given connection. It processes I/O completions as they come due, as fast as it can. It is this process that is CPU-bound; if less than 100% CPU is utilized, it means that there are periods where no I/O completions are currently due.

There are memory costs per concurrent connection: whatever is needed to pick up processing as the related I/O completes, and for some structures representing the open socket in the kernel.

But these costs don't magically add up to "waiting". To the degree that the webserver is tied up in "waiting", it is that the kernel is idle between CPU interrupts generated from the networking hardware. In other words, it's doing nothing, and if you have too many machines spending their time doing nothing, you can eliminate some of them.

Now consider the same assumptions, except in the synchronous case: what this does is move the memory and CPU costs around and increase them. The memory cost per concurrent connection increases, to store the stack etc. The CPU costs increase because now context switches are required in between each I/O completion. These costs can be substantial; they can add up to being a limiting factor in themselves - but as part of either the memory or CPU limits of the machine, not I/O.

Consider this: why would one need more than one webserver if the servers were "I/O bound", and not on server-side bandwidth, but rather on spoon-feeding clients?

Since it's not server-side bandwidth, it can only be because of either memory or CPU bounds. And where do those bounds come from? They come from the CPU or memory cost associated with concurrent connections, as well as the CPU and memory costs of processing any given request. You can minimize the costs associated with concurrent connections by leveraging an async I/O design of the server. But minimizing the CPU and memory costs of per-request processing is pure gravy in terms of reducing the number of machines you need to keep running to process X concurrent requests.

And it is here that the myth lies. Just because you may need to keep spoon-feeding very slow clients, such that decreasing the CPU cost of any given request would not visibly affect the client's perceived latency, that doesn't mean that optimizing the server for CPU usage is pointless. The less CPU usage you spend, the less hardware you need for the same load; likewise for memory.

Another way to think of it: how can e.g. nginx work as a reverse proxy, if it runs on a single machine and is spoon-feeding lots of slow clients? Async design, and offloading CPU/memory requirements to other machines, that's how.