Well... In a dynamic web application, at first, you will nearly always be databa...

barrkel · on Feb 3, 2010

> Once you've got your caching going nicely and your DB humming along, you will end up bandwidth bound. Not by your own pipe, but by clients [...] This is the spoon-feeding problem, and once again algorithms and language implementations on the server can't do anything at all to help it.

Here. Here is where you made the mistake in your assertions.

> That's when you start putting fast, light, highly-concurrent reverse proxies (nginx appears to be winning the market share battle) in front of your actual web servers

My point is that the number of web servers you need in this spoon-feeding case is inversely proportional to how CPU-optimized your servers are. The spoon-feeding problem, as you put it, is just bookkeeping in the kernel (keeping track of open sockets) and iteratively processing I/Os when they come due, and is CPU-bound, unless you actually have approaching 64K open sockets, in which case a reverse proxy won't do, you'll need DNS tricks etc.

mfukar · on Feb 3, 2010

You have misunderstood the problem. The fact that clients sit behind lean, low-bandwidth/high-latency lines means that your webserver spends its time waiting for them. The very definition of I/O bound processing. Solving the spoon-feeding issue is much more complex than simple book-keeping in the kernel, because it is independent (from the angle we're looking at it) from the server's design and operation.

barrkel · on Feb 3, 2010

And I think you've completely missed my point.

My assumptions:

* Server built around async I/O

* Clients with low bandwidth and high latency, but in aggregate insufficient to saturate the server's bandwidth

* Caching etc. on the server side so that server-side I/O bandwidth isn't the limiting factor

* Sufficient concurrent connections that you need more than one webserver

Under these assumptions, each webserver doesn't wait, in the OS blocking sense, for any given connection. It processes I/O completions as they come due, as fast as it can. It is this process that is CPU-bound; if less than 100% CPU is utilized, it means that there are periods where no I/O completions are currently due.

There are memory costs per concurrent connection: whatever is needed to pick up processing as the related I/O completes, and for some structures representing the open socket in the kernel.

But these costs don't magically add up to "waiting". To the degree that the webserver is tied up in "waiting", it is that the kernel is idle between CPU interrupts generated from the networking hardware. In other words, it's doing nothing, and if you have too many machines spending their time doing nothing, you can eliminate some of them.

Now consider the same assumptions, except in the synchronous case: what this does is move the memory and CPU costs around and increase them. The memory cost per concurrent connection increases, to store the stack etc. The CPU costs increase because now context switches are required in between each I/O completion. These costs can be substantial; they can add up to being a limiting factor in themselves - but as part of either the memory or CPU limits of the machine, not I/O.

Consider this: why would one need more than one webserver if the servers were "I/O bound", and not on server-side bandwidth, but rather on spoon-feeding clients?

Since it's not server-side bandwidth, it can only be because of either memory or CPU bounds. And where do those bounds come from? They come from the CPU or memory cost associated with concurrent connections, as well as the CPU and memory costs of processing any given request. You can minimize the costs associated with concurrent connections by leveraging an async I/O design of the server. But minimizing the CPU and memory costs of per-request processing is pure gravy in terms of reducing the number of machines you need to keep running to process X concurrent requests.

And it is here that the myth lies. Just because you may need to keep spoon-feeding very slow clients, such that decreasing the CPU cost of any given request would not visibly affect the client's perceived latency, that doesn't mean that optimizing the server for CPU usage is pointless. The less CPU usage you spend, the less hardware you need for the same load; likewise for memory.

Another way to think of it: how can e.g. nginx work as a reverse proxy, if it runs on a single machine and is spoon-feeding lots of slow clients? Async design, and offloading CPU/memory requirements to other machines, that's how.