I wonder where this myth that "[websites] on the internet [are] I/O bound, not C...

ubernostrum · on Feb 3, 2010

Well...

In a dynamic web application, at first, you will nearly always be database bound. Faster algorithms and faster programming-language implementations on web servers will do nothing whatsoever for this (and increasing the concurrent load of requests the web server can handle will in fact only overload the DB even more).

That's when you start doing caching, and that's why caching has such dramatic effects.

Once you've got your caching going nicely and your DB humming along, you will end up bandwidth bound. Not by your own pipe, but by clients; you may have a nice fat line running out of the data center, but your users may be on anything down to mobile phones or even dialup, and you'll only be able to push responses to them at the speed they can handle. This is the spoon-feeding problem, and once again algorithms and language implementations on the server can't do anything at all to help it.

That's when you start putting fast, light, highly-concurrent reverse proxies (nginx appears to be winning the market share battle) in front of your actual web servers, and once again you will see a drastic effect. Or you combine caching and proxy into one component and do Varnish.

Once you've done this, you might finally start to reach a point where you're genuinely I/O or CPU bound on a server that's actually running your application code. Or you might not; there are other roadblocks you might run into first.

At any rate, optimizing CPU usage is, for the vast majority of websites, a waste of time at least until you've been through the phases I've outlined above. And, generally, I think you'll find that's the advice (the "myth") you've been hearing: fiddling with programming languages and algorithms is literally a net loss of performance until you've dealt with quite a few other (and more important, performance-wise) things.

barrkel · on Feb 3, 2010

> Once you've got your caching going nicely and your DB humming along, you will end up bandwidth bound. Not by your own pipe, but by clients [...] This is the spoon-feeding problem, and once again algorithms and language implementations on the server can't do anything at all to help it.

Here. Here is where you made the mistake in your assertions.

> That's when you start putting fast, light, highly-concurrent reverse proxies (nginx appears to be winning the market share battle) in front of your actual web servers

My point is that the number of web servers you need in this spoon-feeding case is inversely proportional to how CPU-optimized your servers are. The spoon-feeding problem, as you put it, is just bookkeeping in the kernel (keeping track of open sockets) and iteratively processing I/Os when they come due, and is CPU-bound, unless you actually have approaching 64K open sockets, in which case a reverse proxy won't do, you'll need DNS tricks etc.

mfukar · on Feb 3, 2010

You have misunderstood the problem. The fact that clients sit behind lean, low-bandwidth/high-latency lines means that your webserver spends its time waiting for them. The very definition of I/O bound processing. Solving the spoon-feeding issue is much more complex than simple book-keeping in the kernel, because it is independent (from the angle we're looking at it) from the server's design and operation.

barrkel · on Feb 3, 2010

And I think you've completely missed my point.

My assumptions:

* Server built around async I/O

* Clients with low bandwidth and high latency, but in aggregate insufficient to saturate the server's bandwidth

* Caching etc. on the server side so that server-side I/O bandwidth isn't the limiting factor

* Sufficient concurrent connections that you need more than one webserver

Under these assumptions, each webserver doesn't wait, in the OS blocking sense, for any given connection. It processes I/O completions as they come due, as fast as it can. It is this process that is CPU-bound; if less than 100% CPU is utilized, it means that there are periods where no I/O completions are currently due.

There are memory costs per concurrent connection: whatever is needed to pick up processing as the related I/O completes, and for some structures representing the open socket in the kernel.

But these costs don't magically add up to "waiting". To the degree that the webserver is tied up in "waiting", it is that the kernel is idle between CPU interrupts generated from the networking hardware. In other words, it's doing nothing, and if you have too many machines spending their time doing nothing, you can eliminate some of them.

Now consider the same assumptions, except in the synchronous case: what this does is move the memory and CPU costs around and increase them. The memory cost per concurrent connection increases, to store the stack etc. The CPU costs increase because now context switches are required in between each I/O completion. These costs can be substantial; they can add up to being a limiting factor in themselves - but as part of either the memory or CPU limits of the machine, not I/O.

Consider this: why would one need more than one webserver if the servers were "I/O bound", and not on server-side bandwidth, but rather on spoon-feeding clients?

Since it's not server-side bandwidth, it can only be because of either memory or CPU bounds. And where do those bounds come from? They come from the CPU or memory cost associated with concurrent connections, as well as the CPU and memory costs of processing any given request. You can minimize the costs associated with concurrent connections by leveraging an async I/O design of the server. But minimizing the CPU and memory costs of per-request processing is pure gravy in terms of reducing the number of machines you need to keep running to process X concurrent requests.

And it is here that the myth lies. Just because you may need to keep spoon-feeding very slow clients, such that decreasing the CPU cost of any given request would not visibly affect the client's perceived latency, that doesn't mean that optimizing the server for CPU usage is pointless. The less CPU usage you spend, the less hardware you need for the same load; likewise for memory.

Another way to think of it: how can e.g. nginx work as a reverse proxy, if it runs on a single machine and is spoon-feeding lots of slow clients? Async design, and offloading CPU/memory requirements to other machines, that's how.

tlack · on Feb 3, 2010

I never understood that either. For years I've been hearing people say that web apps shouldn't worry about CPU speed, but in 100% of cases over the past ten years I've found servers with faster processors to be much less sluggish when serving both dynamic and static requests. In fact I've often perceived a bigger benefit from a faster processor rather than a marginally faster disk. I must be crazy I suppose..

wmf · on Feb 3, 2010

When people say I/O-bound, I think they mean disk-bound. Using less CPU really won't help you there.

Whether Web servers really are disk-bound is a different issue.

If you have an asynchronous design for your server...

This is PHP we're talking about.

barrkel · on Feb 3, 2010

Asynchronous design just removes some of the memory pressure and a chunk of CPU loss from context switching. If your thread stacks are short and you have plentiful memory, you're still CPU-bound.

But yes, if you have to hit a serialized disk somewhere for each and every request, then you really are screwed. But I don't think that's really often the case, is it? If it were, it would be a miracle that most web pages ever come up in under a second.

wanderr · on Feb 3, 2010

Yeah, I pretty much gave up on reading the article at that line. It would be pretty awesome if we lived in a magical world where the performance of code running on a web server didn't matter, but that's just not the case. To know that, one needs simply run top on a busy web server.

jrockway · on Feb 3, 2010

Yeah, exactly. The reality is that most web applications don't handle IO sanely, and have a 100-400MB process sit idle while waiting for the results of a database query (or for a slow client to suck down the rendered page).

Using a proper threading system (and I mean lightweight threads, not OS threads) makes any application CPU intensive instead of IO-intensive. If the app doesn't use all the CPU, it's performing as well as possible. If it does use all the CPU, then making it run faster is a net gain for your users.

This is PHP, though, and you would have to jump through a lot of hoops to remove the IO overhead. At least HipHop uses libevent, so at least apps won't be blocked by slow clients anymore.

mscarborough · on Feb 3, 2010

>This is PHP, though, and you would have to jump through a lot of hoops to remove the IO overhead. At least HipHop uses libevent, so at least apps won't be blocked by slow clients anymore.

Really, in PHP you don't have to jump through many hoops at all to minimize disk I/O. Employ APC opcode and user variable caching, build PHP's Memcached (with the 'd' on the end, not the old 'Memcache' lib) into your data layer, and you're good to go.

If you're letting any process sit idle, much less a '100-400 MB' one, you're doing it wrong--in any language.