Hacker News new | past | comments | ask | show | jobs | submit login

Almost every single website on the internet is I/O bound, not CPU bound

This claim is patently false.

Under load, a surprisingly large percentage of applications are CPU bound. Disclaimer: I make my living doing web application load tests on a consultative basis--I don't have a study to cite offhand, just experience. I should publish a study, but that's a different issue.




It seems like you're both right. You qualified your statement with "under load". Are a majority of websites/web applications under a significant load?

I'd guess the real answer is that, for most websites, they're I/O bound due to the process of sending data between the client and server.

However, heavier applications and sites with significant traffic then require more CPU processing, thus becoming CPU bound.

I don't have any real expertise, but from everything I've read and seen discussed, this is my interpretation of the issue.


While I don't argue your point, I've noticed a number of recent comments on HN that say: "I'm a consultant, therefore I am right."

My concern is that there is a winnowing of the data that any consultant will be able to comment on:

In your case, the web sites you can base your conclusions on are the ones that feel the need to hire a consultant to do load testing. This is not necessarily a truly representative sample.


Surely there is a bias in my sample, but my sample is representative enough to counter a generalization about "almost every single website."


That's interesting. I'm sure some people here, like me, believe the opposite, since I/O is so much slower than processors now. Can you give a bit more of an explanation as to what kind of applications are CPU bound and why?


I/O is slow if you actually do any, but RAM is so cheap that many Web sites shouldn't need much I/O.


I think that by I/O in this case people mean network I/O, not disk I/O.

Edit: Sorry, maybe I was jumping to conclusions. Databases and files are of course not always cached in-memory.


What wmf said sums it up. In terms of who is CPU bound, it often points to a need for caching or indexing. All it really takes to be CPU bound is to use up more CPU than your hardware can offer. Once you are CPU bound, you're unlikely to push traffic through fast enough to be network or disk I/O bound. If you're caught on something else, like CPU, the speed of those things won't matter much.


I can't speak to your experience, but if you look at computing hardware, IO is still by far the limiting factor. Plus if you look at how applications could ideally be built and how they are actually built, there's a very different story involved. Many applications are CPU bound simply b/c of poorly designed DB queries.


In the real world, all that matters is how the web app is actually built. There is always a bottleneck, and after you remove that bottleneck, there is always another. IO is just one of many things that can bottleneck an application, as your example points out.


If everything is cached in memory then I/O is out of the picture... unless you count context-switching/paging between cache levels as I/O (and in that case anything that intensively uses the CPU is also an I/O operation).

In any case, when you're as large as Facebook, a 50% reduction in CPU-cost can have a significant impact as compared to a 50% reduction in CPU-cost on a site with 1000 hits/month. If even the minimal amount of CPU-overhead (when compared to DB-overhead or disk-overhead) can be reduced you can cut the number of machines you need to distribute that load, reducing costs.


One simple way to state it is, as traffic increases, you are first IO bound, then, as ongoing requests pile up you get increasingly CPU bound and, when memory is exhausted and requests start to pile up really quickly you become IO bound again, briefly, before you crash (or start denying access).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: