Alex Gaynor - Thoughts on HipHop PHP

ericb · on Feb 3, 2010

Almost every single website on the internet is I/O bound, not CPU bound

This claim is patently false.

Under load, a surprisingly large percentage of applications are CPU bound. Disclaimer: I make my living doing web application load tests on a consultative basis--I don't have a study to cite offhand, just experience. I should publish a study, but that's a different issue.

loganfrederick · on Feb 3, 2010

It seems like you're both right. You qualified your statement with "under load". Are a majority of websites/web applications under a significant load?

I'd guess the real answer is that, for most websites, they're I/O bound due to the process of sending data between the client and server.

However, heavier applications and sites with significant traffic then require more CPU processing, thus becoming CPU bound.

I don't have any real expertise, but from everything I've read and seen discussed, this is my interpretation of the issue.

synnik · on Feb 3, 2010

While I don't argue your point, I've noticed a number of recent comments on HN that say: "I'm a consultant, therefore I am right."

My concern is that there is a winnowing of the data that any consultant will be able to comment on:

In your case, the web sites you can base your conclusions on are the ones that feel the need to hire a consultant to do load testing. This is not necessarily a truly representative sample.

ericb · on Feb 3, 2010

Surely there is a bias in my sample, but my sample is representative enough to counter a generalization about "almost every single website."

ehsanul · on Feb 3, 2010

That's interesting. I'm sure some people here, like me, believe the opposite, since I/O is so much slower than processors now. Can you give a bit more of an explanation as to what kind of applications are CPU bound and why?

wmf · on Feb 3, 2010

I/O is slow if you actually do any, but RAM is so cheap that many Web sites shouldn't need much I/O.

nixy · on Feb 3, 2010

I think that by I/O in this case people mean network I/O, not disk I/O.

Edit: Sorry, maybe I was jumping to conclusions. Databases and files are of course not always cached in-memory.

ericb · on Feb 3, 2010

What wmf said sums it up. In terms of who is CPU bound, it often points to a need for caching or indexing. All it really takes to be CPU bound is to use up more CPU than your hardware can offer. Once you are CPU bound, you're unlikely to push traffic through fast enough to be network or disk I/O bound. If you're caught on something else, like CPU, the speed of those things won't matter much.

delano · on Feb 3, 2010

I can't speak to your experience, but if you look at computing hardware, IO is still by far the limiting factor. Plus if you look at how applications could ideally be built and how they are actually built, there's a very different story involved. Many applications are CPU bound simply b/c of poorly designed DB queries.

ericb · on Feb 3, 2010

In the real world, all that matters is how the web app is actually built. There is always a bottleneck, and after you remove that bottleneck, there is always another. IO is just one of many things that can bottleneck an application, as your example points out.

pyre · on Feb 3, 2010

If everything is cached in memory then I/O is out of the picture... unless you count context-switching/paging between cache levels as I/O (and in that case anything that intensively uses the CPU is also an I/O operation).

In any case, when you're as large as Facebook, a 50% reduction in CPU-cost can have a significant impact as compared to a 50% reduction in CPU-cost on a site with 1000 hits/month. If even the minimal amount of CPU-overhead (when compared to DB-overhead or disk-overhead) can be reduced you can cut the number of machines you need to distribute that load, reducing costs.

rbanffy · on Feb 3, 2010

One simple way to state it is, as traffic increases, you are first IO bound, then, as ongoing requests pile up you get increasingly CPU bound and, when memory is exhausted and requests start to pile up really quickly you become IO bound again, briefly, before you crash (or start denying access).

scott_s · on Feb 3, 2010

Because of differences like this ... I believe that the work done on HipHop represents a fundamentally smaller challenge than that taken on by the teams working to improve the implementations of languages like Python, Ruby, or Javascript.

I think this is true. On the other hand, it's not Facebook's job to do research. If they can benefit equally from solving a simpler problem, then that's what they should do.

I agree with the author's points, I just want to make explicit that Facebook didn't make wrong choices. It's just that their work is probably of limited value to everyone else.

pvg · on Feb 3, 2010

There are many more people working on PHP apps than there are people working on dynamic language runtimes. Facebook's work is of limited value to everyone else for a limited value of 'everyone else'.

scott_s · on Feb 3, 2010

I'm taking the author's arguments at face value, that their project is of limited value to others because most PHP apps aren't CPU bound, and that they only handle a subset of the language.

jrockway · on Feb 3, 2010

My thoughts exactly; this is really not too exciting. (I will disagree that web apps are never CPU-bound. That has always been the bottleneck for me.)

I haven't seen the source code yet, but I think Chicken Scheme did everything HipHop does but many years ago. And, there are some hard problems to solve when implementing Scheme, notably handling continuations. PHP is much simpler in comparison.

I do see good things coming out of this project, though. As people are enticed by the biggest temptress in computing, speed, they'll learn to run and deploy web applications that aren't as simple as "ftp this HTML file to a directory". Once PHP's only advantage over other languages is gone, and developers realize that it wasn't that big of an advantage, they'll switch to better programming languages. Then we can forget PHP ever happened, and the field can move on!

pvg · on Feb 3, 2010

You think PHP is more likely to decline in popularity because there's now a way to run it faster? I suspect you are in for a very long surprise.

coffeemug · on Feb 3, 2010

Firstly, there's the question of what problem HipHop solves.

It reduces infrastructure expenses of the top 100 web properties so much that the pain of building it, testing it, rolling it out, administering it, and maintaining it is worth it.

barrkel · on Feb 3, 2010

I wonder where this myth that "[websites] on the internet [are] I/O bound, not CPU bound" seemly implying that optimizing CPU usage is a waste of time, comes from.

Since the I/O bound generally comes from latency, rather than total throughput, the number of concurrent connections a single webserver can handle is often proportional to how much memory and CPU resources each connection uses. If you have an asynchronous design for your server, concurrent connections don't cost physical threads - they just cost the bookkeeping overhead in the kernel. The faster you can switch between those connections, and get finished with work on them when I/Os complete, the fewer physical machines you need to serve a website to a given number of users.

Or to put it another way, the saturation point of processing asynchronous I/Os is CPU-bound, even when speeding up individual requests is I/O-bound.

ubernostrum · on Feb 3, 2010

Well...

In a dynamic web application, at first, you will nearly always be database bound. Faster algorithms and faster programming-language implementations on web servers will do nothing whatsoever for this (and increasing the concurrent load of requests the web server can handle will in fact only overload the DB even more).

That's when you start doing caching, and that's why caching has such dramatic effects.

Once you've got your caching going nicely and your DB humming along, you will end up bandwidth bound. Not by your own pipe, but by clients; you may have a nice fat line running out of the data center, but your users may be on anything down to mobile phones or even dialup, and you'll only be able to push responses to them at the speed they can handle. This is the spoon-feeding problem, and once again algorithms and language implementations on the server can't do anything at all to help it.

That's when you start putting fast, light, highly-concurrent reverse proxies (nginx appears to be winning the market share battle) in front of your actual web servers, and once again you will see a drastic effect. Or you combine caching and proxy into one component and do Varnish.

Once you've done this, you might finally start to reach a point where you're genuinely I/O or CPU bound on a server that's actually running your application code. Or you might not; there are other roadblocks you might run into first.

At any rate, optimizing CPU usage is, for the vast majority of websites, a waste of time at least until you've been through the phases I've outlined above. And, generally, I think you'll find that's the advice (the "myth") you've been hearing: fiddling with programming languages and algorithms is literally a net loss of performance until you've dealt with quite a few other (and more important, performance-wise) things.

barrkel · on Feb 3, 2010

> Once you've got your caching going nicely and your DB humming along, you will end up bandwidth bound. Not by your own pipe, but by clients [...] This is the spoon-feeding problem, and once again algorithms and language implementations on the server can't do anything at all to help it.

Here. Here is where you made the mistake in your assertions.

> That's when you start putting fast, light, highly-concurrent reverse proxies (nginx appears to be winning the market share battle) in front of your actual web servers

My point is that the number of web servers you need in this spoon-feeding case is inversely proportional to how CPU-optimized your servers are. The spoon-feeding problem, as you put it, is just bookkeeping in the kernel (keeping track of open sockets) and iteratively processing I/Os when they come due, and is CPU-bound, unless you actually have approaching 64K open sockets, in which case a reverse proxy won't do, you'll need DNS tricks etc.

mfukar · on Feb 3, 2010

You have misunderstood the problem. The fact that clients sit behind lean, low-bandwidth/high-latency lines means that your webserver spends its time waiting for them. The very definition of I/O bound processing. Solving the spoon-feeding issue is much more complex than simple book-keeping in the kernel, because it is independent (from the angle we're looking at it) from the server's design and operation.

barrkel · on Feb 3, 2010

And I think you've completely missed my point.

My assumptions:

* Server built around async I/O

* Clients with low bandwidth and high latency, but in aggregate insufficient to saturate the server's bandwidth

* Caching etc. on the server side so that server-side I/O bandwidth isn't the limiting factor

* Sufficient concurrent connections that you need more than one webserver

Under these assumptions, each webserver doesn't wait, in the OS blocking sense, for any given connection. It processes I/O completions as they come due, as fast as it can. It is this process that is CPU-bound; if less than 100% CPU is utilized, it means that there are periods where no I/O completions are currently due.

There are memory costs per concurrent connection: whatever is needed to pick up processing as the related I/O completes, and for some structures representing the open socket in the kernel.

But these costs don't magically add up to "waiting". To the degree that the webserver is tied up in "waiting", it is that the kernel is idle between CPU interrupts generated from the networking hardware. In other words, it's doing nothing, and if you have too many machines spending their time doing nothing, you can eliminate some of them.

Now consider the same assumptions, except in the synchronous case: what this does is move the memory and CPU costs around and increase them. The memory cost per concurrent connection increases, to store the stack etc. The CPU costs increase because now context switches are required in between each I/O completion. These costs can be substantial; they can add up to being a limiting factor in themselves - but as part of either the memory or CPU limits of the machine, not I/O.

Consider this: why would one need more than one webserver if the servers were "I/O bound", and not on server-side bandwidth, but rather on spoon-feeding clients?

Since it's not server-side bandwidth, it can only be because of either memory or CPU bounds. And where do those bounds come from? They come from the CPU or memory cost associated with concurrent connections, as well as the CPU and memory costs of processing any given request. You can minimize the costs associated with concurrent connections by leveraging an async I/O design of the server. But minimizing the CPU and memory costs of per-request processing is pure gravy in terms of reducing the number of machines you need to keep running to process X concurrent requests.

And it is here that the myth lies. Just because you may need to keep spoon-feeding very slow clients, such that decreasing the CPU cost of any given request would not visibly affect the client's perceived latency, that doesn't mean that optimizing the server for CPU usage is pointless. The less CPU usage you spend, the less hardware you need for the same load; likewise for memory.

Another way to think of it: how can e.g. nginx work as a reverse proxy, if it runs on a single machine and is spoon-feeding lots of slow clients? Async design, and offloading CPU/memory requirements to other machines, that's how.

tlack · on Feb 3, 2010

I never understood that either. For years I've been hearing people say that web apps shouldn't worry about CPU speed, but in 100% of cases over the past ten years I've found servers with faster processors to be much less sluggish when serving both dynamic and static requests. In fact I've often perceived a bigger benefit from a faster processor rather than a marginally faster disk. I must be crazy I suppose..

wmf · on Feb 3, 2010

When people say I/O-bound, I think they mean disk-bound. Using less CPU really won't help you there.

Whether Web servers really are disk-bound is a different issue.

If you have an asynchronous design for your server...

This is PHP we're talking about.

barrkel · on Feb 3, 2010

Asynchronous design just removes some of the memory pressure and a chunk of CPU loss from context switching. If your thread stacks are short and you have plentiful memory, you're still CPU-bound.

But yes, if you have to hit a serialized disk somewhere for each and every request, then you really are screwed. But I don't think that's really often the case, is it? If it were, it would be a miracle that most web pages ever come up in under a second.

wanderr · on Feb 3, 2010

Yeah, I pretty much gave up on reading the article at that line. It would be pretty awesome if we lived in a magical world where the performance of code running on a web server didn't matter, but that's just not the case. To know that, one needs simply run top on a busy web server.

jrockway · on Feb 3, 2010

Yeah, exactly. The reality is that most web applications don't handle IO sanely, and have a 100-400MB process sit idle while waiting for the results of a database query (or for a slow client to suck down the rendered page).

Using a proper threading system (and I mean lightweight threads, not OS threads) makes any application CPU intensive instead of IO-intensive. If the app doesn't use all the CPU, it's performing as well as possible. If it does use all the CPU, then making it run faster is a net gain for your users.

This is PHP, though, and you would have to jump through a lot of hoops to remove the IO overhead. At least HipHop uses libevent, so at least apps won't be blocked by slow clients anymore.

mscarborough · on Feb 3, 2010

>This is PHP, though, and you would have to jump through a lot of hoops to remove the IO overhead. At least HipHop uses libevent, so at least apps won't be blocked by slow clients anymore.

Really, in PHP you don't have to jump through many hoops at all to minimize disk I/O. Employ APC opcode and user variable caching, build PHP's Memcached (with the 'd' on the end, not the old 'Memcache' lib) into your data layer, and you're good to go.

If you're letting any process sit idle, much less a '100-400 MB' one, you're doing it wrong--in any language.

leej · on Feb 3, 2010

he may meant "it is not as CPU-bound as in the case of rendering or en/decoding video".