The discussion of epoll/select does talk about the event-driven scenario and the...

axod · on June 8, 2008

Sure, I just expected the doc to go a little more like:

1 process per connection (bad) -> 1 thread per connection (better?) -> 1 process (bingo!)

neilc · on June 8, 2008

~1 process per core, that is (unless the daemon is completely I/O bound).

rcoder · on June 9, 2008

This is only true if you think your application can do a better job of scheduling than the underlying OS kernel. In many (if not most) cases, this is false.

For HTTP servers hosting static content, you may be able to out-perform the OS thread scheduler. For most non-trivial apps, you're probably wrong.

Small fork()-ed processes can still compete with clever poll(), /dev/epoll, or kqueue()-based servers, if you can keep each instance lightweight.

neilc · on June 9, 2008

This is only true if you think your application can do a better job of scheduling than the underlying OS kernel. In many (if not most) cases, this is false.

Why is it false? As the other reply notes, you have more domain knowledge than the kernel's scheduler does. You also didn't need to pay the overhead of entering the kernel to context switch (which is why context switches between userspace threads are cheaper than between kernel threads). In the case of fork()-based servers, you also need to flush the TLB (depending on the CPU architecture).

I'd be curious to see links that support your last claim: AFAIK, it is fairly well-known that event-based daemons using epoll/kqueue are the most performant technique for writing scalable network servers. See C10K, etc.

rcoder · on June 9, 2008

The problem is that, in the real world, monolithic polling network servers have to manage all the transaction-specific context for your application in a single binary.

I'm not challenging the fact that the C10K architecture wins if we're talking about static content. Real webapps don't live and die by their static content performance, though: the critical path is through the dynamic content generation, which means that select()/poll()/et. al. don't buy you much, unless you can hook your database client events and application thread scheduling inside that loop as well.

Green (a.k.a. userspace) threads are one good solution, but fork() isn't the performance-killer that people seem to think it is, either.

staunch · on June 9, 2008

> This is only true if you think your application can do a better job of scheduling than the underlying OS kernel. In many (if not most) cases, this is false.

Why wouldn't you be able to virtually always do a better job of scheduling events (when you control all the code competing for resources) than a generic OS scheduler?

> Small fork()-ed processes can still compete with clever poll(), /dev/epoll, or kqueue()-based servers, if you can keep each instance lightweight.

Do you know of any examples of ultra highly scalable fork()ing servers?

rcoder · on June 9, 2008

> Why wouldn't you be able to virtually always do a better job of scheduling events (when you control all the code competing for resources) than a generic OS scheduler?

Simply put, I would say that "you" (where "you" is an average webapp developer) probably don't understand scheduling, event-driven programming, or memory management as well as the average kernel developer. I of course don't mean this to extend to anyone in this thread. Imagine, though, giving your average PHP or ASP developer, who may struggle to implement a basic sort algorithm, the problem of implementing cooperative multitasking in a scalable way.

> Do you know of any examples of ultra highly scalable fork()ing servers?

Under real-world workloads, I still consider Apache to be "highly scalable." In my experience, given 1:1 investment in hardware for front-end and database servers, the RDBMS craps out long before the webapp, so being able to accept another 5000 incoming connections is only going to hurt you.

axod · on June 9, 2008

If you use long lived connections, apache simply does not cut it any more IMHO

axod · on June 9, 2008

You're thinking that having the OS save everything in a context switch is better than you just calling a different function to do a piece of work for one of your connections? Not really.

rcoder · on June 9, 2008

Forive the reductio ad absurdum, but it seems that by your argument, we'd all be better off simply implementing our webapps within the init daemon, and handling all scheduling and resource-protection within our application logic.

Personally, I like having my orthogonal processes running in isolation from each other, mediated through a tested, scalable kernel.

Obviously, there's a balance to be found between needless context switches and dangerous coupling of function, but I've found that, as often as not, a process context per user is actually a pretty good point to come down on within that continuum.

axod · on June 9, 2008

I guess it depends how many users you want to scale to :)

rcoder · on June 9, 2008

I'm less concerned with cramming as many users onto a single box as possible than I am with insuring a high quality of service for each active user.

If you have 5000 concurrent sessions in a single OS-level process, you better have some pretty damn good fault-recovery mechanisms built into that process.

When you get right down to it, this is the same basic argument gets used to support Erlang: namely, that it supports large number of cooperating, isolated processes, rather than shoving everything into a single monolithic server.

I trust that the Linux (or BSD, Solaris, or other reasonable OS) kernel can handle scheduling a few thousand processes smoothly, so long as those processes fit into available RAM.

What kills big Apache (or other fork()-driven) server environments isn't context switching between backends, it's swapping.

That being said, you're right about it being all about scale. If you're like Google, or Facebook, and can afford to hire engineers who do nothing but worry about scaling to 10K sessions on each HTTP host by moving your critical-path code into hand-tuned C++ or Java inside the polling server, more power to you. If you're in the normal world most of us live in, where scalability via hardware is more economical than via developer time, then fault isolation may prove more useful than raw throughput.

axod · on June 9, 2008

Good point. In practice though I like to keep 1 networking process/thread, and use the other cores for other tasks, such as database access, and other operations that might block. Or just long operations that need to get done without compromising other connections etc.

After all, the actual networking code isn't cpu heavy.

That way you don't have to deal with any synchronization or concurrency issues in the networking code at all.