Hacker News new | past | comments | ask | show | jobs | submit login
Work has started on the next generation of Apache web server (h-online.com)
89 points by Tsiolkovsky on March 8, 2011 | hide | past | favorite | 35 comments



The 2.3 branch is a testing branch (that was alpha before 2.3.11, now is beta after 2.3.11).

What we will see is 2.4 (general/public release). And it won't really be stable / product-ready until about 6 months after: 2.4.8.

http://httpd.apache.org/docs/2.3/new_features_2_4.html

http://httpd.apache.org/docs/2.3/developer/new_api_2_4.html

The MPM for Windows is really great, it takes full advantage of the threaded nature of processes on Windows ... You can spin up 100s of threads in 1 process without much of an impact and have each thread do 1 connection (with a low 1 or 2 seconds keepalive to keep the client).

I'm mostly interested in Apache on Windows and have toyed around with the idea of including v2.3 in my WAMP distribution ( http://www.devside.net/server/webdeveloper ) in the experimental branch, with also some MySQL replacements such as MariaDB and Drizzle.


Apache hacker here. AMA.


Why is apache so slow compared to Nginx, Cherokee, Lighttpd and many other web server softwares?


Different design goals. Apache is meant to be robust, extensible and portable.

* Robust: that's reflected in its internal API that makes it near impossible to leak resources.

* Extensible: witness the gazillion modules out there.

* Portable: compiles and runs on very exotic or outdated systems. SCO, IRIX, Digital UNIX, VMS, the list goes on.

nginx and such were designed from the ground up with performance in mind - and with success - but the trade-off is a lack of portability and an API that is much harder to program to.


Well gee. It's been a while since I've used Digital UNIX, Irix, SCO, or VMS.

Because they've become totally irrelevant.

Seems like a waste of resources when there are more pressing things to do than worry about the 5 people who (a) use VMS and (b) demand a bleeding-edge Apache.


No need to be so acerbic. They are providing a valuable service to you, for free. Apache is older, much more ubiquitous, supports more modules, has a more familiar setup and configuration system for many people, runs on more platforms, etc. It has more legacy issues, which are why it's so popular, but also why it moves slower. Nginx is much newer, doesn't have the same legacy issues, and so could afford to focus on speed.

There's a reason that Apache is installed on just about every random web host you can find, and has a module for every language or environment you need to deel with, while Nginx is a bit more of a specialty web server, usually used for dedicated sites that can spend the time to tune carefully for the highest performance. They both have their place.

It's great that Apache is still innovating moving towards loadable MPMs as well as adding an evented MPM. But it's not a bad thing that they're moving slower than a new server like Nginx; there's room for more than one great free web server in the world.


Preforked and threaded blocking IO consumes far more resources than event-driven non-blocking network IO. Process/thread stacks consume a significant amount of memory. This is why nginx ruins Apache at almost anything at scale (especially reverse proxying). Apache falls over while consuming grandiose amounts of memory in the low millions of connections per day while nginx can easily handle 10mm connections per day on a single core machine with 256mb ram. It's ridiculous.

On the other hand, for long-lived connections, you're better off using blocking IO since it's more efficient. Most web traffic is not long-lived, though. See Paul's interesting article on NIO vs IO performance:

http://mailinator.blogspot.com/2008/02/kill-myth-please-nio-...

http://www.mailinator.com/tymaPaulMultithreaded.pdf

And don't forget about Slowloris. If you see your competition using Apache, you really don't need to worry. :)

Disclaimer: I have nothing personal against Apache and used it for six years or so. I've just moved on to better software. If it works for you, that's great.


I still use Apache because I trust it; and because the ease with which I can configure a Python server (i.e. mod_wsgi), a Ruby server (passenger) and a Perl server, while being able to make use of the same modules I've been using for years.

Also, a properly configured Varnish placed in front of Apache ruins Nginx at almost anything at scale. I've seen it.

      Most web traffic is not long-lived, though.
But most web traffic is blocking. Going NIO requires caching - which is a huge penalty and a PITA; and doing it while not having actual users doesn't make sense.


You should look at Unicorn as an alternative to Passenger for your Rails apps. You just send it a USR2 for zero downtime deploys. Pretty nice setup.

https://github.com/blog/517-unicorn http://unicorn.bogomips.org/

Varnish isn't a webserver. You can put Varnish in front of nginx as well. :)

Agreed on the blocking.

On large sites, I've been doing a single HAProxy instance -> nginx instance on each webserver -> Unicorn app server on each webserver with really good results.


Will those two things included in the new apache change this?:

Event MPM: The Event MPM is no longer experimental but is rather fully supported.

Asynchronous support: Better support for asynchronous read/write for supporting MPMs and platforms.


Yes, I believe so. I'm really excited that the event MPM is no longer experimental. I had tried it way back in the day, but mod_php could not be safely used with it as mod_php was (is?) not thread-safe. I wasn't aware of FCGI at the time.


Mostly due to two reasons:

0) People don't know how to use PHP with it: module or fastcgi; and which MPM to use here.

1) People don't know how to configure the MPM setting to allow Apache full use of server resources (without under-utilizing or over-utilizing).

2) Apache is a full / well-rounded web server application... It's not a specialty server (ex: static content mostly) that can exclude this and that and only provide 1 feature to excel at. Other servers are stripped down compared to Apache.

* Not the AMA guy (nor an Apache hacker guy), buy have a little known WAMP distribution (called WampDeveloper, used to be Web.Developer Server Suite) since 2003 with over 250,000 unique ip downloads between 2003-2006 (have stopped counting), and have worked on 1000s of issues for users and clients since the start.


The only reason I still run Apache is because a lot of PHP stuff requires mod_rewrite to create pretty URL's and I haven't had the time to rewrite them using LUA so I can use Lighttpd.

But on the same server, using PHP-FPM with FastCGI on both lighttpd and Apache I am serving almost 4x the requests that Apache is serving with less memory overhead. Switching the sites that are currently running on Lighttpd back to Apache as I attempted to do not to long ago to only have to maintain a single server made Apache die a miserable fast death. It just could not keep up, I've had friends look over my config (datacenter techs, help people scale their stuff, porn mostly) and they said it looked fine. Apache was the limiting factor here.

I moved from prefork to threaded, that helped a little, but not much. Apache was just using a lot of memory, and overall did not provide the performance I wanted. Switched back to lighttpd, load on the server went down, and the websites were responsive as before.


"made Apache die a miserable fast death. It just could not keep up, I've had friends look over my config (datacenter techs, help people scale their stuff, porn mostly) and they said it looked fine."

Apache crashing under load is indicative of seriously bad MPM settings: specifically the # of processes / threads allowed to be used (and some other settings such as KeepAlive timeout, how php is being used, etc).

With a properly configured MPM, anything coming over the process / thread limit goes into a backlog queue which is 511 entries long. Anything over this just gets dropped.

To iterate, Apache does not crash under load (too many requests), it crashes due to too many processes / threads (an MPM setting) being used and sometimes due to leaking modules such as mod_python/ruby.


What type of setup do you use to analyze, debug, and follow the code?

Any tips on getting started with module development?


Another apache hacker answering :-)

I use emacs and gdb to muck around the code. Apache has a lot of backwards compat warts with regard to which header files define things, unfortunately, but grep (or ack) and ctags will get you around.

The best way (by far) to learn about module hackery is with Nick Kew's book, http://www.amazon.com/Apache-Modules-Book-Application-Develo...


Another vote for Nick Kew's book. He is a regular poster on the modules-dev mailing list[1], by the way.

`httpd -X` and `gdb httpd $(pidof httpd)` are a great help when debugging.

[1] http://mail-archives.apache.org/mod_mbox/httpd-modules-dev/


Any plans for native support of WebSockets?


Are there any significant barriers to use the event MPM as it stands in the 2.2.

Does it fall back to worker when its used under SSL?


No. The main problem the event MPM addresses is HTTP keep-alive.

With pre-fork and threaded, every connection takes up a process or thread. This sucks for keep-alive because it can take a while before the client issues the next request.

The event MPM puts that idle connection in a kqueue/epoll/etc. pollset and recycles the process/thread for another request.

It's a conceptually simple change but it has some profound performance implications.


As a single datapoint, I run a site that handles many billions (this is not rhetorical or an exaggeration) of hits per month using mpm_event: I consider it the only sane/stable way of configuring Apache.


As I found out to my sadness, the only sane configuration is not the one which ships with apt-get apache2.

I like many things about Nginx. The fact that it is a production-capable web server out of the box, for example. I know that is probably not a high-priority design goal for Apache for historical reasons, but it seems a very sensible default in 2011.


For the record, "apt-get install apache2-mpm-event" is all you need to switch to mpm_event: I believe it isn't supported on all platforms, and, as a conservative Debian-based distribution using apt-get, I'd prefer "predictable behavior across all of our targets" over "highest performance configuration on any given system".

(I also believe there are some corner cases where mpm_event may burn you on 2.2, and both Apache and Debian are going to play it safe there; but, when I looked into this in detail a forever or so ago, I determined that whatever issues they were didn't apply to my setup. As an example: if SSL were an issue, I doubt I would ever use Apache to directly serve SSL anyway, as that's what SSL accelerators are for.)

That said, I still wouldn't leave the defaults in place with regards to "number of servers / threads"; although it isn't like nginx doesn't need the same configuration: the default value of worker_{processes,connections} is almost certainly inappropriate for your specific setup. I also use nginx (as a load balancer), and I have those values at 64/10096, up from the Debian-default of 1/1024.

The important thing, to me, is what a technology makes possible, not how well it is configured out of the box. Example: it is more damning to me that nginx only does HTTP/1.0 to upstream servers it is proxying for, a reasonably fundamental limitation of the codebase, than any transgression they could make in their default configuration.

Seriously: production is not about "out of the box", it never was, and it likely is never going to be. If you are trying to run a production server using "out of the box" defaults you are going to be forever disappointed by the performance and functionality of the offerings.

To make a minor modification to a statement I've made before (http://news.ycombinator.com/item?id=2145967 is the reference) about database servers:

We (as a civilization) simply do not have the science and theory yet to make the practicalities of setting up and maintaining anything at this level of complexity a totally seamless and simple process with well-understood performance characteristics unless you constrain absolutely every single variable.


If you are trying to run a production server using "out of the box" defaults you are going to be forever disappointed by the performance and functionality of the offerings.

I have sold hundreds of thousands of dollars of software to thousands of people which has run for years without (web-server related) incident on nginx. If I am fated to be struck down for my ignorance of nginx internals, that doom must be further down the road.

Where I do see failure is when I put the world's most popular blogging software on Apache, turn on caching and give it a gig of RAM to play with, and then watch the server get denial-of-serviced by the totally innocent actions of any ten readers attempting to access the website in a 15 second interval. Thank you, KeepAlive. Ten is not a big number on the Internet. Apache has been DOSed by my younger brother's comic book writing advice blog.

I accept that, for historical reasons, the Apache project does not optimize for being useful without being a master of arcane trivia like knowing what options the server is compiled with (!) to be able to operate a college student's blog (!!) without falling off the Internet. Much software developed these days pitched at web developers, including software which I write, has as a guiding design principle that you should be able to start using it in five minutes following simple instructions and those sensible defaults should mostly work. I think this development model is categorically better than software-by-the-experts, software-for-the-experts models.

P.S. apt-get mysql gets you a database which won't fall over if you try to host a comic book blog on it.


I'm sorry, but I've never tried to run a comic book blog. I have, however, had nginx taken to its knees with the default configuration on a system that could trivially handle the load it was receiving, requiring modifications to the exact same parameters one has to reconfigure after doing an apt-get install apache2-mpm-event to get good performance.

Frankly, it seems like your real issue here is that no one told you to apt-get install apache2-mpm-event instead of apache2, and if anyone would be to blame regarding that it would be Debian/Ubuntu (you will note I specifically stated "if I were a conservative Debian-based distribution using apt-get", not "the Apache project").

Again: this isn't a problem with the default configuration, this is you installing the wrong thing. From my perspective you may as well be complaining that you typed "apt-get install apache" and got Apache 1.3 (yes, I know this doesn't actually happen on Ubuntu: it simply doesn't work), when that "should" have given you Apache 2.x.


In 2.2, the event mpm doesn't support ssl, but in 2.3/2.4 it does support ssl.

In fact, IIRC, if you are using apche as a reverse proxy (mod_proxy) the event mpm should become the standard as it will detach the worker thread both for keepalive on the front end, and when waiting on response from the backend.


Will apache support Google's SPDY at some point? Also, how much is having to support all the legacy platforms slowing down development of new features?


http://code.google.com/p/mod-spdy/

It already has had support since the start (though non-ASF)...

It would be unwise to release a new protocol without releasing a proof-of-concept tie-in for an already existing web server, unless you want the project to be DOA.


Anything new on Waka?


Interesting now lighttpd, nginx and apache are all going to have some degree of integration of Lua in request processing. Its really useful having a real programming language built right in.


Agreed, and lua is so nice to embed that it makes it easy. The only thing that compares is tcl, and, well... Yeah, go lua!


Does this mean that apache2 + the Lua support are going to be the next node.js? On a more serious note, this is super-exciting.


If you're interested in something like Node.js for Lua, check out Ignacio Burgueño's LuaNode.

https://github.com/ignacio/LuaNode


One of the things that impresses me the most with apache httpd (except for stability and configurability) is the amazing level of integration with perl, not much of a perl hacker myself, but I've done a little development of C stuff for apache, and from what I can see all of the C api is also accessible from mod_perl.


Apache has a lot of distance to make up. Nginx with PHP-FPM and uWSGI are very fast and light on the memory. I also save time with my setup by not having to configure both Nginx and Apache (since Apache is way too slow to serve static content).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: