Balancer Battle – Load testing HAproxy, Nginx and HTTP-Proxy's WebSocket support

gyepi · on April 9, 2013

> nginx and haproxy were really close, it's almost not significant enough to say that one is faster or better then the other. But if you look at it from an operations stand point. It's easier to deploy and manage a single nginx server instead of stud and haproxy

From an operations standpoint, haproxy has other features (failover, cli management, clustering) that actually makes it a much better load balancer. I usually install all three haproxy, stud, nginx because they are each very good in their specific niche. As for the simplicity of installation, that can be handled with a configuration manager.

ominous_prime · on April 9, 2013

In fact, I've often used nginx in front of haproxy as an easy ssl offloader (this was before stud was as stable and complete as it is now).

HAProxy also has a raw tcp mode, which is great when you have to balance non-http services.

ominous_prime · on April 9, 2013

As a simple reverse proxy for small setups, there is almost no difference between the two, especially when running on a VM. You do miss many of the advanced balancing features in haproxy, but again, this config was a basic reverse proxy, not really load-balancing anything.

I haven't worked on these in a couple years, but on real hardware, haproxy could push much more bandwidth. We could saturate 10Gb ethernet fairly easily at the time, which wasn't possible at all with nginx.

TallGuyShort · on April 9, 2013

+1 to your second paragraph. I've previously carried out what I considered a pretty fair benchmark: real hardware, realistic server setup, well-informed configuration of both nginx and haproxy, etc... HAProxy had significantly bettter performance in my results. Not questioning their results, just surprised.

V1 · on April 9, 2013

I'll be more then happy to see if my configurations can be tweaked. And I was surprised as well, but it seems that it always boils down to the configuration files a small change can sometimes yield great results as seen with a specifying ciphers for SSL.

ominous_prime · on April 9, 2013

Like I said, running as a simple reverse-proxy, they should be pretty close. The added latency is really the differentiator when going to a single backend, and that's about all you're measuring. You're control itself is still <400rps.

Now add dozens of backend servers, and start throwing a couple orders of magnitude more traffic at it, and the differences will really start to show.

Please note: I'm not saying your numbers aren't useful. Many people will be using a setup like this, and would like to know what the differences would be, if any.

V1 · on April 9, 2013

Thanks for you feedback, I did do some testing with multiple backends (4 different servers) but it didn't show any substantial changes. I'll see if I can do some in depth in a while.

GibbyBorn · on April 10, 2013

netflix has chosen nginx to saturate much more than 10Gb. Do you ever try to tune it? In fact nginx default config just sucks, it seems for small vps.

jsmeaton · on April 9, 2013

It would be interesting to see the difference with HAProxy if this line was removed: https://github.com/observing/balancerbattle/blob/master/hapr...

What the option does is close the connection between the proxy and the backend so that HAProxy will analyse further requests instead of just forwarding to the already established connection.

To be fair, I don't know what nginx does - whether connections are kept open or shut down - so I'm not sure that it'd be a fair comparison.

Also interesting are the HAProxy built in SSL times. I'm surprised they're so slow. Perhaps the cipher is also the culprit. The cipher can also be specified in HAProxy.

  bind *:8080 ssl crt /root/balancerbattle/ssl/combined.pem ciphers RC4-SHA:AES128-SHA:AES:!ADH:!aNULL:!DH:!EDH:!eNULL

V1 · on April 9, 2013

The `http-server-close` option did not change anything. But by adding the ciphers it was able to squeeze out the same performance as Nginx. I'll update the tests accordingly. Thanks for the heads up!.

jsmeaton · on April 9, 2013

Thanks for testing these. I'm glad HAProxy passes muster with regards to SSL as we're currently moving to 1.5-dev18 from apache SSL offloading.

V1 · on April 9, 2013

I'll re-run the test with the advised changes and see if the performance improves.

otterley · on April 9, 2013

How many requests are made per connection? In order to better gauge performance we need a 3-axis plot, where the response rate is measured against various request-per-connection values and connection rates.

laumars · on April 9, 2013

Before each test all WebSocketServer is reset and the Proxy re-initiated. Thor will hammer all the Proxy server with x amount of connection with a concurrency of 100. For each established connection one single UTF-8 message is send and received. After the message is received the connection is closed.

source: https://github.com/observing/balancerbattle#benchmarking

nodesocket · on April 9, 2013

> I had the wrong ciphers configured. After some quick tweaking and a confirmation using openssl s_client -connect server:ip

Is this in the nginx config? Can anybody elaborate a bit further? Here is what I am currently using in my nginx config for ssl:

    ssl_session_cache shared:SSL_CACHE:8m;
    ssl_session_timeout 5m;

    # Mitigate BEAST attacks
    ssl_ciphers RC4:HIGH:!aNULL:!MD5;
    ssl_prefer_server_ciphers on;

V1 · on April 9, 2013

Are you referring to the cipher verification? I used:

  openssl s_client -host localhost -port 8082

Which is a openssl command. These settings were used for testing SSL: https://github.com/observing/balancerbattle/blob/master/ngin...

See https://gist.github.com/3rd-Eden/5345018 for the output of the openssl s_client for those ciphers. You'll see that `cipher : RC4-SHA` is used here. Which is one of the fastest if not the fastest cipher available.

wereHamster · on April 9, 2013

Uhm.. https://github.com/observing/balancerbattle/blob/master/ngin... ?

thrownaway2424 · on April 9, 2013

Is it just me, or are all these latency numbers terrible? For a local echo server I would expect mean latency at or below 1ms.

V1 · on April 9, 2013

It's not a `local` server. The proxy, websocket server and the benchmark utility are all on different servers.

thrownaway2424 · on April 9, 2013

I meant "local" as in "not on the other side of a continent". Here's me pinging a nearby DNS server that's 40 miles away from my desk:

  % ping -c 10 -A 4.2.2.1
  PING 4.2.2.1 (4.2.2.1) 56(84) bytes of data.
  64 bytes from 4.2.2.1: icmp_req=1 ttl=56 time=3.24 ms
  64 bytes from 4.2.2.1: icmp_req=2 ttl=56 time=2.88 ms
  64 bytes from 4.2.2.1: icmp_req=3 ttl=56 time=2.95 ms
  64 bytes from 4.2.2.1: icmp_req=4 ttl=56 time=2.90 ms
  64 bytes from 4.2.2.1: icmp_req=5 ttl=56 time=2.95 ms
  64 bytes from 4.2.2.1: icmp_req=6 ttl=56 time=2.91 ms
  64 bytes from 4.2.2.1: icmp_req=7 ttl=56 time=2.90 ms
  64 bytes from 4.2.2.1: icmp_req=8 ttl=56 time=2.87 ms
  64 bytes from 4.2.2.1: icmp_req=9 ttl=56 time=2.94 ms
  64 bytes from 4.2.2.1: icmp_req=10 ttl=56 time=2.94 ms

  --- 4.2.2.1 ping statistics ---
  10 packets transmitted, 10 received, 0% packet loss, time 1806ms
  rtt min/avg/max/mdev = 2.875/2.952/3.247/0.112 ms, ipg/ewma 200.705/3.019 ms

I can't imagine why a local websocket echo service should be 5x slower that this.

hoop · on April 9, 2013

I'd like to see Hipache tested against these as well. Hipache https://github.com/dotcloud/hipache

V1 · on April 9, 2013

hipache is build on top of http-proxy. That's why I haven't included it in the tests. They have seem to have switched to a fork of http-proxy but there aren't big (if any) performance modifications as far as i've seen from the commits.

breser · on April 9, 2013

Even though RC4 is fast you really shouldn't be using it: http://www.isg.rhul.ac.uk/tls/

jsmeaton · on April 9, 2013

There aren't a whole lot of other options yet. https://community.qualys.com/blogs/securitylabs/2013/03/19/r...

You can enable TLS 1.2 ciphers, but there is a large percentage of clients unable to use them, so the fallback is RC4.

> At the moment, the attack is not yet practical because it requires access to millions and possibly billions of copies of the same data encrypted using different keys

So we have some time.

V1 · on April 9, 2013

Even Google is using it ;)

devicenull · on April 9, 2013

Why do people always benchmark on virtual machines running on someone else's server, and expect meaningful results?

fywacro · on April 9, 2013

These benchmarks DO represent meaningful results.

Some benchmarks are designed to measure absolute performance, in order to answer questions like "How many servers should we buy, if we expect to handle X hits/second?" or "What's the limit of our app stack's scalability, in hits/second?" But... the O.P is NOT benchmarking for absolute performance.

But the O.P. here is benchmarking toward a different purpose: comparing relative performance. These tests are designed to answer a totally different kind of question: "Which of these app stacks performs best, given the same hardware budget for each stack?" or "How many extra servers do we need to buy, if we want to use Apache/mod_php instead of Nginx/php-fpm?"

With relative-performance benchmarks, we have to assume that it's valid to extrapolate from small servers to large, and from one server to many. That is, if Ngnix beats Apache by 1000% on a lonely 500MHz Pentium 3 box, what can we predict about NGinx vs. Apache performance on a dozen-strong cluster of quad-core, dual-socket 3.6GHz machines? In more general terms: How well does each application "scale up/out, horizontally"?

The answer depends on the application type and software architecture. For example, most modern web servers are multi-thread/process apps, with minimal shared state in between. Also, modern web stacks generally push cross-request state into a separate datastore layer (if any). As a result, modern web apps tend to scale up linearly, to the performance limits of the datastore layer. Until your database becomes a bottleneck, you can expect that 2x web servers == 2x hits/second.

devicenull · on April 9, 2013

He's benchmarking relative performance, but we don't know what else is running on the host machine during the benchmark. What if one of the other users on the host was running a very cpu/network intensive process during only one of his test runs?

V1 · on April 9, 2013

Because we people obviously don't have a datacenter in our own basement. And the common mistake people make when benchmarking is running the servers on their own machine and then use the same machine to benchmark the server it's running.

You need to have multiple (powerful) machines for this. And also, spinning up machines in the cloud is quite easy to do and allows people to reproduce the same test results because you have access to exactly the same environment.

snaky · on April 9, 2013

>You need to have multiple (powerful) machines for this

Oh really? For simple http 'hello world' comparision (where you are interested in relative numbers, not absolute ones) bechmark?

All you need is one old and slow laptop (with test contenders) and one modern and mighty (with test script). The only thing you have to be sure about is the test script can generate more load than test contenders can handle. Even if the old laptop isn't slow enough you can just add some predictable and stable load to cpu/disks/network/whatever is a bottleneck for them - you may use tools that are available for that or even quick & dirty hacks like one-liner 'while(1) {do some math}' that effectively make your 2-cores CPU 1-core while running with high system priority.

devicenull · on April 9, 2013

You can reproduce them? So you can match the loads running on all the other VMs on the same host then?

kalmar · on April 9, 2013

You could use cluster compute instances which use hardware virtualisation, and I think gives you the whole machine. With spot pricing, you could run them for about 21¢ an hour.

count · on April 9, 2013

Unless you're looking for absolute performance numbers, then doing it on a vm on someone else's server is probably the most realistic deployment scenario.

thebuccaneer · on April 9, 2013

Where is Varnish in this mix?

GibbyBorn · on April 10, 2013

Varnish is slow as hell, see: http://todsul.com/nginx-varnish

thebuccaneer · on April 10, 2013

And, yet, running the same test on my own varnish server yields 33k requests/sec with 50 concurrent requests and 28k requests/sec with 200 concurrent requests.

Something tells me that perhaps Tod Sul is doing something wrong.

GibbyBorn · on April 10, 2013

Absolute numbers are useless. Different hardware will produce different numbers.

thebuccaneer · on April 10, 2013

http://bit.ly/14XPD3O