> nginx and haproxy were really close, it's almost not significant enough to say that one is faster or better then the other. But if you look at it from an operations stand point. It's easier to deploy and manage a single nginx server instead of stud and haproxy
From an operations standpoint, haproxy has other features (failover, cli management, clustering) that actually makes it a much better load balancer. I usually install all three haproxy, stud, nginx because they are each very good in their specific niche. As for the simplicity of installation, that can be handled with a configuration manager.
As a simple reverse proxy for small setups, there is almost no difference between the two, especially when running on a VM. You do miss many of the advanced balancing features in haproxy, but again, this config was a basic reverse proxy, not really load-balancing anything.
I haven't worked on these in a couple years, but on real hardware, haproxy could push much more bandwidth. We could saturate 10Gb ethernet fairly easily at the time, which wasn't possible at all with nginx.
+1 to your second paragraph. I've previously carried out what I considered a pretty fair benchmark: real hardware, realistic server setup, well-informed configuration of both nginx and haproxy, etc... HAProxy had significantly bettter performance in my results. Not questioning their results, just surprised.
I'll be more then happy to see if my configurations can be tweaked. And I was surprised as well, but it seems that it always boils down to the configuration files a small change can sometimes yield great results as seen with a specifying ciphers for SSL.
Like I said, running as a simple reverse-proxy, they should be pretty close. The added latency is really the differentiator when going to a single backend, and that's about all you're measuring. You're control itself is still <400rps.
Now add dozens of backend servers, and start throwing a couple orders of magnitude more traffic at it, and the differences will really start to show.
Please note: I'm not saying your numbers aren't useful. Many people will be using a setup like this, and would like to know what the differences would be, if any.
Thanks for you feedback, I did do some testing with multiple backends (4 different servers) but it didn't show any substantial changes. I'll see if I can do some in depth in a while.
What the option does is close the connection between the proxy and the backend so that HAProxy will analyse further requests instead of just forwarding to the already established connection.
To be fair, I don't know what nginx does - whether connections are kept open or shut down - so I'm not sure that it'd be a fair comparison.
Also interesting are the HAProxy built in SSL times. I'm surprised they're so slow. Perhaps the cipher is also the culprit. The cipher can also be specified in HAProxy.
The `http-server-close` option did not change anything. But by adding the ciphers it was able to squeeze out the same performance as Nginx. I'll update the tests accordingly. Thanks for the heads up!.
How many requests are made per connection? In order to better gauge performance we need a 3-axis plot, where the response rate is measured against various request-per-connection values and connection rates.
Before each test all WebSocketServer is reset and the Proxy re-initiated. Thor will hammer all the Proxy server with x amount of connection with a concurrency of 100. For each established connection one single UTF-8 message is send and received. After the message is received the connection is closed.
See https://gist.github.com/3rd-Eden/5345018 for the output of the openssl s_client for those ciphers. You'll see that `cipher : RC4-SHA` is used here. Which is one of the fastest if not the fastest cipher available.
I meant "local" as in "not on the other side of a continent". Here's me pinging a nearby DNS server that's 40 miles away from my desk:
% ping -c 10 -A 4.2.2.1
PING 4.2.2.1 (4.2.2.1) 56(84) bytes of data.
64 bytes from 4.2.2.1: icmp_req=1 ttl=56 time=3.24 ms
64 bytes from 4.2.2.1: icmp_req=2 ttl=56 time=2.88 ms
64 bytes from 4.2.2.1: icmp_req=3 ttl=56 time=2.95 ms
64 bytes from 4.2.2.1: icmp_req=4 ttl=56 time=2.90 ms
64 bytes from 4.2.2.1: icmp_req=5 ttl=56 time=2.95 ms
64 bytes from 4.2.2.1: icmp_req=6 ttl=56 time=2.91 ms
64 bytes from 4.2.2.1: icmp_req=7 ttl=56 time=2.90 ms
64 bytes from 4.2.2.1: icmp_req=8 ttl=56 time=2.87 ms
64 bytes from 4.2.2.1: icmp_req=9 ttl=56 time=2.94 ms
64 bytes from 4.2.2.1: icmp_req=10 ttl=56 time=2.94 ms
--- 4.2.2.1 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 1806ms
rtt min/avg/max/mdev = 2.875/2.952/3.247/0.112 ms, ipg/ewma 200.705/3.019 ms
I can't imagine why a local websocket echo service should be 5x slower that this.
hipache is build on top of http-proxy. That's why I haven't included it in the tests. They have seem to have switched to a fork of http-proxy but there aren't big (if any) performance modifications as far as i've seen from the commits.
You can enable TLS 1.2 ciphers, but there is a large percentage of clients unable to use them, so the fallback is RC4.
> At the moment, the attack is not yet practical because it requires access to millions and possibly billions of copies of the same data encrypted using different keys
Some benchmarks are designed to measure absolute performance, in order to answer questions like "How many servers should we buy, if we expect to handle X hits/second?" or "What's the limit of our app stack's scalability, in hits/second?" But... the O.P is NOT benchmarking for absolute performance.
But the O.P. here is benchmarking toward a different purpose: comparing relative performance. These tests are designed to answer a totally different kind of question: "Which of these app stacks performs best, given the same hardware budget for each stack?" or "How many extra servers do we need to buy, if we want to use Apache/mod_php instead of Nginx/php-fpm?"
With relative-performance benchmarks, we have to assume that it's valid to extrapolate from small servers to large, and from one server to many. That is, if Ngnix beats Apache by 1000% on a lonely 500MHz Pentium 3 box, what can we predict about NGinx vs. Apache performance on a dozen-strong cluster of quad-core, dual-socket 3.6GHz machines? In more general terms: How well does each application "scale up/out, horizontally"?
The answer depends on the application type and software architecture. For example, most modern web servers are multi-thread/process apps, with minimal shared state in between. Also, modern web stacks generally push cross-request state into a separate datastore layer (if any). As a result, modern web apps tend to scale up linearly, to the performance limits of the datastore layer. Until your database becomes a bottleneck, you can expect that 2x web servers == 2x hits/second.
He's benchmarking relative performance, but we don't know what else is running on the host machine during the benchmark. What if one of the other users on the host was running a very cpu/network intensive process during only one of his test runs?
Because we people obviously don't have a datacenter in our own basement. And the common mistake people make when benchmarking is running the servers on their own machine and then use the same machine to benchmark the server it's running.
You need to have multiple (powerful) machines for this. And also, spinning up machines in the cloud is quite easy to do and allows people to reproduce the same test results because you have access to exactly the same environment.
>You need to have multiple (powerful) machines for this
Oh really? For simple http 'hello world' comparision (where you are interested in relative numbers, not absolute ones) bechmark?
All you need is one old and slow laptop (with test contenders) and one modern and mighty (with test script). The only thing you have to be sure about is the test script can generate more load than test contenders can handle. Even if the old laptop isn't slow enough you can just add some predictable and stable load to cpu/disks/network/whatever is a bottleneck for them - you may use tools that are available for that or even quick & dirty hacks like one-liner 'while(1) {do some math}' that effectively make your 2-cores CPU 1-core while running with high system priority.
You could use cluster compute instances which use hardware virtualisation, and I think gives you the whole machine. With spot pricing, you could run them for about 21¢ an hour.
Unless you're looking for absolute performance numbers, then doing it on a vm on someone else's server is probably the most realistic deployment scenario.
And, yet, running the same test on my own varnish server yields 33k requests/sec with 50 concurrent requests and 28k requests/sec with 200 concurrent requests.
Something tells me that perhaps Tod Sul is doing something wrong.
From an operations standpoint, haproxy has other features (failover, cli management, clustering) that actually makes it a much better load balancer. I usually install all three haproxy, stud, nginx because they are each very good in their specific niche. As for the simplicity of installation, that can be handled with a configuration manager.