Hacker News new | past | comments | ask | show | jobs | submit login
C500k in Action at Urban Airship (urbanairship.com)
89 points by cscotta on Aug 24, 2010 | hide | past | favorite | 21 comments



I'm curious what kind of OS tuning was required for 500k concurrent connections. It's obvious that some ulimit tuning is necessary, but I'm wondering if any other hard limits beyond that were hit? Also, in case the original author is reading, you might find this interesting as well. It's an article on trying to reach 1MM concurrent connections with Erlang.

http://www.metabrew.com/article/a-million-user-comet-applica...


We modified /etc/sysctl.conf to have really small buffers per socke, modified ulimit to allow for 999999 fds, and modified /proc/sys/net/ipv4/ip_local_port_range to max out ephemeral ports per connection.


Thanks for doing this, definitely very interested in the follow up posts. Having just finished working for a company that 'claim' to be experts in this area it is very interesting to see you guys wipe the floor with your connections/node ratio.


  Jumping from 35,000 connections per node on an EC2 
  Small instance to over half a million on a single 
  EC2 Large...
You moved the goalpost. I think the conclusion would make a lot more sense if you compared apples to apples when it comes to your server... either test everything on a small or on a large.


Apologies for not publishing the EC2 Small number in the original post. It was 120,000 for the pure Java NIO implementation, but I'd like to explain a bit more on that one.

We hit 120k clients at the time the process was killed by the kernel, but had about 500mb free memory left when the OOM kill occurred. Here's why (and please correct me if I'm mistaken - I'm not a kernel expert, but have done a lot of reading in this area over the past couple months):

Small EC2 instances are restricted to running a 32-bit kernel. The 32-bit Linux kernel (2.6 series) allocates memory into three zones, with the first and smallest slot reserved for DMA (~16mb), the next portion reserved for kernel functions ("low memory", ~896mb), and the rest allocated as "high memory" for userspace. There are very limited options to configure this allocation, aside from recompiling and maintaining a custom kernel for our purposes, which we do not want to do. The Hugemem kernel allocates low memory differently, but is no longer being actively recommended (that I can see), and only makes sense for servers with > 4GB anyway.

Because TCP sockets are allocated in low memory, and its size is relatively fixed, 120k sockets will exhaust low memory despite about 500mb of high memory being free and unallocated. At this point, the kernel has no memory left allocated to itself to do work, so the OOM killer steps in and shoots the process.

The 64-bit kernel makes no distinction between "low" and "high" memory, and does not suffer this problem. We switched to an EC2 Large instance in order to avoid this limitation and properly test the service. In the end, if Amazon were to offer a 64-bit Small instance, it would be ideal for our application and would push our cost per client even lower, though it's currently at a very acceptable spot.

While the metrics in this post focus on connections per node, our ultimate goal involves both maximizing the number of connections per instance, in addition to minimizing the number of instances required (essentially, we want to [safely] maximize density across the board). The sentence you quote refers to this cumulative goal rather than the particular comparison of implementations and instance types. I should've been more clear.

Anyhow, pardon this omission from the original post. At some point, I might write a bit more about the TCP/kernel-level issues we ran into if anyone's interested.


It would be interesting to know what the "slow" software looked like on the big instances, then. Still, a "large" instance is not fifty times the size a "small" instance, so it looks like you're easily coming out ahead.


It is apples-to-apples if you're comparing cost, which (I think) is what UA really cares about. But they didn't do the math explicitly and, like you, I was curious, so:

A small instance costs 8.5¢/hr, so if it handles 35,000 connections with Eventlet that's 0.00024¢/hr/conn. Put another way, that's $57.60 to handle a million connections for a day.

A large instance costs 34¢/hr, so at 500,000 connections that's 0.000068¢/hr/conn, or $16.32 for a million connections for a day.

So the NIO implementation apparently saves UA 72% over the Eventlet implementation. Impressive!

(This is where I wait for someone from UA to show up and correct my numbers...)


Eventlet stopped being performant at around 60k iirc. (I work for UA)


Can someone clarify what a spike is? It doesn't appear to refer to just resource load...

"...we spent several hours fanning out that spike to include three versions"


It's a spike as in a spike solution -- an end-to-end prototype to see if a chosen method of solving the problem is feasible, but without writing tests or handling all error cases or the like.

http://c2.com/xp/SpikeSolution.html


thank you.


Why the difference when working with code written in Scala?

What does Scala do to make the number of connections 50% less?


I'd be particularly interested to know what the JVM spikes were doing with their respective connections. I'd love to profile some roughly-equivalent implementations using Erlang/Twisted/libevent/et al.

FOR SCIENCE


Someone mentioned this in the comments, but I'm curious about the outcome of the Node.js spike as well.


Does keeping a socket open affect the handset battery life much?


Not in our tests, and nobody in our beta test period had any comments. The logic around controlling the connection is pretty intelligent in what it does.

A connection that's open isn't problematic at all, assuming it's being used - lots of setups/teardowns can be an issue (which is why polling apps will consistently take up a large portion of your battery life). The main issue is if the app is using a radio when it shouldn't be. Luckily, we can control this pretty well with our handling code and have been able to balance out needs pretty well between delivering a message as quickly as possible and not destroying battery life.

If you're an Android dev, we'd love your feedback on it in general :)


Interesting, but I wonder why they didn't take a look at Erlang - I mean this is pretty much what it was meant to do.


Perhaps they did not have a cool enough accent? 'EARRRRRLang'

Kidding - I would be very interested in more on why Scala failed as well. Both Scala and Erlang are implementations of the Actor model and should theoretically be very well suited to this.


When trying to support the maximum number of connections per box the model with the best conceptual fit doesn't necessarily have the best functional characteristics. Actor models can have much higher per-connection resource usage. Message passing isn't free, and wrapping each connection in an Actor/Greenthread structure isn't free either.

At the end of the day storing sockets in a hashmap is a pretty compact data structure. If you can minimize synchronization/locking, threads shared state is extremely efficient from a CPU and memory standpoint. Maybe not so efficient from a developers time/sanity standpoint though. :-)

Do note that this article is purely about an edge server though. Its job is essentially to hold sockets and communicate with internal queues. The message passing model is alive and well, albeit at a higher level.


I, too, am wondering why the Scala implementation didn't fare as well as the Java implementation.

Offtopic: I'm not an expert on functional programming (just getting my feet wet; forgive me for being so naive), but after reading Scala By Example I have this nagging feeling that a functional implementation of an algorithm might actually run slower than an imperative implementation of the same algorithm. I mean, you're copying data all around the place, creating new instances of objects instead of modifying them in place, using a ton of recursion everywhere. Could anyone offer some perspective on this?


Facebook has some great notes on their use of Erlang in the Chat node.

http://www.facebook.com/note.php?note_id=51412338919




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: