Finally, Python may not be the best language for this real-time endpoint. Eventlet is a fantastic Python library and it allowed us to build something extremely fast that has scaled to several thousand concurrent connections without breaking a sweat on launch day, but it has its limits. There is a large body of work out there on handling a large number of open connections, using Java's NIO framework, Erlang's mochiweb, or node.js.
I wrote justin.tv's chat backend, in Python, using the Twisted network libraries. It has scaled to peaks of more than half a million concurrent chat connections, on 8 fairly modest commodity servers. Python is more than capable here, with the right networking approach. Feel free to ask me anything about it.
I'd like to second this. Although Twisted's documentation and 5-year-old bugs will have you cursing the libraries name at times it really does perform well once you figure it all out.
We're using Python + Twisted for our XMPP servers at HipChat (http://www.hipchat.com) and neither have let us down. Hopefully we can reach 500k+ connections on such a small number of servers as well.
One specific note about Twisted/Python vs node.js: The combination of Python's yield and Twisted's defer.inlineCallbacks generator makes it really easy to write and maintain nonblocking code. In node I find it far too easy to get lost in a sea of callback functions. Here's what you get to do in Twisted: http://enthusiasm.cozy.org/archives/2009/03/python-twisteds-...
With Node.js, you can use defers (promises), as well. As long as you feel that promises isn't too much mental overhead (obviously you don't), I would recommend using promises for any sizable Node.js application.
That's awesome, thanks for the offer, I'll definitely take you up on it and ping you if we run into problems scaling up that live endpoint using Python.
As a side-note, this is why I love the software engineering community: everyone's always so willing to share knowledge and help each other out. Let's hope we never lose that as a community!
Yep, but we liked Eventlet better because of the simpler programming model and better support for things like psycopg2 (our database driver). So far it's worked out great--it's held up really well under our current amount of load.
That said, Tornado is a really neat piece of software, and it worked very well for FriendFeed and others. It really just came down to personal preference.
In what way does Tornado not support psycopg2? Because the Tornado code includes some convenience functions for working with an unmodified MySQLdb driver, I assume that FriendFeed also used the MySQLdb driver with no modifications. If the MySQLdb driver works with Tornado, shouldn't psycopg2 also work?
By using an unmodified MySQLdb driver, FriendFeed did not make asynchronous calls to the database. I searched around and found this comment w/ response from Paul Buchheit about the issue: http://news.ycombinator.com/item?id=1479301
Do you need to make async calls to the database because your queries are slow?
We're also using Twisted a lot at my company, IQ Engines. It's powerful and has done a pretty good job for us so far. I just wish the documentation, which is crucial for getting new coworkers up to speed, was better. Do you guys continue to build new services at Justin.TV using it or have you migrated to other eventloops ie Tornado, gevent etc?
We use it for new things - I think the last thing I wrote with it was a custom dynamic dns server we needed - that was just a few weeks ago I think.
Yeah, the documentation is awful. Fortunately I've pretty much always just assumed that all documentation is going to be bad, so I pretty much never even bother trying - I just read the source.
Congrats on the launch! I've actually found myself using Convore more often than I thought I would. Particularly because I'm often in places that block irc.
Have you looked at distributed counting in Cassandra for your counting needs yet[0][1]? Great info on its development and use at Twitter. It seems like you too have lots of interesting things to count. Your initial choice of Postgres for everything was a bit interesting. Your problem seems like a perfect fit for a hybrid solution (which you are already implementing by way of redis) that I think more and more companies will come to embrace by hook or by crook.
Yeah, I've been following that work for quite a while, it's really impressive! I'm a big fan of Cassandra in general (in fact, I wrote an example application to help teach beginners about it: https://github.com/ericflo/twissandra).
The decision to go with PostgreSQL over Cassandra (or another distributed system like Riak or HBase) was simply because it gave us the most flexibility to change our product quickly while we operate at low scale. And if I'm honest with myself, we're operating at very low scale right now.
In the future as we scale up, Cassandra's distributed counters will be one of the first places we look.
I realize you may also not want to answer this, and that's also alright, but I figure I may as well ask, as I really am curious (and maybe someone else will answer from the perspective of their company): why not? Whenever anyone has asked me for a stat on Cydia, whether it be active devices over some time range, daily revenue, costs related to <insert-subproject-here>, or what have you, I tend to break out a quick SQL query and provide an exact answer (assuming the question is of something I an measure: not all are); is this stupid of me? I've noticed a ton of companies refuse to disclose numbers, and I've always assumed that the result will be that anyone listening will just assume "ok, so almost none then" unless you give them a good answer and back up how you calculated it, but does this actually "hurt the cause"? (I do not have the benefits of the years if startup experience that you have access to by being a part of Y Combinator, so I try to get advice whenever I can ;P.)
The top 20 public groups topped out at about 250 users last night, add on a bit for the other public groups and a little for private groups. Maybe 400 concurrent users max so far?
Just my guess, but I'd be surprised if it's orders of magnitude different.
That's exactly the approach we're taking. We're not going to spend any time prematurely worrying about various scaling things until we can forecast that it's legitimately going to be a problem.
Give that you know of IRC (and were inconvenienced that it was blocked), have you tried irccloud.com? The people who did that are also on HN and it has worked out great for the friends of mine I've gotten to use it so far (I have a lot of friends with inconsistent network connections or who are not very technical that I wanted to be able to access my IRC server).
Sort of a random query, but why are you using Celery with Redis? Last time I looked into it, the documentation basically said "you can, but you should really use AMQP".
Also, from 2.2 on Redis support is complete. It's not as reliable as AMQP (no message acknowledgements, and you can lose minutes of messages when not using append_only mode)
Kind of surprising to not see node+socket.io in there, but it is nice to see some python projects (Celery/Eventlet) doing the same job. I'd be curious to see if they end up swapping that out if/when scaling becomes an issue.
Swapping to node.js doesn't solve any of their scalability problems. The problems they have to solve with scaling their current system will be identical to how they will have to scale stream servers written in node.
I only skimmed the article, but my understanding was that they're using long polling to connect the clients to the message queue. In that case, using node.js and socket.io (as swanson suggested) could considerably reduce the number of concurrent connections since clients with WebSocket or Flash support wouldn't need to hold connections open waiting on publish events.
That approach isn't unique to node.js, of course, but the combination of node.js and socket.io would be a valid way to improve scalability [0, Fig. 3] of most any app relying on long polling.
You can close your browser, go home, re-open your browser, and you've lost nothing. You can see that as a permanent "screen" irc.
Also, the fact that it's web based give them the opportunity to build the protocol as they want. For instance, they have the possibility to prettify snippet of codes, play video, show images (as they are already doing), etc.
It is also a bit different as each "channel" are "group" in convore where you can have multiple topics. So basically, for the django community, it's like if you had #django-performance, #django-host, #django-debug, and on and on. So, the second you join a group, you can start new topics.
So, basically, I know it is possible with IRC if you stretch it.. for instance, building your own irc client or hacking with mIRCscripts (Been there done that). Also, you of course can change the IRC protocol and host it yourself.. But then, convore just come with all that for free with a beautiful web-based interface.
Note also that I feel it's more serious as you need to login with your facebook/twitter which means less trolls.
The founder of convore really liked IRC.. so if you want, it is IRC+Twitter 2.0.
You don't need to log in with your Facebook/Twitter account; I was able to sign up just fine with an email address. It's just one of the authentication options they provide. (It also lets them import your contacts from those services, which is handy for people who are big into Facebook or Twitter.)
I wrote justin.tv's chat backend, in Python, using the Twisted network libraries. It has scaled to peaks of more than half a million concurrent chat connections, on 8 fairly modest commodity servers. Python is more than capable here, with the right networking approach. Feel free to ask me anything about it.