Hacker News new | past | comments | ask | show | jobs | submit login
An HTTP reverse proxy for realtime (fanout.io)
150 points by jkarneges on April 9, 2013 | hide | past | favorite | 43 comments



This is awesome! I released something exactly like it a few years ago, hookbox (MIT licensed): https://github.com/hookbox/hookbox/blob/master/docs/source/i...

Basic idea is this: You put all your real-time stuff in a message queue (MQ) which communicates directly with the browser. For authentication / authorization and various other forms of permission / logging, you have the MQ communicate with the web framework via http callbacks (Webhooks) and a standard REST API. So the architecture is:

User <--Websocket--> MQ :: publish/subscribe

MQ --Webhooks--> PHP/Django/Servlets/etc. :: user signed on, user joined a channel, etc.

PHP --REST--> MQ :: publish(msg), remove(user, channel), etc.

The key is to include cookie information in the callbacks from MQ -> PHP so the callback happens in the context of the user session. Suddenly you can do things like write a chat app in 30 lines of php + js, or a persistent time series in 20, and it really feels magical.

I actually started Hookbox almost as a statement of irony, because I was really frustrated about the major pushback I was getting to sockets in web browsers at the time. I'd just finished writing/submitting the initial proposal for Websocket, and I wrote this tongue-incheeck piece about the mismatch between typical web development and network server programming: http://svwebbuilder.wordpress.com/2008/10/20/html5-websocket...

So Hookbox started as a 2-3 day project that took on a life of its own for a while and ended up being really useful. This project was one of my smaller open source codebases and to this day I receive tons of interest and requests for maintenance, though I've abandoned it for years due to time.

I'm sure there's a huge market for this sort of thing. It's great to see Pushpin, I'll definitely check it out!


This is a great product, thanks for building it.

In a previous project that I had worked on, we were spawning up complex pieces of infrastructure using Chef integrated with fronting Rails app. Realtime updates were always tough to orchestrate with custom Rabbit MQ feedback from the Chef clients pushing out to Rails clients using JS.

I believe a solution like this one would come in very handy for pushing out realtime updates for long running infrastructure requests from a distributed system. Kudos!


Nice! We're working on something similar in Ruby EM at http://firehose.io/ if anybody is interested.

Glad to see more streaming REST implementations coming alive!


Brad, firehose looks cool too. Are you guys using it in production? (Should I be worried that the Travis build is failing?)


Yep! We pump a lot of messages through this per day. Not sure why Travis is failing, probably a recent em-hiredis bump. I'll check it out.


I started to feel like i am falling behind with all these interesting libs, servers, frameworks, protocols, languages, platforms recently.

Especially, if you are a student and trying to keep up with all these new stuff. It is becoming really hard to decide what to learn next or what to focus on.


Keep an eye on what's out there, surface level among all the things that are useful (or just interesting to you). Dive in when you have an actual problem to solve. Don't decide before you dive in, use that surface knowledge to know what your options are and where to start but don't make decisions with it (can be tricky).

And as others are saying, know your fundamentals, and know your theory. Get experience doing something real start to finish as often as possible and in order to do that you will have to dive into things. That's how to decide what to learn next.


Upvote.

At the startups that I've worked, things move quickly and unless you have a personal motivation to learn something, you can't keep up.

It's best to get a cursory view of your available options (star them, bookmark them, commit them to memory) and when the situation arises for a specific problem to solve, you'll know of a handful of options to further investigate.


Don't get blinded by this stuff. It's always moving; it's fun coming up with new ways to do the same thing. Plus, everyone thinks they can do a better job (myself included!). Kinda like how kids think they'll be the best parent or teacher ever when they grow up.

Focus on understanding various modes of thinking, the various key paradigms of programming. Work on understanding basic algorithms, so that given a problem, you can do some rough math and have a general idea of the bounds involved. Spend some effort understanding the low-level stuff.

Long polling's probably been used for over a decade. You should be able to figure out why long polling exists. Think about, for instance, implementing an IRC chat room in the browser, without using any special new features, just HTML and old JS. Once you understand that, you should be able to skim this article, and understand the point and value of the tool without having getting into the details. For me, the point of reading such articles is to see if someone came up with a new way of viewing something, an idea or perspective that might expand how I deal with another problem in the future. Only secondarily do I think I'll use most of the tools I read about (although it's nice to know such a thing exists, for the day you do need it).


Learn the fundamentals not the frameworks. The frameworks will come and go but the fundamentals change much more slowly. For example in this case, if you don't already, learn how long polling works, and what the alternatives are, learning how web sockets work, learn how comet works etc. It will lead on to how TCP/IP works at a basic level anyway.


I've written memqueue (https://github.com/halayli/memqueue/) for similar purposes.

Memqueue is a revision-based queue server with a REST API. Multiple consumers can poll the same queue at a different pace by using revisions. A revision is sort of a cursor that allows a consumer to specify where to poll from in the queue. If a connection drops it's not a problem, you continue from the revision you stopped at after you reestablish a connection. Each time a new message arrives to the queue, the revision is bumped by one and consumers are expected to poll from the new revision.

It also allows you to specify message & queue expiries so you don't have to manage memory growth.


I like seeing mongrel2 as part of this stack. It seems like the perfect fit for an architecture like this and I get the feeling that's it's heavily underused.


Mongrel2's usage of ZeroMQ messages to manage low-level protocol was a great influence to Pushpin's design. In fact, one of the other components in the stack, Zurl, is basically the inverse of Mongrel2 (doing outbound HTTP instead of inbound). Love all these little worker components.


This looks really interesting. We've been using the nginx-push-stream-module to achieve something similar, but the public facing HTTP API design isn't very flexible, so something that lets you configure this explicitly would be great.

https://github.com/wandenberg/nginx-push-stream-module


Thanks for the comment. At my last job, we were unable to migrate an existing API over to a certain realtime solution without breakage, and that was motivation to create something more versatile.


Personal peeve: There is still a name collision beetween the popular use of "real-time" to describe "live" updates (usually as the defining aspect being "without polling") versus the definition of "real-time", implying that events are delivered to the customer in a guaranteed period of time. Which is obviously not true.


Can you give an example of any 'true "real-time"' projects implementing any aspect of HTTP? I think this ship has sailed.


Yeah, for example, there are web services whose design goal is <30ms completed request so they can guarantee their client timely data based on a SLA. If this "real-time" project can't guarantee time of transactions, it's not "real-time".


How is this even possible? All it takes is a clogged router buffer 'somewhere' along your network path and that <30ms web request is gonna get blown away. Am I missing something here?

I'm probably just being ignorant lol. Can you point me to an example company offering a service like this?

Thanks.


I can't, but you could for example look at the financial services industry or providers of real-time data feeds.

In general these services are not available through the public internet so routers with clogged buffers are not usually an issue. But it depends on the SLA.


I realize http is extremely well known and documented so it's relatively easy for backend communications but I've always wondered how efficient it actually is considering how old the standard is.


It's actually fairly good now that most of the improvements in 1.1 can be used with browser support like pipelining, compression, keep-alives and so on. However it isn't perfect SPDY is better still as it allows for things like muxing and pre-emptive resource downloading.


I'm glad you built this Justin. I have wanted this functionality many, many times in the social gaming world and never had the time/budget to build it and instead hacked something together. I love seeing how XMPP has influenced the design of very useful and much more pragmatic solutions to near-real-time communications.


That's nice, but I have a question about the mechanism.

In the document I saw that Pushpin send response to client while it's waiting for the response from web application, right? How is it possible? I mean if you send a response to the client, you can't send anymore responses after that.


If Pushpin is told by the web application to hold a request open, then it does not send anything to the client (at least not in the long-polling case. streaming works different but I've yet to write about it). So when the application publishes data at a later time, it is the first payload the client sees.


This is really cool. To make sure I understand, using it would replace the need to use services like pusher.com? Also, if I were using, say, Sinatra with nginx and unicorn, where would this sit in the pipeline?


To be clear, this is software, so comparing it to Pusher.com is a bit of an apples to oranges comparison. Pushpin is more comparable to Socket.io, Juggernaut, Faye, etc.

What makes Pushpin special is that you can control what the outside-facing HTTP exchanges look like, which makes it good for implementing APIs, and may also be useful if you're just really anal about how your client/server interactions work. :)

The cost is that you need to design your protocol and write client code (the other solutions already have their own special protocols and come with corresponding JavaScript libraries ready to go). So whether or not Pushpin is good for you depends on what level of control you're after.

In the pipeline, Pushpin goes in the very front, just behind a load balancer (if any). The reason for this is you could put instances of Pushpin in different geographic locations, all fronting an application in a single location. So you want it the furthest out, closest to any users that might be connected to it.


Thanks!


Cool! This sort of architecture looks great for our Rails API we're developing. I'd be curious to see benchmarks to know how many connections I can expect to be maintained by a single instance of pushpin.


AFAICT, you can do the same with newer versions of HAProxy and nginx. Is there any advantage of using this over those well tested and proven alternatives?


Nginx has the Push Stream Module, which is similar but not quite as versatile. I don't think you could implement the "incremental counter" API described in the Pushpin article, for example. Whether or not this matters to you comes down to how much control you need.

I'm not aware of such functionality in HAProxy but would love to hear about it (I'm an HAProxy fan :)).


While "incremental counter" is a cool idea, I fail to see how it really improves anything. You're just moving the same problem from one layer to another. Node.js does all Pushpin does and you can write "simple chat server in 30 lines of code" as well, without the need to set up another piece of software.

This look like band-aid for developers still stuck with Python, PHP, whatever... technology from 2000's. I see I've got a down-vote from fanboys already, but that's just the way HN works: it's hard to have a constructive coversation, but it's easy to hate.

P.S. @jkarneges, this reply was not directed at you, but whoever down-voted my simple question. I mean, how can you down-vote a question? Let's not question anything and spread love, a la Facebook "like-button-only" style. :(


I've upvoted your original post to see if that helps.

The article does play up the compatibility with legacy frameworks. However, the proxy approach itself was designed independently of this, and the versatility turned out to be a bonus. Some background: http://blog.fanout.io/2013/02/10/http-grip-proxy-hold-techni...

Basically, I'm positing that as a system gets larger, then moving the problem to an outer layer is good design, even if all of your backend code is event-driven. You can use Pushpin and Node together with a straight face. :)


Awesome Justin,

finally something that every web developer understands immediately and really takes the pain out of realtime for a lot of people.


mirror: http://google.com/search?q=cache:http://blog.fanout.io/2013/...

If you are using wordpress and not wp-super-cache, when a page becomes popular your server is going to have a bad time.

update: looks like they fixed it


Thanks for the tip! I've enabled wp-super-cache now, and the page seems to be working again.


Love the idea and clean API! Is this production quality?

Also, just curious - how come qt is required?


Current state of code is "it probably works". We will be hardening it over the next few weeks though to get it production ready (will be deploying it on fanout.io, to replace our older code).

Qt, because... it's a nice C++ event-driven lib. :)


Thanks, can't wait to try it out.

Must look into Qt then sometime. :)


It's possible to link to the Qt core and network components without bringing in the GUI.


Does each publish_http_response_async release all pending requests or just one?


It releases all requests held on the specified channel.


Impressive!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: