Million User Webchat with Full Stack Flux, React, Redis and PostgreSQL

megaman821 · on Feb 16, 2015

The Flux terminology is confusing to me. It looks like the Observer pattern to me.

* Stores contain Observables

* Components (or Views) contain Observers

* Actions are Proxies

So the article is basically saying the Observer pattern is scalable, but uses the buzz-phrase "Full Stack Flux" instead. To make it even worse it is only a theoretical application of this pattern.

remon · on Feb 16, 2015

I suppose there is some value to this as a thought experiment but pretty much every tier in that architecture has breaking flaws. The most relevant being that anything that has an audience of 1 million users cannot run on an architecture that has single points of failure.

chucksmart · on Feb 16, 2015

TLDR;'thought experiment', feh, if this was actual tested result would be much more interesting.

_almosnow · on Feb 16, 2015

But if the author didn't "thought experiment" about that, how else would he bait you into clicking...

aantix · on Feb 16, 2015

Physically the hardware can't handle it or are you just advocating that it shouldn't?

remon · on Feb 17, 2015

Hardware fails. This architecture has single points of failures which means if one of those instances fails the entire system goes down. This is generally not acceptable for environments with 1 million active users (which implies high business value)

gfodor · on Feb 16, 2015

AFAICT, Facebook engineered their client side code around Flux in order to eliminate two-way data binding in their user interface code, which leads to all sorts of issues. I don't imagine they push the pattern into the server. Their relay stuff still relies upon a smart data tier which understands a query language called GraphQL.

I definitely don't think they considered implementing a dumb dispatcher and store layer on the database server using stored procedures. (This seems terrifying to me, I don't see the upside.)

It's an interesting experiment but I think this might be an example of being too aggressive in trying to generally apply a design pattern that was motivated by a specific problem.

bsaul · on Feb 16, 2015

Great post, but has this design been implemented to effectively handle something close to a million users ( or even 100k and show that no part is overheating) ?

As in every new and complicated design, i'm a bit skeptical of rules of thumb calculations. You never know what the wrong latency issue at the wrong place can do...

moe · on Feb 16, 2015

but has this design been implemented to effectively handle something close to a million users

Obviously not. Some of his numbers are off by an order of magnitude.

E.g. he claims "10 million messages/sec" for a single redis instance.

In reality redis tops out at well under one million messages/sec; http://redis.io/topics/benchmarks

The design is almost comically bad (single source of truth for a "scalable" chat app?!). This is either an attempt at parody or this guy must be suffering from a rather severe case of second system effect...

[1] http://en.wikipedia.org/wiki/Second-system_effect

elierotenberg · on Feb 16, 2015

Oh, you're absolutely right, 10M/s is a mistake, which I've corrected.

Few remarks though:

- it's still way below the number of actions we're talking about here (~100k/s)

- since redis is only used as a generic MQ and not as a store, it can be sharded at the app level without the pain usually associated with redis clustering

- I've deployed a similar (but less performant) design for the player of a gaming website, which is in use in production for more than a year, and works like a charm (we're talking ~5-50k users per channel on a daily basis). This is definitely a "second-system" pattern, but I try to avoid the associated pitfalls :)

I'd be genuinely interested by your feedback!

thedufer · on Feb 16, 2015

Have you ever used redis MQ at scale? I'm guessing no; the redis server is not your bottleneck. The fact that every server proc has to parse every message puts a hard ceiling on the amount of traffic you can handle. Intelligent routing is, I believe, the answer here. I've spoken with antirez on this and he agrees that at the scales you're talking about, redis MQ doesn't cut it.

moe · on Feb 16, 2015

Feedback?

How about just not making wild claims about byzantine fantasy designs that you never tested under any kind of load.

There has been a lot of research in messaging architectures, some of the best message brokers are free. As it happens, none of them have any resemblance to your proposed design.

RabbitMQ has been benchmarked[1] to 1 million messages/sec on 30 servers and works very well for many people.

Why not start with that?

[1] http://blog.pivotal.io/pivotal/products/rabbitmq-hits-one-mi...

elierotenberg · on Feb 16, 2015

This benchmark is indeed very interesting.

I think I may have failed to express my point, though. I'm not building a message queue, as it is certainly a very hard problem that has been engineered for years by people way smarter than me :) I'm merely leveraging the goodness of their implementations (in my case redis, but RabbitMQ is also an option I've considered explicitly in my post).

The chat is a contrived example to show that even under high load, full-scale flux over the wire is a reasonable option. As for "any kind of serious load", well, maybe my example fails to meet the requirements, but unless I'm building Facebook, I think I've faced something serious enough to be able to think about my next step.

jacquesm · on Feb 16, 2015

If you're building a large scale chat service you are implicitly also building a message queue.

And as for the high load you haven't actually experienced high load until you put this into production with a million users.

To make that clearer: you can design a system for any number of users, the only relevant question is how it held up in practice and as long as you haven't had a million concurrent users you just don't know (and probably it won't).

moe · on Feb 16, 2015

I'm not building a message queue

That may be the kernel of the problem here; you built a subset of a message queue without realising it.

RabbitMQ has a websocket plugin[1]. Just make your javascript connect directly to a RabbitMQ cluster and you have a solid, scalable foundation - almost for free.

[1] http://www.rabbitmq.com/blog/2012/05/14/introducing-rabbitmq...

bsaul · on Feb 16, 2015

I was under the impression that the design was indeed never tested, but at least the author had some real life experience of building a moderately sized chat service.

I can understand why people like you are pissed by this kind of blog post which reads a bit too much like an ad, but i think it's still good that people are trying to reinvent the wheel with completely new technologies, because sometimes it leads to surprising results.

Maybe the OP should add some warnings in the blog, saying that it's an highly experimental design that people shouldn't try to use for their own projects at the moment...

shripadk · on Feb 16, 2015

I agree. "Number of messages a single Node broker can handle: ~10k per second without JSON stringification memoization, ~100k per second with JSON stringification memoization (Nexus Flux socket.io does that for you)." <- Can't be possible. The numbers should be lower with JSON stringification/parsing instead of being 10x.

thedufer · on Feb 16, 2015

"Memoization" is the key word there. Apparently the author is expecting a lot of identical messages.

pothibo · on Feb 16, 2015

I thought it was someone from Slack talking about their infrastructure (or similar).

Instead, it's some abstract theory. I believe everyone can dream of any architecture.

jacques_chester · on Feb 16, 2015

GameRanger has six million users, tens of thousands to hundreds of thousands of whom will be active simultaneously. His chat problem involved has moderate fan-out.

Scott Kevill does it on a single machine (last time I checked) with hand-rolled C++ and close attention to the details of how the Linux networking stack works.

itsbits · on Feb 16, 2015

Single Machine??...really...would love to know more about the stack..

pratikch · on Feb 16, 2015

I am also interested to know, Do you've any links on this?

jacques_chester · on Feb 16, 2015

Scott gets very little press -- mostly when he picks up support for games after the publisher or matchmaking service abandons one.

I used to live in Perth and we'd sometimes hang out and mock the front page of HN.

http://www.gameranger.com/about/

marktangotango · on Feb 16, 2015

Very interesting, do you have links available?

vonklaus · on Feb 16, 2015

Do you have anything like this up on github?

thomasfl · on Feb 16, 2015

This could be the inspiration for a great open source project, and become something that could easily be deployed to a cloud hosting platform. Basically it's the same as firebase, but with some of the react and flux goodness like server side rendering. Or somebody could package this as a product.

ylesaout · on Feb 16, 2015

Where did you see there is server side rendering? Flux and React does not imply server side rendering out of the box.

elierotenberg · on Feb 16, 2015

Actually this whole stack is designed with SSR in mind, see the related package react-nexus (which I will soon blog about, too!) :)

fauigerzigerk · on Feb 16, 2015

One problem I have with using postgres listen/notify as a general purpose message queue is that it requires polling (At least that was the case when I last looked at it). Of course you can use a blocking wrapper around the polling code but it still causes unnecessary roundtrips.

Erwin · on Feb 16, 2015

Your database connection is just a socket, so you can add that file descriptor to the set of file descriptors you are waiting IO on, if you are using a classic select/poll based system. See an example in the pscyopg2 docs here: http://initd.org/psycopg/docs/advanced.html#asynchronous-not...

Once that FD is active, you call the poll() method and your notify payload becomes available to you.

fauigerzigerk · on Feb 16, 2015

You are right. It seems that this was an issue with older versions of libpq: "In prior releases of libpq, the only way to ensure timely receipt of NOTIFY messages was to constantly submit commands" http://www.postgresql.org/docs/9.4/static/libpq-notify.html

nunwuo · on Feb 16, 2015

Huh? The entire point of listen/notify is exactly that you don't have to poll.

fauigerzigerk · on Feb 16, 2015

And yet in previous versions of libpq polling was the only way to use listen/notify. I'm not sure if it was a limitation of libpq or the server implementation itself: http://www.postgresql.org/docs/9.4/static/libpq-notify.html

It's still not a particularly simple interface as you have to check for notifications after every single SQL command if I understand it correctly.

polskibus · on Feb 16, 2015

As far as I understand, this design is based on similar goals as the Meteor/MongoDB project - instead of reading oplog, you listen for notifications.

I wonder how would Meteor be better than this proposal given that both are node based.

alxndr · on Feb 16, 2015

OT: as someone trying to learn my way around Erlang's OTP framework, I'd be interested in a walkthrough of an OTP based system compared to this.

razzi · on Feb 16, 2015

Anybody know how to efficiently make smexy graphs like that? Did Adobe have a hand in their creation?

cha_os · on Feb 16, 2015

This is so interesting - keep up the good work!