While the article isn't wrong, it's increasingly my belief that "messaging" is s...

grogers · on Sept 27, 2014

We used an rdbms for our workflow engine (what you are calling jobs) and it works great. We needed to react to state changes and process workflows within seconds, with dozens of workers executing different jobs in parallel. This is well within the limits of any rdbms.

The workflow system itself took a week or so to build along with a client library and our first actual workflow. Trying to add a messaging system on top for push notification would have been crazy. Polling every second for work in the db works just fine for us. I'm sure we could lower that polling interval arbitrarily lower if we needed (we might never get it as low as RabbitMQ but that is fine).

So if your tps and latency needs fit in an rdbms, I would recommend just operating one system instead of two.

atombender · on Sept 27, 2014

Sounds like a pragmatic solution. If you're using Postgres you can use "listen" and "notify" to avoid polling, though.

jacques_chester · on Sept 27, 2014

The hard part of distributing work from a central dispatcher is that you wind up with a central controller needing a complete model of the overall system.

I work on a component of the Cloud Foundry PaaS. The current generation works more or less as you describe -- there's a cloud controller backed by a PostgreSQL database. It sends start and stop commands to a pool of execution agents over a messaging bus.

The next generation of that runtime system ("Diego") is actually inverting this control. Execution agents no longer receive instructions to perform tasks or long-running processes. Instead they bid on them in an auction.

The controller is radically simplified and the overall system converges on a satisficing solution for distributing work efficiently (and they ensure it does so with extensive simulations based on the actual code). It's actually more robust because it doesn't rely on heavy efforts to ensure that the controller is always up to date. And it doesn't fall into inconsistent states if the controller vanishes.

Onsi Fakhouri did a brief presentation which is worth watching: https://www.youtube.com/watch?v=1OkmVTFhfLY

Rapzid · on Sept 27, 2014

I would be interested in a debate over these strategies between the cattle.io creator and one of the cf guys..

jacques_chester · on Sept 27, 2014

Essentially it's based on experiences of scaling up CF installations for customers, and for the public service version (Pivotal Web Services).

A high-intelligence controller turns into a giant "god object". It attracts new functionality like a magnet. Everything depends on it and it has to know everything about everything.

In practice it begins to be both the leading cause of failure and the main scaling bottleneck.

The auction-based model allows any number of auctioneers and executors to participate without any one component needing global state.

Essentially, we rediscovered the economic calculation problem. And we're solving it by distributing knowledge across the system and letting agents solve their own problems locally.

darren0 · on Sept 28, 2014

Creator of cattle.io here. There are far too many topics to address here, but I’ll give you a brief overview of how cattle.io works. The high level orchestration system’s role is to manage the state of resources and their state changes. When states are changed events are fired that may or may not be required to be handled to ensure that that state transition can be completed. The important thing here is that the high level orchestrator does not actually perform the key logic to complete an operation. That logic is delegated to an agent or microservice. The scheduling of which agent or microservice handles each event is specific to the nature of the operation (long running, targeted to hypervisor, etc).

Rapzid · on Sept 28, 2014

Thanks for the reply Darren. I had actually read all of the cattle.io docs and have started poking through the source code. My initial reaction was that I really liked how the orchestration of resources was modelled as a type of state machine. It really clicked for me and I could see how something like this would have value on it's own as a sort of run-deck or the like. It also simplifies the agents.. I can see how creating a .Net agent would be dead simple, and I'm a firm believer that if you're orchestrating on Windows you'll end up wishing you were using .Net eventually if you aren't already.

But to the topic at hand, I was a bit confused about how the bidding comes in to play with Diego. After reading the Diego design notes, https://github.com/cloudfoundry-incubator/diego-design-notes , I'm believing this mostly has to do with the scheduling. So it seems to be an alternative to the omega-style serialized scheduler of Flynn, or however Fleet handles this. My current understanding is that stampede.io relies on Fleet to handle scheduling, I'm not sure how stock cattle.io gets about it.. It would seem CF still has high-level orchestration(going from staging, then running, etc) and that a comparison between the two systems would be more involved than I had thought(that and cattle.io concerns itself with IaaS as well) :|

darren0 · on Sept 29, 2014

Fleet is used only to bootstrap the system. It has very little to do with the larger architecture of stampede or cattle. I view orchestration and scheduling as two completely different topics. When looking at IaaS the scheduling needs are initially quite simple. As such I don't have a mature scheduler and really don't intend on building one. As your requirements mature the needs for more complex scheduling is needed. I've gone down that path many times and it snowballs so quickly into a very difficult problem.

With cattle I’ve purposely decided to defer the scheduling needs to something like Mesos or possibly YARN. I don’t intend to try to reinvent that wheel. I have yet to do that integration.

The CF Diego approach is one that seem very common today. In general all of the Docker “orchestration” platforms out there today are mostly just Docker plus a scheduler. This approach is great, as long as you don’t need complex orchestration. As the CF folks point out, complex orchestration is hard. If you can remove the need to orchestrate and build a system that needs only a scheduler it is generally much simpler and easier to scale.

The problem I have is that stampede focuses heavily on real world application loads and boring practical issues that exist in crufty old applications that exist in the wild. These apps have requirements that necessitate complex orchestration. As such, with stampede I tackle orchestration head on. I try to find a sane and scalable approach to it. Which is really quite difficult, but it is my area of expertise. I feel if I have a platform that can excel at both orchestration (from the native capabilities of cattle) and scheduling (by leveraging a mature scheduler framework) I’ll have an extremely capable base to build the next generation of infrastructure.

Rapzid · on Sept 29, 2014

Thanks for the clarification and insight :) I was mistakenly under the impression that stampede was utilizing Fleet's scheduling abilities.. Assumptions :[

jacques_chester · on Sept 28, 2014

State machines and protocols are an excellent way to model the problem of managing the lifecycle of a single task or LRP; they don't solve the problem of the overall system. It's the latter which is actually much harder.

Consider the classic termite-and-wood simulation[0]. In this simulation I have a population of termites, randomly scattered on a grid. And flakes of wood, also randomly distributed. I want to gather this wood.

A current-generation approach would be to, for example, divide the grid into subgrids, assign local termites to perform exhaustive search, progressively aggregating into larger piles in successive rounds. This would require substantial engineering to ensure that my central wood-gathering algorithm has checkpointing, failover, robust message passing, reruns etc etc.

Alternatively, I can give each termite 3 simple rules:

1. Walk randomly.

2. If you don't have a piece of wood in your jaw and you come across one, pick it up.

3. If you have a piece of wood in your jaw and you come across another piece of wood, drop the wood in your jaw.

This latter solution doesn't have difficult -- in some cases impossible -- engineering requirements. And it works very well.

Most systems at this scale eventually wind up push intelligence out to the nodes, because (forgive my wankery here) it's impossible in a relativistic universe to have perfect simultaneous knowledge of phenomena separated by space and time. You need to subdivide the problem so that agents can solve their own and rely on emergent behaviour in the overall system.

[0] http://ccl.northwestern.edu/netlogo/models/Termites

Rapzid · on Sept 29, 2014

This makes a lot of sense for certain problems in the domain. I've thought a few times about how to introduce the right bit of agency into actors in order to have them accomplish a larger goal without a central authority.

I created a couchbase cluster joiner that would have the nodes converge into a single initial cluster based on ec2 tags. I thought about how people would group up on their own when walking into a room and came up with:

  Join a group
  Judge value of group(is it the largest + tie breaker)
  Leave and join group of highest value

The problem wasn't so hard as the termites but it removed the need for a central controller.. And I was proud of myself :)

opendais · on Sept 27, 2014

> I've played with the idea of a hybrid system: Register jobs in Postgres, use RabbitMQ

That is similar to what I've been using at $DAY_JOB for a low volume work queue that really, really needs to function exactly on time and never, ever have an issue. The only difference is I use MariaDB/Galera and dropped the MQ entirely as it could handle the volume easily [its peaks @ 100 r/s for hours ... of course when your only allowed 100 r/s against an API...].

The only reason I did it was because of time-based rate limiting on external APIs where the vendor refuses to increase our bucket size.

Tbh, I'd only consider it for a low volume message queue. If you want something that operates on the master/slave model I'd use Redis as the queue.

https://github.com/resque/resque https://github.com/blog/542-introducing-resque

If it is good enough for github, its probably good enough for you. ;)

jeremyjh · on Sept 27, 2014

There is not very much relational about jobs, you just need a good persistence mechanism. Have you seen Gearman (http://gearman.org) ? It is designed explicitly as a job manager.

vidarh · on Sept 28, 2014

There's lots of stuff that is relational about jobs the moment you have a large number of them, and need to be able to search and filter by different statuses, different users, different sources, date and time, type of job etc., or run reports over them (how many percent of jobs are in what states? average latency to start processing? average latency to completion?)

You can work around that by putting metadata about the jobs into an RDBMS, or collating it separately, so you can certainly make do without an RDBMS...

Or you could just put them in the RDBMS in the first place and place a few indexes and optionally triggers to create log entries on state changes.

jeremyjh · on Sept 28, 2014

These all sound like attributes of a job definition - not separate entities to which a job is related. You are correct they would need to be indexed.

atombender · on Sept 27, 2014

Gearman looks decent, though it does not have persistent queues, and relies on an external storage such as an RDBMS to provide that. Not sure if there is that much gained. Still makes the database a SPOF, for example.

jeremyjh · on Sept 27, 2014

It depends on the database back-end you use; it can use libmemcached which opens up the possibility to use a CouchBase cluster if you really need to go that far.

atombender · on Sept 27, 2014

Forgot to say that a decent job manager should be sufficiently composable that you can integrate a GUI and high-level monitoring into it by exploiting its API.

I consider the persistence layer a private implementation detail of a system. If Gearman controls an RDBMS schema, for example, I don't want to bypass Gearmen to mess with that directly, ever. But as far as I can see, German doesn't have an API to access anything in the queue.

emmelaich · on Sept 28, 2014

Indeed. And we know that Queues Are Databases

    http://research.microsoft.com/pubs/69641/tr-95-56.pdf

But the issues here as you've identified are different concepts being dealt with at different levels.

You can make the query more efficient by using a different schema. You can move the polling function to a single poller which then triggers the event out. And you can even have the database do the trigger - for most databases; certainly Oracle and Postgres.

Roboprog · on Sept 28, 2014

I hear you. I had to build a job queue system back in 2005 at one job - pretty much what you said - insert as either ready or pending, start to consume as in-process (important to have a transaction around the find-and-mark operation, of course), mark complete at end of job. Using a DB allowed easy querying to find out what was going on recently, as well.

On the other hand, last year I worked on some JMS stuff that was essentially also just a job processing system. To me, the JMS stuff seemed like a total waste and unneeded complication vs DB queue-tables, other than integrating with other software that wanted to use JMS. On the other hand, I can't see the "enterprise" minions that developed that debacle reasoning about concurrency well enough to do the job with a DB. E.g. - using event driven listeners to dequeue and acknowledge the messages, and than placing them in a list (i.e. - another queue) in memory, even though they had not been processed yet!

Yet another example of resume padding, where it's important to have 3 to 5 years experience with some tool, since the intelligence to reason about what the tool actually must do, and why, won't be recognized.

vidarh · on Sept 28, 2014

When I did billing at yahoo, about a decade ago, the European billing system (Yahoo had about ten billing systems at the time, thanks to a combination of turf wars and acquisitions) was built almost entirely around a job management system in an RDBMS, where every little thing that was done (issue an invoice; charge a credit card; refund etc.) would be a result of a process picking up the next applicable job, look at the state, decide what the next step was, process, log what was done, and update state.

Nothing fancy anywhere, which is how it should be (it was billing; we were processing millions of dollars of payments).

The nice thing about a proper job management system was that each state transition on a job was logged, and every state transition was the result of executing a quite small amount of code with a quite small amount of input data, so if/when anything went wrong, it was a matter of going through the audit trail until something didn't match up, and it means we knew what component had done something wrong, and what data it had been processing.

Wherever you can afford that overhead (or can't afford not to take that overhead), it makes things so nice to deal with.

atombender · on Sept 28, 2014

Replying to myself because I forgot two aspects that messaging services handles poorly: Scheduling and prioritization.

Scheduling is important for jobs, but not events. Prioritization is useful for both. But RabbitMQ doesn't even support assigning priorities to messages.

Scheduling can be done with RabbitMQ by having a cron job that publishes a job message. But that means you're statically tied to a cron configuration. A more flexible configuration would let clients post future jobs to a special queue (or queues, perhaps organized by "daily", "weekly" etc.) that is read by an external scheduler app that reschedules the messages into the proper worker queue. But this stuff gets (unnecessarily) complicated fast.

grimlck · on Sept 27, 2014

I'm pretty curious as to your statement 'Register jobs in Postgres, use RabbitMQ as the trigger mechanism. But this puts a burden on the processor, which now has to use two APIs.'

Does that mean you know of a way to do it with just a DB and without a MQ?

If you store the jobs in the database and dont have a parallel MQ system, exactly what triggers the job to be run?

For example, a job might be to process a batch of pictures. To create the job, you would insert a row into the database. But when you insert the row, how would another machine know to wake up and start processing it? Is that other machine just polling? If so, how do you ensure two machines on the cluster dont pick up the same job?

atombender · on Sept 27, 2014

Postgres has "listen"/"notify", which lets you easily signal between concurrent clients; one notification is delivered to exactly one listener. You can't pass along any information, though you can use the signal name as a way to target listeners.

grosskur · on Sept 27, 2014

Actually, when you NOTIFY a channel in Postgres it gets delivered to all clients who have previously run LISTEN on that channel. And you can send an optional payload (up to 8000 bytes):

http://www.postgresql.org/docs/9.3/static/sql-notify.html

atombender · on Sept 27, 2014

Oops, my mistake. That's actually disappointing, since it means that for every message, N workers will all wake up and needlessly try to grab the message.

You can get around that by using advisory lock to make access exclusive, but they will still all wake up and will still have issue a select.

empthought · on Sept 28, 2014

Why not have N channels for N listeners? These are very lightweight (no persistence/retry, etc. -- essentially just a way to avoid SELECT polling by consumers).

atombender · on Sept 28, 2014

That ties the channel names to the number of listeners, which opens you up to race conditions when the pool of workers changes. And you would have to keep track of which channels aren't busy at any given time, in order to avoid waking up busy workers that can't accept a job. Not impossible, just more stuff that could go wrong.

empthought · on Sept 29, 2014

I think you're right, that would need to be built into whatever job management logic you implemented before the notification was sent. This type of architecture is more suitable for a workflow or job management system rather than fanning out data processing to N workers.

arohner · on Sept 27, 2014

Completely agree. I also think projects should draw a clearer distinction between queues, and messaging. ZeroMQ for example, is not a queue, it's a toolkit for building a message bug!

thomasz · on Sept 27, 2014

> I've played with the idea of a hybrid system: Register jobs in Postgres, use RabbitMQ as the trigger mechanism. But this puts a burden on the processor, which now has to use two APIs.

I've had exactly the same idea for my next project. Seriously, I'm quite surprised that something like that doesn't seem to exist already.

gfodor · on Sept 27, 2014

I built a similar mechanism using SQS and DynamoDB instead of RabbitMQ and PostgreSQL, respectively. The nice part is it requires no servers for job management. (Just workers obviously.)

https://github.com/gfodor/skyrunner

(It's still a bit half baked, but could be useful as a starting point.)

grosskur · on Sept 27, 2014

Nice, thank you. I've been looking for exactly this---SQS and DynamoDB seem like a good combination.

eropple · on Sept 27, 2014

You may not care, but I personally avoid SQS and Dynamo for lock-in purposes. I'd rather run my own systems and have the option of moving to GCE or something later.

gfodor · on Sept 28, 2014

I've found in the past that moving code between different queue and job coordination systems is pretty straightforward. Operationally of course you need to learn the new infrastructure but that is unrelated to fears about vendor lock-in.

eropple · on Sept 29, 2014

Dynamo is more of a beef than SQS. I've found that Dynamo usage tends to grow. Like a fungus. Applying sufficient fire to remove it later seems beyond many shops' ability to marshal.

(Also, SQS is really weird about latency and places pretty ugly limits on responsiveness.)

atombender · on Sept 27, 2014

Out of interest, what do you think the benefits of such a system would be? I'm concerned that it would only add complexity.

For my part I plan to hide the queue processing in a central app (we're all SOA), so that workers don't need to know anything about databases, serialization formats or AMQP. They would just wait on a socket to get new jobs.

thomasz · on Sept 27, 2014

A message queue stores, well, messages, so it is really good for events. But I also need to persist state machines that run for months. They must be queryable and so on. A db is good at this. On the other hand, I've had a very unpleasant experience with the database-as-IPC-channel and do not wish to ever go near that again. So using a rdbms to persist state and a message queue for the events seems like the conservative approach. Am I overlooking something obvious here?

atombender · on Sept 27, 2014

Not at all, just wondering about your thinking. Note that I think using a database for IPC is a bad idea if you're actually using tables and such to coordinate; Postgres' listen/notify IPC is much better.

That said, it has issues, as pointed out elsewhere in this thread; when you notify, all listeners wake up, which is not what you want with a one-job-per-worker dispatch system.

If you abstracted the listening to a central job manager app, however, you could use "listen" to wake up just the job manager, and then use that wake-up to dispatch the next job to an available worker.

vinodc · on Sept 28, 2014

RabbitMQ's consistent hash exchange is pretty useful to route jobs to where you need them.

https://github.com/rabbitmq/rabbitmq-consistent-hash-exchang...

aikah · on Sept 27, 2014

So basically ,you're saying that a pubsub system is (obviously) hard to model with a RDBMS,while a job queue is a perfect use case,because jobs have states, for RDBMSs right ?

atombender · on Sept 27, 2014

Also because you want persistence (durability) for jobs; overhead and latency is okay for jobs, whereas events want super low latency and don't need persistence.

29J · on Sept 28, 2014

just for your radar: i've used pgq from skype, for queueing in postgresql a couple of times, and liked it.

http://wiki.postgresql.org/wiki/Londiste_Tutorial_(Skytools_...

once was implicitly when i used londiste for replication. more recently was for queueing, from lisp.

atombender · on Sept 28, 2014

I have seen it before, never used it. Thanks for the reminder.