Hacker News new | past | comments | ask | show | jobs | submit login

The hard part of distributing work from a central dispatcher is that you wind up with a central controller needing a complete model of the overall system.

I work on a component of the Cloud Foundry PaaS. The current generation works more or less as you describe -- there's a cloud controller backed by a PostgreSQL database. It sends start and stop commands to a pool of execution agents over a messaging bus.

The next generation of that runtime system ("Diego") is actually inverting this control. Execution agents no longer receive instructions to perform tasks or long-running processes. Instead they bid on them in an auction.

The controller is radically simplified and the overall system converges on a satisficing solution for distributing work efficiently (and they ensure it does so with extensive simulations based on the actual code). It's actually more robust because it doesn't rely on heavy efforts to ensure that the controller is always up to date. And it doesn't fall into inconsistent states if the controller vanishes.

Onsi Fakhouri did a brief presentation which is worth watching: https://www.youtube.com/watch?v=1OkmVTFhfLY




I would be interested in a debate over these strategies between the cattle.io creator and one of the cf guys..


Essentially it's based on experiences of scaling up CF installations for customers, and for the public service version (Pivotal Web Services).

A high-intelligence controller turns into a giant "god object". It attracts new functionality like a magnet. Everything depends on it and it has to know everything about everything.

In practice it begins to be both the leading cause of failure and the main scaling bottleneck.

The auction-based model allows any number of auctioneers and executors to participate without any one component needing global state.

Essentially, we rediscovered the economic calculation problem. And we're solving it by distributing knowledge across the system and letting agents solve their own problems locally.


Creator of cattle.io here. There are far too many topics to address here, but I’ll give you a brief overview of how cattle.io works. The high level orchestration system’s role is to manage the state of resources and their state changes. When states are changed events are fired that may or may not be required to be handled to ensure that that state transition can be completed. The important thing here is that the high level orchestrator does not actually perform the key logic to complete an operation. That logic is delegated to an agent or microservice. The scheduling of which agent or microservice handles each event is specific to the nature of the operation (long running, targeted to hypervisor, etc).


Thanks for the reply Darren. I had actually read all of the cattle.io docs and have started poking through the source code. My initial reaction was that I really liked how the orchestration of resources was modelled as a type of state machine. It really clicked for me and I could see how something like this would have value on it's own as a sort of run-deck or the like. It also simplifies the agents.. I can see how creating a .Net agent would be dead simple, and I'm a firm believer that if you're orchestrating on Windows you'll end up wishing you were using .Net eventually if you aren't already.

But to the topic at hand, I was a bit confused about how the bidding comes in to play with Diego. After reading the Diego design notes, https://github.com/cloudfoundry-incubator/diego-design-notes , I'm believing this mostly has to do with the scheduling. So it seems to be an alternative to the omega-style serialized scheduler of Flynn, or however Fleet handles this. My current understanding is that stampede.io relies on Fleet to handle scheduling, I'm not sure how stock cattle.io gets about it.. It would seem CF still has high-level orchestration(going from staging, then running, etc) and that a comparison between the two systems would be more involved than I had thought(that and cattle.io concerns itself with IaaS as well) :|


Fleet is used only to bootstrap the system. It has very little to do with the larger architecture of stampede or cattle. I view orchestration and scheduling as two completely different topics. When looking at IaaS the scheduling needs are initially quite simple. As such I don't have a mature scheduler and really don't intend on building one. As your requirements mature the needs for more complex scheduling is needed. I've gone down that path many times and it snowballs so quickly into a very difficult problem.

With cattle I’ve purposely decided to defer the scheduling needs to something like Mesos or possibly YARN. I don’t intend to try to reinvent that wheel. I have yet to do that integration.

The CF Diego approach is one that seem very common today. In general all of the Docker “orchestration” platforms out there today are mostly just Docker plus a scheduler. This approach is great, as long as you don’t need complex orchestration. As the CF folks point out, complex orchestration is hard. If you can remove the need to orchestrate and build a system that needs only a scheduler it is generally much simpler and easier to scale.

The problem I have is that stampede focuses heavily on real world application loads and boring practical issues that exist in crufty old applications that exist in the wild. These apps have requirements that necessitate complex orchestration. As such, with stampede I tackle orchestration head on. I try to find a sane and scalable approach to it. Which is really quite difficult, but it is my area of expertise. I feel if I have a platform that can excel at both orchestration (from the native capabilities of cattle) and scheduling (by leveraging a mature scheduler framework) I’ll have an extremely capable base to build the next generation of infrastructure.


Thanks for the clarification and insight :) I was mistakenly under the impression that stampede was utilizing Fleet's scheduling abilities.. Assumptions :[


State machines and protocols are an excellent way to model the problem of managing the lifecycle of a single task or LRP; they don't solve the problem of the overall system. It's the latter which is actually much harder.

Consider the classic termite-and-wood simulation[0]. In this simulation I have a population of termites, randomly scattered on a grid. And flakes of wood, also randomly distributed. I want to gather this wood.

A current-generation approach would be to, for example, divide the grid into subgrids, assign local termites to perform exhaustive search, progressively aggregating into larger piles in successive rounds. This would require substantial engineering to ensure that my central wood-gathering algorithm has checkpointing, failover, robust message passing, reruns etc etc.

Alternatively, I can give each termite 3 simple rules:

1. Walk randomly.

2. If you don't have a piece of wood in your jaw and you come across one, pick it up.

3. If you have a piece of wood in your jaw and you come across another piece of wood, drop the wood in your jaw.

This latter solution doesn't have difficult -- in some cases impossible -- engineering requirements. And it works very well.

Most systems at this scale eventually wind up push intelligence out to the nodes, because (forgive my wankery here) it's impossible in a relativistic universe to have perfect simultaneous knowledge of phenomena separated by space and time. You need to subdivide the problem so that agents can solve their own and rely on emergent behaviour in the overall system.

[0] http://ccl.northwestern.edu/netlogo/models/Termites


This makes a lot of sense for certain problems in the domain. I've thought a few times about how to introduce the right bit of agency into actors in order to have them accomplish a larger goal without a central authority.

I created a couchbase cluster joiner that would have the nodes converge into a single initial cluster based on ec2 tags. I thought about how people would group up on their own when walking into a room and came up with:

  Join a group
  Judge value of group(is it the largest + tie breaker)
  Leave and join group of highest value
The problem wasn't so hard as the termites but it removed the need for a central controller.. And I was proud of myself :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: