I haven't read the Omega paper yet, but plenty of people are running long-runnin...

hendzen · on April 21, 2014

From the Omega paper (section 4.2):

  Mesos achieves fairness by alternately offering all 
  available cluster resources to different schedulers,
  predicated on assumptions that resources become available
  frequently and scheduler decisions are quick. As a result,
  a long scheduler decision time means that nearly all 
  cluster resources are locked down for a long time, inaccessible
  to other schedulers. The only resources available for other
  schedulers in this situation are the few becoming available
  while the slow scheduler is busy. These are often insufficient
  to schedule an above-average size batch job, meaning that
  the batch scheduler cannot make progress while the service
  scheduler holds an offer. It nonetheless keeps trying, and
  as a consequence, we find that a number of jobs are abandoned 
  because they did not finish scheduling their tasks by
  the 1,000-attempt retry limit in the Mesos case (Figure 7c).
  This pathology occurs because of Mesos’s assumption
  of quick scheduling decisions, small jobs and high re-
  source churn, which do not hold for our service jobs. Mesos
  could be extended to make only fair-share offers, although
  this would complicate the resource allocator logic, and the
  quality of the placement decisions for big or picky jobs
  would likely decrease, since each scheduler could only see
  a smaller fraction of the available resources. We have raised
  this point with the Mesos team; they agree about the 
  limitation and are considering to address it in future work.

Its worth noting that Andy Konwinski was a coauthor on both Mesos & Omega, so I'd hope they (Omega authors) represented Mesos' capabilties accurately. I don't have any personal experience running Mesos in production, I'm just going of what was written.

necubi · on April 21, 2014

Ah, interesting. My personal experience with Mesos has included clusters that are basically all transient services (like map reduce jobs) or all long running services, but not both. I can see how that might lead to pathological scheduling decisions.