Mesos achieves fairness by alternately offering all
available cluster resources to different schedulers,
predicated on assumptions that resources become available
frequently and scheduler decisions are quick. As a result,
a long scheduler decision time means that nearly all
cluster resources are locked down for a long time, inaccessible
to other schedulers. The only resources available for other
schedulers in this situation are the few becoming available
while the slow scheduler is busy. These are often insufficient
to schedule an above-average size batch job, meaning that
the batch scheduler cannot make progress while the service
scheduler holds an offer. It nonetheless keeps trying, and
as a consequence, we find that a number of jobs are abandoned
because they did not finish scheduling their tasks by
the 1,000-attempt retry limit in the Mesos case (Figure 7c).
This pathology occurs because of Mesos’s assumption
of quick scheduling decisions, small jobs and high re-
source churn, which do not hold for our service jobs. Mesos
could be extended to make only fair-share offers, although
this would complicate the resource allocator logic, and the
quality of the placement decisions for big or picky jobs
would likely decrease, since each scheduler could only see
a smaller fraction of the available resources. We have raised
this point with the Mesos team; they agree about the
limitation and are considering to address it in future work.
Its worth noting that Andy Konwinski was a coauthor on both Mesos & Omega, so I'd hope they (Omega authors) represented Mesos' capabilties accurately. I don't have any personal experience running Mesos in production, I'm just going of what was written.
Ah, interesting. My personal experience with Mesos has included clusters that are basically all transient services (like map reduce jobs) or all long running services, but not both. I can see how that might lead to pathological scheduling decisions.
[0] https://github.com/mesosphere/marathon