generally a huge fan of kubernetes but it's stunning what a did-it-ourselves dirtbag k8s opted to be every step of the way with regard to scheduling.
Facebook has really really good talks about managing process scheduling at scale, talking about how they leverage cgroups to do the right thing.
kubernetes seems to not give a fuck. they have their own resource systems they cooked up. shit gets scheduled in a huge massive cgroup. any order or control is userland, totally ignorant to the kernel control. there's not hierarchies, no priorities, everything is absolute, schedule or die. it's such a ginormous piece of shit, so in unbelievably willfully ignorant to all the good kernel technology that exists. it tries to make sure the kernel never has a role & that's just a huge mistake, just deeply tragic.
one noteable side effect ofany is that while the the kernel has many ways to make multi-tenant scheduling fairly reasonable, kubernetes has a variety of wild hair brained schemes, all of which detour around how easy the job would be if different pods could be scheduled in different cgroups. but that's somehow too blindingly obvious for kubernetes, which instead tries to mediate what to run entirely by itself.
Facebook has really really good talks about managing process scheduling at scale, talking about how they leverage cgroups to do the right thing.
kubernetes seems to not give a fuck. they have their own resource systems they cooked up. shit gets scheduled in a huge massive cgroup. any order or control is userland, totally ignorant to the kernel control. there's not hierarchies, no priorities, everything is absolute, schedule or die. it's such a ginormous piece of shit, so in unbelievably willfully ignorant to all the good kernel technology that exists. it tries to make sure the kernel never has a role & that's just a huge mistake, just deeply tragic.
one noteable side effect ofany is that while the the kernel has many ways to make multi-tenant scheduling fairly reasonable, kubernetes has a variety of wild hair brained schemes, all of which detour around how easy the job would be if different pods could be scheduled in different cgroups. but that's somehow too blindingly obvious for kubernetes, which instead tries to mediate what to run entirely by itself.