I actually disagree with the first recommendation as written - specifically, not...

jeffbee · on May 17, 2020

If you ask for 0.001 CPU share, you might get it. I would advise caution. You that pod gets scheduled on a node with another node that asks for 4 CPUs and 100MB of memory, it's not going to get any time.

otterley · on May 17, 2020

It depends. If the second pod requests 4 CPUs, it doesn't necessarily mean that the first pod can't use all the CPUs in the uncontended case.

A lot of this depends on policy and cooperation, which is true for any multitenant system. If the policy is that nobody requests CPU, then the behavior will be like an ordinary shared Linux server under load - the scheduler will manage it as fairly as possible. OTOH, if there are pods that are greedy and pods that are parsimonious in terms of their requests, the greedy pods will get the lion's share of the resources if it needs them.

The flip side of overallocating CPU requests is cost. This value is subtracted from the available resources, making the node unavailable to do other useful work. Most of the time I see customers making the opposite mistake - overallocating CPU requests so much that their overall CPU utilization is well under 25% during peak periods.

jeffbee · on May 17, 2020

Most people would be thrilled to get anything close to 25% CPU util. I guess one of the big missing pieces fro Borg that hasn't landed in k8s is node resource estimation. If you have a functional estimator, setting requests and limits becomes a bit less critical.

coredog64 · on May 18, 2020

1000% agree. Former employer had a proprietary app scheduler that worked like this. We would frequently tell users to request as little CPU as possible. Extra CPU would be shared, but if you made an unreasonable request you’d never get scheduled in the shared environment.

marekaf · on May 18, 2020

I agree! I assumed not to trust anyone with all the greedy pods they can schedule.

The example you are describing is probably not super common but I will try to rephrase my blogpost so that it reflects this comment:)

otterley · on May 18, 2020

Sorry, what is not super common? With my customers I rarely see incidents due to CPU starvation of a pod in their K8S clusters.