...that's just push model. "Signalling" via "well, the loadbalancer have 10 sess...

...that's just push model. "Signalling" via "well, the loadbalancer have 10 sessions max per server" is enough.

Pull model just adds unnecesary RTT.

> Off the top of my head I'm trying to think how a host could actively signal to an LB that's its ready for more requests... I suppose its trivial and common to use a health check. Is it even a change to say that these need to be updated to achieve your goal of host to LB pulling?

Like this.

There is rarely a case where you decide to not serve the next request after serving previous one so push is most optimal for short ones. And if it doesn't want to it can just signal that via healthcheck.

Pull makes more sense for latency-insensitive jobs like "take a task from queue, do it, and put the results back", as if you say make video encoding service that dynamically scales itself in the background and "just do one encode and exit" is commonplace.