Not an expert in load balancing, but in similar problems (work sharing / work st...

rektide · on April 18, 2023

Having enough workers to do the work shouldn't be any more constraining than it was before.

The article does nicely mention that simple round Robin actually has lower latency, because some traffic gets lucky & goes to under-utilized machines. Unfairness helps some traffic go faster. The queue is probably going to eliminate this, but the unfairness advantage comes at the cost of a lot of other traffic getting put into long queues on workers, so it wasn't really a good thing anyways. The p90+ is usually awful.