One thing I found interesting, is it you go with PEWMA and create a scenario where the cluster is stressed, and then add 1 server, it pummels the shit out of the new server and you have a brief surge in failed requests.
Not sure if that is a real world issue, or just with the simulation...
This is very likely a bug in the simulation. My simplified implementation of PEWMA prioritises servers that have had no traffic, in order to send at least 1 request to all servers. There will be a window, until this new server serves its first request, where it is considered the highest priority server.
I doubt very much that this would be part of any real world implementation
I'm not familiar with PEWMA, but real load balancers sometimes have this problem. Either because of dynamic weighting that slams the new server which shows zero load, or because the new server needs to do some sort of cache warming, whether that's disk or code or jit or connection establishment or ???, a lot of times early requests are handled slowly.
Most load balancers should have a way to do some sort of slow start for newly added or newly healthy servers. That could be an age factor to weighting, or an age factor on max connections or ???. Some older load balancers are just not great at this, so you develop experienced based rules like 'always use round robin, leastconn will kill your servers with lumpy loads'. All that said, and a repeated theme across my comments in this thread, the more sophisticated your load balancing is, the harder your load balancer needs to work, and the sooner you need to figure out how to load balance your load balancers.
It should happen in the real world as well, at least that's what I've been told when I started my first job as a system admin.
The reason people cited to me back then was that the balancer usually isn't particularly smart when balancing, so they only see a free node, thus every free request is routed to it. The errors (mostly timeout) will happen once the request start to actually get processed.
Normally, the node gets a steady amount of requests over time, thus the load is constant (generally speaking, a request will require the most resources at the same relative time of their lifecycle). As all requests are fresh, they'll all hit the same load bottleneck at the same time, causing all the timeouts.
The answer is to both aggressively scale horizontally and then quickly decommission until you're back to baseline.
Or just accept the failed requests
Its been over 10 years though, it mightve been improved since.
I don't know anything about this subject, but my first thought (which may be wrong) would be to just set the weight of the new server to be the same as one of the other servers that are receiving messages (perhaps one of the lower ranks). In that way, it would not be overloaded so easily and adjust its ranking after a while
I guess my explanation was lacking then, as that wouldn't help. reducing the weight below the old nodes might work, but it would also extend the duration you're overloaded, which would also cause requests to fail.
One thing I found interesting, is it you go with PEWMA and create a scenario where the cluster is stressed, and then add 1 server, it pummels the shit out of the new server and you have a brief surge in failed requests.
Not sure if that is a real world issue, or just with the simulation...