I think you’re talking about SPAs in specific. Many have race conditions in frontend code that are not revealed on fast connections or when all resources are loaded with the same speed/consistency. Open the developer console next time it happens, I bet you’ll find a “foo is not a function” or similar error caused by something not having init yet and the code not properly awaiting it. If an SPA core loop errors out, load will be halted or even a previously loaded or partially loaded page will become blank or partially so. Refreshing it will load already retrieved resources from cache and often “fixes” the problem.
You see it in backend code too. For example Golang's context.WithTimeout is used to time out http requests and database calls that may be taking too long. This is particularly irksome with microservices where multiple services are running timeouts that interfere with one another.
It is becoming du jour to quell 99 percentile latency spikes (i.e. 1:100 requests will take substantially longer) by terminating the requests, which may not always be in the best interest of the user even if it is convenient for the devops teams and their promotion packets.
Thanks for sharing. I wasn’t aware that was a thing, it seems to be a form of manipulating the appearance of performance rather than actually boosting it. We log all slow calls so we can find out what they’re running up against - knowing a call took more than 5ms a p99 of 5ms is a pretty poor internal signal, but being able to trace which calls took 15 or 75s (vs those that took less but would also have been killed nevertheless) is extremely helpful.
Perhaps probabilistically terminating calls would work better? I assume the decision has to be made ahead of time with timeout contexts if there anything like cancellation tokens, so even if you give just 5% of all your inbound requests a deadline 10000x as long, you’ll still get some useful info to work with.
As a user, I would absolutely hate it. I somehow frequently run into pockets of badly written or architectured code that cause some of my requests to take a minute or more to be fulfilled on an otherwise responsive server - if I had to retry “just” twenty times for it to go through, I’d lose my mind.
Well I hope it’s clear that this is just malpractice. Nobody should set their deadline to their p99 latency unless the result of the call is completely irrelevant to the success of the top-level request. Deadlines should be set to a huge amount of time, much longer than your tail latency but sufficiently less than infinity to protect your backend from running out of resources with too many requests in flight. For example if your p99 latency is 1ms you might set your timeout to 60s or something like that.