This. It's a lot easier to horizontally scale things with a lean towards consistently lower operational latency. You can keep raking in the benefits and cranking up throughput without a whole lot of thought.
It's much more expensive and complex to take an erratic latency operation and bring it down by throwing on more resources. As far as I can tell, the normal design course is making sure all your major actions are either pure or idempotent allowing parallel (and redundant!) requests to be made... which is a large (worthy, but large) engineering effort, and then we're talking about scaling to 2x or more just so you can make that redundant-request thing your default behavior.
Another approach you can use in some cases with the JVM, which is often the simplest, is to set up the JVM so it doesn't GC (give it a lot of memory), then either just spawn a new JVM to take over, or take the machine about to run a GC out of your load-balanced pool before running a full GC, then put it back in again.
Doing the manually triggered & staggered GC trick on a pool of machines you control can give you very low latency guarantees, since no production request will ever hit a GC-ing JVM.
It's much more expensive and complex to take an erratic latency operation and bring it down by throwing on more resources. As far as I can tell, the normal design course is making sure all your major actions are either pure or idempotent allowing parallel (and redundant!) requests to be made... which is a large (worthy, but large) engineering effort, and then we're talking about scaling to 2x or more just so you can make that redundant-request thing your default behavior.