A counterintuitive idea would be to use the machines for two purposes. - a high ...

vitus · 2024-06-18T16:05:52 1718726752

Why is this counterintuitive? You're describing scheduling -- your high priority service has the ability to preempt the low priority service and use more of your CPU (or whatever your bottleneck is) if there are pending requests.

If you have a high-availability, latency-sensitive service that's processing incoming requests in realtime, you want it to be overprovisioned (for limiting queue lengths, the ability to soak up DoS before other systems kick in, etc). But that's wasteful if you're idle half the time. (Or generally, off-peak -- you might explore spinning up / down nodes dynamically, but you probably still want some baseline amount of capacity in each of your availability zones.)

Another dimension you could explore (if your problem space allows it) is varying the amount of work you perform depending on your current load -- for instance, you could imagine a video server that serves 4k at light load, then degrades to 1080p at moderate load, and then degrades even further to 720p at heavy load.

That said, it depends on your cost model. If you're dealing with bare metal (or you are the cloud) where your principal operating expenses are space and power, then you might be fine with servers idling if power is expensive. If depreciation of your servers drives your costs (or if you get charged the same amount regardless of your actual utilization), then you might want to run your servers hotter.

angarg12 · 2024-06-18T17:00:55 1718730055

If you have two workloads with very different operational profiles (e.g. low latency low volume vs high latency high volume) you could also process each workload in a different copy of the system, sorta-kinda the hot-path/cold-path pattern.

It might sound like duplicating your system would make it more complex, but having two systems each doing one very specific thing tends to be simpler than one single system doing very different things.