When this was my main focus at work, we would have thresholds for CPU utilizatio...

When this was my main focus at work, we would have thresholds for CPU utilization and memory and I/O, but you couldn't simply throw more traffic at a collection of servers until they hit one of those thresholds because most of those thresholds were engineering limits, not steady-state but also because if you did you generally had no elasticity in the system for unplanned business events. We also broke things out based on what was critical path for revenue generation vs running the business or value-add.

You also have to adjust your thresholds based on your DR/BCP plan, your fault tolerance design and operational requirements. If you were hot/cold between 2 sites or AZs with a 95% SLA but you could run things hotter than if you were hot/hot between 2 sites with a 99.9% SLA. It wasn't unusual for our CPU usage threshold to be 30%.

Ultimately the critical piece of knowledge is when your utilization vs latency curve turns into a hockey stick and knowing what the bottlenecks are that drive that. For us we had to learn the hard way exactly what kind of throughput we could expect out of each layer of our infrastructure because in many cases, the latency would spike but the utilization didn't correlate with that under stress conditions but it did under normal conditions, learning how to pick out those early warning signs was an art.

We did some analysis to determine the mix of transactions for different types of business activity (BAU, annual big event that changed a profile permanently, periodic big event that represented an impulse/one time change) and could project expected CPU utilization based on a model for each mix of transactions based on business metrics. We based all of our capacity recommendations on the business volume projections by event type so that we weren't asking our BAs to tell us things like transactions per second by API which in most cases would have just gotten us blank stares, we'd ask them to give us a sales projection and apply our model to it.