> Scheduled tasks are a great way to brown-out your downstream dependencies.
I've been trying to convince my division to prioritize adding events to our stats dashboard.
Comparing response times to CPU times is just expected level of effort to interpret the graphs. But you don't have any visibility into how a chron job, service restart, rollback, or reboot of a host caused these knock-on effects. And without that data you get these little adrenaline jolts at regular intervals when someone reports a false positive. Especially in pre-prod, where corners get cut on hardware budgets, and deploying a low-churn service may mean it's down for the duration.
This might end up being a thing I have to do myself, since everyone else just nods and says that'd be nice.
I've been trying to convince my division to prioritize adding events to our stats dashboard.
Comparing response times to CPU times is just expected level of effort to interpret the graphs. But you don't have any visibility into how a chron job, service restart, rollback, or reboot of a host caused these knock-on effects. And without that data you get these little adrenaline jolts at regular intervals when someone reports a false positive. Especially in pre-prod, where corners get cut on hardware budgets, and deploying a low-churn service may mean it's down for the duration.
This might end up being a thing I have to do myself, since everyone else just nods and says that'd be nice.