> if I can automate a runbook can I not just make the system heal itself automatically
The runbooks are still codified by a human in the current scenario. We are experimenting with some data to see if we can generate accurate runbooks for different scenarios but haven't found much luck with it yet. I do think that some % of issues will be abstracted in near future with machines doing the healing automatically.
> you should invest in toil reduction to stop having those repeated problems.
Most teams I speak to say that they try their best to avoid repeating the same issue again. Users typically use PlayBooks for:
(a) A generic scenario where you have an issue reported / alerted and you are testing 3-4 hypotheses / potential failure reasons at once.
(b) You want to run some definitive sequence of steps.
The runbooks are still codified by a human in the current scenario. We are experimenting with some data to see if we can generate accurate runbooks for different scenarios but haven't found much luck with it yet. I do think that some % of issues will be abstracted in near future with machines doing the healing automatically.
> you should invest in toil reduction to stop having those repeated problems.
Most teams I speak to say that they try their best to avoid repeating the same issue again. Users typically use PlayBooks for:
(a) A generic scenario where you have an issue reported / alerted and you are testing 3-4 hypotheses / potential failure reasons at once.
(b) You want to run some definitive sequence of steps.