Hacker News new | past | comments | ask | show | jobs | submit login

In the Google SRE book they say that if a service has reliability much better than the stated SLO they artificially introduce errors to get closer to the error budget.

This is to prevent over-reliance on the measured SLO rather than the stated SLO in upstream services.




This sounds very very odd?

Why ride the line? It'd take one major issue then you're way over you error budget?

For testing/simulations I can see why you'd introduce the errors.


My understanding is that they don't bring the service to exactly the SLO... To prevent overreliance on a service, it can be sufficient to introduce some level of failure, which may still be well above the SLO.

http://danluu.com/google-sre-book/#chapter-4-service-level-o...


One major issue and you stop artificially inserting errors, which are inserted at a rate such that you could turn off the error inserted within some timely manner and still stay within budget.


Netflix has Chaos Monkey. The purpose is to find unexpected flaws and risks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: