Hacker News new | past | comments | ask | show | jobs | submit login

This is an idea that I hope becomes more widespread for operations folks -- optimizing what alerts to prioritize fixing based on the number that result in wakeup calls.

At the Velocity conference this year, Etsy did an amazing talk on sleep and being oncall (I can't seem to find it on youtube?). They released an open source app that links their oncall system to a sleep device (jawbone or fitbit) at https://github.com/etsy/opsweekly. Also, they had a nice graph which described how they were woken up less over the year because of this system.

Having this metric as another layer behind primary error budgets (app downtime is inversely proportional to the number of times your devs get to deploy new features) is a nice way to keep your operations staff very happy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: