Hacker News new | past | comments | ask | show | jobs | submit login

Fantastic idea. When firefighters screw up, people die. So we have a large set of very practical guidelines and most of them exist because somebody died.

I'd love to see outages (for example) managed with the Incident Command (IC) system. Maybe some companies do this already, IDK. But IC works very well in any "WTF is happening?" emergency situation.




Yeah, topics like that have already been covered fairly well. I'm more interested in "tactical" level discussions. Avoiding (or accounting for) tunnel vision, having muscle-memory level familiarity with your tools, the importance of pre-planning, etc. Specific things that can be meaningful to an individual or small team.


For Google that's the standard way to manage incident: https://landing.google.com/sre/sre-book/chapters/managing-in...

(I assume it's the case in many other places)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: