Hacker News new | past | comments | ask | show | jobs | submit login

In another comment, I pointed out a mistake of mine that was a major factor in an outage.

I also screw up all the time in ways that would cause outages, except we have automated tests, tsan/asan, code reviews, a staging environment, various safety checks, experiment gates, pre-mortems, slow rollout procedures, an alert on-duty SWE and on-call SRE, etc.

Today one of my mistakes was caught early in the prod phase of our push. That's much later than I would like but still before it did any real damage. I submitted the bad code last Wednesday and have been out sick with the flu (and caring for my preschool-aged kids) since then, so my awesome team handled my problem for me.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: