Hacker News new | past | comments | ask | show | jobs | submit login

Plus it would be weird to present just that specific information, outside of the context of a post mortem / failure chain analysis type discussion.



That's true, though they are also saying things like "We created a permanent fix for the bug and began deploying it at 17:25.". "Permanent fix" sort of implies they understood the issue really well.


That's my point though. Even though they may understand the immediate flaw in their code that caused the issue, there's not much use (for them or their customers) in just talking in detail about that specific flaw.

I'd go so far as to argue that the specifics of the flaw are immaterial right now. At this stage, the important thing is that they have identified a specific code change that was the proximate cause of the issue, and have a mitigation in place. This is contrasted with more mysterious and hard-to-track-down failures. ("We are working to understand why our systems are down and will post another update in 30 minutes")

What will take time, and the thing which will be interesting, is failure tree analysis. (You might hear the phrase "failure chain" or "root cause" but IMO it's quite rare for things to be so linear). That can help identify opportunities to improve processes at many different levels of the product lifecycle.

Humans are fallible, and there's no way we can write bug-free software, so the solution has to be more robust than "hope that every member of our organization never makes a mistake again"


Yes, I was saying I would have avoided words like "permanent fix", because it sets unrealistic expectations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: