Dispatch – Open-source release of Netflix's crisis management framework

iruoy · on Feb 24, 2020

You can also read it here: https://outline.com/p3PBUY

Medium really annoys me, because I can't even scroll without disabling my adblocker.

athenot · on Feb 24, 2020

I have Safari configured to open all medium.com pages directly in Reader Mode. Takes care of all the noise/interruptions they throw up on the screen.

ksec · on Feb 25, 2020

How do you do that?

athenot · on Feb 25, 2020

- Open a medium URL.

- In your toolbar, click the site settings icon (might need to edit your toolbar if it's not there).

- Check: "When visiting this website [X] Use Reader when available"

ksec · on Feb 25, 2020

This is Brilliant! Thank YOu.

regnerba · on Feb 24, 2020

Link doesn't work for me.

asdkhadsj · on Feb 24, 2020

Weird, failed for me too - I refreshed and then it worked. Wonder why

iudqnolq · on Feb 25, 2020

I use Outline on average around once a day and for the past few months it's needed a refresh somewhere between 10% and 20% of the time. I assumed that it had to do with them fetching the page, I'm surprised a cached get failed.

rhizome · on Feb 24, 2020

took two tries for me, too

goojal · on Feb 24, 2020

do not trust a link from some stranger on "hacker"news!

madrox · on Feb 25, 2020

Incidents basically represent engineering culture in extremis. Seeing how large orgs manage incidents really says a lot about culture. It's interesting to see Netflix go so far to automate what amounts to trivial amounts manual labor in (hopefully) rare instances. It says a lot about how they think about making mistakes and the developer experience working through crisis.

lifeisstillgood · on Feb 25, 2020

So much of this is generalisable to just running a project or a company (comms, collecting metadata, making smart automation decisions to save time effort duplication.)

There is a deep business transformation lurking here. As a post here says Netflix clearly has at its heart "just automate it all".

luord · on Feb 25, 2020

Python, VueJS and Postgres.

That right there is my favorite stack for prototyping. Though, admittedly, I only say that because none of my prototypes have taken off (yet).

jf___ · on Feb 24, 2020

Fun to see `sentry.io` as one of the dependencies, kind of an interesting level of recursion on an incident mgmt app

doublerabbit · on Feb 24, 2020

So in other words another over-complicated ticket system

athenot · on Feb 24, 2020

No, there's a lot more that goes into handling incidents that affect large production systems. Getting things back up as fast as possible, coordination, communication, getting the right action items out of it. There are tradeoff decisions that need to be made, executives and big customers picking up the phone.

This kind of tooling is what arises in an effort to automate and streamline incident response. When you're operating at Netflix' scale, each minute is precious and if a tool manages to save 45 seconds on each incident, it can be quite valuable.

rhizome · on Feb 24, 2020

The incident workflow about halfway down reads to me like a lingoed-up version of "create a bug, escalate, put a Slack (etc.) URL in the bug, send the bug to blamees/ondutys, message boss(es), finish fix and push, schedule a meeting for the next day." Which it turns out that I've guessed reasonably well, having read the rest of the article. I mean, there's decades behind this very use-case, and at the end of the day it's possible to hook out to Slack from RT, too. But they're not using RT, true.

https://rt-wiki.bestpractical.com/wiki/WorkFlow#Modeling_Wor...

I don't have a problem with the work -- like I said, it's a persistent use-case -- it's just the way it's described here, as if it wasn't and with puffery. And the thin-ness of my skin with this is not the issue!

edoceo · on Feb 24, 2020

I'm a small team and we had to get a custom CIC for all these reasons, just much smaller.