Then certainly you understand the importance of SLOs, how SLAs regulate reliability and feature velocity.
Let’s say I’m RobinHood. Let’s pick an SLO. I think three nines monthly SLO is a good start, that budgets ~45 minutes of down time per month. Maybe I can argue for a more aggressive SLO, but let’s pick this one - because I think it will keep users relatively happy as trades aren’t blocked for more than an hour at worst. I drive an agreement with stakeholders that if we needle out of this SLO, we drop all feature work and focus on hardening reliability.
RobinHood was out for a whole day. This is unacceptable. It points to a complete organizational fuck up - product and feature development have too much power and priority at the expense of reliability.
I’m not sure that RobinHood has ever heard of SLOs or reliability engineering. I really hope their leadership is smart enough to hire and empower the right people that will drive organizational change.
Why would they burden themselves and their feature velocity with SLOs/SLAs when they can build a 5 billion dollar company insanely quickly even though they have downtime?
The users are not saying "We measured your 5 9's and I'm going to quit if you have 6 minutes more downtime"
Sure they lose some users who get annoyed, but they have a 5.6 billion dollar company, some users will go, a lot more are coming
Users are saying “you were down for an entire day and I lost money - I’m out”.
Your reliability target is a product decision. Maybe with the right features the market will tolerate shitty unreliable financial services that falls over for an entire day. Or maybe RobinHood will go from a 5.6 billion dollar company to a zero dollar company because users hate them.
Point is high reliability is choice based on priorities - which seems like RobinHood does not care about. And I will certainly stay the fuck away from their platform.
This works in the acquisition phase, which I suspect Robinhood is nearing the end of.
Once their userbase turns into the retention or conversion (competitors have $0 trades now, too) phases, mistakes like this are much more costly in the long term.
You're missing the point. Reliability and Performance are features in Financial markets. It is a key feature for brokerages which they constantly advertise to differentiate themselves. These companies lay undersea cables to shave off few milli-seconds latency and pay a very hefty premium to be colocated in the same DC/rack as the stock exchange. Therefore Performance and Reliability are inseparable.
Nobody is debating whether people will continue using RH and that was never the issue. RH has massively damaged its reputation and reputation _is_ everything.
Let’s say I’m RobinHood. Let’s pick an SLO. I think three nines monthly SLO is a good start, that budgets ~45 minutes of down time per month. Maybe I can argue for a more aggressive SLO, but let’s pick this one - because I think it will keep users relatively happy as trades aren’t blocked for more than an hour at worst. I drive an agreement with stakeholders that if we needle out of this SLO, we drop all feature work and focus on hardening reliability.
RobinHood was out for a whole day. This is unacceptable. It points to a complete organizational fuck up - product and feature development have too much power and priority at the expense of reliability.
I’m not sure that RobinHood has ever heard of SLOs or reliability engineering. I really hope their leadership is smart enough to hire and empower the right people that will drive organizational change.