Nobody said that. Things are already broken. Move fast and *fix* things is what ...

blackflame7000 · on March 9, 2020

Well to be honest we don't know how badly things are broken only they do. But the concept of moving fast while doing things perfectly is the holy grail of software development and not as easily achieved as you seem to make it appear.

manigandham · on March 10, 2020

I only said it needed to be done fast, regardless of how much work. Of course it's not easy.

I'm surprised by all the misinterpretation in this thread. Seems like it reflects the laid-back West Coast/SV attitude that isn't a good fit for high pressure time-sensitive work in other industries.

yibg · on March 10, 2020

You seem to think (or at least imply) that hard things can be done fast if only you work hard enough at it, that it's just a matter of trying. This is just not true. Not sure how else your posts should be interpreted. I've spent days with a team (yes a competent team) just tracking down a bug, let alone fix it (although usually when it's tracked down it's relatively quick to fix). If this issue involves multiple systems in a highly complex environment, then it very well could take a while to address fully, no matter how hard they work at it.

manigandham · on March 10, 2020

Because that's usually the case. Crunch time, disaster recovery, and emergency fixes are common in every sector from video games to aerospace. If you can't fix then switch to a secondary, or rebuild from backup, or throttle users, or process manually, or do anything other than be completely down.

RH wasn't prepared with any contingency. They should have a resolution for their users - even if they can't find or fix the original cause. That's the failure I'm talking about.

See the 2 other users in this thread that describe similar high-pressure situations.

yibg · on March 10, 2020

Depends on what the issue is. Could be something that can be quickly fixed or not. Although going by the lack of a resolution I’m assuming it’s not.

Reality is, without knowing more about what’s causing this it’s impossible for either of us to say. If there is indeed some fundamental bottleneck that was previously not known, then I certainly won’t be surprised if it takes a while to sort out.

Now you can say they should’ve load tested, capacity planned etc etc. But we are where we are. Still can’t go back in time to turn this into a quickly fixable problem if it’s currently not.

Edit: also pretty disappointed that we don’t know more about the root cause. As an user I’d want to know what the issue was and what they are planning to do to about it to evaluate if I should trust them going forward.

blackflame7000 · on March 10, 2020

> needed to be done fast, regardless of how much work

This is the part you don't understand. There is a difference between digging ten one-foot deep holes vs one ten-foot deep hole. People need time to plan how to coordinate and then get on the same page so that everyone can work at their own pace. That is the part that is not parallelizable and is the rate-determining step.

manigandham · on March 10, 2020

The context is all lost here. Plenty of other companies and industries have emergency action and disaster recovery. People don't work at their own pace, they work to the deadline with solid procedures. They can fix and replace entire components to restore service ASAP since because that's the priority.

If this sounds unfamiliar or onerous then it's because you and others might have never experienced teams that do this. Robinhood is clearly lacking this experience and disaster planning.