Hacker News new | past | comments | ask | show | jobs | submit login

> This should be fixed within days even if it means redesigning the entire system.

Please tell me you do not manage a team of software developers.




The last 20 years I have been working in heavily regulated environments; in cock-ups like this were your Retail clients got royally screwed (for a second time in such a short period), on a repeat issue, the hammer will fall HARD on their heads. Anyone wanted to sell to minimize losses has been severely damaged.

There are many ways to resolve these issues FAST. The easy way is to throw money at this. Go tomorrow (literally) and bypass all procurement controls and go to a mega big provider and scale this asap. Any financial services with a half decent IT has done the paper exercise to this scenario (at least for the purposes of BCP/DRP). The slow/better/mature way may be too slow, especially with the current market conditions.

The RH folks will definitely get a visit from SEC, their external auditor, and their external auditor will get a visit from SEC.(their auditor will be in the deepest of shits)(how come they failed to spot such a going concern issue?)(what the hell were they looking for on their audits?)(did they only send juniors over there?)

I feel sorry for the retail traders that got knocked down. I think that anyone locked in buying at 27-28k (US30) should wait 6 months to breakeven and after the US elections (irrespective of the winner) there will probably be a rally.


This is why you would use IB for a retail brokerage account versus the "cooler app" if you care about execution and uptime.


They aren't that fantastic either. They had issues today as well, I couldn't exit out of spreads, so I had to liquidate everything and that was failing too. It was a hot mess. Edit: I didn't actually want to liquidate all my holdings but it was better than staying in and risking everything else.


"redesigning the entire system" was an exaggeration. They have a serious problem if that's really what they need to do.

Rolling out fixes fast, even if they require intensive changes, is completely reasonable and expected in many industries though. They should have the talent and procedures to get it done. This isn't some startup web app, it's a multi-billion dollar broker managing millions in client funds. It's bordering on incompetent to have 3 outages in 2 weeks.

EDIT: What exactly is everyone disagreeing with?


> Rolling out fixes fast, even if they require intensive changes, is completely reasonable and expected in many industries though.

This is a financial trading platform. Do you understand the risks of potentially introducing a different bug?


While I don't agree with OP about replacing an entire system overnight. I do remember a friend who worked on a trading floor, and when they had bugs, their manager would say "the traders are taking a 30 minute lunch, you have that long to fix it" with the clear implication that they'd be fired if they didn't.

So I'm not sure if I know that the state of the art for trading platforms is as rock solid as everyone is implying, and Robinhood seems to be way far off from whatever gold standards there are, see infinite leverage bug. So I don't think it's crazy that they move quickly to fix it.

I'd never be able to stomach the pressure, and I wouldn't wish it on others, but it doesn't seem crazy.


I understand the risks need to be weighed against restoring service for their users who might be losing money and avoiding regulatory fines.

Changing a single line can introduce a different bug. Use proper QA and testing to catch as many as possible as with any development.

My emphasis is on getting things fixed quickly. They need to do whatever it takes to get systems online asap. Not sure what's so controversial about that.


Because move fast and break things doesn't work in finance.


Nobody said that. Things are already broken. Move fast and fix things is what they need to do.


Well to be honest we don't know how badly things are broken only they do. But the concept of moving fast while doing things perfectly is the holy grail of software development and not as easily achieved as you seem to make it appear.


I only said it needed to be done fast, regardless of how much work. Of course it's not easy.

I'm surprised by all the misinterpretation in this thread. Seems like it reflects the laid-back West Coast/SV attitude that isn't a good fit for high pressure time-sensitive work in other industries.


You seem to think (or at least imply) that hard things can be done fast if only you work hard enough at it, that it's just a matter of trying. This is just not true. Not sure how else your posts should be interpreted. I've spent days with a team (yes a competent team) just tracking down a bug, let alone fix it (although usually when it's tracked down it's relatively quick to fix). If this issue involves multiple systems in a highly complex environment, then it very well could take a while to address fully, no matter how hard they work at it.


Because that's usually the case. Crunch time, disaster recovery, and emergency fixes are common in every sector from video games to aerospace. If you can't fix then switch to a secondary, or rebuild from backup, or throttle users, or process manually, or do anything other than be completely down.

RH wasn't prepared with any contingency. They should have a resolution for their users - even if they can't find or fix the original cause. That's the failure I'm talking about.

See the 2 other users in this thread that describe similar high-pressure situations.


Depends on what the issue is. Could be something that can be quickly fixed or not. Although going by the lack of a resolution I’m assuming it’s not.

Reality is, without knowing more about what’s causing this it’s impossible for either of us to say. If there is indeed some fundamental bottleneck that was previously not known, then I certainly won’t be surprised if it takes a while to sort out.

Now you can say they should’ve load tested, capacity planned etc etc. But we are where we are. Still can’t go back in time to turn this into a quickly fixable problem if it’s currently not.

Edit: also pretty disappointed that we don’t know more about the root cause. As an user I’d want to know what the issue was and what they are planning to do to about it to evaluate if I should trust them going forward.


> needed to be done fast, regardless of how much work

This is the part you don't understand. There is a difference between digging ten one-foot deep holes vs one ten-foot deep hole. People need time to plan how to coordinate and then get on the same page so that everyone can work at their own pace. That is the part that is not parallelizable and is the rate-determining step.


The context is all lost here. Plenty of other companies and industries have emergency action and disaster recovery. People don't work at their own pace, they work to the deadline with solid procedures. They can fix and replace entire components to restore service ASAP since because that's the priority.

If this sounds unfamiliar or onerous then it's because you and others might have never experienced teams that do this. Robinhood is clearly lacking this experience and disaster planning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: