On the profession side of this, if you're an engineer at RH in the thick of this - many have been there. It seems dire now, but in a few years the fog, panic, and haze of no sleep will become a story you tell your peers at happy hour.
Many will cast stones - but they have been there too. If they haven't, well maybe their day will also come. You may feel bad at the moment - but the best way professionally forward is "We try our best tomorrow"
If this were an outage directly caused by a natural disaster, I could understand. This outage was an availability problem. This clearly points to some prioritization problems within the leadership layers if robust and resilient infrastructure was not emphasized.
The prioritization problems may not be due to ignorance or malice though, and may be justifiable if there are other fires that are burning brighter. It's still pointing to problems though, and I think it's completely legitimate for engineers to question the stability of the company when this sort of thing happens.
At the very least as an engineer I would be asking some pointed questions of my leadership. Maybe not dusting off the resume yet, but still I'd want to get reassurance from internally that the leadership problems that caused this are being addressed.
Sometimes you just have to cut them some slack. Have you engineered a highly available cluster before? I'm not talking about the hot-standby postgres master that gets called on once every 2 years, but I'm talking about a 180 node Cassandra cluster thats doing 15,000 writes a second 24/7 and peaking at 60,000 writes a second every day, and you have to do node replacements every week or two because of the high load.
Or I'm talking about a 200 node hadoop cluster thats doing the electrical metering and billing for 8 million people, and is NOT allowed to stop.
Or the trading platform thats running sub millisecond trades and downtime means 300,000 $ USD per minute.
These are systems I have engineered over the last 10 years, and I can say: These things are complex and have failures in 1000 different ways, and while you're monitoring 999 of them that one thing you're not looking at is festering under the surface (your monitoring system is tracking IRQ hardware interrupt response times, right???)
Part of being in a team is everyone pulling together, and yes it's stressful at the time, but even very good management cant see all ends, just like very good engineering cant predict everything. I don't think it's useful to start pointing the finger at management and "asking some pointed questions at leadership" because sometimes everyone is doing their best. Yes we should analyse our failures so we can do better, but your tone is very accusatory, and I believe that a better approach is an all inclusive chat about how we can do better, and management saying "great job engineering" for fixing it, and giving them a break after the stressful event.
Does the duration of their downtime suggest a “1/1000” unmonitored oversight? Or is it more like a threshold that was meet and probably could/should have been observed?
And FWIW, they have down time every day and weekend, at least in a virtual sense; the load does drop off in a very real sense too. You are spiritually correct, they should pull together and sort it out, and they owe nobody money here (don’t use a discount broker if you want some sort of guarantee about trades) but as a general rule you should ever feel too sorry for banker under just about any circumstances. The harshest lesson here, for everybody, was the only thing they would do for you was give you some commission free trades but that won’t work with this one, so a non-apology is what you get.
I think you may be focusing on the finger instead of the thing that it's pointing at.
The post reads to me like all those examples were meant to be concrete examples to drive home a more general argument that complex systems are, well, complex, and that there's an element of hubris in taking potshots from the peanut gallery.
I think the original point in this sub-thread boils down to: basic micro-level human error like typos + bad configuration deploys is completely understandable (to a certain extent), but macro level failures that happen by ignoring obvious trends and best practices is malfeasance.
Personally I don't think Robinhood will ever release a full honest post-mortem and so we'll never know (and never be able to judge fairly).
If the system failed by virtue of being too complex, that is also malfeasance because any devops/SRE worth their salt (as might be expected at a 7 BILLION DOLLAR company) should smell unnecessary complexity from a mile away and slowly refactor it away over the course of several years - which looking at Robinhoods downtime history they never did.
The closest example to Robinhoods engineering woes is Reddit, which throughout its early history made fairly poor infrastructure and data modeling decisions but have since repaired and improved on. We should hold Robinhood to higher expectations then Reddit for obvious reasons. Them having similar engineering capability to circa ~2012 start-up reddit is INEXCUSABLE.
As with any big system, spinning it up is much harder than bringing it down. After an outage, they have to stay offline to audit their systems to ensure that all the nodes are synchronized, all queued trades have been processed, and no accounts are in invalid states. I'm sure they could have restarted in a matter of minutes, but the risk is ridiculously high.
No doubt there are many complex systems and they inevitably go down. Every provider has suffered meaningful outages.
I think the issue here isn’t so much that the system went down but the blog post.
It’s very light on details and doesn’t go far enough in terms of re-establishing trust with the customers that were affected. Which by the looks of it is everyone attempting any trade most of the day on Monday.
On the other hand, they've had plenty of time and resources to do just that in a reliable fashion, it's not like it's one guy in his bedroom (I hope!). It's not like they are volunteers doing this open source for the community, they are getting paid (very well, I assume) to run the system. And Management is getting paid (even better, I assume) to make sure the priorities are right and correct decisions are taken. "Who could've known there might be a lot more traffic" sounds like somebody failed in Management, and engineering might have failed by not foreseeing the issue and/or informing Management.
Sure, don't burn people at the stake, but "hey, it's hard, don't blame them, they are doing their best" doesn't cut it for me. I'm sure they're expecting to be paid and not for someone to "do their best" to pay them.
Can you give me a concrete example of a massive distributed system that has zero downtime?
Because the largest distributed system I have seen and worked on was at Apple (or maybe DFP at Google) - and even though they had some of the smartest people in the world and literally billions of dollars behind them, there were still an endless list of problems and downtime events.
The point isn't that "a system cannot fail", the point is "if the system fails, it's no big deal, shit happens, cut them some slack" is a weird way to look at it for corporate systems, especially in sensitive areas.
If you're running a HA system and you only need one nine to express your availability percentage, sure, sure, you have the smartest people etc and you're doing such a great job, and yeah, yeah, show me one system that has 100% uptime etc.
It didn't say it's no big deal, you're extrapolating and exaggerating my words because your argument is weak.
My point was that failure is inevitable in any complex system, and I was responding to the parents point that he immediately pointed the finger at management in an accusatory way, and I was saying that's not constructive.
Also your point "They expect to be paid" isactually implicitly "I expect management do do their best to pay me" - there could be a failure in the payroll system, there could be a failure in the banks, there could be many reasons outside managements control that means I'm not getting paid. I can say "why don't you have redundant payroll systems" (which is a stupid waste of resources given the cost/benefit/low failure rate) But my point is again - complex systems have failures - and SOMETIMES, JUST SOMETIMES, YOU CAN CUT THEM SOME SLACK.
When a fiduciary breaks their duty to their clients, you don’t cut them slack. You sue them. This isn’t like Silicon Valley where you can get away with antics like this.
You must be new here, welcome to late stage capitalism. Nobody rich goes to jail, and lawsuits are cost of business. You just factor them into the 5 billion dollar company, pay your 300M dollar fine and walk away a billionaire.
Google doesn’t target zero downtime. The marginal cost is too high. For important services (like Search page and ads) they aim for 5 nines uptime (99.999%), which translates to 5 minutes of downtime per year.
Then certainly you understand the importance of SLOs, how SLAs regulate reliability and feature velocity.
Let’s say I’m RobinHood. Let’s pick an SLO. I think three nines monthly SLO is a good start, that budgets ~45 minutes of down time per month. Maybe I can argue for a more aggressive SLO, but let’s pick this one - because I think it will keep users relatively happy as trades aren’t blocked for more than an hour at worst. I drive an agreement with stakeholders that if we needle out of this SLO, we drop all feature work and focus on hardening reliability.
RobinHood was out for a whole day. This is unacceptable. It points to a complete organizational fuck up - product and feature development have too much power and priority at the expense of reliability.
I’m not sure that RobinHood has ever heard of SLOs or reliability engineering. I really hope their leadership is smart enough to hire and empower the right people that will drive organizational change.
Why would they burden themselves and their feature velocity with SLOs/SLAs when they can build a 5 billion dollar company insanely quickly even though they have downtime?
The users are not saying "We measured your 5 9's and I'm going to quit if you have 6 minutes more downtime"
Sure they lose some users who get annoyed, but they have a 5.6 billion dollar company, some users will go, a lot more are coming
Users are saying “you were down for an entire day and I lost money - I’m out”.
Your reliability target is a product decision. Maybe with the right features the market will tolerate shitty unreliable financial services that falls over for an entire day. Or maybe RobinHood will go from a 5.6 billion dollar company to a zero dollar company because users hate them.
Point is high reliability is choice based on priorities - which seems like RobinHood does not care about. And I will certainly stay the fuck away from their platform.
This works in the acquisition phase, which I suspect Robinhood is nearing the end of.
Once their userbase turns into the retention or conversion (competitors have $0 trades now, too) phases, mistakes like this are much more costly in the long term.
You're missing the point. Reliability and Performance are features in Financial markets. It is a key feature for brokerages which they constantly advertise to differentiate themselves. These companies lay undersea cables to shave off few milli-seconds latency and pay a very hefty premium to be colocated in the same DC/rack as the stock exchange. Therefore Performance and Reliability are inseparable.
Nobody is debating whether people will continue using RH and that was never the issue. RH has massively damaged its reputation and reputation _is_ everything.
> Or the trading platform thats running sub millisecond trades and downtime means 300,000 USD per minute.
I mean, I'll bite. Assuming you only traded 6 hours a day (ie US) that'd be a 27bn dollar a year strategy, and the only way for returns to be linear and trading to be sub milli is market making/arbitrage.
Kudos, these are moderate sized systems you've built over your career. There are lot bigger and more mission critical systems in the world and you might build them one day.
I understand GP's tone wasn't exactly nice here. But here's the rub with RH's outage. RH is unfortunately in an industry (Finance, Healthcare, Aviation, Food, etc.) where people _need_ to trust them to be successful. The consequences of failure in these industries is very catastrophic not only for them but their clients. Sure failures happen but the scale at which RH has failed and the lukewarm response they've put out has pissed off people. I don't recall any brokerage, old or new, that has failed so catastrophically and has responded to it so poorly. If you think you have a worse example, I am all ears.
I don’t remember them offering any apology or explanation at all.
That’s an exchange mind you where things like the global price of oil and s&p futures trade. Not a small boutique brokerage.
Further they have planned downtime every week & at that point still had planned daily downtime I think.
I think Robinhood screwed up. I think they should learn a hard lesson. But people thinking that trading is some high reliability industry haven’t spent any time in it.
The scary thing to me is are healthcare, aviation & food the same?
Part of AWS's sell with elasticity is only spending what you need, but those industries have redundancies or unused capacity.
Someone in one of these threads said there's a hidden DNS within VPCs that can fail and isn't scaled, so if that's true, they might just have to architect around that unless they can get AWS to change it. It's on RH for not knowing that but it's also kind of on AWS too.
But as far as what you can do, you can really only split your cash across brokerages if you want to engineer the same redundancy yourself. Otherwise, RH would need to route everything to another exchange to keep satisfying orders, and even that is just another system that could fail. Keeping all of your money in one brokerage doesn't seem ideal if you want to completely avoid downtime. Doing the same redundancy yourself with those industries isn't really practical.
Boeing's failures have killed hundreds of people. Governments still pay them and people still fly on their planes. Stores sell salmonella contaminated products all the time and people still shop there. RH's failure pales in comparison. Crypto exchanges fail all the time, people still use them. RH may lose a few customers in the short term but I see no reason they wont bounce back, they provide a product people like and the majority of people dont like change and will stay with them provided stability returns soon.
Non technical people dont want a technical apology, they just want an 'our bad, working on it' which is what was provided. The company will be fine. Should they be is another question all together.
Technically people still fly the old Boeing planes that don't crash. The 737MAX is still not in service, and there is a likelihood that it may never go back into service. All future orders are cancelled, and there isn't a clear pathway to the plane re-certified and more importantly for people to trust them again.
High trust systems require just that, high trust. And once broken it's hard to re-establish.
Crypto exchanges certainly have their fair issues of downtime, but don't forget that crypto exchanges for a long time operated purely for early adopters as crypto wasn't something that everyone traded. There was also less availability of competition, because again the industry was newer and there were fewer choices.
And certainly Coinbase helped to popularize crypto trading and they had their fair issues, but I don't believe they had an outage of this exact magnitude, and again they were in an early adopter area where mistakes are seen as part of the process. If not expressly, then at least subconsciously.
I think that we have entered a new 'trust' phase, where we pretty much don't care about it and just want familiarity. Look at Facebook, privacy has been violated a thousand times, and we still keep logging in. Experian is still chugging along. People used and paid for AOL for years when they did not have to.
Online consumption is different than in person. You go to a restaurant and the food is bad you probably don't go back. Online the bulk of consumers just keep going back because that's what they are used to. We love our favorites.
I remember all of AWS going down a couple years ago.
Boeing itself is fine even though one product killed hundreds. Robin Hood is going to be fine. This will be forgotten in a week.
It is not about scale, it is about the fact that people lost real money. If you can’t make it work you should not be in that business, and I don’t really care how hard they work.
I carry reasonable investment balances - I’m not an active trader but in this space I expect availability. I’d never put my money on RH - and this has nothing to do w risk profile
I've been trading for years, would not keep a penny on that platform. They've effectively cut off all liquidity for their customers for at least 2 days during high market volatility. You are missing out on tax loss harvesting, buying dips etc.
Nothing was stopping the Robinhood customers from opening an eTrade or TD Ameritrade account or something and doing their trading out of that platform for the duration of the outage. Robinhood isn't really an institutional platform in my understanding anyway.
I've was a primary contributor on a migration of time series data to Scylla. As an anecdote, I once emailed our business contact about tracking down why we appeared to have data inconsistencies between our new (Scylla backed) and old system. I thought the e-mail got lost since we never heard back...until 8 months later (long after we had de-prioritized the migration since our old system was "good enough") asking if we had tried the newly released version which fixed a data loss issue.
Blew. My. Mind. Not only because of the radio silence and then dropping back in out of the blue as if no time had passed, but also because they had a data loss issue.
So rechecked out my previous branch, upgraded Scylla versions and sure enough the data differences we were noticing before appeared to be resolved. I couldn't believe the amount of time I had spent combing through my code to see if I had a hard to detect bug somewhere...but nope, it was ScyllaDB (although I am sure there were plenty of other bugs...just they weren't the cause of this specific symptom).
I am actually a fan of ScyllaDB and what is trying to do. Performance was great (as advertised) and management was simple enough; but they are going to need to work pretty hard to convince me "instability" is just rumor after that experience not too many years ago.
Well, we moved to it from Cassandra. It's yet to fall over and we're querying it maybe 200k times per second. No changes were needed from the client driver side of things too. YMMV.
Ive seen bigger, scarier, potentially costlier time based bugs personally. I dont think this would make me reevaluate my employment if I was at robinhood. As the parent says you either learn these lessons the hard way or you havent learned them yet. Thats doesnt translate to being a “leadership failure.”
Your smaller point about prioritization is spot on though. I dont believe Ive seen any similar incidents lead to business ending outcomes. I personally point to sony or, more recently, equifax as examples of the disparity between actual business impact and technical abhorrence. In light of that why is it worth trying to preemptively solve technical challenges instead of business needs? Every calorie spent on “what if” subtracts from “whats needed.”
Reminds me of the book Showstopper and the personal stories in - its about the creation of Windows NT. Pretty interesting how things where not so differnet some 30 years ago
Important step though: have a retro, many maybe and write a report explaining what was messed up and how you might mitigate in the future. It looks like it’s going to be a good one. If you can share a sanitised version publicly, that would hopefully make it all a little bit more worth it.
I think I speak for everyone here if I say that, if that report is public and interesting, everyone on this thread will be happy to get you a drink.
Robinhood opened up stock trading to a large portion of the population that would otherwise not have been interested in traditional trading platforms with high commissions.
Their success helped to pressure companies such as TD and Schwab to mostly get rid of commissions as well, which is great for the average trader
I think Robinhood has a lot of problems, but to say they're not pushing any boundaries ignores the huge changes they've brought to the industry.
The fees I am referencing were imposed by brokers, not exchanges. Our exchanges have stock splits, but that still doesn't make a $10 fee on a single $50 share very palatable to the small-time investor.
Having worked as a professional investor since 2012, I can say these outages can happen anywhere. I've seen day long outages at exchanges where tens or hundreds of billions of dollars would have been trading, at brokers where who knows how much would have traded. I've also experienced these outages at retail companies that are more established, including TD Ameritrade (I become a customer when ThinkOrSwim was acquired.) I have also seen brokers screw over individuals on a significant scale without real ramifications.
The fact that Robinhood is telling people anything about the outage is only because they are the company they are, operating in the startup world/mentaity.
To the people thinking they should be compensated in some way...If you are doing >$1m daily volume, maybe you can contact them to see what they can do but even then, I doubt it. The way this should be handled is to have multiple executing brokers. You can implement offsetting positions if needed and transfer positions when your main account becomes available, if you are using a broker that can clear. Right now it seems Robinhood is working to implement clearing but you could still go to neutral or put on your positions.
> The fact that Robinhood is telling people anything about the outage is only because they are the company they are, operating in the startup world/mentaity.
Yep. Intercontinental Exchange and Eurex, two huge capital markets exchanges, routinely have multi-hour outages and don't even acknowledge that they've happened, let alone explain them.
I have mixed feelings of sympathy about this whole RH thing.
Anyone who has used RH regularly should be well aware of how inept it is. Any spikes in volume or volatility, even on a single stock, bring it to it's knees pretty often. Like not just the last week, but even during calm periods. I've personally lost 20-30% on positions solely because RH was bugging out, thankfully I use RH just for "fun trades" usually <$100.
I cannot fathom having the balls to trade any real amount of money on the platform while being aware of these long term issues.
On the flipside I feel for new users and perhaps even generally inactive users who weren't aware of RH's incredible flakiness. I'd imagine (or hope to) the losses of most of those users were small, assuming they were new or casual and just testing the waters.
Even if one of my small plays hit it big on RH, the money would just go to my main account on TD (which has been smooth all week shy of a few hiccups Fri morning during record volume). It's been obvious for a long time that RH should not and cannot be trusted. If you're trading options with a $60K account on RH, well, I don't even have words for that level of ignorance.
I abandoned Coinbase after having difficulties getting a few 1000 bucks out of there. It worked out in the end.
Problems with my data I can tolerate up to a point. Problems with my money I absolutely can not tolerate. As you said, it's unfathomable how people can trade money on a platform that's flaky.
The interesting thing about working for a UK challenger bank - I now have visibility into all of the outages going on at large, high-street banks here.
Complete outages are rare, and well-publicised, but things go wrong a lot more[1] than you might think without any communications to customers that anything is wrong, sometimes outright denying[2] that there's a problem.
Everything has outages. Is this the new narrative now that we've moved on from the leap year thing? That RobinHood is just a bunch of shitty engineers?
There are no public details about the root cause.
I think RH is bad for people in general, but this pile-on is outrageous.
Robinhood crashing isn't an isolated unfortunate "well it happens to everyone" moment.
RH has constantly had issues at least since I started using it over a year ago. I didn't notice it really at first, but I also didn't know much about anything trading related back then. It didn't take long though for me to have my first "incident" where my market orders were seemingly vanishing into the abyss as the underlying moved. I'm not talking seconds, I'm talking minutes. For a market order on high liquidity options. Never mind trying to get filled at anything besides the ask (buying) or bid (selling).
RH has had serious underlying issues for a long time now. This incident didn't happen in vacuum. The writing has been in huge block letters on the wall for a long time.
> There are a couple of situations where outages are not normal or acceptable: 1. Dealing with other people's money 2. Monitoring/managing other people's health
Generally true, but there is a couple of exceptions to this rule: if everyone knows that the company is brand new and does not have an established reputation, then using that app requires a general acceptance of risk.
Robinhood was brand new, and outages should have been expected. The problem with Robinhood isn't the outage, it's that it was marketed to college students gambling with their parent's money, who know just enough about the stock market to be dangerous, but not enough to invest properly.
From what I've heard, the "teams" maintaining most of these aren't paid half as much as a mid-level FAANG team.
Luckily for everyone, those industries are so old, they have accidental redundancy built in (paper records for old doctors who can't be arsed to use a computer, etc.).
Saying 'everything has outages' is kind of disingenuous. There are many computer systems in the world today that can be considered to have practically perfect up-time. Mainframes have uptime measured in decades. I realize the concept of 1 gigantic iron box in a heavily-fortified installation with 2N+1 redundancies throughout is still not enough to ensure 100% uptime. But, when is the last time you swiped your credit card and had a failure to process the transaction?
I know quite a few people that were personally affected by this and lost money due to the two outages and they are all pulling their money from Robinhood. The fact that they can't offer any compensation might be a big problem for them, since they already have zero trading fees, which is what most brokerages offer as compensation.
Personally it doesn't pass the smell test for me. The load was much higher the previous week and load problems go away once the load disappears. They probably had a lot less load the rest of the day, so the fact they were down the entire day suggests it was something else. I would need a fully transparent post mortem before I believed anything they said.
You can't process the backlog on a trading platform. If i put in a trade at 2:20 pm and the system goes down, I don't want my trade to execute next morning at market open. That's insane. Especially the RH flavor of YOLO infinite leverage call option nonsense.
Exactly, you have to default to fill or kill within the trading day. You just can’t treat certain products like a standard queue... sometimes time is the most important component
FYI, FillOrKill/ImmediateOrCancel are not the same as a day order.
FoK/IoC means “do not queue this order”. It’s immediately filled (or not) (or, for IoC, partially) based on whatever orders are already in the book, and then you’re done.
Whereas a day order is queued until the end of the day or until it’s filled, whichever comes first.
Load problems don't go away when the load disappears. If the system isn't engineered very carefully (this takes a lot of work!), you may have cascading failures that may take hours to resolve, especially if you have bad retry policies (their mention of thundering herd problem seems to indicate that they might).
I would strongly caution anyone who thinks this subject is trivial, just add a bit of load shedding and you're done. I wrote a bit about my team's work (including a simplified view of some of the considerations that go into how we do retries) here: https://landing.google.com/sre/sre-book/chapters/handling-ov...
They specifically said it lead to a DNS failure. They didn't mention anything else, like corrupt data, etc. Sure there are plenty of ways that outages, not just load problems, can cause significant outages, but what Robinhood specifically said was that they had load issues that lead to a DNS failure. They should be more forthcoming with exactly happened if they want people to trust them.
This is the correct sentiment. People who put anything more than play money into Robinhood should not be surprised when their financial life is ruined.
Quick example: They bought puts on Friday and couldn't unload them for a full day + following morning.
Monday morning puts were down - it was obvious the market was recovering in a big way. Instead of cutting losses at ~20% in the morning they lost ~99% of their position. Some lost 100% since the options expired EOD.
Yes and the point is that today it looked the same as yesterday in the morning but it didn't turn out to be a bounce. It wasn't "obvious" that everything would rise on Monday. Only in retrospect.
Is there a source you can cite for that? Why would anyone want retail investor order data? Especially since most of their orders execute immediately, so you can just get the trade data from the venue...
> Why would anyone want retail investor order data?
Former market maker here.
Retail flow is low risk. If I buy $100mm of institutional flow, I could get a bunch of corporate hedging orders. Or I could make a single bet against George Soros. With retail, one tends to find lots of small orders. Even if there are some with high information, i.e. they're smart money and I'm going to lose money trading against them, they're small enough to be manageable.
Retail is also low information. At an old job, we bought a prominent retail broker's options flow. The number of in-the-money unexercised options that would come through that pipe was mind-blowing. (Today, whoever was buying Robinhood's flow likely got the same.)
This is such an empty update. At the very least, they should have published a detailed postmortem or committed to one by a certain date. How are we supposed to know that they have learned their lessons?
I don’t work for them, but I am pretty sure we can blame the litigious nature of this industry for the lack of detail in the postmortem. Not everyone can afford to be cloudflare :)
Even for Cloudflare, I thought the company will get sued out of existence after the proxy data leak, but finance industry/SEC etc is a completely different ballgame.
I believe it's the fear of litigation rather than actual litigation. Other companies also manage to publish postmortems and don't get sued out of existence.
The compliance world isn’t quite as fast-moving as tech. Even a “high priority” business continuity post mortem at a financial institution is going to take at least a week for all of the lawyers & senior management to agree on the language.
Start from the email notification. They have been asking themselves the easy questions.
Just look at the top questions in their email:
* Are the funds in my account safe?
Yes, your funds are safe.
* Was my personal information affected?
No, your personal information was not affected.
* Can I use my Robinhood debit card?
Yes. If you have a debit card, you should have been—and should still be able to—use your card, but you may have had issues receiving notifications, viewing your balance, and seeing transactions in your app.
------------
The real question is: How is Robinhood compensating for the missed trades?
I think it's unlikely that Robinhood (or any brokerage) would compensate people for losses on hypothetical trades that could have been made during an outage. Such a policy would allow customers to pick their entry and exit points, and extract money from the brokerage at will.
Even if the trades were well-defined at the time the outage occurred, there would still be an asymmetry between people demanding compensation on their profitable trades while eschewing losses on their bad trades. It's doubtful any brokerage would be willing to eat that.
If the holder can afford to exercise, and I can assure you that most RH users cannot afford to exercise a single contract(at least on most commonly traded stocks).
Otherwise it's on the broker to sell it at close to someone who can afford to exercise. And who knows if RH pulled that off or not.
I haven't seen official statement, but I did see a couple reddit threads where Robinhood exercised ITM calls without enough funding in the account, then collected the shares and paid the cash difference.
That's an interesting question. I suppose it's hypothetical in the sense that they now have to look at "what if" those options had been exercised; but unlike a spot trade that someone "would have" done, Robinhood might already have had obligations on its end of the original options trade.
Yeah seriously if you have > a few hundred in options on robinhood. And you’re waiting until the day they expire to unload them. You’re dumb or don’t care about your money.
No brokerage will do that. Here's an excerpt from the account agreement of Schwab, a respected discount broker:
> During periods of heavy trading and/or wide price fluctuations ("Fast Markets"), there may be delays in executing your order or providing trade status reports to you. […] Schwab is not liable to you for any losses, lost opportunities or increased commissions that may result from you being unable to place orders for these stocks through the Electronic Services.
This is absolutely not true. Broker-dealers and brokerages routinely credit clients for execution out of line with the market. Schwab does in fact give price adjustments for slowly or incorrectly handled orders.
The reason nobody will be compensated here is due to two things,
(1) There is no way to determine what a fair execution would have been, since clients couldn't submit orders in the first place.
(2) Clients will adversely select their losing trades for corrections and this would bankrupt Robinhood in about five minutes.
Maybe in some cases they go above and beyond their account agreement if they like you as a customer, but according to the agreement you sign with them its not their problem if things go bad in this way.
Unlikely to have compensation for trades, and only people with limit orders set before the outage would be able to claim damages.
It's no different than you breaking your phone or losing your network connection. Nothing is guaranteed to work all the time. RH might face fines for the extended nature of the outage though, specially since they've managed to avoid them for plenty of past mistakes so far.
If they compensate for missed trades due to service outages, then an attacker could take a position, repeatedly DDOS Robinhood until the position is favorable during a DDOS, and then demand reimbursement since they "would have" cashed out that favorable position.
It follows that Robinhood must never reimburse for outages.
I’d be interested to read a deep technical post-mortem like those which have become fairly standard among other big tech companies. Hoping Robinhood does the right thing here.
Still silence on the traders who lost tens of thousands of dollars? Are they going to be compensating or not?
This blog post doesn't appear to say anything. It's not an apology, it's not an explanation, it doesn't say what they're going to do in response.
This is after the incident in which there was no status updates or support availability for multiple hours of time. Why can't they commit to updates every hour or every 30 minutes?
You could view it as a business decision. Will they lose reputation and customers if they don't compensate for the outage? Do they expect that the long-term cost of that loss would be more than the one-time hit of paying out now?
They may not have a legal/contractual obligation here, but that doesn't mean that treating their customers poorly is without consequence.
The difference is regulation. There are very few regulations and oversight on cloud compute providers, whereas an average person cannot just spin up an app and begin selling securities in a month as you can being a cloud provider.
While RH's ToS does theoretically absolve them of technical issues, they are obligated to comply with 'best execution' securities mandates, no? Separately, it'd be extremely bad for business if they refused compensation.
The point is moot anyway, since they're offering "case-by-case" compensation.
Robinhood will have to deal with a flood of FINRA and SEC complaints from these outages. I'm unsure how much longer FINRA will allow them their broker dealer license with a copious amount of failure in the rear view mirror.
Arbitration is forced, but Robinhood is on the hook for the fees for everyone who decides to arbitrate. Robinhood users might not get anything, but they can still cause pain.
There were some people claiming that RH erroneously exercised their options on r/wallstreetbets. Could be a hoax, but if it isn't, then that seems like grounds for compensation.
Of course, no one complains when RH makes a mistake in the client's favor.
Those people don’t know what is pin risk. Basically their long puts got exercised because automatic execution is determine at 4pm and they didn’t object (creating a short position in equity), their short puts didn’t get exercised because the stock rallied by 5pm and their counterparty was diligent and prevented auto execution (thus no long equities position to compensate the short equities position). Robinhood couldn’t rebuy the short equity position because the actual price rose above the put price leading to a net loss.
Is it just me, or does it feel like the only people using Robinhood are college students gambling with their parent's money?
Given that many extremely smart people who have devoted their lives to the stock market cannot beat average returns, the lack of Robinhood user's knowledge of "pin risk" seems to miss the greater point.
1) Somewhat pedantic: A big reason why performance is-what-it-is is that at any real $$$ liquidity/volume becomes an issue. Lots of option markets are just not that liquid. If you play with only a few $k and robinhood pays for much of market friction then you can potentially outperform market at risk parity.
2) More real: For most people active trading is not about investing, it is about easy and legal gambling. There is a thrill of throwing you money into high risk options or skyrocketing meme-stocks. Because markets are (relatively) efficient the prices of these assets usually reflect their risk profile, so on average you should gain money (flip side of it being hard to beat market is that it is hard to severely underperform, on average, as long as you don't all-in; normally friction cost makes these kind of strategies not work but RH reduces that significantly). It ends up like going to a casino where on average you make a bit of money (but with high volatility means some people lose a lot, some people gain a lot).
That means there are no orders mishandled either. If no one has an SLA then just switching the servers off without thinking about whether customers were planning on trading seems fully in their right. This is terrible for their reputation, but that does't mean they are going to start handing out money because people argue they could have avoided losses if the servers had been up. It's going to be extremely difficult for any customers to back that up legally.
They can probably be fined by some authority, but that penalty isn’t the same as being liable for losses people claim they made because the site was inaccessible. The fine wouldn’t be paid to customers.
People lost the opportunity to place orders. Determining the actual cost is of course impossible since you don't know what orders people would have placed.
"Missed out" doesn't seem like the right phrase here. If you already owned the stock, you still held it, no?
So people who were going to continue to sell off got lucky that they couldn't make that trade, and people who were going to buy got unlucky?
Does anyone seriously expect compensation, or think that it's deserved, or is it group wishful thinking? How would it even work? Would they just take people's word for their supposed intent? Or are people wanting some sort of "here's a gift card" type deal?
This is not to defend RobinHood - I've personally kept my money with well-established companies cause conservative, old, proven systems seem like a good thing for a product in this space - but shit happens, no? There will be more good days, and more bad days, in the market, it's a long-run game anyway, and it's pretty easy to vote with your wallet in this space.
There could be folks holding leveraged Bear ETFs or similar after last week's downturn, who were waiting to see how the market moved Monday morning to decide whether to sell or hold. I could see those folks losing quite a bit of money due to the inability to sell off those types of positions after the market reversed course on Monday.
I suspect you're right though, that it's mostly sour grapes concerning the opposite case - inability to buy as the market rallied.
> There is an entire generation that has never traded through a crisis.
Given that most crises seem to occur roughly every 7-15 years, there will always be such a generation.
A hypothesis: the reason why crises occur roughly 7-15 years is because that is approximately the length of society's collective memory concerning monetary issues.
And even then, limit orders are placed on a best effort basis. I'm sure their terms of service say as much. I have had limit orders not get placed before on otherwise functioning platforms.
The close of today is effectively the open yesterday, so everyone is back where they were.
Of course the problem with the "compensate me" arguments is that a lot of people were going to make decisions that would have turned out poorly yesterday (indeed, the market is balanced and every transaction has a counterparty), though of course with the amazing clarity of hindsight few would recognize or admit that. So if they need to compensate for illusory lost trades, do some people have to pay them for losses they would have incurred?
[I get that there are some complex options that can legitimately be all downside when trading isn't available, but that's a less common option]
Genuine question: With no commission trading at places like Schwab and eTrade, is it even worth trading on Robinhood? For as far as I could remember (about 2 years ago), Robinhood has always failed to scale.
Options are completely free on Robinhood while they still have a per-contract fee at other brokerages. If you don't care about that then no, there's no reason to stick with Robinhood.
Additionally Robinhood self clears options (or for some other reason?) and does not charge the Options Clearing Corp fee of $0.055/contract or the Options Regulatory Fee of $0.0388/contract which all other brokers charge (incl. ones with $0 or flat rate commissions/fees like WeBull, Gatsby, Tradier). All you pay is the FINRA and SEC fees on sells of about a penny each for small trades.
Actually, if anyone knows of another broker who _doesn't_ charge these, please let me know. If you're first for the broker I'll give you $20 for the tip.
Trust me, please trust me, you really really really want to be paying a competent broker when trading options.
If it's chump change you're trading, sure, use RH.
If it's serious money, the $0.65/contract or whatever pays for itself many times over. Even if it's just the ability to regularly get filled between the spread it pays for itself.
Yeah, there seems to be no end to the horror stories of options trades on Robinhood having significant delays before being filled, costing people far, far more than 65 cents.
Options are a derivative, meant for hedging. It's relatively recent that they've gained so much attention as a primary security for speculation. There are decent strategies to make consistent income, especially in selling options, but it takes discipline and capital.
Most people just want the high leverage and quick wins which usually ends badly.
I don't want to reveal too much, but basically undervalued far OTM spreads. I usually net -$0.02 on each trade but occasionally earn $1-10. If I use a "real" broker and pay $0.10-$0.65 per contract the math just doesn't work.
This happened to us at Hustle years ago. Basically if you run on AWS there’s a DNS server provided inside each VPC that usually works fine but which has no observable load metrics etc... so you don’t really know you are slamming it and are about to have a problem unless you audit your entire codebase.
Why? Well that tiny DNS server has certain capacity constraints and if you don’t cache DNS lookups by using a http/https agent for example (in NodeJS) you wind up looking up the same dns info over and over and churning sockets like it’s going out of style. If you run really really hot the poor thing falls over (rightly so).
The limits are high and DNS is fast so you usually don’t notice but when you are under load bugs like this come out of the woodwork. When it falls down you look up the AWS docs, lean back in your chair upon finding this isn’t an “elastic” part of AWS and say “FUUUUUUUUCK” so loud it can be heard from outer space.
If you are Robinhood though don’t you have some former Netflix SRE/DevOps beast on staff that knows this and so you run your own DNS and monitor it?
That's misleading. The way that this has worked for decades on Linux-based operating systems and on Unices is that one installs a local caching DNS proxy, choosing one of the many available: ISC's BIND, Bernstein's dnscache, unbound, dnsmasq, PowerDNS, MaraDNS, and so forth.
Every Unix system having a local caching DNS proxy was and is as much a norm as every Unix system having a local MTS. A quarter of a century ago, this would have been BIND and Sendmail. Things are more variable, now.
To illustrate that this was considered the norm, here is a random book from the 1990s. Smoot Carl-Mitchell's _Practical Internetworking with TCP/IP and UNIX_ says, quite unequivocally:
> You must run a DNS server if you have Internet connectivity. The most common UNIX DNS server is the Berkeley Internet Name Daemon (BIND), which is part of most UNIX systems.
People sometimes think that this is not the case nowadays, and the fact that a computer is a personal computer magically means that a Unix or Linux-based operating system should offload this task and not perform it locally. They are wrong, and that is DOS Think. Ironically, they don't even get to play the resource allocation card nowadays. The amount of memory and network bandwidth that needs to be devoted to caching proxy DNS service on a personal computer is dwarfed by the amounts nowadays consumed by WWW browsers and HTTP(S).
There's no similar argument for a node in a datacentre.
Ideally, not only should every machine have a (forwarding/resolving) caching proxy DNS server, every organization (or LAN, or even machine) should have a local root content DNS server. A lot of (quite valid) DNS lookups stop at the root with fixed or negative answers. Stopping that from leaving the site/LAN/machine is beneficial.
Ironically, putting a forwarding caching proxy DNS service on the local end of any congested, slow, expensive, or otherwise limited link is advice that I and others have been handing out for over 20 years. It's exactly what one should be doing with things like Amazon's non-local proxy DNS server limited to 1024 packets/second/interface.
So the question is not whether there a local DNS cache mechanism exists. It's whether it's set up by the company dishing out the VMs, and if not why not. Amazon provides instructions on how to add dnsmasq, and clearly labels this as how to reduce DNS outages. So it's not even the case that Amazon is wrongly discouraging having local caching proxy DNS servers.
The point of my comment wasn't to say "don't cache" but rather, don't expect that the OS is going to automatically do it for you (as would be the case on Windows and Mac).
Your VPC has a DNS server at .2 of your VPC CIDR block that is mounted via loopback on the dom0 and exposed to your VPC to let you do lookups via their DNS infra.
"Invisible?" I mean, everyone who builds AWS infra, even just single ec2 instances, is aware of it. It's definitely possible that application engineers aren't aware, though.
What scenarios cause this many DNS lookups though? Connections should be kept-alive after the IP translation, so if it's really new connections being setup constantly then wouldn't that show up as a major bottleneck first?
Running on Kubernetes this is easy, it's one of the first issues you hit.
Every DNS request for external domains turns into 10 if you don't explicitly configure FQDNs (dot at the end). This is because in the default configuration the resolver runs with ndots 5 to search all the possible internal Kubernetes and cloud-provider names. Then you have lookups for IPv4 and IPv6 in parallel. So for every external name you look up, you storm the upstream DNS with 10 requests for non existing domains.
Furthermore, the current default DNS service in Kubernetes doesn't have any kind of caching for these kinds of lookups (especially not NXDOMAIN) enabled.
But like I said, this is one of the first issues you hit running Kubernetes on Amazon. It is widely known and can easily be fixed by scaling up some more instances, changing ndots settings, using FQDNs or configuring caching. There is no way that this was the issue, it is plastered all over the internet, the logs are clear and the fixes can be implemented in minutes.
It also doesn't go down completely, the rate-limiter is packets/s on the interface.
It’s easy to have tens of thousands of dns lookups per sec if you don’t know what you’re doing or didn’t pay attention. Connections wouldn’t be bottleneck if the are outbound.
Sad that there isn’t an actual apology anywhere to be found in the letter at all.
And now with the fed rate cut the interest on cash is only 1.3%, with more cuts expected later in the year, which was the last big differentiator. I don’t see how they don’t see massive net withdrawals going forward.
> And now with the fed rate cut the interest on cash is only 1.3%, with more cuts expected later in the year, which was the last big differentiator. I don’t see how they don’t see massive net withdrawals going forward.
This isn't really an issue because the fed rate cut impacts everyone. Other institutions will cut their interest rates as well. I know of a few banks (Canadian) that have already lowered their GIC rates.
If anything, this is actually good for RH. Now instead of comparing 1.8% at RH and 1% at another Financial Institution, you're comparing 1.3% and 0.5% -- a much bigger multiple.
most brokerages don’t actually pay anything. With another cut it’s going to be <1% vs 0%. Hardly anything even with a six figure balance. That’s my point.
Historic... Unprecedented... Thundering herd, a bunch of excuses to explain why they couldn't handle the volume that most real brokerages handle every second.
I'm curious about your thoughts on why a technical infrastructure, which, by nature of being cloud-native, is supposed to be (and likely has been) architected as a highly elastic platform, have not stood the test of time in this regard.
Based on the in information from Robinhood's careers site, their platform is largely based on the following technology stack:
- Python, Django, Django Rest Framework
- Go
- PostgreSQL
- Container and container orchestration technologies (Docker, Kubernetes)
- Microservice-oriented architectures and related OSS technologies (Kafka, Celery/RabbitMQ, nginx, Redis, Memcached, Airflow, Consul)
- Cloud-native infrastructure (AWS, GCP)
- Infrastructure as Code and configuration management (Terraform, SaltStack, Ansible, Chef, Puppet)
- CI/CD and test automation frameworks (Cypress.io, Jenkins, Appium, UIAutomation, Bazel)
Why would you use RH instead of a normal, mainstream brokerage like Vanguard, Fidelity, etc that already has (1) an app and (2) commission-free trades?
Easy answer: As someone who's used Vanguard for index funds and the like for a couple decades now, I had no idea they had an app or commission-free trades. They don't market this at all.
As a secondary answer, normal, mainstream brokerages have pretty bad tech, tbh. I don't expect it to be worse than Robinhood in terms of things like security, and I expect UX to be worse. (Side note: I just discovered that Vanguard actually has a secret security key option hidden under Account maintenance, so I can finally switch from sms 2fa. +1 to Vanguard.)
> Side note: I just discovered that Vanguard actually has a secret security key option hidden under Account maintenance, so I can finally switch from sms 2fa. +1 to Vanguard.
It looks like you still need security codes setup:
"You'll need to register for both security codes and security keys, however. That's because keys and codes go hand in hand—if you lose your key or don't have it, we'll need to send you a code in order for you to log on. In addition, you'll always need a code to access your accounts from a mobile device."
If an attacker can skip the security key you might as well not use one.
My brother has a Fidelity account and apparently even he was blocked from putting in orders online last Thursday, so I'm not sure they're immune either.
I don't think we will get a postmortem. Their lawyers will kill it because it will be an admission of guilt and open them up to even more legal liability.
I would argue that it is worse for a retail brokerage to be down for a day than it is for a trading firm to blow themselves up, though I suppose the latter was more about creating a disorderly market.
Maybe in another four years when they finally realize they still haven't fixed the leap bug. Didn't work out for this year apparently. Last leap year had the exact same problem. The problem is that the ticket is very low priority because right now it is working again and won't happen again until at least 2024 ... By then it will most likely be forgotten. Again.
I can't help but think this glitch was a good thing and Robinhood investors would do better if they traded less anyhow. According to an OpenFolio correlational study, traders who trade more than 12 times per year make 0.5% less than traders who trade less than 12 times per year. OpenFolio was one of the first three websites to have an API integration with Robinhood portfolios.
Companies like Robinhood regularly go down when markets are volatile. It was quite frustrating when the financial crisis was in full swing not being able to log in to my trading account. I reckon I would have made a killing.
"Multiple factors contributed to the unprecedented load that ultimately led to the outages. The factors included, among others, highly volatile and historic market conditions; record volume; and record account sign-ups. "
What a sad press release, I am sure people at their corporate office were sweating over this. The long and short of it is that users trusted the service would work and had possibly a great deal invested only to get a comment when everything breaks down deflecting blame "OMG we weren't prepared for what our users did!"
We live in a sad state of software. I expect things like this and the Equifax scandal to continue if things like software security, reliability, and performance aren't taken into account.
I don't know if it had anything to do with leap year, but I also checked dev tools and saw the same issue (requests for market data on March 3, 2020 on March 2, 2020 8AM PST). However, it was busted for both the website as well as the Android app (and I'm guessing iOS too) so it doesn't seem like it's purely a client-side problem unless all of their clients were built from the same source.
sure, I've seen outages that are caused by DNS config problems. But I don't think I've ever seen one caused by a "thundering herd" overwhelming DNS servers.
Another give away that this is a lie is that support emails were getting a stock postfix error message which means that MX records at least were resolving.
Robinhood isn't a bitcoin company. That's just a feature they have. Its main product offering is the commision free trading--and their presence pushed a lot of big players to adopt the same offering. The wallstreetbets gang is silly and all, but I think they have really democratized stock trading, and made the whole idea seem much more accesible. I think they founders are former finance guys. I hope this doesn't sound like guerilla marketing. I don't even use the app, I have used it but I'm just not that interested in picking stocks. I just think it's cool as an ex-code monkey to entreprenur story.
With mutual fund companies, including Vanguard, generally you can open an account directly with them and buy directly from them (including partial shares, automatic monthly purchases, and dividend/capital gain reinvestments).
How much did it cost to place a trade for a $100 stock previously? RH definitely helped more people gain access to directly trading shares on the stock market, regardless of whether or not they were responsible in doing so.
It's cool that the founders of the company publish blog posts like this for a short outage. Hope other CEOs learn from this and become even more transparent in the future :D
Short outage? They were down for almost the entire trading yesterday and hours today. And there's barely any transparency in this post compared to standard post-mortems.
Just use Square’s Cash App! Free stock trades AND you can buy fractional shares AND a bunch of other stuff like P2P payments and bitcoin. I work there and so can say with some authority that we can handle more volume without going down than RH can.
Actually, bayonetz's posting is the only useful one in the comments for this article. Most of us are here for information from actual industry insiders, and this qualifies.
Here's some more inside info ...
If your "financial app" provider doesn't have a banking charter, run. None of the recent trendy fintech companies have a charter, and are thus clown cars.
Fidelity offers banking services and doesn't have a banking charter but they aren't a "clown car," they are one of the largest financial institutions in the world.
Many will cast stones - but they have been there too. If they haven't, well maybe their day will also come. You may feel bad at the moment - but the best way professionally forward is "We try our best tomorrow"