I'd be curious to know the root cause(s) of their outages. I presume they're using Kafka as an event bus, based on their Faust library[1]:
Faust is a stream processing library, porting the ideas
from Kafka Streams to Python.
It is used at Robinhood to build high performance distributed systems and
real-time data pipelines that process billions of events every day.
I did some evaluation of various Kafka ingestion methods a while back (including Faust), and didn't find Python to be a great fit, so I'd be curious to know if that has anything to do with it.
I had looked at their job board to see their technologies and saw heavy use of python. I know large systems can be built in python, but for financial services something more performant and safe probably would have been better. But who knows, maybe Robinhood would never have had the adoption its had thus far if it chose something else.
[1] https://faust.readthedocs.io/en/latest/