Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Arpio (YC W21) – Protect your business from AWS outages, ransomware
66 points by doug_neumann on Jan 28, 2021 | hide | past | favorite | 41 comments
Hello HN! We’re Shaw [sterwill] and Doug [doug_neumann] and today we’re very excited to share Arpio with you (https://arpio.io). Arpio is a SaaS that protects AWS environments from downtime by making it easy to recover from outages, ransomware, cyber-attacks, and human error.

What that means is that when critical AWS services go down (like the Kinesis outage in November [0]), Arpio can launch identical workloads in a healthy region. Or if a bad actor does bad things in an AWS environment (like Codespaces [1] or Webex Teams [2]), Arpio can quickly restore everything to an alternate AWS account.

Our story goes back to the big S3 outage of 2017. In February that year an AWS employee made a typo at the command line, and inadvertently took down much of AWS’s Northern Virginia region. That outage lasted 5 hours, and we were among the thousands of companies impacted.

All outages suck, but the timing on this one was particularly bad for our business. And worse, we had no control -- all we could do was wait for Amazon to get us back online. As you can imagine, the execs weren’t exactly happy about that...

With Arpio, we’re building the solution we wish we’d had back then. Arpio maintains an exact replica of your production AWS environment in a different region (that you choose) and optionally in a locked-down AWS account (that you own).

This recovery environment includes your data and your infrastructure, and it’s updated frequently as your environment evolves. It’s also checkpointed, so you can roll back to a prior state to recover from data corruption or ransomware. And when you aren’t using it, it’s dormant, so you don’t have to pay AWS for resources you don’t need.

But when you need it (or want to test it), Arpio can have it up and running in a few minutes.

Disaster recovery is usually custom-engineered for a given workload. With Arpio, we’re building a general-purpose solution that works for most AWS workloads. We handle the complexity ensuring every route table is rewired, every security group rule is correct, every private IP address is preserved, and every database hostname is aliased. And handling that complexity makes Arpio simple to implement. We can often get new customers onboarded in under an hour.

Arpio works today with EC2, EBS, RDS, ECS, ECR, ELB, VPC, IAM, ACM, Autoscaling, Cognito, ElastiCache, and CloudWatch. We’re delivering Beanstalk and EFS support in the coming weeks. If we don’t yet support your environment, drop a comment below - we’d love to get your feedback on what we should build next.

We encourage you to take it for a spin. Or if you’re up for a chat, send me a note (doug[at]arpio.io) - I’d love to walk you through it in person.

So, HN, what do you think? We’re excited to get your feedback!

Thanks, Shaw & Doug

[0] https://aws.amazon.com/message/11201/ [1] https://news.ycombinator.com/item?id=7909791 [2] https://news.ycombinator.com/item?id=24319293




> Data never leaves your accounts.

That’s good. I’m wondering how you support cognito though?

Do you just run custom ETLs in customer lambdas?

And wouldn’t this cause all users to have to reset their passwords in the new pool?

Looks really interesting and definitely does things different than “run terraform with a different region”.


Unfortunately, we aren't able to read users' passwords from Cognito user pools, so they do have to reset them in the recovery environment. We'd love to see an AWS API for exporting user secrets from user pools. We'd add support for that really quick.

The experience is better for user pools that integrate with external identity providers, like SAML, since the IdP metadata can be fully replicated into the other region ahead of time.


That has been a barrier to us switching to Cognito. It'd save us a ton of money. Been asking AWS to support cross region replication for years now.


At this point I regret using it and while I have some limited experience with both Ping and Okta I think I'm ready to move us to Auth0.

AWS makes a simple move between user pools inexplicably difficult. They don't even offer an export and import in the same file format as far as I know (bulk export is all JSON via the CLI and import is CSV, but maybe I missed something).

This is ridiculous in my opinion given that once a pool is configured you cannot change attributes (say, turning on a middle name field).


Have they ever given you a response? Cognito has so much unrealized potential...


Ya but under NDA so can't go into it.


Gotcha. I sure hope that NDA'd response is "we're about to unleash 'Cognito2' that rights all the wrongs of the current Cognito."


Thanks for the answer.

We were just discussing Cognito being a pretty significant single point of failure in our current setup (any vendor would be, though). I wish they would offer replication as the other commenter mentioned.


Suggestion. If you support bringing services up on failure, you _could_ also do that during non emergency circumstances. Migrating between AZs or regions is something I would use.


We have done a couple of cross-region and cross-account migrations in recent months. It's the same technology but with a different business model. Happy to chat with you about it if you're curious.


Do you manage S3 replication as well? What about lambda deployments?


Not yet, but they're on the near-term roadmap.

We've helped customers manually setup bucket replication for S3 when they've needed it. The plan is to eventually automate that, including backfilling existing objects as needed.

WRT lambda, we roughly want to conquer virtual server workloads, then containerized workloads, and then we'll hit serverless workloads. Caveat being that everybody's "customer driven" these days and we can easily shift priorities if people really need lambda support.


Nice! My company is a rather large S3 customer (among other services), and the first thing I thought of was "how much would this balloon our AWS bill, esp. S3". Would be curious to see what the cost (and time) requirements of replicating S3 would be.


FYI -- the "Protect your SaaS" button on the front page points to http://arpio.test:8888/


Oops! Fixed that last night and then overwrote the fix with my next deploy. There's a version control lesson in there for me.

It's fixed now.


Am I crazy for thinking that the solution to AWS outages is not in fact doubling down on your reliance on AWS? No one has mentioned it.


Well, I'd assert that every system fails eventually. And it's a pretty common practice that you deal with these failures by implementing a redundant system that can take over in case of a failure. Think about drives in a drive array, or redundant web servers behind a load balancer. This is just the same concept applied in the large for an entire cloud workload.


How do you handle consistency between nodes within a "cluster"? If I have nodes: A, B, and C and node A goes down triggering a failover to B or C how do you guarantee B & C are up-to-date?


Just to be clear, our solution is about failing over all of your environment to another region. So in our case, A, B, & C would all be coming up together from snapshots in that other region. But there are still consistency concerns as you mentioned.

We do everything we can to snapshot servers in quick succession (as much as we can with AWS), but they won't be fully consistent across nodes. We've found, though, that clustered systems like this have built-in capabilities to deal with these inconsistencies. It's kinda similar to if you lost power to all 3 systems and then brought them back up - they might not all be at the exact same point in time, and the application would need to sort that out.

If you'll tell me what application this is (is it a database platform?) I can do some quick research for you.


How do you ensure that your services will be up in the event of an AWS outage?


Our service runs multi-region active across 3 regions. We can survive a 2 region outage and still operate successfully.


side question :) How do you make those links in the submission field clickable?


What permissions do you ask for?


We need read-access to your production environment to survey your infrastructure. Then, depending on the services you're using, we need the ability to backup data, share it with your recovery environment, and cleanup old backups. We make it easy to set this up by giving you a CloudFormation template that includes the roles, policies, etc.

Some of the actions we need to perform can't be completely locked down via IAM policies. An example is the ModifySnapshot API that is used to share EBS snapshots with other accounts. IAM policies don't allow you to constrain which accounts those snapshots are shared with, and we don't want to be sharing your data with any account that isn't yours.

So, instead of asking for ModifySnapshot permissions directly, we include a Lambda function in the CloudFormation template. This function wraps the ModifySnapshot API, and adds validation of the target account. Our role has Invoke permission for this function, but not of ModifySnapshot, thereby eliminating that exfiltration vector.

In the recovery environment, we need more permission to create/modify/destroy the infrastructure that we are managing for you, but we still take a least-privilege approach. We also include a Lambda function in that CloudFormation template to constrain the potentially dangerous APIs.


If you are failing anything back to my production environment, I assume you'll need permissions for that as well?


Is this the sort of thing where, as soon as you prove that the market exists, Amazon can develop the feature internally and kill your company?


AWS will wait until this company launches, gets meaningful traction, and then at Re:Invent 2022, AWS will launch "AWS Region Replication Service," charge $0.02/GB copied across regions, and within a few years, Arpio will have to pivot to doing the same thing, except "multi-cloud" because that's a market AWS doesn't like to be in.


Spot-on with the multi-cloud story. But we don't plan to wait a few years :-).


I like to think our vision is bigger than that. Amazon offers a lot of building blocks. We're assembling those blocks into a solution so that AWS customers don't have to assemble the blocks themselves. As Amazon changes and enhances those blocks, we'll evolve what it looks like to build on top of them appropriately.

And before long, we envision offering analogous functionality for the other public clouds. We've found that the most valuable customers are using multiple clouds and want a single solution across them. So even if Amazon eventually solves this problem for AWS, the multi-cloud market should offer us plenty of opportunity.


I know this is totally irrelevant to your product, and it actually looks quite cool, but the name is one letter away from Arpaio, the name of arguably one of the most racist and anti-Latino "law enforcement" officials in the US. Might be worth thinking about.


We're aware of Joe, but we thought people might rather imagine ancient mythological creatures or recovery point objectives instead. :)


I immediately parsed the name as "ARP I/O". Then wondered if you were IPv4 specific.


I encourage you to strongly reconsider, this morning when I saw your post, first and only thought that went through my head was "Oh this is about Joe Arpio". And it seemed like that connection was intentional - "protect your business" like a Sheriff would. I made connection with "RPO" after reading your post.

Joe Arpio is a convicted felon[0] for contempt of court and treating humans in a sub-human way[1].

I strongly urge you to reconsider this.

[0] https://www.phoenixnewtimes.com/news/sheriff-joe-arpaio-gets...

[1] https://archive.thinkprogress.org/twenty-years-of-in-tents-t...


We're definitely not fans of Joe, and were a little concerned about the association early on, but we've talked with hundreds of potential customers since we started and it hasn't come up yet. A lot of them get the RPO pun quickly, though!


Okay, thanks for the feedback, we'll definitely spend some time talking about it.


Thank you for your response Doug. There is a chance I'm an outlier who has read too much about Joe Arpio.

And apologies for not saying this before: heartiest congratulations for your launch, I wish you nothing but a resounding success, whatever the name may be.

As our friends on WSB say - rocket emoji and moon emoji!


That is literally all I could think of when I read the name. If you are an outlier, then I guess I am, too. My immediate, uncensored first thought after reading the title of the post was "What asshole names their business after that guy?". It was kind of hard to even parse the rest of the post after that.


Thanks, we really do appreciate the feedback. We're gonna run some tests on this.


FYI It's spelled "Joe Arpaio" - Actually not the same name as the product here..


Thanks, we really appreciate the support.


And my sincere apologies for raining on your parade. Congrats again.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: