Ask HN: Is S3 down?

boulos · on Feb 28, 2017

Disclosure: I work on Google Cloud.

Apologies if you find this to be in poor taste, but GCS directly supports the S3 XML API (including v4):

https://cloud.google.com/storage/docs/interoperability

and has easy to use multi-regional support at a fraction of the cost of what it would take on AWS. I directly point my NAS box at home to GCS instead of S3 (sadly having to modify the little PHP client code to point it to storage.googleapis.com), and it works like a charm. Resumable uploads work differently between us, but honestly since we let you do up to 5TB per object, I haven't needed to bother yet.

Again, Disclosure: I work on Google Cloud (and we've had our own outages!).

NiekvdMaas · on March 1, 2017

Apologies if this is too much off-topic, but I want to share an anecdote of some some serious problems we had with GCS and why I'd be careful to trust them with critical services:

Our production Cloud SQL started throwing errors that we could not write anything to the database. We have Gold support, so quickly created a ticket. While there was a quick reply, it took a total of 21+ hours of downtime to get the issue fixed. During the downtime, there is nothing you can do to speed this up - you're waiting helplessly. Because Cloud SQL is a hosted service, you can not connect to a shell or access any filesystem data directly - there is nothing you can do, other than wait for the Google engineers to resolve the problem.

When the Cloud SQL instance was up&running again, support confirmed that there is nothing you can do to prevent a filesystem crash, it "just happens". The workaround they offered is to have a failover set up, so it can take over in case of downtime. The worst part is that GCS refused to offer credit, as according to their SLA this is not considered downtime. The SLA [1] states: "with respect to Google Cloud SQL Second Generation: all connection requests to a Multi-zone Instance fail" - so as long as the SQL instance accepts incoming connections, there is no downtime. Your data can get lost, your database can be unusable, your whole system might be down: according to Google, this is no downtime.

TL;DR: make sure to check the SLA before moving critical stuff to GCS.

[1]: https://cloud.google.com/sql/sla

fidget · on March 1, 2017

The GCS being referred to by the GP is Google Cloud Storage, not Cloud Sequel. You really do need failover set up though. That's true for basically any MySQL installation, managed or not.

adwf · on March 1, 2017

That isn't just a Google issue though. You'd have had the exact same trouble with AWS/RDS if you're running with no replica. The lack of filesystem access is a security "feature" for both. If you have no HA setup then you have no recourse but to restore to a new server from backup, or wait for your cloud provider to fix it.

avereveard · on March 1, 2017

RDS has snapshot backups you can create an instance from iirc so you can self fix this kind of issues.

Sure you get downtime all the same but not the waiting for support to solve an instance crash part

unclebucknasty · on March 1, 2017

Yes, and RDS offers point in time recovery at that.

We've had to use it and can confirm that it works as advertised.

lbill · on March 1, 2017

Not using a failover is a bold choice (not stupid, just bold). A failover is like a good insurance policy: you pay for it, you hope that you'll never need it, but when shit happens you are very happy to have it!

TekMol · on March 1, 2017

21 hours sounds pretty long to me. What type of data was it and how long would you have waited until you continued with a backup of the data on a different machine?

NiekvdMaas · on March 1, 2017

We were definitely prepared to recover from a backup, but the support team told us: "the issue with the file system will likely persevere over a backup/restore". So this, in combination with the data loss you have when recovering from a backup, means we basically had no choice other than to wait till the issue was resolved.

JPKab · on Feb 28, 2017

I've used both Google Cloud and AWS, and as of a year or so ago, I'm a Google Cloud convert. (Before that, you guys didn't at all have your shit together when it came to customer support)

It's not in bad taste, despite other comments saying otherwise. We need to recognize that competition is good, and Amazon isn't the answer to everything.

eknkc · on Feb 28, 2017

We were on GCP for around a year, it was my decision I really wanted to love GCP and I initially did. But we recently switched to AWS.

I think there is little GCP does better than AWS. Pricing is better on paper, but performance per buck seems to be on par. Stability is a lot worse on GCP, and I don't just mean service outages like this one (which they had their fair share) but also individual issues like instances slowing down or network acting up randomly. Also lack of service offerings like no PostgreSQL, functions never leaving alpha, no hosted redis clusters etc... Support is also too expensive compared to AWS.

Management interfaces are better on GCP and sustained use discount is a big step up against AWS reservations. Otherwise, I think AWS works better.

aardshark · on March 1, 2017

I haven't used AWS, but my experience with AppEngine and by extension GCP is similar.

Just last week I got an email saying that they'd discovered an issue on Google Cloud Datastore where certain (strongly consistent!) queries could have been returning incorrect results for a week long period and that I should check my logs to see if anything important had been affected in my application.

That's not the sort of behaviour that inspires confidence in a service.

fidget · on Feb 28, 2017

Functions are going beta this week

Moru · on Feb 28, 2017

And discontinued next year?

benley · on March 1, 2017

The standard "lol google will kill it in 6 months anyway" troll doesn't really apply to Google Cloud services. They know better than to be fickle with infrastructure offerings.

themihai · on March 1, 2017

Are you sure? Have you used appengine or any of their cloud libraries? Just found out that some of the services I wrote 3 years ago don't work anymore due various breaking changes. It also very much applies to cloud services themselves! What happened to the email service(appengine powered)?I'm telling you: it no longer exists! Compare that with AWS SES which gets better and better. I could go on and on all day long. Google cloud is nice on paper but fails in practice. If you consider the lock-in it is not worth it on paper either

davidjnelson · on March 1, 2017

That happened to me and my sites were down for four months. I lost 15 years of seo, and $30/mo revenue... They just shut down the Python app engine sites with no notice. I'll never use google cloud.

scrollaway · on March 1, 2017

What? That sounds like nonsense. Some sources please.

davidjnelson · on March 9, 2017

I wish it were. It was an enormous amount of effort to get my pagerank that high. I suppose I could post my google analytics before and after, but that's not really data I share with the public.

scrollaway · on March 9, 2017

I'm not doubting your site was "down for four months". I'm doubting the part where you said "They just shut down the Python app engine sites with no notice".

Most notably, I know many people who run these types of sites and outside of GAE being mediocre, I've never heard them complain about anything like that.

davidjnelson · on March 13, 2017

This was the very first version of app engine that rolled out in 2008 or so. It showed some type of "incompatible version" notice in the admin ui when I noticed it, and when I tried to redeploy the sites using the command line deploy tool. I switched everything to s3.

benley · on March 1, 2017

How much advance notice did they give when they were shutting down that email service? I bet it was like eight months or more. Things change, you have to deal with it sometimes. Seems kinda normal.

themihai · on March 2, 2017

People don't like to rewrite infrastructure code just b/c the provider decided it's not worth it anymore. When you sign up you consider the whole ecosystem not individual services. The cloud platform is marketed as reliable and rock solid that you can trust. It may be the case with AWS but on Google you should expect experimental, cheap and a high risk to get broken or even deprecated all together. It behaves like a start-up with customers paying for experiments. It's OK for some use cases but you should be aware of that.

Moru · on March 1, 2017

No trolling, just tired of setting up things that just stop working and forcing me to work on a fix. I dont work for a big company with a dev team of 20 people, its just me and customer support. Im close to a burn out as it is, I dont need help with it.

foxylad · on March 1, 2017

I've used Appengine since 2009. Early on they deprecated the original master-slave datastore, but apart from that I've had zero refactoring around their services.

Other services are a different story - from my perspective Google are better at supporting legacy interfaces than most.

iampims · on March 1, 2017

They deprecated their alpha search api 2 years in iirc

fidget · on March 1, 2017

They have a minimum of 1 year deprecation policy in their terms of service.

ChristianBach · on March 1, 2017

I hear you, discontinuing products that you're dependent on is painful, but discontinuing services that you built your infrastructure around is an outright killer.

beambot · on March 1, 2017

Counterpoint from an email I received just last week (Feb 21st):

> We are writing to inform you that we are winding down sales and renewals of Google Site Search (GSS). Starting April 1st, 2017, new purchases and renewals of GSS will not be available.

kyrra · on March 1, 2017

GSS isn't under Cloud and doesn't have the same deprecation policy. Cloud explicitly states that it has a minimum 1-year turndown on any feature they disable.

beambot · on March 1, 2017

> They know better than to be fickle with infrastructure offerings

Site Search seems like an infra offering to me.

inimino · on March 1, 2017

Cloud is still competing with on premises solutions, right? One year is nothing, try ten or twenty.

newjersey · on March 1, 2017

> Cloud is still competing with on premises solutions, right? One year is nothing, try ten or twenty.

Not an expert by any means but I would put more weight to Google's ONE year promise over (to give an example) HPE's twenty years promise. I know it is a cheap shot because I am pretty sure HPE will be bought and sold at least once in the next twenty years.

xchaotic · on March 1, 2017

FWIW I am now dealing with a system that is supported by HPE for over 10 years now. Even if they get bought out, someone will inherit those support obligations. I am also in the camp of not trusting Google with anything.

epalmer · on March 1, 2017

GSS shutdown gives us a year also. I just migrated to GSS a year ago. And they decommissioned they big appliance Google Enterprise Serarch I think it was called.

We were users of the Google Mini Search appliance, went to a 3rd party in-house installed search solution that we did not like and then a year ago went to GSS. We are looking again for something suitable. The best part of the Google Site Search was search fidelity.

allizad · on March 1, 2017

Try open-source search solutions? :)

ExactoKnight · on March 1, 2017

They did it to their Google Analytics API. We spent four months building a dashboard off of it then Google fucking deprecated it. Thanks Google.

anotherturn · on March 1, 2017

Not entirely true. App engine depreciations happen all the time and they give about 1 years notice. Most recently the channel api, before that prospective search, backends, etc etc

benley · on March 1, 2017

"oh no I only have _an entire year_ to deal with an API deprecation what ever will I do?" :-P

bartread · on March 1, 2017

Well, if it's something you've built your entire offering around it could be a simple fix, or it could be months of work, and that will vary by project. Bear in mind that this is completely non-value-adding work that you didn't plan on just to bring your project back to a functioning state.

I.e., some douchebag who has no interest or stake in what you do has just dumped a potentially substantial amount of technical debt into your product backlog and, quite possibly, prioritised it all the way to the top.

As somebody else noted above: I don't need people creating more work from me. I can do that quite well enough on my own, thanks very much, and for side-projects this kind of chopping and changing is a pain in the ass.

By definition, with side-projects time is limited, so you absolutely have to focus on the most valuable activities to the exclusion of all else. For this reason, I only consider AWS and Azure for my projects: Google are just too fickle. Lucky you, if you have the time to deal with their nonsense.

(Btw, I'm not dissing Google on a technical level - they obviously do great, interesting work, and they're certainly one of the pioneers of PaaS. I just don't need the hassle of having to fix stuff because they keep killing APIs, projects, services.)

anotherturn · on March 1, 2017

Yes exactly. The app specifically affected by Channels API depreciation is a side project that serves a few thousand people. It marches along perfectly well, and I pay Google money for it each month - though the project itself makes no money. Now, I need to consider whether the shift from Channel API to Firebase (and the few days work it'll take to do) is worth the investment, or if I should just shut it down.

imron · on March 1, 2017

> doesn't really apply to Google Cloud services

It might not, but doing it so much for other services destroys trust across the entire brand.

dstroot · on March 1, 2017

I, for one, would love to see Redis and Postgres on GCP. That would be enough to get me to switch I think.

doubleorseven · on March 1, 2017

Wait for the NEXT event. Radis will be announced.

espeed · on Feb 28, 2017

Me too. We switched to Google Cloud years ago at its inception and have never looked back -- always viewed it as a competitive advantage due to its solid, more advanced infrastructure -- faster network, reliable disks, cleaner UI that's easier to manage. Just a cleaner operation all the way around.

snackai · on March 1, 2017

What indeed is bad taste is your choice of Google Cloud over AWS. No I really like GCP, use it at core of many apps, but if people really want a decentralized web we need to use more than one provider. Don't "convert". Use booth, redundancy ffs.

kiallmacinnes · on March 1, 2017

Pity I can't upvote more than once! :)

This whole idea of being angry at a vendor for deprecating something with 1yr notice is just ridiculous!

People need to realize they are choosing lock-in, and are choosing the risk of deprecation every time they decide to use a cloud service with no drop in competition/open source/etc.

Own your choices people, don't blame others...

unclebucknasty · on March 1, 2017

Sounds great on paper, but this is infrastructure level stuff with real world constraints.

The expectation of stability beyond a year is certainly not unreasonable when you're asking people to build their businesses/infrastructure on your platform.

And, building redundancy across providers can be impractical, owed to learning curve, cost duplication, higher outbound bandwidth costs, effort duplication, solution complexity, etc.

adjkant · on March 1, 2017

What's the point of decentralizing by putting 50% on one, 50% on the other, and no overlap of the groups? You used the word redundancy, but who is willing to actually do that true redundancy?

advisedwang · on Feb 28, 2017

I work in GCP support. I'm really curious: what do you feel changed that led to such improved support? I'd like to make sure we keep doing it.

anotherturn · on March 1, 2017

Chiming in as I noticed the change too. For a long time it was almost impossible to speak with a human - every query was directed to the extensive but often useless support pages. If a human did respond it often seemed like they weren't savvy enough to handle a microwave let alone solve infra issues.

Then, about a year or two ago - humans actually started responding to and fixing problems. A welcome change!

advisedwang · on March 1, 2017

Do you have one of the paid support packages, or is this your experience of our google groups/stack overflow etc?

vacri · on March 1, 2017

My experience of support with Google Apps for Business makes me very wary of using anything Google for critical business infra. Google products are nice, but as soon as you hit a problem or edge case, you're on your own in my experience.

keithnoizu · on March 1, 2017

This.

I used to work on the Azure Portal Team. As much negative things as I can say about Microsoft, they take making things just work for developers seriously, despite high prices and misc. service issues.

The since nixed compute container project I initially worked on really exemplified this.

I tend to use Colo or AWS when possible but I have a client that insisted on Google GCE and Endpoints.

I've spent so much time time digging through source code and working around broken dev tooling, and dealing with incorrect or out of date documentation thanks to that requirement.

In my personal opinion Google has a way to go in mature tooling. Silent failures, or worse failures that don't result in build failures are not acceptable. Requiring paid support contracts to resolve an issue in google infra is not acceptable. Incredibly poor support for local dev environments is not acceptable.

After dealing with this stuff, I find it unlikely that I will ever rely on their systems in the future. AWS/Colo or, with reservations, Azure all the way.

spuiszis · on March 1, 2017

Wish I could +1 this more. Any time I get some error, I spend hours sifting through old documentation and forum posts.

DerpyNirvash · on March 1, 2017

Why not just open a support ticket?

foxylad · on March 1, 2017

Exactly. Because they've usually only experienced support for Google's free services, people assume all Google support is minimal - but it isn't. We pay $150 a month for silver support, and in the extremely rare (several years apart) case we need help, we get it.

vacri · on March 1, 2017

Google Apps for Business is not free.

foxylad · on March 1, 2017

True, it costs $5 a month - nearly free. It comes down to the eternal truth that there is no such thing as a free lunch, and expecting a $100/hour support person to be at your beck and call for $5 a month isn't realistic.

vacri · on March 1, 2017

In my GApps support experience, it's slow and inexperienced.

dbg31415 · on March 1, 2017

Spot on.

And good luck getting accurate documentation.

ehsankia · on March 1, 2017

Honestly, if you're a big service that millions of people use, you should not put all your eggs in a single basket and should probably use a mix, in case one of the clouds goes down like in this case.

Svenskunganka · on March 1, 2017

One of the biggest reasons you go for a cloud is because you don't want to deal with reliability & scaling issues, and there's a premium price attached to that. I think most companies using S3 in this case believed they put their eggs in different baskets when they put their data in there.

kiallmacinnes · on March 1, 2017

I can't believe anyone would have thought a dependency on a single AWS region, or even single service provider, would count as having eggs in different baskets, at least, I really hope nobody could think that!

I suspect though that most people affected deemed the risks and costs of failure low enough to be acceptable, and for many people it still is - even with this outage. But that's a conscious decision, rather than plain ignorance.

Salgat · on March 1, 2017

That depends if you're willing to pay for the cost of hosting all your content twice and the development overhead of managing that. Twice the persistences means twice the chance of an issue occurring.

nashadelic · on March 1, 2017

That's where tech like kubernetes help in making your app/service portable. Or having common APIs like between s3 and google cloud storage.

Twice the persistence means always having at least one backup and thus the occurance of downtime reduces not up

_wldu · on March 1, 2017

With containers, I think the devops overhead would be minimal.

tejasmanohar · on March 1, 2017

That's if _everything_ is in containers. Also, don't undermine how much of a difference the host machine configuration can make... Docker uses its kernel.

hkmurakami · on March 1, 2017

>(Before that, you guys didn't at all have your shit together when it came to customer support)

Sounds like it basically coincides with Diane Greene coming on board to run the show -- which is great news for all of us with increased competition on not just the technical front but also support (which is often the deal maker/breaker)

7ewis · on March 1, 2017

Is Diane really that good?

I was at a talk last year, where she spoke, and as much as I love Google, it was one of the boat boring talks I've ever heard in my life. So monotone and uninteresting... and I'm probably one of the biggest Google fans out there.

contingencies · on March 1, 2017

I have no idea about the person in question, but stable and reliable infrastructure can be really boring. Unfortunately, it's also necessary.

hkmurakami · on March 1, 2017

I'm not familiar with her public speaking. But you want someone decidedly un-Google-like to run an enterprise software (non-engineering) operation.

Look at Safra Catz's public speaking (Oracle). Terrible public speaker, terrific operator [1].

[1] though we may easily disagree with their business practices.

jamesblonde · on March 1, 2017

I just wrote a piece reflecting on the s3 outage and the limitations of s3 metadata/replication:

https://medium.com/@jim_dowling/reflections-on-s3s-architect...

jamesblonde · on March 1, 2017

Or discuss it here: https://news.ycombinator.com/item?id=13760251

themihai · on March 1, 2017

GCP has always felt like a forever beta product. On top of that you get a lot of lockin so I would never recommend GCP for a long term project.

twakefield · on Feb 28, 2017

The brilliance of open sourcing Borg (aka Kubernetes) is evident in times like these. We[0] are seeing more and more SaaS companies abstract away their dependencies on AWS or any particular cloud provider with Kubernetes.

Managing stateful services is still difficult but we are starting to see paths forward [1] and the community's velocity is remarkable.

K8s seems to be the wolf in sheep's clothing that will break AWS' virtual monopoly on IaaS.

[0] We (gravitational.com) help companies go "multi-region" or on-prem using Kubernetes as a portable run-time.

[1] Some interesting projects from this comment (https://news.ycombinator.com/item?id=13738916)

* Postgres automation for Kubernetes deployments https://github.com/sorintlab/stolon

* Automation for operating the Etcd cluster:https://github.com/coreos/etcd-operator

* Kubernetes-native deployment of Ceph: https://rook.io/

dankohn1 · on Feb 28, 2017

Note that Kubernetes "builds upon 15 years of experience of running production workloads [on Borg] at Google" [0], but is different code than Borg.

In addition to Rook, Minio [1] is also working to build an S3 alternative on top of Kubernetes, and the CNCF Landscape is a good way of tracking projects in the space [2].

[0] https://kubernetes.io/ [1] https://www.minio.io/ [2] https://github.com/cncf/landscape

Disclosure: I'm the executive director of CNCF, which hosts Kubernetes, and co-author of the landscape.

twakefield · on March 1, 2017

Yes, I was admittedly over generalizing with my statement regarding open sourcing Borg.

jsmthrowaway · on March 1, 2017

Well, you're in the ballpark. I might be wrong, but I've heard they're not averse to the idea of open sourcing Borg and Omega (it wasn't that long ago that the Borg paper would have been nigh unthinkable, interestingly), but the litany of Google specific stuff that is baked in makes refactoring for public release a nonstarter. It's a huge codebase with lots of little tendrils to other internal infrastructure.

Anyway, one needs an on-ramp to containers on Google Cloud. And one can't open source the one that one has, which despite being nearly mature enough to own a driver's license, wouldn't really fulfill the precise need that Kubernetes fills without some frontend work. So one writes Kubernetes. An almost entirely different fundamental architecture, by the way, so it's interesting for those who've seen both to compare.

In other words, you're not entirely off the mark even with the generalization.

star-trek-fleet · on March 1, 2017

K8s is a better borg! It leaps forward and build upon many years experience of operating the system.

013a · on March 1, 2017

Is there any way built in to Kubernetes to go multi-AZ, multi-region, or even multi-cloud? Is federation the answer to this?

I remember reading somewhere in the K8s documentation that it is designed such that nodes in a single cluster should be as close as possible, like in the same AZ.

qj_li · on March 1, 2017

Yes, see the blog http://blog.kubernetes.io/2016/07/cross-cluster-services.htm...

blantonl · on Feb 28, 2017

I have a component in my business that writes about 9 million objects a month to Amazon S3. But, to leverage efficiencies in dropping storage costs for those objects I created an identical archiving architecture on Google Cloud.

It took me about 15 minutes to spin up the instances on Google Cloud that archive these objects and upload them to Google Storage. While we didn't have access to any of our existing uploaded objects on S3 during the outage, I was able to mitigate not having the ability to store any future ongoing objects. (our workload is much more geared towards being very very write heavy for these objects)

It it turns out this cost leveraging architecture works quite well as a disaster recovery architecture.

sachinag · on Feb 28, 2017

Opportunistic, sure. But I did not know about the API interoperability. Given the prices, makes sense to store stuff in both places in case one goes down.

khc · on Feb 28, 2017

I am surprised more people don't know about it. I get questions like https://github.com/kahing/goofys/issues/158 every now and then and to be fair I don't think they market it well: https://cloud.google.com/storage/docs/migrating

Disclosure: I don't work for google but have an upcoming interview there.

devmunchies · on Feb 28, 2017

"Disclosure: I don't work for google but have an upcoming interview there."

Disclosure: I took a tour there one time and have used google.

EDIT: I realized that I was being mean, but why was that disclaimer relevant?

Nexxxeh · on Feb 28, 2017

A few possible reasons, the most obvious being grandparent is disclosing a possible source of bias.

Also it could look suspicious if grandparent gets the job and at some point in the future someone looks back at this comment.

If in doubt, disclose. Especially in the tech industry, that's what Gamergate was actually about.

timv · on Feb 28, 2017

> I realized that I was being mean, but why was that disclaimer relevant?

Because:

- transparency is always good

- adding a small disclosure to the bottom of a post is very low impact

- someone who is interviewing for a job at a company is likely to have a set of biases that influence what they say even if they think that they're being honest and objective.

eric_h · on Feb 28, 2017

I think it's a fair disclosure of potential bias.

mbrookes · on Feb 28, 2017

Frankly, if you don't know the difference between a disclosure and a disclaimer, you shouldn't be commenting.

nodesocket · on Feb 28, 2017

Not poor taste at all. Love GCP. I actually host two corporate static sites using Google Cloud Storage and it is fantastic. I just wish there was a bucket wide setting to adjust the cache-control setting. Currently it defaults to 1 hour, and if you want to change it, you have to use the API/CLI and provide a custom cache control value each upload. I'd love to see a default cache-control setting in the web UI applying to the entire bucket.

I also want to personally thank Solomon (@boulos) for hooking me up with a Google Cloud NEXT conference pass. He is awesome!

dward · on March 1, 2017

Out of curiosity, are you also using the cloud CDN?

https://cloud.google.com/compute/docs/load-balancing/http/us...

nodesocket · on March 1, 2017

I found Google Cloud CDN a little overly complicated to get setup since you need to use load balancers.

I use CloudFlare. They handle generating a SSL certificate, can have a CNAME at the APEX, full-site static caching, 301 http => https redirects, etc.

7ewis · on March 1, 2017

How did you get the pass?

Been trying to get one for IO (can't attend NEXT unfortunately)

i336_ · on Feb 28, 2017

Hopefully you're still there even though S3 is back up. I have an interesting question I really, really hope you can answer. (Potential customer(s) here!!)

There are a large number of people out there looking intently at ACD's "unlimited for $60/yr" and wondering what that really means.

I recently found https://redd.it/5s7q04 which links to https://i.imgur.com/kiI4kmp.png (small screenshot) showing a user hit 1PB (!!) on ACD (1 month ago). If I understand correctly, the (throwaway) data in question was slowly being uploaded as a capacity test. This has surprised a lot of people, and I've been seriously considering ACD as a result.

On the way to finding the above thread I also just discovered https://redd.it/5vdvnp, which details how Amazon doesn't publish transfer thresholds, their "please stop doing what you're doing" support emails are frighteningly vague, and how a user became unable to download their uploaded data because they didn't know what speed/time ratios to use. This sort of thing has happened heaps of times.

I also know a small group of Internet archivists that feed data to Archive.org. If I understand correctly, they snap up disk deals wherever they can find them, besides using LTO4 tapes, the disks attached to VPS instances, and a few ACD and GDrive accounts for interstitial storage and crawl processing, which everyone is afraid to push too hard so they don't break. One person mentioned that someone they knew hit a brick wall after exactly 100TB uploaded - ACD simply would not let this person upload any more. (I wonder if their upload speed made them hit this limit.) The archive group also let me know that ACD was better at storing lots of data, while GDrive was better at smaller amounts of data being shared a lot.

So, I'm curious. Bandwidth and storage are certainly finite resources, I'll readily acknowledge that. GDrive is obviously going to have data-vs-time transfer thresholds and upper storage limits. However, GSuite's $10/month "unlimited storage" is a very interesting alternative to ACD (even at twice the cost) if some awareness of the transfer thresholds was available. I'm very curious what insight you can provide here!

The ability to create share links for any file is also pretty cool.

ptrptr · on Feb 28, 2017

Now that's what I call a shameless plug!

scrollaway · on Feb 28, 2017

We would definitely seriously consider switching to GCS more if your cloud functions were as powerful as AWS Lambda (trigger from an S3 event) and supported Python 3.6 with serious control over the environment.

boulos · on Feb 28, 2017

Is there something about the GCS trigger that doesn't work for you? I hear you on Python 3, but I'm also curious about "serious control over the environment". Can you be more specific?

scrollaway · on Feb 28, 2017

Here are our main issues with Lambda, from highest-to-lowest priority:

- It supports Python 2.7 only. We need Python 3.4+ support.

- We can't increase CPU allocation without increasing RAM allocation, making them far more expensive than we need.

- Using psycopg2 on it is a PITA due to their handling of system dependencies.

- The system is entirely proprietary, making it impossible to run it locally for testing.

- Cloudwatch sucks for finding errors in the functions and is atrociously expensive.

- API gateway is an extremely crufty system, and used not to let you pass around binary data (this has changed)

- We can't disable/change the retry-on-error policy.

We have a pretty hard tie-in to S3 and Redshift, but when GCF can do better on a majority of these points, we'll begin moving to it. But yes, Python 3 at a minimum would be a requirement.

mypalmike · on March 1, 2017

> The system is entirely proprietary, making it impossible to run it locally for testing.

I assume that you are referring to emulating the triggering of lambdas behind API gateway...? I've found a project that sets up a node environment to do this. Very handy for js/lambda development. A google search suggests similar options may exist for python.

vikiomega9 · on Feb 28, 2017

On a curious note, how do you guys use lambda?

scrollaway · on Feb 28, 2017

It's a little outdated now, but this post details our pipeline: https://hearthsim.info/blog/how-we-process-replays/

RulerOf · on March 1, 2017

As someone who's literally just starting to look at Lambda, thanks for that quick read.

I had a lot of "chicken and egg"-type questions about using it, and seeing that critical step of bootstrapping the whole thing via the API Gateway was really informative.

simonebrunozzi · on Feb 28, 2017

I keep telling people that in my view, Google Cloud is far superior to AWS from a technical standpoint. Most people don't believe me... Yet. I guess it will change soon.

natbobc · on Feb 28, 2017

Google Cloud is the Betamax of cloud... while it might be technically superior it's not the only factor to consider. :)

boulos · on Feb 28, 2017

Aww... that seems a little early to call ;).

packetslave · on March 1, 2017

you don't comment for 4 years and THAT'S the comment you choose to return with?

natbobc · on March 1, 2017

Yep, replace "compiling" with "S3 recovery" in the following XKCD - https://xkcd.com/303/

joshontheweb · on Feb 28, 2017

What other factors make it doomed for failure like betamax?

natbobc · on March 1, 2017

I wouldn't say that it's doomed to failure but I do think it has a lot of ground to cover to catch-up. Google has a lot of great technology like TensorFlow, Kubernetes, and Go that will keep them relevant.

In support of my flippant remark I see three indicators that hold parallels to Betamax with detail to follow. I qualify that it is largely informed by my own anecdotal experience. Specifically by objections and responses that I've received/observed while myself and peers have proposed or implemented cloud adoption at various companies.

Indicators

1. market share. 2. proprietary tech stack. 3. technical superiority syndrome.

Detail

1. Currently AWS has a major lead, then Azure, then Google. The implication is that market share translates to mindshare, which in turn yields blog articles, OSS libraries/tools, etc. This becomes a virtuous cycle.

For .NET shops that marketshare will tend to favour Azure on the premise that MS knows best.

2. Some of Google's technology stack has a learning curve that is unique to Google. Take GAE as an example and compare to AWS's nearest equivalent Beanstalk (or Heroku). Beanstalk requires few if any changes to an existing application whereas GAE requires that you do it the App Engine way. It might provide a number of benefits, but it's invasive. Containers are shifting the requirement, however not everyone is in a position or has the desire to start with containers on day 1.

Further Google Cloud's project oriented approach while not a bad organisation mechanism detracts from learning. If you assume the premise that exploration is part of learning it forces the user to hold two items in their head: their objective and Google Clouds imposed objective.

AWS on the other hand generally provides defaults that allow you to launch resources almost immediately after sign-up. Google's approach is better for long-term support, maintenance and organisation but the user needs to have the maturity to understand that benefit.

3. It may be technically superior but that statement in of itself is divisive and can shudder some away. It is not enough to simply be technically superior and from my observation the statements tend to originate from X/Googlers.

A number of people will latch onto feature set (for beta, number of films available was a factor). The absence of features will often discount a choice out of the gate (even if those features are irrelevant) as an example:

- regional coverage: AWS - 15 regions/~38 zones Azure - 36 regions/zones Google - 6 regions/18 zones

- partially/fully managed services: AWS is continually growing these, at a level that seems to outpace competitors.

- Outwardly Google appears to tackle the "hard problems" with technically superior solutions (e.g. TensorFlow, BigQuery) but often appears to neglect the "boring" problems a number of companies want as well (e.g. Cloud VDI's, SnowBall, etc).

- Some areas seem to be ossified due to tight coupling (e.g. servlet 3.0 and python support in GAE).

Summary

There is no silver bullet solution. Every provider will have an outage at some point and this could be a big reason that GCE won't be knocked out of the game. I also think Google is working really hard to build community and mindshare. I don't have a crystal ball so only time will tell what happens but technical superiority has rarely been the sole reason that drives adoption.

joshontheweb · on March 2, 2017

I appreciate you taking the time to explain. I'm in the process of making decisions on a new cloud storage provider so this is helpful.

notyourwork · on March 1, 2017

One service outage determines superiority? I prefer a lot more data than a single point.

joshontheweb · on Feb 28, 2017

I'm in the process of moving to GCS mostly based on how byzantine the AWS setup is. All kinds of crazy unintuitive configurations and permissions. In short, AWS makes me feel stupid.

joshontheweb · on March 3, 2017

I should add that someone from the AWS team reached out to me in response to this comment asking for feedback on how they can improve their usability. So I give them credit for that.

andmarios · on Feb 28, 2017

As far as I understand the S3 API of Cloud Storage is meant as a temporary solution until a proper migration to Google's APIs.

The S3 keys it produces are tied to your developer account. This means that if someone gets the keys from your NAS, he will have access to all the Cloud Storage buckets you have access to (e.g your employer's).

I use Google Cloud but not Amazon. Once I wanted a S3 bucket to try with NextCloud (then OwnCloud). I was really frightened to produce a S3 key with my google developer account.

BrandonY · on March 1, 2017

The HMAC credential that you'd use with the S3-compatible GCS API, also called the "XML API", does need to be associated with a Google account, but it doesn't need to be the main account of the developer. It can be any Google user account. I suggest creating a separate account and granting it only the permissions it needs. It'd be nice if service accounts (aka robot accounts) could be given HMAC credentials, that's not supported. Service accounts can, however, sign URLs with RSA keys.

As another option, you can continue using the XML API and switch out only the auth piece to Google's OAuth system while changing nothing else.

There's a lot more detail available at: https://cloud.google.com/storage/docs/migrating

Disclaimer: I work on Google Cloud Storage.

andmarios · on March 1, 2017

Thanks for the advice. I think it would be even nicer if the HMAC credentials could be assigned to a specific bucket via an ACL.

I like GCS (and the gsutil tool) but occasionally a S3 style bucket is needed. For example you need a S3 bucket or a webdav server in order to send alerts with images from Grafana to Slack. A minor issue but nice to have if possible without having to deal with Amazon's control panel.

dividuum · on March 1, 2017

Is there any equivalent to the Bucket Policies that AWS provides (http://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucke...). Cloud Storage seems to be limited to relatively simple policies without conditionals. For a few AWS IAM keys I set up a policy that limits write/delete access to a range of IPs (among other things). Something like that doesn't seem possible with what Google offers. Or do I miss something?

andmarios · on March 1, 2017

I am not familiar with AWS bucket policies, but AFAIK there isn't a way to set IP based access to GCS buckets.

To be honest, I do find the GCS permissions a bit complex. You have IAM, you have ACLs and you have S3 keys. Everything is set in a different place and ACLs aren't fully represented on the developers console. S3 keys give full access to everything, IAM service accounts give access per project and ACLs are fine grained (per bucket/object). On the other hand, IIRC, IAM has a write only setting, while ACLs do not. So I can have an account that can write only to all the buckets of my project but not an ACL (not that useful).

stef25 · on March 1, 2017

> OwnCloud

Kicked the tires, not impressed at all. Notes went missing from the interface could only get them back after manually digging through folders via FTP.

rynop · on Feb 28, 2017

"fraction of the cost" - how do you figure? Or are you just saying from a cost-to-store perspective?

Your Egress prices are quite a bit more compared to CloudFront for sub 10TB (.12/GB vs .085/GB).

The track record of s3 outages vs time your up and sending Egress seems like S3 wins in cost. If all your worried about is cross region data storage, your probably a big player and have AWS enterprise agreement in place which offsets the cost of storage.

boulos · on Feb 28, 2017

Sorry, my comparison is our Multi Regional storage (2.6c/GB/month) versus S3 Standard plus Cross-Regional Replication. That's the right comparison (especially for outages like this one).

As to our network pricing, we have a drastically different backbone (we feel its superior, so we charge more). But as you mention CloudFront, the right comparison is probably Google Cloud CDN (https://cloud.google.com/cdn/) which has lower pricing than "raw egress".

Spunkie · on March 1, 2017

So this is more compute related but do you know if there are any plans on supporting the equivalent of the webpagetest.org(WPT) private instance AMI on your platform?

Not only is webpagetest.org a google product but it's also much better suited for the minute by minute billing cycle of google cloud compute. For any team not needing to run hundreds of tests an hour the cost difference between running a WPT private instance on EC2 versus on google cloud compute could easily be in the thousands of dollars.

malloryerik · on March 1, 2017

Would use Google but I just can't give up access to China. Sad because I also sympathize with Google's position on China.

zoloateff · on Feb 28, 2017

boulous not in bad taste at all - happy google convert and gcs user works very well for us ymmv

zoloateff · on March 12, 2017

boulous is app engine datastore the preferred way to store data or cloud sql or something else, do you mind throwing some light on this thanks

DenisM · on March 1, 2017

If you made a .NET library that allows easily connecting to both AWC and GCS by only changing the endpoint I would certainly use that library instead of Amazon's own.

Just saying, it gets you a foot in the door.

danielvf · on Feb 28, 2017

I had no idea this was an option. Great to know!

sandGorgon · on March 1, 2017

i have had problems integrating apache spark using google storage. especially because s3 is directly supported in spark.

if you are api compatible with s3, could you make it easy /possible to work with google storage inside spark?

remember i may or may not run my spark on Dataproc.

bluedonuts · on March 9, 2017

You can use the Google cloud storage connector (https://cloud.google.com/hadoop/google-cloud-storage-connect...) which works with hadoop (and therefore spark).

mbrumlow · on Feb 28, 2017

What is your NAS box doing with S3/GCS ?

boulos · on Feb 28, 2017

Remote backup (Synology). I've asked them more than once to directly support GCS, or even just to accept my damn patch ;).

gr2020 · on Feb 28, 2017

Are you using Hyper Backup? That seems to support S3-compatible destinations, including GCS, at least in DSM 6.1 -

https://www.synology.com/en-us/knowledgebase/DSM/help/HyperB...

gaul · on March 1, 2017

S3 applications can use any object store if they use S3Proxy:

https://github.com/andrewgaul/s3proxy

thejosh · on March 1, 2017

How about giving a timeline of when Australia will be launching? I see you're hiring staff, and have a "sometime 2017" goal on the site, but how about a date estimate? :)

philliphaydon · on March 1, 2017

Does GCS support events yet?

hyperpallium · on March 1, 2017

As Relay's chief competitor in this region, we of Windsong have benefited modestly from the overflow; however, until now we thought it inappropriate to propose a coordinated response to the problem.

espeed · on Feb 28, 2017

What software are you using for your NAS box?

pmarreck · on March 1, 2017

Classy parley. I'll allow it.

masterleep · on Feb 28, 2017

Competition is great for consumers!

cperciva · on Feb 28, 2017

S3 is currently (22:00 UTC) back up.

The timeline, as observed by Tarsnap:

    First InternalError response from S3: 17:37:29
    Last successful request: 17:37:32
    S3 switches from 100% InternalError responses to 503 responses: 17:37:56
    S3 switches from 503 responses back to InternalError responses: 20:34:36
    First successful request: 20:35:50
    Most GET requests succeeding: ~21:03
    Most PUT requests succeeding: ~21:52

josephb · on Feb 28, 2017

Thanks for taking the time to post a timeline from the perspective of an S3 customer. It will be interesting to see how this lines up against other customer timelines, or the AWS RFO.

kaishiro · on March 1, 2017

Playing the role of the front-ender who pretends to be full-stack if the money is right, can someone explain the switch from internal error to 503 and back? Is that just them pulling s3 down while they investigate?

cperciva · on March 1, 2017

My guess based on the behaviour I've seen is that internal nodes were failing, and the 503 responses started because front-end nodes didn't have any back-end nodes which were marked as "not failing and ready for more requests". When Amazon fixed nodes, they would have marked the nodes as "not failed", at which point the front ends would have reverted to "we have nodes we can send traffic to" behaviour.

greenleafjacob · on March 1, 2017

Could be anything. Most likely scenario is the internal error is a load shedding error and the 503s were when the system became completely unresponsive. If it was a configuration issue then it is more likely that it would have directly recovered rather than going 'internal error -> 503 -> internal error'.

hmottestad · on March 1, 2017

503 is typically what we see when our proxy can't connect to the backend server. We usually get 500 with internal server error when we've messed up the backend server.

So it's likely that the first 500s were the backend for s3 failing, then they took the failing backends offline causing the load balancers to throw 503 because they couldn't connect to the backend.

Twirrim · on March 1, 2017

S3 is not a monolithic architecture, Amazon is a strong proponent of Service Oriented Architecture for producing scalable platforms.

There are a number of services behind the front end fleet in S3's architecture that handle different aspects of returning a response. Each of those will have their own code paths in the front end, very likely developed by different engineers over the years. As ever, appropriate status codes for various circumstances are something that always seems to spur debate amongst developers.

The change in status code would likely be a reflection of the various components entering unhealthy & healthy states, triggering different code paths for the front end... which suggests whatever happened might have had quite a broad impact, at least on their synchronous path components.

thenewregiment2 · on Feb 28, 2017

[flagged]

endersshadow · on Feb 28, 2017

Soundcloud recovering from this failure and S3 being operational are two separate issues. We use S3 and it will take us nominally an hour to recover after S3 went up, for example.

S3 has started working as of about 20 minutes ago, and things are running smoothly.

quakeguy · on Feb 28, 2017

Thanks!

jeffasinger · on Feb 28, 2017

There are other Amazon services that were affected. For example, we're still not seeing auto scaling groups working correctly.

ta_wh · on Feb 28, 2017

"[RESOLVED] Increased Error Rates

Update at 2:08 PM PST: As of 1:49 PM PST, we are fully recovered for operations for adding new objects in S3, which was our last operation showing a high error rate. The Amazon S3 service is operating normally."

https://status.aws.amazon.com/

thenewregiment2 · on Feb 28, 2017

oh the famous downvote for the smear campaign. just admit it.

joatmon-snoo · on Feb 28, 2017

You're getting downvotes because you don't understand that B being down is not an effective indicator of the status of A, even if B depends on A.

ta_wh · on Feb 28, 2017

Think you're mistaken, I don't have downvote privileges!

espeed · on Feb 28, 2017

Claiming a statement is false when it's demonstrably true is something that will likely get downvoted every time. It's misleading to others and fills the board with noise.

gamache · on Feb 28, 2017

A piece of hard-earned advice: us-east-1 is the worst place to set up AWS services. You're signing up for the oldest hardware and the most frequent outages.

For legacy customers, it's hard to move regions, but in general, if you have the chance to choose a region other than us-east-1, do that. I had the chance to transition to us-west-2 about 18 months ago and in that time, there have been at least three us-east-1 outages that haven't affected me, counting today's S3 outage.

EDIT: ha, joke's on me. I'm starting to see S3 failures as they affect our CDN. Lovely :/

traskjd · on Feb 28, 2017

Reminds me of an old joke: Why do we host on AWS? Because if it goes down then our customers are so busy worried about themselves being down that they don't even notice that we're down!

nabla9 · on Feb 28, 2017

Reminds me of an even older joke (from 80's or 90's):

Q: Why computers don't crash at the same time?

A: Because network connections are not fast enough.

(I think we are starting to get there)

contingencies · on March 1, 2017

These are both pretty good. Added to color fortune clone https://github.com/globalcitizen/taoup

xbryanx · on Feb 28, 2017

I'm getting the same outage in us-west-2 right now.

firloop · on Feb 28, 2017

The dashboard doesn't load, nor does content using the generic S3 url [1], but we're in us-west-2 and it works fine if you use the region specific URL [2]. In practice this means our site on S3/Cloudfront is unaffected.

[1]: https://s3.amazonaws.com/restocks.io/robots.txt

[2]: https://s3-us-west-2.amazonaws.com/restocks.io/robots.txt

madmod · on Feb 28, 2017

Good catch. My bet is that because s3.amazonaws.com originally referred to the only region (us-east-1) the service that resolves the bucket region automatically is really hosted in us-east-1. I think AWS recommends using the region in the URL for that reason, however that is easier said than done I think. I would bet that a few of Amazon's services use the short version internally and are having issues because of it.

STRML · on Feb 28, 2017

Seeing it in eu-west-1 as well. Even the dashboard won't load. Shame on AWS for still reporting this as up; what use is a Personal Health Dashboard if it's to AWS's advantage not to report issues?

STRML · on Feb 28, 2017

Now it's in the PHD, backdated to 11:37:00 UTC-6. How could it take an hour to even admit that an issue exists? We have alerts set on this but they're useless when this late.

WaxProlix · on Feb 28, 2017

Same here, and it's 100% consistent, not 'increased error rates' but actually just fully down. I'd just stop working but I have a demo this afternoon... the downsides of serverless/cloud architectures, I guess.

synicalx · on March 1, 2017

Heh that "increased error rates" got a chuckle out of me, I guess 100% is technically an increase.

pm90 · on Feb 28, 2017

Well what if you'd hosted it on your hard drive and it crashed? It seems like the probability of either is similar nowadays.

jacobwg · on Feb 28, 2017

The difference there is you can potentially do something about it, vs having to wait on an upstream provider to fix an issue for everybody.

btgeekboy · on March 1, 2017

"you can potentially do something about it" vs. "you have to do something about it"

Perspective is everything.

JupiterMoon · on Feb 28, 2017

Grab different machine, git clone your repo, good to go.

What's the odds of the server with your repo and your own hard drive crashing at the same time?

_ao789 · on Feb 28, 2017

Strangely, your comment made me read this entire post about working out probabilities.. http://www.statisticshowto.com/how-to-find-the-probability-o...

Quite interesting really!

JupiterMoon · on March 2, 2017

If we assume that the events are largely uncorrelated+ then we are multiplying the probabilities and our chance of wipe out are far lower.

+I would suggest that for situations where the probability of my machine and github's/bitbucket's servers being down due to the same event would be events of such magnitude that I would not be worried about my project anymore being more focused on basic survival...

all_usernames · on Feb 28, 2017

Our services in us-west-2 have been up the whole time.

I think the problem is globally accessible APIs are impacted. As others have noted, if you can use region/AZ-specific hostnames to connect, you can get though to S3.

CloudFront is faithfully serving up our existing files even from buckets in US-East.

illumin8 · on Feb 28, 2017

S3 bucket creation was down in us-west-2, because it relied on us-east-1 (I expect that dependency will get fixed after this), but all S3 operations should have continued to function in us-west-2, other than cross-region replication from us-east-1.

codelitt · on Feb 28, 2017

IIRC the console for S3 is global and not region specific even though buckets are.

seanp2k2 · on Feb 28, 2017

Also, cross-region replication is a new-ish thing: https://aws.amazon.com/blogs/aws/new-cross-region-replicatio...

Ph4nt0m · on Feb 28, 2017

Same outage in ca-central-1

ngtvspc · on Feb 28, 2017

I can confirm this as well.

gamache · on Feb 28, 2017

Huh, I'm not seeing it on my us-west-2 services. Interesting.

movedx · on Feb 28, 2017

My advice is: don't keep your eggs in one basket. AZs a localised redundancy, but as Cloud is cheap and plentiful, you should be using two or more regions, at least, to house your solution (if it's important to you.)

EDIT: less arrogant. I need a coffee.

gamache · on Feb 28, 2017

But now you're talking about added effort. Multi-AZ on AWS is easy and fairly automatic, multi-region (and multi-provider) not so much. It's easy to say things like this, but people who can do ops are not cheap and plentiful.

movedx · on Feb 28, 2017

The only difficult aspect of multi-region use is data replication, which I can confirm is a (somewhat) difficult problem. This issue was with S3 which has an option to automatically replicate data from the bucket's region to another one. It's a check box. A simple bit of logic in the application and you can move between regions with ease.

Even data replication has options for this, too.

And I work in Ops.

gamache · on Feb 28, 2017

Well, you've explained how to do multi-region in S3. Now let's cover EC2, ELB, EBS, VPC, RDS, Lambda, ElastiCache, API Gateway, and all the other bits of AWS that make up my services. And then we can move on to failover application logic.

movedx · on Feb 28, 2017

I picked out S3 as this issue is directly related to it, yet the solution is simple: turn on replication and have your application work with it (which is on the developers, not ops.)

EC2: why are you replicating EC2 instances or AMIs across regions? Why aren't you using build tools to automatically create AMIs for you out of your CI processes?

ELB: Eh? Why do I need ELBs to be multi-regional? I'm a little confused by this on, sorry.

EBS: My systems tend to be stateless, storing as much log, audit, or data in external systems such as RDS, DynamoDB, S3, etc. Storing things on the local system's storage is a bit risky, but if you have to there are disk replication solutions available. EFS comes to mind for making that easier. Backups also come to mind in the event of data loss.

VPC: Why does a VPC need to be cross regional? This one is also lost on me.

RDS: Replication is easy -- it's done for you. Convincing developers their application needs to potentially work with a backup endpoint to the data is harder than data replication problems at times. More often than not, it's simply a case of switching to a read-only mode whilst you recover the write copy of your RDS instance, but this is the role of the developers, not ops.

Lambda, ElastiCache, API Gateway... all these things aren't arguments against my original point: architect correctly. Yes it involves more work (from the developer's perspective, mostly), but more often than not in the event of a failure you're left head and shoulders above your nearest competition and left soaking up the profits as a result.

Based on your responses, however, I think we can safely agree to disagree and move on.

Have a great day! I hope you weren't too badly effected by the S3 outage!

EDIT: typo.

webo · on Feb 28, 2017

>EC2: why are you replicating EC2 instances or AMIs across regions?

Exactly to avoid single region outages?

ec109685 · on March 1, 2017

I think point was that you shouldn't replicate but just deploy to both.

vacri · on March 1, 2017

Gamache's point is that making your production environment cross-regional means setting up all those things in another region and managing them as well. It's not a tickbox.

Our webservers were hit by this outage. In order to make these cross-regional, I'd need to set up VPCs properly, security groups, instances, datastores (several databases), so on and so forth. I don't store anything on the local disk, but I'm not going to run a server in Europe hitting my db servers in us-east-1. AWS doesn't offer all the databases we use. Cloudformation isn't trivial to use once you get past the tutorial examples either.

Basically, your comment is a version of "you're holding it wrong!"

movedx · on March 1, 2017

The US is made up of several regions. You don't have to leave the country to go multi-region, you only need to go west or east from your current location in the US.

Some solutions present more difficulties than others, that's for sure. From the limited information you've given me, your solution is far from being a unique situation that poses many difficulties.

CloudFormation in YAML format is pretty easy. I recommend Terraform, however, which is much nicer again for this kind of stuff. It makes it rather "trivial" to get a multi-region solution in place.

As for the database replication: I highly doubt the solutions you're using don't offer replication, and if they don't, and they're not some very esoteric, highly specialised engines, then I would replace them with something that does.

It reads to me as though your primarily contention point is your databases. Not an easy problem to solve, I'll admit, but not impossible, neither.

jacquesm · on Feb 28, 2017

Two different vendors if you can afford it. It's a bit of a hassle though.

movedx · on Feb 28, 2017

I like to stick to one, but I have seen some success stories with an AWS/GCE mix :-)

HashiCorp's Terraform makes it a lot easier to go multi Cloud, and abstracting away configuration of the OS and applications/state with Ansible makes the whole process a lot easier too.

cfieber · on Feb 28, 2017

Here is one success story:

https://cloudplatform.googleblog.com/2017/02/guest-post-mult...

bischofs · on Feb 28, 2017

It shouldnt be technically possible to lose S3 on every region, how did amazon screw this up so bad?

boulos · on Feb 28, 2017

I believe the reports here are misleading: if you try to access your other regions through the default s3.amazonaws.com it apparently routes through us-east first (and fails), but you're "supposed to" always point directly at your chosen region.

Disclosure: I work on Google Cloud (and didn't test this, but some other comment makes that clear).

twistedpair · on Feb 28, 2017

Amen. We setup our company cloud 2 years ago in US-West-2 and have never looked back. No outage to date.

jacquesm · on Feb 28, 2017

If you have a piece of unvarnished wood handy...

compuguy · on Feb 28, 2017

Is us-east-2 (Ohio) any better (minus this aws-wide S3 issue)?

mullen · on March 1, 2017

us-east-2 is brand new and us-east-1 is the oldest region. Any time there is an issue, it is almost always us-east-1. If possible, I would migrate out of us-east-1.

jchmbrln · on Feb 28, 2017

Probably valid, though in this case while us-west-1 is still serving my static websites, I can't push at all.

nola-radar · on Feb 28, 2017

The s3 outage covered all regions.