I'm really happy I don't host in the cloud. How quickly are the cost savings of ...

tomgallard · on Oct 26, 2012

Surely hosting yourself exposes you to just as much, if not more risk? Problem in the datacentre where you're co-lo'd, or one of your servers blows up?

I think people not trusting the cloud is similar to how people feel safer driving their cars then taking a plane. The stats say the plane's safer, but people prefer being in control. People like the idea of being in control of their servers, even if that means there's hundreds of extra things that can go wrong compared to a cloud provider.

We also get a lot more publicity when a cloud provider has an outage as LOTS of sites go down at once. Hardly anyone notices when service X who self-host go down for a few hours...

Alan01252 · on Oct 26, 2012

I agree with you. From my past experience any data center is subject to risks. I've witnessed:

  Power failures.
  Cross site links being cut due to engineering works.
  Over heating due to air conditioning failures.
  Flooding

And I've experienced all the above from a very large, very well known, very expensive data center company based in London.

acdha · on Oct 26, 2012

This is true of every data center I've worked with. Also network providers: everyone has downtime and sometimes you learn the hard way that despite being written into your contract someone took the cheap way and ran your “redundant” fiber through the same conduit which a backhoe just tore up.

debacle · on Oct 26, 2012

We're coloed across three datacenters spanning the US (one might be in TO I think) and if a datacenter were to go down, we have a hot backup that's no more than 12 hours stale.

The only real manual maintenance that we've got is a rolling reimaging of servers based on whatever's in version control, which usually takes a few hours twice a year, but we'd probably do that if we were in the cloud anyway.

When you can script away 90% of your system administration tasks, hosting in the cloud doesn't really make a ton of sense.

tomgallard · on Oct 26, 2012

But a DNS based failover is still going to take an hour or so to propagate right (given that a lot of browsers/proxies/DNS servers don't respect TTL very well at all)? And then you end up with a system with stale data, and the mess of trying to reconcile it when your other system comes back up.

I'd take an hour long Appengine outage once a year over that anytime!

0xbadcafebee · on Oct 26, 2012

Your name server or stub resolver is what respects DNS TTL, not your browser or proxy. Everyone - including people hosting on AWS - needs to be able to fail over DNS, if the AWS IP you're using is in a zone that just went down, for example.

Any time you have an outage you need to contact your service provider to get an estimate of downtime. If they can't give you one, assume it'll take forever and cut the DNS over. The worst case is some of your users will start to come back online slowly. If you don't cut over, the worst case is all your users are down until whenever the service provider fixes it, and you get to tell your users "we're waiting for someone else to deal with it", which won't make them very happy.

12 hour stale data sounds kind of long to me. 4 hours sounds more reasonable.

codeka · on Oct 27, 2012

I've seen plenty of crappy ISP DNS servers ignore TTL values and cache DNS entries for many hours longer than they're supposed to. Unfortunately, it's all too common.

stickfigure · on Oct 26, 2012

When you can script away 90% of your system administration tasks, hosting in the cloud doesn't really make a ton of sense.

How big is your ops team? I'm guessing it's more than 0.

debacle · on Oct 26, 2012

Ops team? We're a two man operation with occasional contractors.

stickfigure · on Oct 26, 2012

In that case, what is the ratio of "time spent doing ops-related tasks" vs "time spent developing new features" in your company? Please offer an honest evaluation. Everything has a cost; I'm genuinely curious about data points other than my own.

debacle · on Oct 26, 2012

I probably spend no more than an hour a week on ops, and most of that is reading emails from our service providers.

stickfigure · on Oct 26, 2012

Today, maybe, assuming a calm ocean and no scaling issues. But I don't believe you spent an hour a week setting up your three data centers, backups, failover procedure, etc.

debacle · on Oct 26, 2012

The backup script was written in a night, and the most complicated part about failover is remembering to sync data when the outage is over.

0xbadcafebee · on Oct 26, 2012

The "safety" of the cloud is about two things: 1. trusting your service provider, and 2. redundancy.

You have to trust your cloud provider. They control everything you do. If their security isn't bulletproof, you're screwed. If their SAN's firmware isn't upgraded properly to deal with some performance issue, you're screwed. If their developers fuck up the API and you can't modify your instances, you're screwed. You have to put complete faith in a secret infrastructure run for hundreds of thousands of clients so there's no customer relationship to speak of.

That's just the "trust" issue. Then there's the issue of actual redundancy. It's completely possible to have a network-wide outage for a cloud provider. There will be no redundancy, because their entire system is built to be in unison; one change affects everything.

Running it yourself means you know how secure it is, how robust the procedures are, and you can build in real redundancy and real disaster recovery. Do people build themselves bulletproof services like this? Usually not. But if you cared to, you could.

modoc · on Oct 26, 2012

It really depends on how many servers and how good your sys admins are. If you have 1 server in a cheap colo, then yes, the cloud is probably better. If you have a GOOD hosting provider, and build out a redundant at every tier cluster, you can easily beat the major cloud providers' uptime.

We run about 11 client clusters across ~250 servers across 3 data centers in the US and Europe. Each of our client's uptime is very very close to 100%, and we've NEVER lost everything, even for 1 second.

bsaul · on Oct 26, 2012

It's funny, i've got the exact opposite reasoning : it's those moments where i can really appreciate the fact that i'm using the cloud : 1/ I don't have to spend the night debugging or replacing broken hardware 2/ It doesn't cost me any time, any additional resource, any support upgrade, any hardware. 3/ No one can blame me or anybody in my team for the fact that it's not working.

I don't feel like i'm lacking control, i feel like somebody else is taking care of that really annoying shit that happens all the time, no matter how well you design your system.

modoc · on Oct 26, 2012

If you have paying clients, they WILL blame you and your team for the fact it's not working. They don't care who/what the underlying infrastructure is.

Also a good hosting company will handle identifying/fixing/replacing bad hardware for you.

calinet6 · on Oct 26, 2012

Not to mention sufficient redundancy will ensure that you never see many effects from those hardware failures/power outages/floods/fires/anything else with any reasonable probability.

"The cloud is great because I can blame someone else" is obviously a tenuous argument.

lurker14 · on Oct 26, 2012

You customers will not be mollified if you tell them that the service you sold is down due to a subcontractor failure.

untog · on Oct 26, 2012

I think there is an interesting debate to be had here. When your site goes down at the same time as Tumblr/Reddit/one of the big guys, the damage might not actually be that high. In the minds of many, "the internet" is probably quite broken.

debacle · on Oct 26, 2012

Depends on the site. Most of the traffic we cater to is not referrer driven.

ceejayoz · on Oct 26, 2012

My colocation provider (Frontier, telecom in 27 states) went down for an hour and a half last night. It's hardly unique to the cloud.

hartleybrody · on Oct 26, 2012

I don't think you move to the cloud for cost savings. More like time and headache savings.