It's sad to me this is becoming the status quo. Using other massively centralize...

jacquesm · on March 11, 2017

Hardware is a means to an end. I've got plenty of it but at the end of the day what you do with it should be balanced by what it costs.

For companies that have instances running long term it can very well be cost effective to own the hardware. My email server, web server and DNS server are on my own hardware with a co-location facility that I trust.

But for experimental stuff where you need to spin up a hundred machines for an hour or two you just can't beat the cloud (and that's my only use case for the cloud, though I can see others go much further).

I don't like the monoculture any more than you do, but to see this as me having given the 'keys to the kingdown to Amazon' is several steps too far.

hueving · on March 11, 2017

I misinterpreted what you meant by a "short lived experiment". I took that to basically mean any project when you're starting out. My apologies.

Whenever I'm experimenting I rarely need a burst of 100 instances, it's usually 1 or 2 instances to run things and I prefer to run them on my own hardware.

raz32dust · on March 11, 2017

I feel you. But I think the lock-in problem can be solved if we can have some standardization of cloud services such that you can always move to another provider. That has to start with some big company developing an abstraction layer and open sourcing it, and then we can go from there. I think Netflix has a switch from Amazon to GCP; I hope be they'll standardize it and open source it.

cowardlydragon · on March 11, 2017

Chef, puppet, etc, and other Apache projects all offer this already.

It's like leveraging Oracle specific database features. Your a fool to do so.

jacques_chester · on March 11, 2017

> That has to start with some big company developing an abstraction layer and open sourcing it, and then we can go from there.

I work for a company (Pivotal) that's had such a product -- Pivotal Cloud Foundry -- for several years. It creates an abstraction layer for apps or container images, your choice.

Deploy with BOSH to raw metal, OpenStack, vSphere, AWS, Azure or GCP. BOSH creates an abstraction layer over the IaaS.

We're also the main driving force behind Spring, Spring Boot and Spring Cloud Services; the latter is in part a generalisation and integration of Netflix OSS.

We cooperate a lot with Google and Microsoft. For example: https://cloud.google.com/solutions/cloud-foundry-on-gcp

dpark · on March 11, 2017

> I think hackers justify it to themselves by pretending it's a commodity like electricity, but it's far from that. If my utility goes out, I can turn on generator and get exactly the same electricity. If Amazon goes out I have to build again on another cloud from a (hopefully recent) backup or just sit dead (like the recent s3 outage).

What's your use case?

Are you just futzing around at home? Sure, use a server in your bedroom. Who cares?

Are you delivering a service to other people? Then owning hardware is probably a bad idea. If it's in your house, your users are hosed if you lose power or your internet cuts out. Putting it in a DC just means you're handing the same keys to someone else, but your self-managed hardware is definitely going to be less reliable than Amazon's infrastructure.

Owning hardware is a bad deal for everyone involved unless you're big enough to build your own HADR infrastructure.

hueving · on March 11, 2017

>but your self-managed hardware is definitely going to be less reliable than Amazon's infrastructure.

I don't buy this. I've seen many multi-datacenter self-managed deployments provide better uptime than Amazon web services. You are forgetting that when you own the hardware, you can actually orchestrate maintenance windows with live migrations, etc and then take down an entire datacenter with no impact. Guess when Amazon does maintenance? That's right, you don't know and one screw up can mean instances in "degraded status" (a.k.a. you might as well terminate it and launch a new one) or all of S3 is down during critical business hours.

Of course your own hardware in a single data-center is going to be exposed to high probability of failures, but that's the equivalent of using a single instance in EC2 (which I have lost two of in the last 7 years of managing 15 or so of them for a small company).

I will admit that it takes strong ops skills to maintain high uptime on your own hardware, but that's just due to a lack of good open source tooling in this area. I would rather see a movement to improve tooling rather than continue to boost the stranglehold the public cloud is putting on everyone.

dpark · on March 11, 2017

> I've seen many multi-datacenter self-managed deployments provide better uptime than Amazon web services.

Self-managed, multi-DC? Congrats on having a lot of money to blow, I guess.

Yes, with enough money you can match Amazon for uptime or scalability or whatever metric you prefer. For the same money you can probably buy triple the capacity in Amazon or your preferred cloud provider, so this is mostly a game for people with really deep pockets, really large scale, or really poor budgeting.

> You are forgetting that when you own the hardware, you can actually orchestrate maintenance windows with live migrations, etc and then take down an entire datacenter with no impact.

How many DCs are you talking about here? Are you self-managing in 4+ DCs? Or are you running in 2 DCs and your capacity is overbuilt by 100+%? In either case, deep pockets are nice to have.

Also, does your maintenance strategy seriously involve bringing down entire DCs? This is kind of absurd and makes half of me jealous of the bathtub full of cash you must bathe in. It makes the other half of me question some engineering decisions you've apparently made.

> all of S3 is down during critical business hours.

I have trouble believing people when they claim to do significantly better than Amazon (or another favorite cloud provider) for infrastructure uptime. If you stand up a fairly complex system comprised of a number of loosely-coupled services, you're going to end up experiencing some outages, because you'll face the same challenges as Amazon and those guys aren't idiots. You'll lose your message queue due to a bug, or you'll lose a network switch and realize your failover takes 30 minutes to complete instead of the 5 seconds you hoped for, or you'll accidentally DDOS a subsystem when exercising a failover or a system upgrade, or something else. Complex systems fail and when people tell me they built an "internet scale" system with better uptime than Amazon, I'm left to assume that they probably just do a bad job of tracking uptime or else that their systems are not at the scale they imagine. Everyone who builds large systems experiences outages.

jacquesm · on March 11, 2017

> I basically don't believing people when they claim to do significantly better than Amazon (or another favorite cloud provider) for infrastructure uptime.

That needs a dollar-for-dollar or something to that effect qualification. It's possible but very expensive.

There are for instance long running (and I mean really long running, many years or even decades) experiments where any amount of downtime would cause a do-over.

One of my customers had something like this on the go. The amount of money they spent on their power and network redundancy was off the scale, but they definitely had better uptime than Amazon.

Their problems were more along the lines of 'this piece of equipment is nearly eol, how do we replace it without interrupting the work it does'.

dpark · on March 11, 2017

Yes, sorry. I was assuming similar expense. Enough money can buy just about anything, including a few additional nines.

If your goal is to build out scale more reliably than Amazon, at the same or lower cost, that's tough and you're unlikely to achieve it unless your scale is approaching that of Amazon (and you have really good people).

hueving · on March 15, 2017

>Self-managed, multi-DC? Congrats on having a lot of money to blow, I guess.

Putting a rack in a COLO is still self-managed for the purpose of what I'm talking about. It's easy to get multiple data centers where you are renting the space and electricity but you still own the hardware and can make agreements with various ISPs to get service from.

>How many DCs are you talking about here? Are you self-managing in 4+ DCs? Or are you running in 2 DCs and your capacity is overbuilt by 100+%? In either case, deep pockets are nice to have.

See comment above.

>Also, does your maintenance strategy seriously involve bringing down entire DCs? This is kind of absurd and makes half of me jealous of the bathtub full of cash you must bathe in. It makes the other half of me question some engineering decisions you've apparently made.

See comment above. "bringing down a DC" doesn't mean shutting everything off, it means from the perspective of your end users, your service is not available there.

> because you'll face the same challenges as Amazon and those guys aren't idiots.

No, but they have much different priorities. If all I want is static asset hosting, the loosely-coupled micro-service architecture you are referring to is completely overkill and results in the very instability you are claiming is normal.

>Complex systems fail and when people tell me they built an "internet scale" system with better uptime than Amazon, I'm left to assume that they probably just do a bad job of tracking uptime or else that their systems are not at the scale they imagine. Everyone who builds large systems experiences outages.

Nobody except Google and Microsoft are building something as complex as the entire AWS stack. The vast majority of AWS users are using a tiny percentage of the features that come with AWS and can get by on much simpler systems that are easier to reason about and maintain.

When you dump the majority of what Amazon is actually running, you have a much simpler system and architecture and actually can beat Amazon's uptime.

patrickg_zill · on March 11, 2017

Amazon charges at least 15 to 20 times the going rate for bandwidth. So if you are serving large amounts of data, it could easily be the case that you can pay for enhanced uptime with just the savings on bandwidth alone.

kuschku · on March 11, 2017

My raspberry pi at home has had in the past 3 years less downtime than aws.

Local datacenters in the city had even less.

I'm not sure where AWS is supposed to get that famous reliability from, but it's not in uptime. (I can't comment on storage reliability, because I only write a few terabytes of data a month — but otherwise, there's RAID 5 or other RAID setups to ensure data staying valid)

AWS has its advantages in its immense scalability within of seconds, it has its advantages in convenience.

But its uptime isn't much better than most home connections.

Home statistics:

Power downtime since 2006 is 29 minutes.

Internet downtime since 2006 is 6 hours in 2014, 2 times 30 minutes each in 2016.

This is on a 100/40 DSL line nowadays (the downtimes were, except for one, when switching ISPs), without any universal power supply, battery or generator.

For comparison, this is equivalent to a downtime of 99.99% — the same as AWS advertises, but better than what they delivered in this or the last year.

jacquesm · on March 11, 2017

You probably do not get how this works. Let me try to explain: when you talk about the uptime of your raspberry pi you are looking at a single, very simple instance of a computer. It's really easy to get an insane uptime out of a single machine.

Here's one for you:

  > uptime
    02:52:56 up 714 days, 16:53,  1 user,  load average: 0.00, 0.00, 0.00

Which is pretty average for a small, underutilized server. Essentially the uptime here is a function of how reliable the power supply is.

But that's not what AWS is offering.

They offer a far more complex solution which by the very nature of its complexity will have more issues than your - and mine - simple computers.

The utility lies in the fact that if you tried to imitate the level of complexity and flexibility that AWS offers that you'd likely not even get close to their uptimes.

So you're comparing apples and oranges, or more accurately, apples and peas.

jhlgkhkhil · on March 11, 2017

Agreed. What I question is whether a lot of the complexity is actually needed for a lot of the systems being deployed? For example people are building docker clusters with job based distributed systems for boutique B2B SAAS apps with a few 1,000 users. Is the complexity needed? And how much complexity needs to be added to manage the complexity?

kuschku · on March 11, 2017

How am I comparing apples and oranges?

The previous posters said that I should use AWS, because anything I set up myself will have more downtime than AWS.

Now. I've actually set up a few systems.

Some on rented dedicated servers, some on actual hardware at home.

Including web apps, databases backing dozens of services, etc.

As mentioned above, all of them have better uptime than AWS.

How am I comparing apples with peas if this is exactly the point made above — that even for simple services I should use AWS?

jacquesm · on March 11, 2017

> How am I comparing apples with peas if this is exactly the point made above — that even for simple services I should use AWS?

That a single instance of something simple outperforming something complex does not mean anything when it comes to statistical reliability. In other words, if a million people do what you do in general more of them will lose their data / have downtime than those same people hosting their stuff on Amazon. The only reason you don't see it is because there is a good chance that you are one of the lucky ones if you do things by yourself.

And that's because your setup is extremely simple. The more complex it gets the bigger the chance you'll end up winning (or rather, losing) that particular lottery.

kuschku · on March 11, 2017

> The only reason you don't see it is because there is a good chance that you are one of the lucky ones if you do things by yourself.

Or maybe because I have less complexity in my stack, so it’s easier to guarantee that it works.

Getting redundant electricity and network lines, and getting redundant data storage solutions is easy.

Ensuring that of 3 machines behind a loadbalancer at least 2 work is also easy.

Ensuring a complex system of millions of interconnected machines, services which have never been rebooted or tested in a decade (see the AWS S3 post-mortem), none will ever fail, is a lot harder.

dpark · on March 11, 2017

You're right. If you run fairly low volume services that don't need significant scale, you can possibly achieve better uptime than Amazon. You'll probably spend significantly more to get it, though, since your low volume service probably could run on a cheap VM instead of a dedicated physical server.

You're also likely rolling the dice on your uptime, since a hardware failure becomes catastrophic unless you are building redundancy (in which case you're almost certainly spending far more than you would with Amazon).

kuschku · on March 11, 2017

Actually, I’ve calculated the costs – if you only need to build for one special case, even with redundancy you tend to be always ~3-4 times cheaper than the AWS/Google/etc offerings for the same.

But then again, you have only one special case, and can’t run anything else on that.

PascLeRasc · on March 12, 2017

I agree, it is sad that heavy computing/data is being centralized around corporations. I really get a lot out of being able to see and touch my hardware. To me that's worth the additional cost. I love my little 100TB Synology box and it feels weird now sitting at my desk without its soft fan hum.