"For anyone who's not used these "managed" services before, I want to add that i...

erikpukinskis · on April 4, 2019

Yes, but I think very broadly speaking the quirky behavior is stuff you bump into, learn about, fix, and then can walk away from.

The daily/monthly maintenance cycle on a self hosted SQL server is “generally understood” but you still have to wake up, check your security patches, and monitor your redeployments.

You can do some of that in an automated fashion with public security updates for your containers and such. But if monitoring detects an anomaly, it’s YOU, not Heroku who gets paged.

It’s a little like owning a house vs renting. Yes if you rent you have to work around the existing building, and getting small things fixed is a process. But if the pipes explode, you make a phone call and it’s someone else’s problem. You didn’t eliminate your workload, but you shrunk the domain you need to personally be on call for.

maxxxxx · on April 4, 2019

The problem is that if I run my own servers I can fix problems (maybe with a lot of effort but at least it can be done) but with managed services I may not be able to do so. There is a lot of value in managed services but you have to be careful not to allow them to eat up your project with their bugs/quirks.

scarface74 · on April 5, 2019

So what “problems” were you unable to fix with AWS?

holoduke · on April 4, 2019

Exactly this

diminoten · on April 4, 2019

The point is with a managed service, none of your problems will be with the service. That's what the managed service is selling.

deathanatos · on April 5, 2019

I just finished a 2+ week support ticket w/ AWS. We were unable to connect over TLS to several of our instances, because the instance's hostname was not listed on the certificate. This is a niche bug that's trivially fixable if you own the service, but with AWS, it's a lot harder: you're going to need a technical rep who understands x509 — and nobody understands x509.

I've found & reported a bug in RDS whereby spatial indexes just didn't work; merely hinting the server to not use the spatial index would return results, but hinting it to use the spatial index would get nothing. (Spatial indexes were, admittedly, brand new at the time.)

I've had bugs w/ S3: technically the service is up, but trivial GETs from the bucket take 140 seconds to complete, rendering our service effectively down.

I've found & worked w/ AWS to fix a bug in ELB's HTTP handling.

All of these were problems with the service, since in each case it's failing to correctly implement some well-understood protocol. AWS is not perfect. (Still, it is worth it, IMO. But the parent is right: you are trading one set of issues for another, and it's worth knowing that and thinking about it and what is right for you.)

diminoten · on April 5, 2019

Okay, I'm sorry you thought I said AWS was perfect and bug free. I didn't, however, say that. I said (implied, really) it's better than anything you could possibly home brew. Nothing you've said here changes that.

Further, didn't I say that it's trading one set of issues for another? Or at least, I explicitly agreed with that.

I feel like you didn't read what I wrote honestly, and kind of came in with your own agenda. All I ever said was that the issues you trade off are orchestration issues vs. operational issues, and operational issues are 10x harder than orchestration issues because you don't get to decide when to work on operational issues, you tend to have to deal with them when they happen.

frankchn · on April 5, 2019

You wrote “The point is with a managed service, none of your problems will be with the service.”

What deathanatos wrote sounds awfully like problems with the service to me.

I don’t think S3 taking 100+ seconds to respond to a GET request can be solved by orchestration alone.

diminoten · on April 5, 2019

It definitely can. Reasonable timeouts and redundant systems.

indigo945 · on April 5, 2019

It's amazing the length some people are willing to go to to defend AWS marketing slogans as a source of truth. I've seen vendor lock-in before, but AWS seems to be unique in that people actually enjoy working with a vendor whose services go down randomly to the point where they blame themselves for not being "fault-tolerant".

Guess what, if your service is not required to be up because the consuming service is super tolerant to it timing out after 140 seconds, self-hosting it becomes even more of a no-brainer. After all, you clearly need none of the redundancy AWS features.

diminoten · on April 5, 2019

If it makes you feel better, everything I'm saying about AWS can be said about GCP as well.

Sorry, but AWS/GCP is infinitely better at managing infrastructure than you or your company will ever be.

maxxxxx · on April 4, 2019

That's the promise but in reality every software has bugs, including managed services.

diminoten · on April 4, 2019

Not really, not anything like what you're describing.

12 outages since 2011, and none of them are anything like what you're describing: https://en.wikipedia.org/wiki/Timeline_of_Amazon_Web_Service...

bsagdiyev · on April 4, 2019

We've moved from on-prem to AWS fully and we see random issues all the time while their status page shows all green, so I feel you probably have a small amount of resources in use with them or something, because what you're saying doesn't jive with what we see daily. I see you've also copy-pasted your response to other comments too, so I'll do the same with my response.

diminoten · on April 4, 2019

I don't feel like copy/pasting all of our comments to each other, so I'd appreciate it if you didn't do that, thanks.

bsagdiyev · on April 5, 2019

Then don't do it yourself. You're dead set on ignoring people whose experience is different than yours, wrapping yourself in an echo chamber of sorts and telling others they are wrong.

diminoten · on April 5, 2019

I'm not dead set on anything, I'm trying to have conversations with multiple people, not create an immutable record.

And I don't think you know what an echo chamber is if you think one person can create one alone...

maxxxxx · on April 4, 2019

This is not about outages. There are many more things that can go wrong besides outages.

diminoten · on April 4, 2019

[flagged]

maxxxxx · on April 5, 2019

How long have you been working in tech? Just curious.

You sound like someone who hasn't had much real world experience and thinks AWS or whatever is the best thing because it's the only thing you know.

copperx · on April 5, 2019

You may want to ask the OP how much time she/he has been working at Amazon instead.

diminoten · on April 5, 2019

Long enough to know that some dinosaurs refuse to learn anything new (read: AWS) and will bend over backwards to try and keep themselves relevant.

Apaec · on April 5, 2019

I guess it can't be proved that this guy is a shill for AWS.

But this kind of toxic fanatism(yet trying to sound logical) is just harmful for the HN community.

dang: Can this kind of behavior be punished?

goostavos · on April 5, 2019

Bro, what're you so upset about in this thread? That people had different experiences than you with AWS..?

diminoten · on April 5, 2019

I'm not upset, I'm simply pointing out that AWS isn't the problem in any of these examples, it's the various commenter's lack of understanding about how to work in AWS that's caused these problems.

I don't think anyone is actually upset, do you? I certainly hope I haven't upset anyone... :/

Dylan16807 · on April 5, 2019

When your hammer snaps in half, you don't blame yourself for not using two hammers.

When the tool breaks under correct use, criticize the tool. Maybe the user should also have redundancy. The tool is still failing!

diminoten · on April 5, 2019

This analogy is what snapped in half, not the hammer. It's more like if your hammer says right on it, "YOU NEED A SECOND HAMMER" and this is true of all hammers, it's still not the hammer's fault you didn't bring a second hammer.

Dylan16807 · on April 5, 2019

And you're in other threads complaining that the people that had five hammers were still doing it wrong, that all the outages they report are fake somehow...

Even when you're supposed to have redundancy, there are still certain failure rates that are acceptable and some that are not. And redundancy doesn't solve every problem either.

diminoten · on April 5, 2019

What? No I'm not. Literally no where has anyone said they've built a system with redundancies as recommended by AWS and still had problems.

Of course there are unacceptable failure rates. AWS doesn't have them, and pretending like they do is simply lying to yourself to protect your own ego.

msla · on April 5, 2019

> The point is with a managed service, none of your problems will be with the service. That's what the managed service is selling.

Until the managed service simply goes away, of course, taking your data with it.

forty · on April 4, 2019

It's someone else problem unless it prevents you from living here, in which case it's still your problem too. I think the analogy works quite well :)

grigjd3 · on April 4, 2019

So I've worked with AWS and with our internal clusters as a dev. My experience has been that I have to make work-arounds for both, but at least with AWS, I don't have to spell out commands explicitly to the junior PEs.

EDIT: I should be clear, our PEs are generally pretty good, but because their product isn't seen by upper management as the thing which makes money, they're perpetually understaffed.

Macha · on April 4, 2019

Also Amazon documents their stuff in a nice public website, internal teams documented the n-2 iteration of the system and have change notes hidden in a Google drive somewhere that if you ask the right person on the other side of the world they might be able to share you a link to.

pm90 · on April 5, 2019

This. So. Much.

I can't explain just how much developing on GCP has helped me simply by having such amazing documentation. I don't think I appreciated how little I knew: every company where we worked with on premise/ internal services, we would have to use custom services built by others. With GCP, you have complete freedom, not just to design your application architecture from scratch, but to understand how others (coworkers mostly) have designed _their_ applications too! And as a company, it allows the sharing of a common set of best practices, automatically, since its "recommended by Google".

Its kinda like Google/Amazon are now the System/Operations engineers for our company. Which they're good at. And its awesome.

grigjd3 · on April 4, 2019

You were able to find documentation? Where do you work?

dfee · on April 4, 2019

But you’re talking about a reduction in the number of types of specialized people to the number of specializations per type of person. That makes this more scalable.