Considering the cost of the infra it would take to replicate some of these services, not to mention bandwidth, DC, and electricity costs, you're probably still saving money. You're absolutely saving on Network Ops and Data Center Ops not to mention the huge investment in gear one has to make. A single server can cost over 100K. Network gear can costs way more. The fact that you don't have to make those investments is the allure. At a certain point, you might outgrow someones else's cloud and build your own, but that divide is a fairly large threshold to cross.
if you're buying either, you're paying millions in salaries to technicians and sysadmins and DBAs^W^W^Wdevops and SREs.
i've been on both sides of the fence and it's a case of 'grass is always greener on the other side'. the truth is, running any sort of non-trivial infra is ducking expensive.
Depends on how much abstraction you have, I have seen big companies where deploying code is basically like using Heroku. As an engineer responsible for a couple of services you don't need to know or care if this code is running on bare metal, Mesos, K8s and you care even less about the data center.
I come from this old world of managing switches and servers and today we definitely need a lot less people to run code in production. I used to work at a company with ~2000 machines in physical data centers before containerization, this required a huge infra team - I'm sure that today I could support the same workloads with half the team.
Having worked half my career at places with their own data centers and self ran infra, and the other half with mostly cloud based solutions, I have a theory.
Perhaps we are designing far more complicated solutions now to leverage these cloud services, whereas having the constraints of a self operated data center and infrastructure necessitates more ingenuity to achieve similar results.
We used to do so much more with just a few pieces of infrastructure, like our RDBMS's, as one example. It was amazing to me how many scenarios we solved with just a couple of solid vertically scaled database servers with active-active failovers, Redis, an on-prem load balancer, and some webservers (later, self hosted containerization software). We used to design for as few infrastructure pieces as possible, now it seems like that is rarely a constraint people have in their minds anymore.
Amen, I'm becoming an old grumpy engineer on my team for constantly asking why we need yet another <insert cloud technology here>. I'm not against new technology but I am against not considering what we have and how it may already solve the problem without adding wider breadth to our operational surface area. And it's every single damn year now because now cloud providers string their own cloud primitives together to form new cloud services.
How many times I've had this discussion. Let's publish a notification, and let's have the message receiver call some API. Why not just call the API from the place where you want to publish the message? Because we need this SNS message queue.
Probably because the API can be unreachable, timeout, etc — with a message queues it can be redelivered without permanently dropping customer data or whatever with only a stack trace to remember it by
That's naive without any context to claim. You have to know the source that triggers the code to publish the message, what the message is for, the fault tolerance and availability of the API we're calling before you can even begin to decide. Which you validated perfectly by giving a snarky "what about redundancy" answer to a complicated question.
> Perhaps we are designing far more complicated solutions now to leverage these cloud services, whereas having the constraints of a self operated data center and infrastructure necessitates more ingenuity to achieve similar results.
Nothing about "ingenuity", just plainly having some friction in implementation makes for simpler designs.
If you have zero cost (aside from per request pricing but that's not your problem right now, that's management) to add a message queue to talk between components, now that's a great reason to try that message queue or event sourcing architecture you've read about.
And it works so "elegantly", just throw more stuff on queue instead of having more localized communications.
We don't worry about scaling, cloud worries about it(now bill for that queue starts to ramp up but that's just fraction of dev salary, we saved like a 2 weeks of coding thanks for that! Except that fraction adds up every month...).
Repeat for next 10 cloud APIs and you're paying at every move, even for stuff like "having a machine behind NAT". And if something doesn't work can't debug any of it.
Meanwhile if adding a bunch of queue servers would take few days for ops to sort monitoring and backups on it, eh, we don't really need it, some pubsub on redis or postgresql we already have can handle stuff that needs it, and rest can just stay in DB. This and that can just talk directly as they don't really need to share anything else on queue, we just used queue to not fuck with security rules every time service needed to talk to additional service.
it is the classic find a problem to use our solution, or XY problem
As an example I have seen many times people attempt to find a reason to use k8s because the industry says they should instead of looking at what they need to do and then determining if k8s is the best for that application
Our reason was pretty much "clients want to use it". One migrated to it for no good reason whatsoever aside from senior dev (that also owned part of the company) wanted to play with new toys. Other one halfway decided that their admins don't really want to start k8s cluster and just told us to deploy resulting app (which REALLY didn't need k8s anyway) on docker.
Maybe they’re looking for an excuse to gain k8s experience to bolster their resume? If most startups fail, might as well gain some skills out of the current one? Perhaps it doesn’t benefit the startup though, inflating complexity, infra spend, and slowing productivity.
I always figured it was the other way around. When you're small it's pretty easy to get by with a stupidly simple solution but as you grow you end up needing to spend much more to build something scalable and at that point, using the cloud makes sense. The biggest success that cloud providers have had is convincing users that they need to spend $100k and that a much simpler $5k solution that's built using off the shelf components just won't cut it.
I see the cloud mostly for startup-ish companies hoping to grow rapidly but which want to avoid large upfront expenses to be ready for said growth.
A stable company where growth as a percentage isn't likely to be significant can run things cheaper on their own in most cases. At least if you consider the cost of the inevitable departure from the cloud provider either to switch another or to go on-prem. And if you aren't willing to make that exit, you can guarantee your cloud provider won't stop cranking up the fees until the threat of you leaving surfaces.
I think this is a pretty key point. If a business is going through any kind of rapid change, cloud providers offer a lot of off-the-shelf help for that, be it ability to scale, hosted infrastructure, or PoPs in new geographies. If the company is relatively static with easily predictable future requirements, you can get a lot more bang-for-your-buck by handling things on your own and developing your own in-house expertise.
There is also a third approach that is the best if you have a predictible base load with surges sometimes imo: hybrid cloud
You basically run the base load in your own data center and the surges go to the cloud. My university is evaluating this because sometimes you have multiple labs that need a lot of compute resources at the same time and local compute cluster has finite capacity.
It's not though. With your own stuff you have at least one DC sitting idle, with all that private gear doing nothing. Doesn't matter if you don't use a single byte of bandwidth. With AWS at least some of that is not there.
If you're set up for HA you're paying for the idle hardware either way, and if you save on electricity that might benefit the DC option but not the cloud option. Overall not much difference there.
Bandwidth is the one thing where the cloud clearly wins wrt idle servers... except that DC bandwidth is a hundred times cheaper than AWS bandwidth, so you should prefer buying 133% or 150% or 200% DC bandwidth by a mile.
Whether you are paying for HA depends on your Recovery Time Objective (RTO). You can have a bunch of suspended EC2 instances and non EC2 resources where you only pay per use in another region.
You can redirect traffic to another region and have autoscaling spin up EC2 instances, etc.
Sure, if you can wait for it to load from unallocated resources (and risk failure) then it's a very different scenario.
But, very notably, you can have a suspended cloud backup even if your main servers aren't cloud. And the added complexity for datacenter-to-cloud HA doesn't have to be significantly higher than the cloud-to-cloud version.
Entirely depends on use case. If you "just" need a lot - a lot of storage, bandwidth, CPU power - going on prem is way cheaper when you get up to "few racks of servers".
If you complicated your architecture enough - and the cloud makes it oh so easy to make rube goldberg architecture - keeping many different services running or even developing in-house can take a lot.
And it's not like cloud costs you zero in ops work either, just need different set of skills.
But it is not like on-prem stagnated - there is plenty automation in that space too. Our team of 3 manages 7 racks of servers and few dozen projects on them (anything from "very legacy" to 30+ node k8s cluster and ceph storage with it) and the hardware management still isn't majority of our work
We did very similar with a Java stack without even trying really. Competitors using things like Ruby and went all in on distributed messes had hundreds of servers but we had about 15. It does require you to be aware of performance, but I wouldn't call it difficult or particularly time consuming.
I am gonna go out on a limb and say that given that they're talking about replication they mean server rack which definitely is not $100k/mo but can pretty easily be $100k up-front.