Considering the cost of the infra it would take to replicate some of these servi...

phpisthebest · on April 17, 2023

>>A single server can cost over 100K. Network gear can costs way more.

"Can" sure... likely to for most orgs no. Certainly not for someone running a 5 figure cloud bill...

as a reference, our standard compute server we use on prem is $12,000 including a 5 year support contract.

> At a certain point, you might outgrow someones else's cloud and build your own,

If your buying $100K servers, and $100K network devices, I pretty sure you are at the point

baq · on April 17, 2023

if you're buying either, you're paying millions in salaries to technicians and sysadmins and DBAs^W^W^Wdevops and SREs.

i've been on both sides of the fence and it's a case of 'grass is always greener on the other side'. the truth is, running any sort of non-trivial infra is ducking expensive.

bcrosby95 · on April 17, 2023

This is what always made me chuckle a bit. I have a half dozen friends who were sysadmins 20 years ago and today they're "DevOps".

Precisely zero have been put out of work.

throwawayapples · on April 17, 2023

Yep. It's not like either sysadmin or devops are harder than or easier than the other... they're just different.

Actually, seems like managing k8s is an order of time expenditure greater than managing an old-school F5 with a bunch of Unixy web servers behind it.

SoftTalker · on April 17, 2023

Yes, time and complexity.

Managing a switch and servers is a piece of cake compared to managing k8s, IMO.

aeyes · on April 17, 2023

Depends on how much abstraction you have, I have seen big companies where deploying code is basically like using Heroku. As an engineer responsible for a couple of services you don't need to know or care if this code is running on bare metal, Mesos, K8s and you care even less about the data center.

I come from this old world of managing switches and servers and today we definitely need a lot less people to run code in production. I used to work at a company with ~2000 machines in physical data centers before containerization, this required a huge infra team - I'm sure that today I could support the same workloads with half the team.

primax · on April 17, 2023

When I was a sysadmin it was definitely harder.

Devops gives the benefit of cows instead of pets, and a ton of reusable work. So I get why it's more 'valuable' from a remuneration basis.

throwawayapples · on April 18, 2023

Pets aren't that bad if your pets are a few elephants.

primax · on April 22, 2023

Yeah but how often are they elephants, and how often are they cattle that have just been neglected for 15 years?

zikduruqe · on April 18, 2023

>seems like managing k8s is an order of time expenditure greater than managing an old-school F5...

or Nagios...

dimitrios1 · on April 17, 2023

Having worked half my career at places with their own data centers and self ran infra, and the other half with mostly cloud based solutions, I have a theory.

Perhaps we are designing far more complicated solutions now to leverage these cloud services, whereas having the constraints of a self operated data center and infrastructure necessitates more ingenuity to achieve similar results.

We used to do so much more with just a few pieces of infrastructure, like our RDBMS's, as one example. It was amazing to me how many scenarios we solved with just a couple of solid vertically scaled database servers with active-active failovers, Redis, an on-prem load balancer, and some webservers (later, self hosted containerization software). We used to design for as few infrastructure pieces as possible, now it seems like that is rarely a constraint people have in their minds anymore.

notyourwork · on April 17, 2023

Amen, I'm becoming an old grumpy engineer on my team for constantly asking why we need yet another <insert cloud technology here>. I'm not against new technology but I am against not considering what we have and how it may already solve the problem without adding wider breadth to our operational surface area. And it's every single damn year now because now cloud providers string their own cloud primitives together to form new cloud services.

datavirtue · on April 17, 2023

Could have just called an API but instead we fired an SNS event. Sigh.

notyourwork · on April 17, 2023

How many times I've had this discussion. Let's publish a notification, and let's have the message receiver call some API. Why not just call the API from the place where you want to publish the message? Because we need this SNS message queue.

jquast · on April 18, 2023

Probably because the API can be unreachable, timeout, etc — with a message queues it can be redelivered without permanently dropping customer data or whatever with only a stack trace to remember it by

notyourwork · on April 18, 2023

That's naive without any context to claim. You have to know the source that triggers the code to publish the message, what the message is for, the fault tolerance and availability of the API we're calling before you can even begin to decide. Which you validated perfectly by giving a snarky "what about redundancy" answer to a complicated question.

ilyt · on April 18, 2023

> Perhaps we are designing far more complicated solutions now to leverage these cloud services, whereas having the constraints of a self operated data center and infrastructure necessitates more ingenuity to achieve similar results.

Nothing about "ingenuity", just plainly having some friction in implementation makes for simpler designs.

If you have zero cost (aside from per request pricing but that's not your problem right now, that's management) to add a message queue to talk between components, now that's a great reason to try that message queue or event sourcing architecture you've read about.

And it works so "elegantly", just throw more stuff on queue instead of having more localized communications. We don't worry about scaling, cloud worries about it(now bill for that queue starts to ramp up but that's just fraction of dev salary, we saved like a 2 weeks of coding thanks for that! Except that fraction adds up every month...).

Repeat for next 10 cloud APIs and you're paying at every move, even for stuff like "having a machine behind NAT". And if something doesn't work can't debug any of it.

Meanwhile if adding a bunch of queue servers would take few days for ops to sort monitoring and backups on it, eh, we don't really need it, some pubsub on redis or postgresql we already have can handle stuff that needs it, and rest can just stay in DB. This and that can just talk directly as they don't really need to share anything else on queue, we just used queue to not fuck with security rules every time service needed to talk to additional service.

phpisthebest · on April 17, 2023

it is the classic find a problem to use our solution, or XY problem

As an example I have seen many times people attempt to find a reason to use k8s because the industry says they should instead of looking at what they need to do and then determining if k8s is the best for that application

ilyt · on April 18, 2023

Our reason was pretty much "clients want to use it". One migrated to it for no good reason whatsoever aside from senior dev (that also owned part of the company) wanted to play with new toys. Other one halfway decided that their admins don't really want to start k8s cluster and just told us to deploy resulting app (which REALLY didn't need k8s anyway) on docker.

chubs · on April 17, 2023

Maybe they’re looking for an excuse to gain k8s experience to bolster their resume? If most startups fail, might as well gain some skills out of the current one? Perhaps it doesn’t benefit the startup though, inflating complexity, infra spend, and slowing productivity.

scarmig · on April 17, 2023

And as far as problems go, "we're so successful that it makes financial sense to build our own on-prem cloud" is a pretty good problem to have.

xmprt · on April 17, 2023

I always figured it was the other way around. When you're small it's pretty easy to get by with a stupidly simple solution but as you grow you end up needing to spend much more to build something scalable and at that point, using the cloud makes sense. The biggest success that cloud providers have had is convincing users that they need to spend $100k and that a much simpler $5k solution that's built using off the shelf components just won't cut it.

HardlyCurious · on April 17, 2023

I see the cloud mostly for startup-ish companies hoping to grow rapidly but which want to avoid large upfront expenses to be ready for said growth.

A stable company where growth as a percentage isn't likely to be significant can run things cheaper on their own in most cases. At least if you consider the cost of the inevitable departure from the cloud provider either to switch another or to go on-prem. And if you aren't willing to make that exit, you can guarantee your cloud provider won't stop cranking up the fees until the threat of you leaving surfaces.

scarmig · on April 17, 2023

I think this is a pretty key point. If a business is going through any kind of rapid change, cloud providers offer a lot of off-the-shelf help for that, be it ability to scale, hosted infrastructure, or PoPs in new geographies. If the company is relatively static with easily predictable future requirements, you can get a lot more bang-for-your-buck by handling things on your own and developing your own in-house expertise.

icefo · on April 17, 2023

There is also a third approach that is the best if you have a predictible base load with surges sometimes imo: hybrid cloud

You basically run the base load in your own data center and the surges go to the cloud. My university is evaluating this because sometimes you have multiple labs that need a lot of compute resources at the same time and local compute cluster has finite capacity.

mlyle · on April 18, 2023

Time to market and avoiding NRE is great. Margin doesn't matter in the beginning.

But hopefully you don't get trapped in the cloud and can claw the margin back.

foobarian · on April 17, 2023

The most painful is having to run multiple data centers for HA. Double or triple the price right there.

traceroute66 · on April 17, 2023

> The most painful is having to run multiple data centers for HA. Double or triple the price right there.

Ok, let's make one thing patently clear: ITS THE SAME IN THE CLOUD

All the cloud vendors will tell you need to have stuff replicated in multiple "Availability Zones" or "Regions".

And yup, the nickle & diming nature of the cloud means that's going to cost you double or triple.

foobarian · on April 18, 2023

It's not though. With your own stuff you have at least one DC sitting idle, with all that private gear doing nothing. Doesn't matter if you don't use a single byte of bandwidth. With AWS at least some of that is not there.

Dylan16807 · on April 18, 2023

If you're set up for HA you're paying for the idle hardware either way, and if you save on electricity that might benefit the DC option but not the cloud option. Overall not much difference there.

Bandwidth is the one thing where the cloud clearly wins wrt idle servers... except that DC bandwidth is a hundred times cheaper than AWS bandwidth, so you should prefer buying 133% or 150% or 200% DC bandwidth by a mile.

scarface74 · on April 18, 2023

Whether you are paying for HA depends on your Recovery Time Objective (RTO). You can have a bunch of suspended EC2 instances and non EC2 resources where you only pay per use in another region.

You can redirect traffic to another region and have autoscaling spin up EC2 instances, etc.

Dylan16807 · on April 18, 2023

Sure, if you can wait for it to load from unallocated resources (and risk failure) then it's a very different scenario.

But, very notably, you can have a suspended cloud backup even if your main servers aren't cloud. And the added complexity for datacenter-to-cloud HA doesn't have to be significantly higher than the cloud-to-cloud version.

ilyt · on April 18, 2023

Entirely depends on use case. If you "just" need a lot - a lot of storage, bandwidth, CPU power - going on prem is way cheaper when you get up to "few racks of servers".

If you complicated your architecture enough - and the cloud makes it oh so easy to make rube goldberg architecture - keeping many different services running or even developing in-house can take a lot.

And it's not like cloud costs you zero in ops work either, just need different set of skills.

But it is not like on-prem stagnated - there is plenty automation in that space too. Our team of 3 manages 7 racks of servers and few dozen projects on them (anything from "very legacy" to 30+ node k8s cluster and ceph storage with it) and the hardware management still isn't majority of our work

jbverschoor · on April 17, 2023

A single server which can probably serve your complete customer base could / might cost less than 5k

nathancahill · on April 17, 2023

As I recall, StackOverflow runs (ran?) on 6 large but not massive servers.

ctvo · on April 17, 2023

They also used the dotnet stack (Windows Server, IIS, MSSQL, dotnet), and optimized the heck out of everything. They're not the typical use case.

(I'm not saying dotnet allowed them to get by on N low digit servers. I'm saying those folks are atypical)

ilyt · on April 18, 2023

So assuming your code is 5 times worse that still fits within one rack

> I'm saying those folks are atypical)

Calling "write not shit code in language that's not dog slow" "atypical" is sad state of our industry.

Also in many cases you can get a lot out of caching if you do it smartly.

bcrosby95 · on April 17, 2023

We did very similar with a Java stack without even trying really. Competitors using things like Ruby and went all in on distributed messes had hundreds of servers but we had about 15. It does require you to be aware of performance, but I wouldn't call it difficult or particularly time consuming.

l33t233372 · on April 17, 2023

What are you saying by pointing out that they use the dotnet stack?

It’s interesting trivia to be sure, but I’m wondering if you were making a point with that

drchickensalad · on April 17, 2023

Have you seen the performance of standard ruby and python web frameworks in comparison? It's a massive difference

Spivak · on April 17, 2023

I am gonna go out on a limb and say that given that they're talking about replication they mean server rack which definitely is not $100k/mo but can pretty easily be $100k up-front.

jbverschoor · on April 18, 2023

Sure, but 100k buys you a lot of iron and silicon.

birdyrooster · on April 17, 2023

Add in redundancy and that number quadruples bc you need to manage shared state and 2-3x the hardware

ilyt · on April 18, 2023

> A single server can cost over 100K. Network gear can costs way more. The fact that you don't have to make those investments is the allure.

IF you're pushing 400+ Gbit, sure. Most won't.