Hacker News new | past | comments | ask | show | jobs | submit login
The cost of cloud (ptribble.blogspot.com)
134 points by harryruhr on Dec 25, 2021 | hide | past | favorite | 130 comments



I have a rule that is simple, effective but also quite rude: if you can't deliver and maintain a 500 instances large infrastructure, same uptime and all, at half the cost of AWS by yourself (1 person) in 3 months using only open source solutions basically you should not have an opinion about this. You are just rationalizing your incompetence on this particular subject. Sorry to be this blunt but I am simply tired of listenting to people who can't do it explain that what I do every day can't or should not be done.


I don't consider this rude or blunt, but rather incomplete as I really am not sure what points are frustrating for you or what you would hope someone takes away from it -- I'm an outside observer on the cloud subject as I have seen huge debates over use of the public cloud internally at my company and also with client companies.

I've seen the billing costs of the public cloud absolutely demolish an IT org's yearly budget in a month because of unexpected cost upticks, and I've seen a reduction in total cost of ownership by reducing needed licensing/staff/building costs. I get both sides on what the public cloud can do.

I've also seen what you can do on-premises; I've worked with clients who manage 7000+ machines (mostly virtual + some physical) with a team of 4 using pretty reasonably priced on-site hardware. (pro-tip, I guess Hitachi boxes are absurdly great servers with fantastic uptime, pockmarked only by an absolutely horrendous UI to manage)

My experience from the many clients I work with is that it is less about the specific stack you settle on and more your comfort level in getting the most efficiency out of it. The deeper and more intimate you are with all levels of your infrastructure, the better you know how to eke out the most from every single $.01 you spend on it.


When I'm frustrated I'm not exactly clear in my writing.

You need to be able to do both options before having an opinion on which is appropriate in which case. I am suprised to have to state this. But in my experience people argue one option a lot without being to deliver the other.

People who know bare metal are rare these days from the total of available infrastructure engineers (call them sysadmins, devops, etc). I guess this justifies companies looking at cloud a little bit. But if you really search you can find engineers sub 100k per year being able to deliver 100k per month savings compared to AWS.

There are also engineers who stayed away from cloud and can't deliver that option. A lot more rare though. The same level of wrong if they argue against cloud from ignorance.

The right choice for serious infrastructures is always both these days. Have the bulk on premise for steady loads and 95% of features, expand to public clouds for dynamic scaling and features you don't want do do yourself, at least yet. This combination offers good costs, flexibility, covers possible future needs, etc


> People who know bare metal are rare these days from the total of available infrastructure engineers

Sysadmins are not rare they're just not the people you hear about in Silicon Valley bubble anymore. 90+% of businesses haven't moved to the "cloud" (i.e. whoever the fuck's computer you can't get your hands on in case of problems) and even if they wanted to it would make no sense: most businesses just need a basic website and an email/accounting service. Cloud abstractions provide much complexity and zero benefits for such usecases.

> But in my experience people argue one option a lot without being to deliver the other.

I'm in this box. I can't deliver "cloud" computing and from a political perspective i refuse to "learn". Also, it makes no sense for the non-profit projects i work with: the biggest ones need at most a few servers which is still manageable by hand and certainly easier to deal with via Ansible/Chef than via new layers of abstractions and all their new failure modes (eg k8s/AWS).


> most businesses just need a basic website and an email/accounting

I think those businesses should definitely go to the Cloud - but not IaaS. Use Microsoft 365 or Google Workspace for the email needs and a Website-as-a-service vendor, whether that’s wordpress.com or Webflow.


I'm not saying you're wrong, but why do you call that the cloud again? Mutualized hosting is what we've been doing since "forever".

The part where i disagree: don't go with Microsoft or Google, they're the worst. They've got less-than-stellar service, abysmal support, and they're capitalist assholes. Go with a local tech coop or non-profit (or even just a local tech artisan for-profit company) with friendly support.

I think it's been said in many other threads, but it's always worth repeating: by using Microsoft/Google email services, you make it impossible for others to use a solution of their choice because they will be blocked despite having perfect server configuration.


Thank you for sharing this, it helps clear up the concern you had a lot.

I deal with some big German clients fairly frequently and one of the requirements is "[they] own the entire stack top to bottom, back to front." A lot of dark site operations I work on also share a similar requirement, so really it's why I'm far more open and comfortable with an all on premises situation since I see the scaling done without any public cloud.

From what I do work with on public cloud sure, I absolutely get why it is so easy to scale if you don't already have a good team to build and orchestrate a local set up. I also see some big name companies I contract with just throw money onto a fire fueled by Azure, and while the expenditure hurts sure, it's still considered acceptable.

I guess I probed because I see a lot of different sides of modern architecture and aside from a well documented and disciplined one, I'm not sure there's a right thing with modern architecture, just different comfort zones with different efficiencies.


There are few things we want to do on-premise any more. The main problem of on-premise, and benefit of cloud, is that we can add new capacity at a moments notice. You never have to wonder if you’ll need to add more capacity (with two month lead times) to provision a database.

Now you could say that infra teams that do not anticipate such a need are less than ideal, and I’d agree with you, but I haven’t been part of them and I imagine they have their own issues to deal with.

Cloud (as a dev) makes me not worry about infra teams, since they’re not our problem (beyond the ones managing the cloud environment).


There are certainly companies like yours that are perhaps Web product driven and need flexible scalability, but there are many out there that have little requirements for such scaling, at least unexpectedly, which will run perfectly fine with on-premises virtualisation.

The poster above is right, both have their purpose, but those sold on cloud as the complete solution are kidding themselves in most cases, Happy to accept crazy cloud cost blow-outs above over-provisioning tin or thinking properly about the use-cases.

It honestly sounds like you don't care about efficiency because of either good inflows or a need to move extremely fast. Such is the appeal of cloud...


You are actually right. Thinking back to other companies I’ve worked for, only the last two had any need for cloud, the others had a more or less stable workload that was ideal for on-premise. Also all between 5-200 employees, I wonder if that matters.


Sub 100k engineers are a fiction. Sure, you could get somebody on staff for under 100k salary, but it's not necessarily going to be someone competent.

But even aside from that: OK, you found somebody who agrees to work for 90k. What about social security tax? Group health coverage? Workman's comp insurance? HR support? Payroll? Risk of lawsuits if someone hurts them/their feelings?


Thing is: many AWS customers have a sub-100k/mo bill. Savings from this sub-100k person will be relatively lower.

On top of that, for small/mid-sized companies, it's difficult to avoid "employee-lockin". It's perceived as a minor risk to have a vendor lock-in. Unfortunately, often they turn out to be right.


The cost of three person months for a reasonably competent devops person is probably close to 50000$. Maybe a bit less for a cocky junior one that will make a mess and a bit more if you pay premium freelance rates for somebody less likely to botch the job. That pays for a lot of infrastructure. Not counting your own cost is a rookie mistake. And not realizing you really need 4-6 of these people to be able to get to your five nines is the second mistake (you need people on call 24x7 and when they are sick, over Christmas, etc). So, the real cost would be closer to 1M/year. Just staffing to babysit stuff you build manually ... or you pay Amazon, Google, etc. and you just worry about your own application not crashing. That's why this is so popular.

Few companies actually need that many instances. The math for the less than 10-20 instances the vast majority of companies actually need is quite brutal. A day of your time basically pays for months/years of hosting. The thing to optimize is devops time. Not hosting cost. It's by far the most expensive thing and also the most likely thing to fail on you (by leaving, by being incompetent, negligent, lazy, sick, etc.) and also the hardest thing to source when you need more of it. Good devops people are scarce.

I've dealt with plenty of companies that had no more than two or three idling t2 instances paying for multiple devops people to babysit that "infrastructure". It's stupid and wasteful. A decent devops person costs about 0.5-1 instance year (i.e. a full year of hosting 24x7) per hour for such small instances. And scaling an instance group from 2 to 500 instances is a 1 minute job if you ever need to. Unless the savings are enormous, the time they spend on minimizing the number of instances or automating their deployment will never be worth the money. It's money down the drain. You need to think in terms of a few hours for getting stuff done to make it worth the cost. Anything more is probably too expensive.


> And not realizing you really need 4-6 of these people to be able to get to your five nines is the second mistake (you need people on call 24x7 and when they are sick, over Christmas, etc).

If you need that kind of availability, you need to have people on call anyway to babysit your app. A good infrastructure (unless built to the minimal price point) will handle nearly all cases of hardware failure automatically, without someone having to wake up, so it's not likely to put additional load on those people.

I'm not necessarily disagreeing with your overall point, but if you need five nines, you're talking about an entirely different league of infrastructure compared to people who need two or three VMs that could also be handled by a NUC somewhere in the office (which will amortize itself against AWS in a few months).


I maintained a datacenter with approx 1000 hypervisors with a very small team and took few weeks to start having production workloads. The effort to maintain hardware was quite little and it was hugely cheaper than any cloud service.

Having said that, your requirement is pretty absurd. Billions of people choose to own and maintain houses and cars and cook their own food because it's cheaper than the alternatives. Nobody expects them to be professional mechanics or cooks.


> Nobody expects them to be professional mechanics or cooks.

To be fair, there’s also nobody that would listen to them over a professional mechanic or cook.


And they wouldn't open a commercial restaurant on top of their DIY "home food infrastructure".


Airbnb? :)

You’ve got a very wide variety of home professionals, especially looking at YouTubers, some being fantastic, while others …not so much.


Which 500 instances? 500 ec2 “large” is like half of a rack… or is the point to engage opponent in unwinnable argument?


I think i can beat that argument pretty easy, 5 freebsd hosts with each 100 running jails...it's cheaper, uptime is something to discuss, but the data is not on someone elses computer.

The real cost of cloud is that nearly anyone think it's impossible to setup a infra for yourself....loosing systemadmin as a role in companys probably the biggest loss.


Not "anyone", but probably most newcomers to the industry. Since they are simply not exposed to the non-cloud ways of setting up infra.

On ever AWS/cloud post here the first comment is usually rent dedicated boxes from Hetzner (or whomever) and you can cut your costs. (And especially now with k8s it is really really easy to have something sane on bare-ish metal.)

But at the same time what "cloud" gives to people is 20+ PoPs around the world. Basically giant hosting companies, with an endless list of bells and whistles.


What’s interesting is in the past 5-10y the technology to run your own smaller dc (on the order of few dozens of racks) had become extremely commoditized. You can buy 100G switches for pennies now and every piece of software has a high quality open source version down to bmc level. I believe we’ll see a reverse trend for established SV companies in the next 10 years


But at the same time "established SV companies" are already paying so much for labor costs that probably they don't want to hire anyone to run a DC for them.

(Maybe they'll really start hiring remotely. Maybe not.)


Just task their existing staff to “run the dc”. Way easier to figure this out than grinding leetcodes if you ask me. Also you can rent pretty much everything in that chain these days so your existing “infra engs” can manage soft layer while you outsource all the management of hardware and below.


Assuming the 500 instances are something like m5.4xlarge, the on-demand cost would be $3M. You save something on reserved instances, but have other excessive costs like egress, so that should be the right order of magnitude.

So why do you limit a project that's supposed to save $1M+ to a cost of $50k or so?


I've learned a lot from managing school networks for public schools. Thousands of users runnung on either re-purposed or gifted hardware from 8+ years ago.

In 2013 I was running a school with 500 students and 70 teachers on a 8GB RAM HP Server that was built in 2005 and had no problems other than disk speeds for network transfers.

The same setup in the cloud would have been much more expensive but then again I had/have access to unlimited Microsoft product licenses because of the MS-ACH agreement so take that with a grain of salt. They even give every public school in the country their own unlimited KMS host key.


Hiring your own security guard is cheaper than paying an outsourcing firm.

Hiring your own janitor is cheaper than paying an outsourcing firm.

Building your own office is cheaper than renting one.

Doing your taxes with pen and paper is cheaper than paying turbo tax.

Making your own food is cheaper than eating out. Hiring a cook directly is cheaper than hiring sudexo.

I could keep going. But sometimes it’s not just cost. The biggest two values you get with AWS is 1) reducing time spent outside your business’s core competencies and 2) a vast ecosystem - 3rd party offerings, readily available devs, consulting services, and compliance services.

I’d add that for those having compliance needs. It’s not always as simple as rack and stacking infra. You have to use services that meet the compliance auditors needs.


When people talk about the cost of cloud, there’s some assumptions we need to state:

A. Does your workload fully utilize 100% of the capacity of the resource? If not, then cloud would be cheaper. Just like if you only need office space for a few people, it’s not cost effective to buy an entire office building. If you only need server with a few gigs of RAM, it’s not cost effective to buy (own) an entire physical server.

B. If you are going to fully utilize a resource and don’t want to purchase/own it - then a service provider needs to provide that asset to you around cost and make margin from the efficiency from scale they have. Example, it’s actually more expensive for me to buy all of the ingredients to make a hamburger than to simply buy a fully prepared hamburger for McDonalds. McDonald’s is able to provide this due to their scale.

What I’ve seen is that when you’re in Group B, many people are finding that AWS/etc is way more expensive. Essentially, their scale in efficiency is not being passed down to the customer in cost savings. And the sizable cost premium is not worth the value received in return.

I’ll give a good example of where this does make sense, and that’s Hetzner or OVH. Their scale allows them to procure & host dedicated servers at a price I’d be difficult to match doing it myself. Or even if I could beat their price, it would be minimally. But folks are finding that with AWS/etc, that premium is extreme and that’s where the equation is unbalanced for folks.


McDonald's is a pretty good example. For one, cheap burger they can provide it to you for cheaper. As soon as you start asking for quantity or quality though, you quickly realize you can make it yourself for cheaper, not counting your time.

At that point, the question becomes how much is your time worth, or in this analogy, are you ready to hire a professional chef to get better quality food?


There are also a lot of people who think they are in group A, move to the cloud and find out they are much closer to group B.


Not really. As far as I can tell, the core value of AWS is letting accountant shift CAPEX to OPEX.

The technical considerations are a distant second or third place.


I choose to host on Vercel because I can spin up a production-ready, globally distributed web app in about 5 minutes with marginal, predictable costs without hiring a single other person, procuring any hardware nor learning anything beyond what’s necessary for coding my application. Pair it with analogous services like Upstash and Fly.io for persistence, and you can achieve incredible scale with minimal operational burden. Obviously this depends on your workload - for mine I can imagine this would cover the majority of use cases for the lifetime of my company. And there are many companies like mine.


At a glance, Vercel's pricing looks unbelievably expensive. $550/TB traffic, and $60k/yr for a 128MB function running at 100% utilization. What's the point of scalability, if you can't afford it to scale above the size of a small vserver? I'd have nightmares about a small DDoS attack costing me millions running on infrastructure like that.

What does it offer compared to other serverless offerings (aws lambda, google cloud run) to justify this cost?


I think you're misunderstanding Vercel.

You host static, or slightly dynamic (calling APIs from the front-end) websites. Serverless functions are a bonus to use occasionally. If you're using a serverless function at 100%, you're doing something terribly wrong.

As for DDoSes, such providers are genuinely okay with waiving bills from serious mistakes or DDoSes (besides also having anti-DDoS services for "free" and transparently, so must DDoSes won't even show up on your bill).


Exactly, you’re incentivized to make your website as static as possible, and I attach standard http cache headers to most of the server rendered stuff so that their responses get cached in Vercel’s CDN, and once again not invoked super often.


1. But you still pay $550/TB for traffic, right? That's 5-10x the cost of AWS, which is already very expensive.

2. What advantages does Vercel offer?


1. My workload is an early stage enterprise SaaS where traffic is not the limiting factor for our growth. If you’re planning to push a lot of bandwidth you probably want to use something else.

2. Like I said, it’s that I don’t have to spend even a minute thinking about how I’m going to deploy my app. It just listens on our git repo and runs the NPM standard build and start commands to run the app, so I don’t need to do any vendor specific configuration. We use NextJS as our web framework, so we just write pure web frontend/backend code and automatically everything’s hooked up so that it’s served with serverless infra (so I don’t have to care about scaling or machine resources ever), with a global CDN that caches the API responses we return by just attaching a Cache-Control header, which is very transparent. On top of that Vercel instruments deploys for all of our git branches so that I can see what my teammates do directly in their PRs, once again with no configuration. And if the pricing becomes an issue, all our code is just following web standards and next to no vendor-specific code exists in the app, so I can move off it any time, but really I don’t see that happening even if our SaaS 100x’d in size (which is the aim).

I really have trouble seeing how we can do less work on nor get less locked into a specific infra this way. I’m sure for resource intensive workloads it’s not ideal, but for ours, optimizing for resource efficiency by running our own stack of servers is a case of YAGNI; the simplicity of the DX is totally in our team’s favor.

Not really sure why this argument wouldn’t make sense by now, Heroku has always been expensive and yet it always has been popular since it’s so much simpler than dealing with the choice paralysis and complexity of either using the full AWS system and of running your own servers.


Rent colo space, rent transit, rent equipment (which you should absolutely do below gigantic footprint). Boom - your physical dc is now an opex. Still cheaper than public cloud by an order of magnitude on certain workloads (egress heavy, gpus, etc)


Could you elaborate how this is an advantage?

In my previous role, my manager argued for my work as a developer to be charged as Capex to the project instead of Opex.

Why would accountants want the opposite for AWS?


OPEX in the US is tax deductible for the current financial year; a lot easier to calculate and maintain. It’s basically the cost to run the company, taking away from revenue.

CAPEX items are amortized over multiple fiscal cycles; it’ll count and can helps raise the value(valuation) of your company, but tricker to calculate.

So depending what financial number goals your company has, the accounting of items can go one way or the other.

If your dev work brought permanent value to the company, then it can be capex. If you were a contractor instead, it could be either cap or opex. AWS services are basically rented and not permanent value to the company, ie, if you sold everything the company owned for cash, you couldn’t sell the AWS part, just the terraform scripts.


For business perspective, it's large irreversible upfront investment on capex vs ongoing opex. Sizing and building a data center is risky, execs not wanting to attach their names to $xxM data center project.

Cost wise Capex are depreciated, this gives less visibility on month on month costs compared with opex which goes onto income statement.

Developer work as capex is intangible asset, which has a bit more 'flexibility' depending on what management wants. It can operate on the same idea, developer wages for 1 year spread over several years via amortization of intangible asset.


Depreciation of assets also goes on P&L so no difference on periodic visibility. Also, while not capex, prepayments/reservations for cloud services are in fact assets/liabilities so yes opex vs capex is a good high level distinction but not 100% the essence.


The capex vs opex argument in cloud is more about having better transparency on your infra costs on a monthly/quarterly/yearly basis. With DCs you need to make large upfront purchases for todays and the next 3 years needs. If you under estimate you’ll be unable to grow further. If you over estimate you’ll be stuck with a bunch of unused infra.

Now say you want to spin up a new feature/product. Can you accurately forecast the compute needs? How difficult would it be to get a large capex PO out through your internal orgs on the unreleased non-revenue functionality.

Compare that to cloud where you pay for what you need. Calculating marginal cost is much easier, and securing budget on an ongoing basis to pay for clouds opex is also much easier, as you can easily show to finance the profit margins.


As the other commentor has mentioned, much easier for management to manage costs and assign to cost centers or buckets. It's a big difference to track consumption monthly based on actual instead of straight line depre, eg. depreciation hits even if everything turned off


if the project is in R&D phase, we call it "captalization" . other than that, the regular employee is generally opex and sent to profit and loss


Couldn't you do this by leasing hardware before AWS?


It's still about cost.

(1) opportunity cost: doing your own systems administration instead of growing your business (features, marketing, etc).

(2) switching cost: once you have a working system that outgrows the free / cheap AWS tiers, and might be cheaper to run outside cloud, switching away from the cloud becomes expensive, and does not look like a good investment to many, see (1).


Opportunity cost is everything. Staffing for a small startup, after funding, is a huge bottleneck. Every second you spend managing an install of Rabbitmq, be it on a VM or on-prem, is time spent not working on the app, or anything else that's drastically more important.


And every dollar you bleed on expensive AWS (and expensive devops engineers to wrangle it), is a dollar less you can pay for a new employee to deliver value quickly and get you profitable (or aquired).

So what's the balance?


Startups spend way more on staffing than cloud. generally the cloud Premium is less than the cost of a single employee for an early stage startup.

As the startup scales it’s about velocity of product. Spending 10% your time to save 20% on cost is a bad strategy. Instead startups should take that 10% time and invest it into their core product. This would lead to an accelerated timeframe for raising their next funding, which will be much larger than any cost savings. Eventually in the future once growth slows down you can focus on costs to improve profit.

*you still want to ensure your costs grow slower than top line rev.


Personally I think run on-prem till you outgrow a server in the office closet, Digital Ocean/Linode till you run out of features, then AWS/GCP when you start needing to scale to handle hockey-stick user growth. Then again, you'll need an expensive devops engineer to manage those redeployments seamlessly, so have you really gained anything there?

How much funding do you have? A couple hundred thousand is different from a couple million.


I pay my engineers six figures, and my cloud bills don’t top $1000/mo. Optimizing for cloud bills is stupid in my case.


I'm not even sure it's cheaper. At a previous employer we had approximately 150k of cloud expenses per annum. Bringing that in house would have eclipsed the cloud expenses on additional staffing costs alone.


What are you guys doing to have such a low cost?

I have a small GKE cluster and a few databases and I’m well above that, I nearly hit that in a month!

Maybe you don’t have read replicas of your databases? Do you take any traffic?


How much is small? In terms of cpu cores and memmory

Cost in cloud is all about capacity and each cloud has very good tools to see where the money is going. It sounds like either what you consider small is not that or money is being wasted somewhere on things you are not aware of


I guess “small” is open for interpretation.

Let’s say 720vCPU for the cluster and about 2.3TiB of memory.

These would be quite small nodes if you bought them as machines in a datacenter. Most modern machines have 40+vCPU and 128G+ RAM each.

I have 36machines in my GKE nodepool with 20vCPU and 64G of ram. So the aggregate totals sound high but it’s not many. In terms of real machines I could have fewer, like 18 or so.


that compute is much larger then startups i worked for that make millions of dollars/50+ M unique users per month. so obviously not "corporate" sized but definitely not small

Looking at the GKE price calc - N2 machines (which i have no idea if they are cheapest per vcpu/mem) * 7 will give you 896 vcpu and 3584 GB ram. that will cost you 21K per month for a zonal cluster

we can do napkin calculations but that won't help you. if you want to get your bill down you just need to open the billing reports and start slicing data by usage

Edit - i really hope i am not coming off as condescending or anything. I used to work in a startup related to cloud cost optimization and currently as a devops in a cloud env so i know how these costs can get out of hand.


I have definitely looked into getting the costs down, we’re not doing anything truly special.

Making use of best practice costs a lot more than most people expect, interzonal networking is charged, for example; a lot of people also assume that redundancies are built in to things like RDS or CloudSQL, but they’re not, and you should be having replicas.

And of course traffic to databases is interzonal networking.

It adds up, not a lot can be done In Many cases.


idk but "a small GKE cluster and a few databases" doesn't sound like it should cost a million per year.


You can outsource the staffing costs too. This also turns out to be much cheaper due to economies of scale. :)


* understanding what you're actually doing is better than outsourcing your knowledge.

Most of the issues I see from either on-premises or cloud generally come from not actually understanding the business/use cases/environment. Cloud becomes the 'solution' to a problem of people and process, rather than a value proposition that augments existing reality. You can't make good decisions (e.g. Should I outsource my taxes) if you don't understand what doing those thiggs acrually involves (which most people don't bother to even try).

And usually worked into this is a lot of outsourcing of expertise to the point the business relies on third parties to tell them what to do, which is never a cheaper outcome.


What you call "sometimes it's not just cost", actually are costs. Only it's indirect costs that are hard to measure, but these costs definitely exist and definitely have to be taken into account somehow. A large part of hidden costs in corporates is the things you are not doing to save costs because the infrastructure is not flexible enough. I think everyone will recognize this, but it's really hard to put a number on it though.


Not to be dishonest towards the argument that you're making, but most of the examples that you give actually seem fairly straightforward.

I do my own taxes, but maybe that's just easier in Europe and is definitely easier for individuals. That said, there's no reason why a LibreOffice spreadsheet would be an insufficient solution for handling taxes and other things like that.

I also haven't eaten out in years, the closest to that was ordering some Wolt when hanging out with my friends pre-COVID, because they wanted to try some. Apart from that, it's all just home cooked meals for me and that's pretty great. It also seems to be working out great for the folks over at https://www.reddit.com/r/mealprep/top/?t=month

At work, the company that i work for have their own building and have their own support staff as well, which seems to be working out great for them.

Furthermore, there are plenty of on prem resources that are used and despite the disadvantage of lacking self-service in many cases, there's very little difference in configuring and running software for deployments, with something like Ansible and containers. Even moreso when you have to support clients that have their own particular data centers and on prem deployments, which might differ noticeably from public cloud offerings. That's even not thinking about things like compliance in regards to what data can be stored where.

Personally, i also have a homelab with some repurposed old computers with 200 GEs and value RAM, a few HDDs and WireGuard for working around NAT and exposing my sites to the world through a pretty cheap cloud VPS or two from https://www.time4vps.com/?affid=5294 (affiliate link, to make hosting cheaper if anyone else uses them). Of course, when i need 24/7 uptime, i do use their VPSes in a hybrid cloud setup, especially since my blog getting 30k views could be a bit taxing on a residental 4G LTE modem connection.

The argument about competencies, ecosystems, 3rd party offerings, outsourcing and so on is probably a valid for some, but not for me and not for many companies out there - too often you end up depending on SaaS solutions which vendor lock you and might cause you to spend unreasonable amounts of money, or will let you remain ignorant about how to actually manage the software that you're using, i think SaaSS (Service as a Software Substitute) is a relevant term here: https://www.gnu.org/philosophy/who-does-that-server-really-s...

That said, what works for me and even the company that employs me, won't work for others. And what works for others, won't work for me. This is all because of how different the circumstances of various people out there are: i cannot afford AWS, i cannot afford Azure, GCP and managed services for my own needs.

I currently pay 320 EUR for 6 cloud VPSes per year (and additional amounts for the occasional replacement HDD for my homelab), whereas others pay similar amounts for their cloud platforms of choice per month. For them, depending on their circumstances, it might be more cost effective to spend their time working and throw money at problems, whereas for me it's almost always more cost effective to learn the tech myself.

Similarly to how in Latvia you could hire a team of developers for what one developer would cost in the US. Companies have other factors to consider, of course, but this is just one example - the alternative (opportunity) costs of individuals.

Edit: Of course, some in the comments are talking about hundreds of VPSes/VMs/nodes and in my eyes, that's just an order of magnitude or two higher than what i'm talking about. I've seen plenty of companies in my country running their own data centers and there have been relatively few issues with those that i'm aware of. Something like Ansible and container clusters can scale pretty far!

The problems were more often caused by either mismanaged environments/deployments by developers/agencies who just didn't care about shipping sustainable software but cared more about getting paid and making their software someone else's problem, or making mistakes early in the development and not considering load testing and scalability of the systems as priorities. I'd argue that you can do bad engineering anywhere, though, be it on prem or in the cloud.


Managing all of these careers is a big cost though in time and effort


If you run your own datacenter, there is also the opportunity cost of slowing down R&D and new development work.

Let's say the year is 2012 and Redshift is introduced, completely changing how organizations can generate insights from their data. Running your own datacenter? good luck waiting for the ops team to install something similar! It might take you a couple of quarters assuming they are already competent at it. On the Cloud? Press a few buttons and you're off to the races.

velocity is a competitive advantage


What? Vertica exists since 2005. New products/services come and go. If that precious data truly has so much value, then it is already being processed in your inhouse system.

These new services are nice for startups and for eventually outsourcing... Aaaand of course for upselling to folks who are already in the faith.

Velocity is and advantage. Yes, but if you really see a brutally good deal with some new AWS service there's nothing preventing you from using it, DC or no DC.


So what you're saying is on premises you need competence to do things but in cloud incompetence is no problem!

These are arguments that wreak of developers who don't see the need to worry about things like 'cost,' 'reliability' and 'efficiency' because of the need to be 'innovative' and breaking stuff faster to win the market.

I'm sure it's fine so long as the developers are really competent.


Correct me if I'm wrong, but Redshift's value proposition was not that it was the first DWH system, but that it was the first one that could scale up and down and be charged by the minute.


Is this how people really think?


It’s definitely the message cloud marketing has been cramming into people’s brains, and most of them are falling for it. Why am I seeing cloud ads in an airport, and never ads for any other low level technology like Ethernet or LDAP? Because they’re doing an end-run around tech people and targeting CEOs with these completely unrealistic messages.


I think the mistake is often made by comparing primitives. E.g. running my own RAID vs S3. Colo traffic vs AWS traffic.

But what about comparing the whole ecosystem?

Can you provide a self hosted granular access permission to your RAID? How hard is it to configure and maintain?

Will your colo deflect a DDOS attack?

When you run your own services, you have to reinvent so much it doesn’t seem to be worth it.


> Can you provide a self hosted granular access permission to your RAID?

This is the second-level mistake engineers commonly make:

The right questions isn't "Can you do X". Give engineers enough time and resources and they can usually come up with a solution to do X.

The real question is "How much time and resources need to be invested to accomplish X at a satisfactory level?"

And the third-level mistake is to assume that getting something to work once is the finish line. In practice, getting something to work once is just the beginning. Getting it to a maintainable, well-documented, repeatable state is a lot more work.

Cloud services make all of this effort disappear. Type a few commands and it's good to go. Now you can take all of the engineering hours that would have gone into the DIY version and allocate them to working on the company's product instead of reinventing architecture that you could have simply paid for.

Good engineers are scarce and expensive. Using them to reinvent infrastructure that can be trivially purchased for a nominal amount is a terrible move most of the time. Even when it does make sense, the right move is to build the prototypes on AWS and then consider transitioning to self-hosted later if the numbers work out.


Eh, I would argue that any advantage cloud has in ease of configuration is because of the brain drain in good server software caused by the cloud: Spend some time in Microsoft Azure and it becomes instantly sad that all this manpower was pulled off of Windows Server (which has stagnated as a product) and been invested in a proprietary service product that runs on top of Windows Server. And the former will outlive the latter more than likely.


You make a lot of claims without providing evidence.


> Can you provide a self hosted granular access permission to your RAID?

Yes.

> How hard is it to configure and maintain?

Very few things are harder to configure or maintain than they are on a cloud service, because if they were, someone (e.g. you) would get frustrated and make them easier, and then they wouldn't be for anyone else.

> Will your colo deflect a DDOS attack?

Ah yes, S3 can handle serving that many requests and keep everything online. But then don't you get a bill for $72 billion dollars?


I believe both Azure and AWS (probably GCP too) have built in DDOS mitigations for free.

https://docs.aws.amazon.com/waf/latest/developerguide/ddos-s...

You might be on the hook for bandwidth costs from a more sophisticated attack though.


That isn't very specific about how it works ("defends against the most common, frequently occurring network and transport layer DDoS attacks" whatever that means), but it sounds like they're going to drop weird looking packets.

The problem is, one of the more common types of DDoS is that the attacker has a botnet with a million machines in it and has them all make legitimate requests to your service all day, thereby overloading it. This looks just like a large volume of legitimate requests, because it is. S3 or similar isn't going to get overloaded, but then what stops you from getting a bill the size of the moon?

To do otherwise they'd either have to be able to distinguish these from legitimate requests (how?) or give you free traffic when you claim you were under a DDoS that they can't distinguish from a large volume of legitimate traffic (unlikely).


Why is this unlikely? If you do it several times then they will start to get annoyed and say no but a service like AWS is all about the long term customer relations. I've had bills of ~$1k refunded even though I'm a ~$3 p.m. user.


Waiving a bill for a thousand dollars isn't really costing them a thousand dollars because their underlying cost is much lower than that.

Do the math on how much the S3 bill would be if a million bots each with a 100Mbps cable connection would DDoS you for a month. A thousand dollars is too low by how many orders of magnitude?

You might get them to waive that, maybe, or maybe not. Even at their cost they'd never make it back from you. Do you have any guarantee that they will? What happens if they don't? What happens if they do it once, but the attack hasn't ended?


Umm. why wouldn't you run Ceph yourself? It speaks S3. (It has an component called RGW - Rados gateway, completely stateless, scalable, implementats the S3 ACL Policy xml)

And yes, running it has a cost.

But it's also has the advantage that devs can run it locally in docker easily. CI can spin up endless test clusters.

And so on.

Obviously you are right that the right way to compare cloud vs non-cloud is to look at the full picture. And that also means we need the context.

Small/hobby project? Doesn't matter. You can run on your own toaster or on Oracle cloud or on AWS/GCP/Azure. Just do what you want, the costs are negligible.

Operating business with stable well predicted size? Again, do whatever you want. If IT is a big part and costs matter, optimize for cost and run it on a few dedicated boxes. If you are not cost sensitive and you want to be one of the cool kids run it in AWS or whatever. (We have a client that exists for 25+ years, reached its optimal size, does some innovations from time to time, but it is basically a new website or app. The underlying backend is the same, maybe they'll replace it eventually. Probably with a complete SaaS and then they'll only need to host a landing page.)

Large multinational company with more departments than sanity? Again do whatever you want, likely you have bigger problems than the cloud bill or the inability to run one more app in your DC.

"Unicorn" startup? Crunch the numbers, do what makes sense. Everyone knows that "Netflix went full AWS" but maybe not everyone knows that they went full on-prem CDN more with their hundreds/thousand (s) of local caches at ISPs / IXPs.

And so on.


These types of posts rarely measure operational cost. Anybody can buy infrastructure and stick it behind an API. But can you make it fault tolerant with high availability and low latency. Can you do all of that and _still_ beat AWS's costs? For the vast majority of customers the answer is no.

I used to work at a medium sized company, and they saved millions by moving to the cloud, and gained much better availability/performance. It wasn't even close, because that medium-sized company didn't have the expertise to operate the service efficiently. They just bought off the shelf stuff from VMWare etc.. Plus, DR meant paying double.

disclaimer, I now work for AWS.


My availability on-prem is vastly higher than AWS, especially over the past month.


AWS/Cloud have seemed to create this fallacy that dedicated hardware/on-prem fails far more than actual reality.


Well, that's called "advertising and marketing" - persuading engineers to buy their products by doctoring the truth. It's actually easier with rockstar engineers - they never admit their pet approach might be wrong so AWS just needs to sell to them once ;)


Indeed there is an odd element to that, as when it's all said and done, AWS itself is an on-prem infrastructure too.


when looking at these things, you always need to look over the long term. Sure, December wasn't great, but how was all of 2021 overall?

It's also not just cost/availability, but flexibility and scalability. Most high-growth startups would have not been able to scale quickly enough pre-cloud. Facebook is literally the unicorn.


My datacenter has better uptime than AWS across all of 2021. Pretty sure in December alone, Amazon did worse than I've done in the past couple years combined. I have also enjoyed not being affected by various Azure or Microsoft 365-related outages throughout the year.

Scalability is a niche perk of the cloud, absolutely. I work somewhere that has a customer base that does not meaningfully scale, so it's not a concern for me. But then: Once an organization has established and has a relatively predictable scale rate, they should exit the public cloud if they can.

An established company with a stable capacity need shouldn't ever be moving to a cloud provider with a generous big tech profit margin. But that also poses a risk for the scalability of the cloud: If you don't have people overpaying for stable capacity usage, can the cloud provider afford tons of extra capacity for flexible needs?


> An established company with a stable capacity need shouldn't ever be moving to a cloud provider with a generous big tech profit margin.

I’m not a fan of the cloud. But if you’re too small to have your own data center and support team, the cloud is absolutely a good option.


> when looking at these things, you always need to look over the long term. Sure, December wasn't great, but how was all of 2021 overall?

Overall, our 10+ year old infrastructure has hugely outpaced reliability of any of the three main clouds (and brough in more business since we got customers when the main clouds killed most of the internet).

Why do you keep behaving like reliable infrastructure was impossible without paying rent to american bigcorp?

(This doesn't mean cloud doesn't make sense for many people, but, seriously, cut the crap.)


No reason you can't have onsite infra but have cloud as backup for 0.009% time your stuff is down


But can they make bandwidth 1000x more expensive?


Ex Amazon here.

I've been in companies that owned their datacenters and it was much, much cheaper than using any cloud service.

Poorly managed datacenters exist but that's an organization problem. Remove the datacenter and you'll have poorly managed cloud instances and services costing millions.


Not sure where you get “vast majority” from, but as an anecdote I worked for two companies that operated the platform teams at 10%-15% the cost before cloud (including headcount).

However, as I’ve eluded to in other threads of this kind; people don’t like to invest in their own tech, an ideal budget would have been around 15%-20% or cloud spend, we could have solved nearly everyone’s pain with that amount of money.

DR does mean paying more, but just so we’re clear: you have to do DR in cloud too, meaning read replicas of database instances and off-cloud backups which auto restore. If your platform costs more money then this is not a good thing and isn’t just baked into the normal cost.


"My rough estimate is that the unit cost of provisioning a service on AWS is about 3 times that of a competent IT organization providing a similar service in house."

The word competent here is doing a lot of weight lifting.

I know companies in which you have to wait _months_ for a small server to be allocated to your team. AWS does it in seconds


> I know companies in which you have to wait _months_ for a small server to be allocated to your team. AWS does it in seconds

If those companies ever migrated to AWS it would probably still take months to get a small EC2 instance allocated to them. Likely the problem is bureaucracy, not competence.


Well at least then you're only waiting on the bureaucracy and then seconds for AWS instead of beaurocracy and then machine order, setup, provisioning, and re-provisioning time, all with their own bureaucracy time :)

When I worked in Sony games I had the choice between AWS and the IT department. The IT department was fine, just it was a 6 month lead time for the hardware and they had a tendency to optimize servers they didn't understand.

The problem was that they wanted 6 months and a capacity plan. 2 months to get me a test server. I knew server capacity about 2 weeks before launch. I also needed 5 server class machines with which to test against (which was the end cluster number) and the kicker which was 250 load generators to prove it.

The real kicker is that for many games load peak is the Friday after launch, we really only needed 1 machine a week later, and .25 going forward thereafter.

With the IT department we would have bought $160k worth of servers. And we still couldn't have actually tested things without AWS. AWS Cost $30k the first year and then we actually moved to new instance classes which cost $12k, $6k, and $1k a year thereafter. It was enough cost dropping that it wasn't worth dev time to even downgrade the cluster to a single instance.

None of that includes what IT would have cost to run the hardware. I'd make the same choice every time.


This is the use case that the cloud is really designed for, and most people who are 100% anti-cloud don't realize how peaky many workloads can actually be. However, I doubt that most cloud advocates understand this.


Hacking away at that beuracracy is one of the main advantages of moving to cloud. Maybe it all comes back in the end, but for a while it is definetly a source of speed and agility.


I don't know why there are so many articles in this vain:

- use the cloud until you're making so much money that you can afford to hire the sufficient talent to replicate cheaper.

fin


How many organizations move either between clouds or off cloud entirely? I imagine for the former there’s some import tools in order to poach customers.

But between all the advanced proprietary software solutions and exorbitant egress fees, moving off cloud entirely is rather difficult.


I'm sure people will disagree, but I've always viewed the main benefit of cloud is a company with no talent or time to do networking / infrastructure early on, so they use the cloud, which requires less expertise. Once they grow and can afford network / infrastructure talent, the switching costs are hard to justify, even with the 3x markup (according to the article).

I can see why it's compelling, but for me running my stuff on other people's servers in this day and age is concerning. Like many computing things, it really depends on the situation.


The issue is that the 3x cost doesn’t include the cost to staff the team that runs your in-house platform software on your colo setup, and the mixed incentives you get when a sizable part of your team is not shipping product but instead shipping say, Kafka queues. My previous employer has one of the largest colo deployments of the f500 companies and it was a mess technically and organizationally. At the end a few hundred infra devs were laid off and projects scrapped after ~24 months of no visible progress on platform stability and support.


Rented dedicated feels like a nice middle ground between colo or on-site and cloud hosted. It's on the cheaper end. And if you can boot into a self-configuring image, maintenance cost is quite low too.


Part of the issue I see overlooked is the cost of acquiring and losing physical assets. If you have an onsite data center and a meteor hits it, how bad off are you?

Disclaimer, I barely passed my aws associates cert and have 0 qualifications to weigh in on this subject with any authority. It's just a point I've seen glossed over before. Yeah your data center is cheaper to run, but is it cheaper to replace?


Any decent IT team will backup, replicate, shard, to different locations, and not rely on a single datacenter.

As said in the blog post, with the cloud you will get the same thing that you can pay your inhouse IT for 3 times the price.

For example, lots of people are thinking that by just using 'the cloud', your data is safe / replicated to multiple geographically separated areas. But no, it is not automatic, it needs to be configured like that with the associated costs. If you lose an ec2 instance, it will be lost in the same way as a node in your personal data center.


How likely is it that you have a competent IT team? If they are competent, are they also fast? I may not care about having a scalable, redundant deployment if it takes 3 years to get done.

It’s tough to recruit a tech team and keep it running. Worst case scenario the team leaves and your stuck with folks who don’t know how anything works.


+1. On top of this if you need to provide slos on availability, diaster recovery, respect regionalized requirements, even with a technical playbook just acquiring physical space gets hard and expensive real quick!


That's a good point. Redundancy requires more cost and more expertise, but as a counter argument, AWS has had its share of outages lately as well.


A crucial point is to ensure that you give yourself the ability to move from your cloud providers to on prem, and / or to a different cloud.

A bad example: If you are with provide X and they change pricing structure and it impacts you 3 times the cost, you can move it without too much hassle.

Once you embrace all the proprietary and fancy features that your cloud provider has, then you are stuck and moving will be a nightmare.

compute/s3 are easy to move. k8s should be easy to move but I have not tried it myself. Database hosting is easy to change as long as you are using an independent product, not a custom database your cloud provider has.

It is much more difficult when you have AWS/Azure pipelines, AWS/Azure Geo location/manipulation AWS/Azure proprietary scaling etc etc

Terraform is supposed to help here but in my experience with TF consultants it is not at all straightforward not compatible with a lot of AWS/Azure offerings.

Then you are stuck and moving will be difficult and expensive. Which of course is the business plan for the cloud providers.


For those that didn't read it, I recommend the thread on bandwidth costs from 5 month ago this article links to.

https://news.ycombinator.com/item?id=27930151


I managed a deep learning team at Capital One during the period when they dropped their own data centers and moved completely to AWS. I am 100% sure that they made the right decision. My current company also uses AWS. With modern corporate infrastructure with micro services, and buying vs. building many 3rd party services, it can make sense to go all in on Cloud.

Personally, I miss the days of monolith web applications that were relatively easy to host on a leased server. I continue to be a big fan of Hetzner and their hosted servers as well as their VPSs are very reasonably priced. Another thing that I like about Hetzner, OVH, etc. is that their bandwidth costs are also very reasonable so moving databases and monolith web applications to a similar service does not have to be a big deal.

I think that each company’s needs have to be assessed separately.


These 3x calculations for cloud almost always ignore a real TCO calculation for a real organization and real app deployment and rarely compare capability, flexibility and recovery which have real value, but have tough calculations.


Mid sized cloud companies like OVH, Scaleway, Hetzner or Infomaniak are the way to go for me.

But here's is the point that's completely missing from the article. Many choices in organisations are leaded by two wrong drivers :

- Career risk awareness : like we said in the good old days, nobody was fired for buying Sun Microsystems. Same thing applies for Azure and AWS. You're not paying the bill so why looking for a cheaper alternative that might cost you personally a lot. - CV driven decision : on your CV it's better and more valuable to have 3 years of Azure/AWS than having 3 years on OVH/Hetzner/etc.

Finally, as a leader in an organisation, it's always easier to follow the trend rather than trying to convince your coworkers to follow you to a more "exotic" solution.


Maybe tangent to this but I am really enjoying Cloudflare Workers w/KV and Durable Objects. Currently running over 80% of my company’s infra on it and it’s a monthly cost under $50.


Interesting. What workloads are you running there? Internal in-house built line of business apps?


I wonder how the costs compare if you choose to go outside of the big three. Hetzner, OVH, Scaleway etc are so much cheaper and the network exit costs are much lower than with AWS GCP or Azure.


"...3 times that of a competent IT organization providing a similar service in house." A competent IT organization is a real stretch for most organizations.


I figured a large portion of the blame could be laid at the feet of leadership; these people want to lead people, not run IT. Moving to the cloud gets rid of all those pesky power and environmental and raised floor and employing all those weird people…and if it’s more money, that makes my budget look good, too. If you save money, you’ll get less next year.


"I use AWS for a lot of things, but I strongly regard the cloud as just another tool, to be used as occasion demands, rather than because the high priests say you should."

He's right. Evaluate your needs and use it if it makes sense. Not a very controversial opinion, I think!

The problem is, how do you evaluate your needs if you aren't an expert in either self-hosting or using managed services (AWS)? I think you should treat it like going to a doctor: get two opinions from two different senior professionals in two different specializations. And definitely make an assessment based on real numbers. Try to get ballpark figures from similar-sized businesses about their costs (capex/opex, infrastructure, staff) and requirements (expertise, time-to-market, FRs/NFRs, regulations, etc). Building a business is a huge thing, and how you use technology can either be a hindrance or an accelerator, but it has to fit your use case.


I think there is room for hybrid cloud built with open source software.

Datbases and virtual machines tend to be expensive in cloud providers. Steady state workloads.

Object storge tend to be fairly cheap.

There is also cost of vendor lock in if you use propritary cloud technologies such as databases.

You can build open source infrastructure with hybrid solution. Probably less risk of downtime.


Have not seen this being mentioned yet, here goes:

Choosing a large cloud provider is often much greener than doing your own or using a smaller provider.

Carbon is a large cost.

GCP has been carbon neutral for over a decade, AWS and Azure have made big public commitments.

Sourcing clean energy for 1000s of servers is not always easy.


Just curious: if you are an enterprise, what do you think (besides cost) is a compelling argument in 2022 for not going to a public cloud? This is probably a Ask HN, so please do point me to it if you know an existing one.


Cost is a huge one, but availability and service priority is a big one for me: Amazon and Microsoft are going to provide service and choose to deploy updates and do maintenance on schedules that work for them, not schedules that work for you.

Your in-house IT department knows not to take a risk the week before your big product launch. Amazon and Microsoft simply do not care about your product launch at all, and probably aren't even aware of it.

You're going to pay for infrastructure and IT staff either way, so why not pay for IT staff and infrastructure that prioritizes your business needs and not their own business needs?


having your own control about maintenances is a big one.

the same with being able to give your customers flexibility in terms of when maintenance happens. this is a big deal in b2b space, especially with long running client relationships because it gives customers some form of control they never are going to get back in the public cloud unless they are insanely large customers.


I wouldn't underestimate the cost of internal politics...


I dont think many are actually against "Cloud" per se. But Specifically AWS, Azure and GCP. Especially when you are only looking at like EC2 and wondering why they are 3x more expensive to other managed server / cloud vendor even on Reserve instances with zero bandwidth bundled.

If you look at prices from Oracle Cloud ( ignoring whatever feeling you have against it ) than all of a sudden it is extremely attractive.


> In particular, traditional legacy IT infrastructures are ridiculously overpriced. (If you're using commercial virtualization platforms and/or SAN storage, then you're overpaying by as much as a factor of 10, and getting an inferior product into the bargain

Could someone elaborate? Is this saying something free like OpenStack would be better than something paid like VMware? Is that really common knowledge?


I agree that 99% of companies should not be buying their own hardware and setting up a datacenter in their basements. But: not all clouds are created equal.

The "cloud" can just be a blank Debian box on Digitalocean where you have root access _or_ it could be some obscure managed AWS service where all the technical days are abstracted away behind a REST API.


I really like how these claim in the post not backed up by anything.


a lot of time its not about saving money. you can't even hire/afford people if you are a small/mid size company.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: