I see so often developers state something along the lines of, "We need to go with AWS!" (sometimes substituting AWS with GCP or Azure) and the reason they always give is "scalability". 99% of the time, what they're building not only doesn't need to scale to that degree but they somehow seem to think that it's always inherently cheaper.
Meanwhile, when I build something for a client my first go-to is something like DigitalOcean or Amazon LightSail. I know, LightSail is still AWS. But it's not going to automatically increase the bill by 300% because someone accidentally toggled something in AWS's byzantine admin console.
When ("if" is more often the case) that client starts to see the kinds of traffic that necessitates a full-scale AWS build out then I'll advise them of their options. By that point, they're actually generating real revenue and, as importantly, they didn't pay me to prematurely optimize a product that wasn't yet generating any revenue, which helped them succeed.
Scalability itself isn't the selling point. It's freedom.
It's the freedom to get things wrong, iterate and try again.
It's the freedom to move resources around.
It's the freedom of not having to wait for your procurement process and approval from a purchasing department.
It's the freedom of not having to wait for your operations team to plug in and configure bare metal. And all of these freedoms have compounding interest.
For most organizations, what I just talked about equates to man years of work wasted and I think a lot of people nitpicking about cloud being expensive have lost sight of this.
Sure it costs more, but it's from money you've already saved. You'll have loads of saved money left to spend on more developers too.
It sounds to me like you're describing a corporate environment with all the bureaucracy that comes with such an environment. That scenario likely has the traffic and revenue to justify a full AWS build out.
I'm talking about new projects at much smaller companies, or much smaller departments within the company that don't yet know if the product they're building is going to even work.
Not to mention; I have the same freedoms to try things out with a $10/mo DigitalOcean droplet as I do EC2/Lambda/ELB/RDS/S3/etc... If something doesn't work, delete the droplet and start over. To that degree, it's even easier, cheaper, and more free to just test something out in Docker containers on my laptop.
So, respectfully, I'm not buying the "freedom" argument. That's not even the selling point AWS is pitching.
If you know how to build in AWS cost-effectively (and really to cost-effectively use compute in general), I would still pick cloud over buying any hardware.
For my last gig I was hosting the web infrastructure for a 5B/yr enterprise with a low-5 digit Alexa Rank on roughly $100/mo in EC2 instances. That was after converting the property to a static site built with jekyll (you can do pretty amazing, magical-looking things with front end JS calling out to apis and nginx SSI while still serving static conntent) and a hosted wordpress ($25/mo) for the content editors (and CircleCI's free tier for builds -- woop woop). And we were insanely over-provisioned with those t2.smalls.
Procurement sucks. Having to test out new systems when you need to buy new hardware 2 years later also sucks. Having to wait behind much larger customers when Intel has a Xeon supply shortage sucks (2018 sucked if you had to buy servers).
I've mentioned this elsewhere but; I'm not arguing co-location over cloud. I'm arguing inexpensive cloud like DigitalOcean, Linode, or LightSail over a complex AWS configuration designed to scale with near infinite flexibility. Why does a product that gets 100 visitors a day need that? $5-10/month serves them just fine.
The cynical answer is "resume-driven-development".
The slightly more charitable answer is that having your infrastructure automatically scale and adapt to the workload, and watching it take off is a really fun and exciting moment, and people want to experience that, and in the rare event that it does happen and you suddenly need to handle 10-100x more traffic, having things auto-scale up and then back down means you (in an ideal world) get to watch it excitedly, instead of furiously running around spinning things up and putting out fires that arise when you start operating at larger scales.
Is it overkill to build your business like that from the get-go? Almost certainly (precluding the situation where you have some kind of guarantee of incoming load), but people (want to) do it anyway.
> (...) in the rare event that it does happen and you suddenly need to handle 10-100x more traffic, having things auto-scale up and then back down means you (in an ideal world) get to watch it excitedly, instead of furiously running around spinning things up and putting out fires that arise when you start operating at larger scales.
Emphasis on "rare".
And also, AWS is not the only cloud provider which offers autk-scaling. I've used an european cloud provider which even offered Kubernetes with node autk-scaling, and there are even scripts in the wild that implement autk-scaling for small cloud providers such as Hetzner.
I'm starting to believe AWS only benefits from bandwagon effect and resume-driven development. None of AWS's value propositions are ever mentioned in these discussions, which is telling .
LightSail isn't any cheaper than the EC2 instance types backing it. Everything else (VPCs, IAM roles, etc) doesn't have a cost associated.
What you're missing is that cost/complexity-wise, LightSail and EC2 are equivalent. The only difference between them is your interface to it. LightSail doesn't give you some pretty necessary knobs to kick things into a working state when the EC2 instances are having a burger. In fact, the last time I used LightSail and had a problem with unavailable instances, I just ran the ec2 api commands against the lightsail instance IDs to solve my problem.
DigitalOcean has some networking properties that make it extremely undesirable for some use cases and Linode is frequently the target of massive global DDOS. I remember well a few years ago the Christmas Eve Linode DDOS because I had to work 20 hours that day.
Thankfully it was only 20 hours because we were already in the process of moving off of Linode and we just decided to flip all of the switches to serve out of AWS. Most of the time was spent waiting on DNS TTLs.
Instance costs across all of the cloud providers are pretty competitive. Where AWS and Google Cloud and Azure are more expensive are in those "extra" services where you would be paying people to run infrastructure (elasticsearch, sql databases, etc). DO, Linode, etc, don't give you that option -- it's not an apples-to-apples comparison....and in most cases you shouldn't use some of these things. Definitely no service where you can't just pick up and go use some other hosting tomorrow. Cloud vendor lock-in is real.
> LightSail isn't any cheaper than the EC2 instance types backing it.
Sure it is: Lightsail includes plenty of transfer with its price. At the $10 tier you get 3tb thrown in. If you're doing anything that burns even a modest amount of transfer, the price difference will be sizable.
"Linode is frequently the target of massive global DDOS. I remember well a few years ago the Christmas Eve Linode DDOS because I had to work 20 hours that day."
Those sound like some of the weakest reasons. Is your information still upto date? Are they still experiencing major outages?
The truth is it would save you money to use a digitalocean instance over aws, regardless of your edge cases where some network property is not desirable.
How much do you spend a month more to justify working around that weird networking property?
Is it really worth it for everyone else? What is this weird networking issue we need to spend extra to avoid?
I can't really get into specifics, but the particular issue that we have with DO (and it really is specific to DO/droplets) makes it a complete no-go for our new product architecture.
It doesn't save us money if we can't use it.
As for my reasons being weak, you might have read other places in this thread where I've mentioned that I'm responsible for
a) a multi-million dollar infrastructure,
b) across clouds and on-prem in tier 1 datacenters
c) have been in this field for a long time.
Odds are that I'm not a total idiot, that I have good information and that I know what I'm doing here.
You are probably an expert in your area but the original poster was talking about much smaller shops. That's the point being missed for every multi-million dollar spender there are thousands of smaller projects paying 10x the cost because they think they will be you one day and it is cheaper to build on that stack from day one.
Yeah, I'm saying that you can use this cost effectively at smaller places too.
When I think of my core Terraform infrastructure that I can spin up for any project, the only resource with a recurring AWS/Google/Azure cost is NAT Gateways. You can get very cost-competitive with the instances themselves with reservations and/or spot, depending on your architecture. The costs of the big clouds and the small ones is basically the same here.
The way the cost balloons out of control for most companies is for things like S3 access (applications that HEAD the bucket every 60 seconds and hit the API request cash register), or managed services like EKS, Amazon ElasticSearch, etc. It's in the "oh I don't have to learn how to manage X application" where cloud gets expensive.
Smaller providers solve that problem by not even giving you that option. It's not that AWS is expensive, it's that they give you rope to hang yourself with.
Except that in smaller places time is of the essence I we don't have enough to learn the full extent of AWS and I'll save my time and money by using a droplet.
Learning AWS does have a cost. Money I'd rather spend on learning and mastering standard, open and portable systems.
That's not a triviality - C++ has the same problem, if you have a single ezpert individual it's great, but if you have a team, and some are very green, it will be a nightmare.
Small startups use junior talent for most services. They are not hiring the best hr team, marketing, sales and development. They don't have unless amount of money. They will pick one or two areas where they have an advantage and hire a strong person or two.
I think we have wildly different expectations of the scale of "small startups".
If you can afford to full-time hire an entire hr team, an entire marketing team, an entire sales team and an entire engineering org, then you're able to pay an AWS bill.
There are a lot of great parts of AWS that aren't in LightSail, even for small projects. S3. RDS (with or without Aurora). ECS and ECR. It takes and manages a lot of complexity for you.
(DigitalOcean, to its credit, has offerings in those spaces -- they have an object store, they have hosted databases, they have a container service. And I use DigitalOcean for some personal products. But you can actually run a Postgres DB on AWS Aurora Serverless for cheaper than you can run one on DigitalOcean, depending on your workload. It's not obvious that DigitalOcean is a better choice there.)
I am not sure how can you host something serious in just $100/mo on AWS: m5.large w/25GB SSD costs $75/mo, 1TB traffic costs $90. DigitalOcean asks $60 for this (2 vCPU; 8GB RAM) and includes 4TB of traffic out of the box for the reference, Hetzner asks €25. Note that even a second host for a DBMS is not included in this, let alone DBMS with replication. You will also have to host Redis (or alternative) on its own server if you need to scale your frontend beyond one server. Either I am seriously missing something or our definition of serious/prod env differs quite a lot.
When you have a real finance department that vets each third party or have some governmental requirement to get three quotes for each new service (part of requirements set out to minimize grift in sourcing), having a single bill that can be expanded to include new projects and services makes it a little simple.
This causes a problem during cost cutting because unless it's a priority your bill will spiral out of control.
And yet, AWS isn't for a really larger company either. At some point the bills become so huge that running your own IT and datacenters is cheaper. Usually a lot. Even when it means building some AWS services yourself.
> These large companies rely on these established solutions, paying much bigger bills than Dropbox – surely, there must be a good reason for it. Maybe the answer is this: the cloud isn’t the answer for everyone, but it probably is for most companies.
Worker at Fortune 200 company here having multiple datacenters and the operations staff to go with that. So far, every single application we've ported over to AWS has resulted in reduced costs. I'm not even talking about total costs, i.e. datacenter power costs, real estate taxes, staffing costs - I'm only talking about server costs, networking costs, storage costs and software licensing costs. So far AWS has beat on-premise every single time. Is it cheap? Compared to running on-premise Hell Yeah! Our average savings has been 60% (and like I said we're not even looking at total cost, so the actual average savings is even higher). It's ridiculous.
Amazon itself used to be that environment. Before AWS was adopted internally, which it wasn't for some time, to say nothing of the many years before it was created, getting capacity involved a weeks or even months long political game with the infrastructure team. Even infrastructure devs had to struggle to find resources. Hiring managers would literally horse trade their spare capacity.
> Amazon itself used to be that environment. Before AWS was adopted internally, which it wasn't for some time, to say nothing of the many years before it was created, getting capacity involved a weeks or even months long political game with the infrastructure team.
It's my understanding that you're describing how Amazon worked in the 90s. It's my understanding that Amazon afterwards moved to an internal cloud provider, and afterwards spun off AWS which worked in parallel with Amazon's internal cloud provider.
> It's my understanding that you're describing how Amazon worked in the 90s. It's my understanding that Amazon afterwards moved to an internal cloud provider, and afterwards spun off AWS which worked in parallel with Amazon's internal cloud provider.
Not at all. Amazon's transformation took years. Up until last year, amazon.com was Oracle's biggest customer. AWS development far outpaced their actual use of it.
The website platform itself started migrating to AWS, in a crude but reasonably effective way, in 2009. Various other service owners migrated sooner though.
Also there had already long been a directive to build services as though they were going to be used by external customers since the early 2000s. It's not wrong to view AWS as just another step in Amazon's long services journey, although that elides many significant events.
I have to agree... I think DO is pretty good as a starter ground... found dokku because of the DO startup option, and it's pretty much my go to for spinning something up to play with and/or tear down.
Create an app with a Dockerfile in the root of the project, and it's a git push away from deployed. The LE extension adds https, and it pretty much just works. Connecting to DO hosted databases and object storage is pretty smooth as well.
It doesn't auto-scale like FaaS options, or offer a lot of the other bits one may want/need, but it does good for a lot of the one-off things I'll play with.
I didn't even mention the best part about hosting projects in cloud providers.
You can generalize the infrastructure to run 99% of businesses needs and write that Terraform code only once. You can bring that with you everywhere and take care of the basics in seconds and you can sleep at night easily without a pager going off.
We used this with Azure. We contracted a firm to write a stack for eCommerce sites we were spinning up left and right. It passed through ITIL and was approved all over the chain. After that - security was happy, ops was happy, everyone was happy. Everything was just deploying an application - be it an application making millions of dollars per instance across many brands.
It enabled the branding/marketing teams to launch sites without waiting for months of tech backlog. it was beautiful
> You can generalize the infrastructure to run 99% of businesses needs and write that Terraform code only once. You can bring that with you everywhere and take care of the basics in seconds and you can sleep at night easily without a pager going off.
To quote my parent. That's what we did and it worked for us.
It was in an almost exactly similar situation to yours where I learned to do this and I've been able to scale it up to companies with 1000x more demanding infrastructure.
The funnier thing in my case was that three years after I left, the infrastructure was humming along fine with nobody actively looking at it. They never hired anyone to look after it because it never gave them any problems and they would have been fine to keep paying me to be there and do little/nothing.
I've staked my whole career on this way of doing things and I'm never looking back.
Funnily I'm in a different position at a different company and in house is a huge win for us over aws/azure due to very specific hardware requirements. There's a time and a place for everything - but the days of colocation are gone for people who don't actually need it
Understandable...and as I've mentioned elsewhere, we do also use physical hardware in datacenters. It's a situation of only for the things that need it for us.
AWS or any cloud service is anything but freedom. So please use the word freedom judiciously. If you are working with AWS your application and your whole organization is at the whims of Amazon. You will not even be able to migrate if you are tied into those API's and services because they are not same across cloud providers. You don't even own the data or compute resource hosted on such cloud services because just by a single notice from a government agency your account will be blocked without access to your own data or compute resources. It has happened and will continue to happen because for government approaching single cloud provider is much easier than going over the process of getting it for company or an individual (more costly and difficult given different jurisdiction across country and state lines).
In terms of technical freedom you will have less of it and will be tied into proprietary extensions, API and services offered by those cloud providers.
On the other hand for self-hosted infrastructure government notice will not block access to resources owned by you and data in those resources. You can respond to notice and continue using those resources until court rules. So it is more freedom than a cloud can offer.
In our organization we need to go extra mile and put extra engineering efforts to make sure our apps can work on different clouds with different services (e.g. using apache libcloud, avoiding proprietary API's).
> If you are working with AWS your application and your whole organization is at the whims of Amazon.
Part of freedom is the freedom to do stupid things.
> You will not even be able to migrate if you are tied into those API's and services because they are not same across cloud providers.
VPCs, Instances, Load Balancers and Storage are roughly-equivalent everywhere and the differences between are mostly-trivial. For the rest of the "shinies" they offer, well, see above.
> You don't even own the data or compute resource hosted on such cloud services because just by a single notice from a government agency your account will be blocked without access to your own data or compute resources.
If you think this doesn't also apply to your physical racks in datacenters, you've got another thing coming. Ask Lavabit.
Design for this in your strategy. You can still "own" your data and have it hosted in the cloud too. There's no reason to have to be exclusive.
> In terms of technical freedom you will have less of it and will be tied into proprietary extensions, API and services offered by those cloud providers.
As mentioned before, stick to running in instances and you're no differently off than anywhere else you can operate.
> On the other hand for self-hosted infrastructure government notice will not block access to resources owned by you and data in those resources. You can respond to notice and continue using those resources until court rules.
Absolutely not. With a warrant, the authorities will come and take your shit. They have guns. You don't. Ask Lavabit. Ask The Pirate Bay. Ask any blockchain drug pusher.
> VPCs, Instances, Load Balancers and Storage are roughly-equivalent
Worked enough as preferred partner with all 3 major cloud providers to know that roughly equivalent does not mean equal and efforts are significant. e.g. self-hosted haproxy or nginx is much better than those load balancers as each provider has their own set of API's. So please do not underestimate the efforts.
> If you think this doesn't also apply to your physical racks in datacenters
Self-hosted means more than just datacenters, can be within office or home. So it's not easy to shut. Also there are numerous examples of aws account suspended with all services blocked just based on one sided terms of service not even a legal notice requirement.
As I said earlier most people trade freedom for convenience and that's what cloud service is. Its not freedom but slavery to convenience.
> stick to running instances
What's running is not the question here, but freedom and ownership.
Here its still an issue of who owns the resources and data within those resources. Buried within terms of service and agreement its the cloud provider who is a judge, jury and executioner.
> With a warrant, the authorities will come and take your shit.
It's very hard to get a warrant from a court for a legitimate business to confiscate their resources (bar is just too high) vs a notice via an email to a cloud provider for perceived infringment (indeed most cloud provider have dedicated contacts for government notices and implementation, given they also provide services to government agencies). Also targeting individual self-hosted infrastructure is very costly. Examples you are presenting are just to create FUD, majority of the self-hosted infrastructure in the world is for legitimate purpose and true freedom.
> You can own your data and still use public cloud.
First it will be good if you read through the terms of service, privacy, data retention and service agreement contract. You will be astonished to know that buried deep within the legal language the data is owned by the cloud provider (as they own access as well as the rights to physical resource holding the data) and they provide you an option to export it (but this can be completely blocked by a notice from govt.). You can mitigate the risk by encrypting the data with your own key which is not shared with cloud provider (but indeed this is also not true most of the time given the key to encrypt the data is hosted with the cloud provider as well).
For your reference AWS for its instances still do not support ed25519 key for SSH access and needs an RSA public key (although its less secure). We need to go extra mile to customize the AMI images to support ed25519 keys and remove the account and keys created by AWS.
Please let me know how you will recover the data in case your multi-site deployment on AWS with DR is blocked by aws as they received a notice from government agency with a gag order.
> You are familiar with using multi-site DR strategies, yes?
Most companies using AWS or GCP or Azure use multi-region deployment (which means using different regions). But all will be shut down with a one single notice from government agency with gag order (you won't even know why given gag order).
So your multi-site DR will only work if the company keeps a self-hosted backup of data to avoid cloud provider from shutting down access to the data. This is precisely my point that self-hosted guarantees freedom not the cloud which is right opposite of freedom (traded for convenience).
Being locked in to AWS (or GCP, or..) when costs are rising out of control is what I consider the opposite of freedom.
I was with one startup that went full-in on AWS features, so it was very difficult to migrate. Sure, we got started real quick, that was nice. I argued early on that we need to be careful of not getting locked in to AWS in case we need to jump away once we have some customer growth, but everyone else thought that was silly.
Fast forward two years and our AWS bill was about $2500 per year per customer. Revenue per customer, certainly nowhere near that. Burning VC money gets you far but only so far. Yes we ran out of money.
The reps from the major clouds constantly pester us to use their vendor-only features and we always politely have to say "no thanks".
We're up front about our multi-cloud strategy and only use the basic resources that are roughly equivalent/consistent between providers. We have the expertise to run whatever managed application they have ourselves and the engineers to build our own platforms.
I know corps and teams where they need approvals for each cloud resource they instantiate. That is a culture problem.
On other hand, when we were using simple VPSes at various companies, at some companies, I could just send email saying I provisioned a new server and at the others, they wanted a ticket with justification for it. But we always had enough test servers to try out and experiment. Cloud is great in many ways but freedom has more to do with an organization.
> I know corps and teams where they need approvals for each cloud resource they instantiate. That is a culture problem.
I'm on the software side so I want my compute resources in the next 30 seconds like anyone, but.. at a larger company there needs to be controls in place or things can go bad quick. At one company I'm aware of, finance really started banging on doors when the cloud bill went over $1M/month and nobody really knew very clearly what any of it was being spent on or what products or customers each expense was associated with.
It's not a culture problem, it's a reaction to what happens if there aren't any controls on spending other people's money: people just waste resources like there's no tomorrow.
I've seen this at my current employer. Like so many, got hooked via first-hit-is-free Azure credits. It doesn't even run end user facing services, just does software development, yet somehow after a few years of being a startup the cloud bill was millions of dollars a year. Why: because engineers would create a fresh VM from scratch rather than just logging in and using their accounts on pre-existing VMs. They'd forget to delete resources. They'd use expensive services like managed Kubernetes because why not.
There was also a more subtle cost harder to account for: Azure machines are dog slow so significant engineering time was wasted trying to work around that problem.
There has to be some friction to push people into using pre-existing hardware instead of acquiring more. If there isn't you end up with an absurd IT footprint well beyond anything justifiable in engineering terms.
Fair. I've seen shops like that and I feel bad for them, but that's an organizational limitation, not one imposed by the cloud provider.
If you want to buy server hardware, there is always the spec, testing and sales process no matter what. Hardware buying eats so much time...but at least the fancy dinners are kinda nice.
And there's a proper way to do the accounting for it as a depreciating asset vs buying cloud (and most public orgs actually prefer the former).
Yeah, the freedom to spend corporate money is nice, but for a smaller org, the experience is exactly the opposite. We have ca. 30 servers and I can move things around without any problem. Most of them are already configured and ready for operation; reconfiguring is easy. With AWS, first of all you need to learn all their ecosystem, and then we had constant problems with cost allocation. Yes, you can book all these charges under "IT", but it's too broad, you need to assign them to projects. And then they constantly grow.
With bare metal it's 100% predictable. Where to book the sudden spike in costs in April? Why did it happen? We had €3000 booked for this project in April, why do we have to pay €5000? And so on.
Sure, if you have too much money to burn and the financial department is lax with controlling the costs, AWS feels great, but not everybody has this luxury.
It's very easy to implement strict controls for who and how infrastructure gets created. Public companies have to use this too and we have all of the compliance requirements I'm sure you're aware of.
Automating this and automating the guardrails makes your job _easier_ not harder.
> It's the freedom to get things wrong, iterate and try again. It's the freedom to move resources around. It's the freedom of not having to wait for your procurement process and approval from a purchasing department. It's the freedom of not having to wait for your operations team to plug in and configure bare metal.
I don't understand your point. How is any random cloud provider, or even hosting company, a blocker to moving resources around or wait for procurement or to iterate?
I'm looking at the Hetzner cloud dashboard and I'm free to launch how many instances I need at the drop of a hat.
In the end there's an itemized bill, which unlike AWS you pay only for what you ordered up front and you can reason about without having to endure advance courses or freaking machine learning tools.
Where exactly are you seeing any freedom at all on AWS?
The whole point of my statement was that it's NOT a technical argument.
Anyone with significant experience and maturity knows this already. Being technically correct doesn't matter if you've created more work for yourself than you can handle.
"Scalability" is much more often a people management problem rather than a technical one. Please come back and tell me how you feel about this when you have 10^4+ servers in your care.
The nice thing about well-orchestrated cloud is that you can run your infrastructure the same way whether you have 10^1 or 10^5.
The system I inherited at my current job was built with "scalability" in mind. That meant use of the Firebase databases and splitting everything out into lots of microservices. Ironically this actually caused scaling issues (even with the very small scale we have at the moment) because the lack of query flexibility in the database meant downloading entire collections in order to provide the necessary functionality.
We've since moved (almost) everything over to postgres and merged a lot of the microservice together. And while we're still running on heroku, we could probably run on Raspberry Pi if we needed to.
It's incredible that with the increases in processing power and storage technology, and the continuous performance enhancements of postgres in particular the last years, any "scalable by design" solution of 10 or even 5 years ago can easily be implemented in postgres these days. I guess the 80/20 rule applies more than ever in these cases.
I am far from a database expert, but the fact that I have been able to implement without much effort efficient systems that have high bandwidth and reliability requirements, balancing and low latency speaks eons of the current state of affairs.
Sadly, I’ve seen this at more misguided startups. Microservices solve a problem no startup has until they go series B and grow like bananas. When they actually scale.
Sounds like it has nothing to do with microservices and more that they just blew everything up instead of fixing their firebase schema. But hey, sometimes that's way simpler.
Microservices didn't particularly cause technical issues. Just unnecessary operational overhead. While I'd probably go with a monolith if I was building for scratch, in this case we've stuck with ~5 microservices (we did have over 10) as it works well enough.
Firebase is fundamentally limited though. You simply cannot do things like joins and even filtering and updating is very limited compared to a SQL database. Of course you can implement these things in code, and precompute them and store them denormalised. But at that point you're implementing your own database and it's much less work to move the data to a different database platform. We did a gradual migration which made it relatively painless, although it was still a lot of work. And we're able to move much faster on new features now we have the flexibility of a fully-featured database.
firebase is probably the most hyped and overused tech I've seen. I've run into far too many startups that went to firebase because they wanted to scale fast and didn't realize how limited firebase was for queries.
If you're starting a startup, use postgres for your database. it will handle damn near anything and you can spawn off functionality into microservices as they become bottle necks as you scale.
Hell, postgres lets you write database extensions that perform novel behavior. I saw one the other day that automatically syncs data to elastic search. Wouldn't be that hard to envision having writes to certain tables sync to firebase if you need subscriptions on records for certain parts of your app.
Two reasons I use Firebase for my app (although with an admittedly trivial schema/data):
a) I have a thousand things to do that are higher priority than configuring/hosting a Postgres server.
b) Integration with Firebase Auth is a godsend.
I doubt people choose Firebase for technical or scaling reasons. More likely they choose it because it's trivial to integrate, so you can spend the limited hours you have in each day on more important (customer-facing) work.
With all due respect, " they choose it because it's trivial to integrate" is a terrible reason for choosing a technology. Anyone choosing a database should be doing so based on the following.
1. does the insertion and querying mechanisms support the requirements of my app?
2. is it scalable? (ie first few thousand users, beyond that and any tech you pick is going to run into issues without a dedicated strategy)
3. can I afford it? (in terms of maintenance as well as licensing)
Postgres (and mysql) has plenty of standard tooling for integrating into your app and is no where near as limited. I would not take an engineer proposing firebase without making a STRONG logical case for why its the best choice as someone who's graduated out of junior status.
Then again, fixing messes caused by such engineers for several startups that started with firebase because its the easiest to get started with PAID for my galavanting around southeast Asia last year so I won't wag my finger too hard.
Postgres is also trivially integrated with firebase auth. There's a method in the client SDK which gives you a token you can send with API requests. And a method in the admin SDK that you can use to validate the token and resolve it to a user id.
I think your sort of thinking requires a certain level of maturity and experience. I see the opposite with with some engineers who always have this illusion that the first or even second version of their project will be the final one for all times so they build it to scale (which actually doesn't even really scale because they don't know where the real bottlenecks are) and highly configurable. I've seen way, way more projects bite the dust because of poor product design or product market fit than the inability to scale. I've definitely seen projects that can't scale but at least from an engineering point of view, those problems are way more tractable than people not wanting or needing what you've created. As I've gotten more experienced, I've realized one of the great skills a good engineer has is the ability to change/fix/scale a project once it is up and running and while people are using it.
> I've realized one of the great skills a good engineer has is the ability to change/fix/scale a project once it is up and running and while people are using it.
This is by far the most important lesson I have learned this decade. Anyone can pull the rug out on iteration 1 and start a completely new attempt in a 2nd iteration bucket.
It takes determined engineering efforts and talent to iterate on top of the existing code base while not impacting the ability to deliver ongoing feature support to production.
We have historically tried to do complete rewrites with bombastic claims like "We can do it right this time". Well, we did it "right" 3 times until we decided that restarting from zero every time is a bad way to make forward progress. I will advocate for a clean restart if you are just beginning or it is clear you have totally missed the domain model target. If you have been in production for 3 years, maybe consider a more conservative iteration approach.
The best analogy I can come up with is the Ship of Theseus thought experiment. Rebuilding the ship from the inside while you are sailing it to the new world will get you there much faster than if you have to restart in Spain every time someone is in a bad mood or doesn't like how the sails on the boat are arranged.
I work with small businesses, and 90% of the time, they are better served by running a few VMs in a 1U Xeon-powered rackmount Synology or the like.
Does it have a lot of compute power? No.
Is it dirt-simple to admin, and have enough horsepower to do the simple things like run a QuickBooks server on a Windows VM, Fileshare on another, test on a third, etc that is all they usually need? Absolutely!
It isn't uncommon at all that I make my initial IT inventory at a new client's business and find that they are spending hundreds to thousands/month unnecessarily in overblown costs, and AWS is a primary driver of that. A few kilobucks in to owned hardware, and the problem is essentially solved for a few years, and paid for in a few months. Everybody wins.
It's a pain to admin for someone who knows nothing about it, and it's also a pain to have to do anything at all!
The power of the cloud for smaller shops is that the don't have to know what Synology is, or even give it a thought.
That you click point-click and set up an EC2 with some S3 and get rid of most admin knowledge/know-how is like magic.
That's the best reason to use it, even it costs 1-3x as much.
Once there is A) stability and predictability in requirements and B) sufficient scale - then - we can hire smart people such as yourself who know better, to set up something relatively simple, that will be relatively low maintenance (i.e. whatever you bill) and save some $.
Think Development vs. Operations cost.
If in dev, we have to wait for, mess with, configure things, it's very expensive, delays etc. and we don't want that.
Once there's enough predictability to determine 'unit cost' on the services side ... then you can 'cut costs' by moving to a slow-changing, custom solution, that is hopefully 'simple', as you say.
The (vast) majority of small businesses are not out to grow as fast as possible, seeking that big money valley exit. Most are family operations that seek to stay at whatever size that might encompass. FWIW.
"The (vast) majority of small businesses are not out to grow as fast as possible, seeking that big money valley exit. "
This is nothing to do with what I said though.
A 'mom and pop shop' that isn't doing product, and only needs IT, may still likely be very well served by the cloud because they probably only need a few EC2s and some minor services and frankly that's very cheap relatively speaking.
'Expertise' is quite expensive, and doesn't come into value unless there is some scale in place - I don't mean 'millions of users' but a company that has <50 staffers ... for IT there's no reason they can't be fully cloud.
'Mom and Pop Shops' want to know what a 'rack' is about as much as they want to know what a 'filter' is for their engine. The idea is to get rid of all of that overhead.
>for IT there's no reason they can't be fully cloud.
Without even taking other aspects in to account, I think you overestimate the availability, reliability, and speed of internet connections for much of the country.
Even for that kind of small-scale business: what happens when that server has hardware trouble? Are they testing their backups regularly? Something like Heroku might cost them much more in an average month, but it eliminates a huge range of worst case scenarios.
They have a contract with an IT professional to handle that sort of thing for them, same as they would have a contract with someone who uses Heroku or Amazon to fill their IT needs if they were using those servers instead.
If they've got an in-house sysadmin, you can get a lot of PaaS time for that price. If they've got an agency tasked with looking after their servers, in my experience that's not a great situation; you might get a little more personal attention than you would from a PaaS vendor, but you're always a client rather than a colleague, and you're pretty much paying for the same product that a PaaS would sell you; they might be cheaper, but the worst-case downtime is much longer. If you're outsourcing the whole package of software development and deployment then it's not really your problem any more. But in my experience a surprisingly large number of companies - especially those "family business" sized ones - fit into that window where a handful of in-house developers make sense, but an in-house sysadmin doesn't; sometimes they'll get lucky and one of those developers is happy being a part-time sysadmin, but if not then a PaaS can be a good tradeoff, even if on a per-unit basis it looks expensive.
Helping companies deal with that gap is what I do in a nutshell. It's rather enjoyable- I work directly with the owners to help develop a good IT/Data management plan, then offer monthly service/retainer contracts for the rest of it. No overhead of an agency to account for, so it's cheaper for them, good income for me.
> my first go-to is something like DigitalOcean or Amazon LightSail
I look at compute instance providers by how they charge and for what project.
Sometimes I need more RAM, sometimes I need a faster CPU, sometimes I need more storage space accessed as a normal filesystem. Sometimes I need all of the above.
So it might be DigitalOcean, it might be Linode, it might be some fly-by-night bare metal provider.
Yes, this is the right way to approach it in my opinion. Use the right tool for the job. So many people in this industry see everything as a nail and the hammer they know is all that is needed for it.
>When ("if" is more often the case) that client starts to see the kinds of traffic that necessitates a full-scale AWS
Bang on. The capex costs of BYO IT setup was the bugbear(real one FWIW) that AWS and other cloud providers were trying to avoid. But if a lot of folks are paying lots of money upfront preparing for "scale" that might never happen, they aren't really saving capex are they?
Cause when I think AWS, I think of using Lambda, Fargate, DynamoDB, RDS, SQS, SNS, S3, Kinesis, Redshift, Elastic search, etc.
All these managed services arn't just about cost or scale, a lot of it is about convenience and development speed and flexibility, as well as reduced maintainance and operational burden.
"maintenance and operational burden" in the cloud is way higher than renting a box somewhere and forgetting about it.
I can't understand why people keep using this as an argument against anything else other than cloud while playing catch-up with the rain of changes and suffering from fomo of last week's new stuff from cloud.
Heck, when did infrastructure became like js landscape?
Renting a box and forgetting about it is how you get hacked. There is more maintenance and operational burden with self hosting vs cloud. But going cloud doesn't mean there's no longer any burden.
> Renting a box and forgetting about it is how you get hacked.
You get hacked in any compute model if you don't know what you doing.
> There is more maintenance and operational burden with self hosting vs cloud.
Renting a box is closer to dedicated hosts than to self hosting, which are two very different things.
> But going cloud doesn't mean there's no longer any burden.
Exactly, and it starts with the complexity and confusion that the cloud became. Then with vendor specific stuff that you don't need, such IAM, and then when you think everything is alright, you notice your bandwidth is caped after 1 hour running your ec2, that your port 25 is magically broken (docs were updated years later after much complains), everything is a slow black box.
Cloud is like chopping your both legs and being happy with it.
Doing OS updates, configuring load balancers, DB backups, setting up firewall rules and OS user accounts, ssh access, archiving logs, configuring monitoring and alarms, assigning DNS, handling deployments, etc.
I don't know, as an application developer, I'd rather not waste my time with stuff like that if I can avoid it.
> I don't know, as an application developer, I'd rather not waste my time with stuff like that if I can avoid it.
Believe or not there are other roles besides "developer" within technology related jobs.
The problem of cloud is that the heavy majority of supporters just echo the usual "scale", "maintenance" arguments without even willing to learn what those things are to the fundamental level.
"You have to scale", said the developer that switch his ec2 from one type into the next, more expensive one, "now I have two vCPU, great!"
And the people who knows how the real world of infrastructure is doesn't buy this cheap arguments, but are outnumbered by the louder major group.
> Believe or not there are other roles besides "developer" within technology related jobs
I would assume much of AWS, at least the services I mentioned, are targetted at application developers though. So if you say running your own box is easier or better for running your application, you'd need to convince an application developer that it is.
And as an application developer myself, like I said, I don't find it so. Targeting Lambda for example is much simpler. If I need a DB, using RDS or DynamoDB or S3 is much simpler than spinning and maintaining my own instance on my own box. If I need to publish notifications, well I don't even want to think what I'd need to do to replace SNS or SQS. If I want to have application logs and have them archived for some longer period of time, and make sure they don't fill up my box's hard drive to the max, I can just use CloudLogs. Etc.
So again, you'd need to convince me it can be more convenient and less maintainance for me to use my own box instead of using those AWS services. That's even before we talk about availability and scale.
> 99% of the time, what they're building not only doesn't need to scale to that degree but they somehow seem to think that it's always inherently cheaper.
Here's the thing though. For most startups, 99% of the value comes from the long right-tail of that 1% chance of explosive viral growth out of left-field.
You're right, most companies aren't ever going to need that kind of overnight scalability. But most of those startups will wind up failing anyway. At the end of the day, everyone's betting that they'll be the exception that makes it big. Without that possibility it's not worth showing up at the table.
And if it does come, you need to be ready to move fast. Like, "we hit the front-page of Reddit and we have 5 minutes to scale up our traffic by 5000%" fast. If you take your time to collect some revenue and circle back, the ship has likely sailed.
I'm not saying thinking this way is sensible for every org. But it's definitely sensible for at least some business models. In particular within the hyper-growth SV startup scene. Foregoing scalability because most startups don't go viral, would be like an aspiring actor foregoing headshots because most auditions don't get callbacks.
> Without that possibility it's not worth showing up at the table.
I'd argue this mentality is exactly why so many startups fail, and fail hard. They over-optimize for being one of the "lucky few" and under-optimize for actual sustainability.
I agree with this, but not every startup's business model has the luxury of operating under that calculus. Particularly in markets with network effects or other winner-take-all dynamics. You're either first or you're dead. And at least one of your competitors will be investing big on VC-powered hyper growth. Slow and sustainable growth may increase short-term survival chances, but guarantees that you lose in the long-term.
If Instagram or AirBnb didn't plan for rapid scalability, and focused on short-term sustainability instead, they'd almost certainly be footnotes. It doesn't mean that a startup pursuing an expensive hyper-growth scalability strategy is a good investment. But a startup building a social network or B2C marketplace will never conquer the network effects without that ingredient.
> Slow and sustainable growth may increase short-term survival chances, but guarantees that you lose in the long-term.
I'm arguing the exact opposite: that slow and sustainable growth is exactly how you maximize long-term success.
And there are very, very few markets that are truly "winner-take-all". They only seem that way because everybody's doing the exact thing against which I'm warning: trying to emulate and catch up to already-big players instead of carving out their niche and playing the long game.
> If Instagram or AirBnb didn't plan for rapid scalability, and focused on short-term sustainability instead, they'd almost certainly be footnotes.
Statistically speaking, they'd almost certainly have been footnotes anyway. That's the nature of startups that pursue VC-powered hypergrowth: you're counting on a heck of a lot of luck.
(And not that I'd be complaining much if those companies were footnotes, either; AirBnb in particular has done a great job of exacerbating housing crises around the world, and Instagram - like most social networks - seems to rely entirely on throwing every dark pattern imaginable at its users, even before Facebook bought it. Good riddance to the both of 'em.)
I agree that scalability is rarely an appropriate reason to go in on AWS; however, “managed services” is a really good reason. You don’t have to operate your own equivalent services, but rather you pay for Amazon engineers to operate them on your behalf. Of course, there are some people for whom running their own services is the more appropriate scenario, but I think this is not a good default for anything more complex than a Rails app.
Someone will snarkily mention that Amazon services go down from time to time across an availability zone or even a whole region, and presumably this never happens on prem (I guess the idea is that Amazon engineers are well-below average?) simply because you rarely hear about it when $MomAndPopCRUDShop goes down. On that note, in my experience, customers are really sympathetic when an Amazon outage brings you down (because half of the Internet is also down) but not very sympathetic when it’s just your site that’s down.
It's not that Amazon has worse engineers: They have great engineers. The difference is they are Amazon's engineers, not your engineers.
My IT team is not going to break something on the most important day of the month for my business. Sometimes even when I want to do something mundane on the network, it's "eh, let's wait until not (insert key date here), just in case". Amazon does not care what day is important to my business, they're Amazon and they do what they want.
You are going to pay IT staff either way, if it's your own hires (or an MSP), or you are paying via AWS fees. If you're going to pay for IT, why wouldn't you pay for IT that takes orders from you and cares about your business?
The other thing is, Amazon's maintenance and upgrades is based around Amazon's need to remain competitive and turn a profit, and to support businesses other than yours which may have larger needs. (The rollout of a new feature that causes an outage for you might not have any benefit to you anyways.)
Similarly, a single patched Exchange server running on a single VM on a VM host with a UPS backup on it is generally speaking, more reliable and has better uptime than Office 365. Hilariously, Office 365 also costs a lot more.
> You are going to pay IT staff either way, if it's your own hires (or an MSP), or you are paying via AWS fees. If you're going to pay for IT, why wouldn't you pay for IT that takes orders from you and cares about your business?
Because my IT staff budget would go up if they have all of the additional responsibilities that we currently outsource to Amazon by considerably more than what we would save on our AWS bill by doing it in house. This makes sense because Amazon has a lot more customers besides just me paying for their engineers to manage a given service. So I would be paying a premium "for IT that takes orders from me and cares about my business" and that premium is rarely worthwhile. In the worst case, I have an important customer demo and an Amazon outage occurs--I just explain to my customer that we're affected by the same Amazon outage that's affecting all of their other vendors and half of the rest of the Internet and I forward them the Amazon post-mortem. This has never been a problem for me to date, and I doubt there are many people who have lost deals on account of Amazon in excess of their savings for using Amazon (no doubt there are a few companies for whom this doesn't hold and they should seriously consider on prem).
> Similarly, a single patched Exchange server running on a single VM on a VM host with a UPS backup on it is generally speaking, more reliable and has better uptime than Office 365. Hilariously, Office 365 also costs a lot more.
If I had claimed that every managed service was worth its price for every user, then you'd have successfully refuted me. :) No doubt there are some managed services that aren't worth the price, and even an individual managed AWS service probably doesn't justify the cost of integrating with AWS; however, if you're going to be needing analogs for IAM, Lambda, EKS, SQS, CloudWatch, etc, you're probably going to be spending more on balance (and incurring more downtime) by doing it in house (consider also the difficulty in finding engineers with experience in your exact matrix of tech choices versus general AWS experience).
O365 had a couple of bad years, earning the nickname "Office 360" for a while.
This year, I can only recall a single outage and we've been overall happy with it. O365 costs more than an Exchange server only in part because it comes bundled with all of MS-Office.
If you bundle Exchange + CALs + Office Professional Plus, over a three-year version upgrade cycle, versus three years of Office 365, Office 365 costs about twice as much, last I checked.
And, since Microsoft supports it's software for ten years, you can skip one or two versions of Exchange/Office and still get security updates, and pay a quarter or a sixth of what it costs to go with Office 365.
Sure, but the problem with this is often overblown. If I was going to send mass mail, I might prefer to use an outside service like Sendgrid or Constant Contact to avoid that risk, but for general email, it isn't a big deal to operate your own on-premise server.
I run my own email domain for my family. We have DKIM and SPF setup. We've been on a clean, constant IP for 5+ years. We still have deliverability problems that I don't have from O365 (work) or gmail. We probably have even more than we detect, because these are typically invisible until you can "discover the absence of a signal".
I’m not sure what point you’re making exactly, but Stripe and AirBnB (not sure about Shopify) are all on AWS.
> A year after Airbnb launched, the company decided to migrate nearly all of its cloud computing functions to Amazon Web Services (AWS) because of service administration challenges experienced with its original provider. Nathan Blecharczyk, Co-founder & CTO of Airbnb says, “Initially, the appeal of AWS was the ease of managing and customizing the stack. It was great to be able to ramp up more servers without having to contact anyone and without having minimum usage commitments. As our company continued to grow, so did our reliance on the AWS cloud and now, we’ve adopted almost all of the features AWS provides. AWS is the easy answer for any Internet business that wants to scale to the next level.”
> Since 2011, Stripe has delivered its PCI-compliant payment platform entirely on AWS, relying on the security best practices as well as easy auditability of the AWS platform. Stripe wants to make it easier than ever for developers to process payments on their web and mobile applications. Using AWS provides Stripe with access to a world-class infrastructure that helps it scale seamlessly and increase developer productivity.
I wasn’t suggesting they were small potatoes, but if your whole infrastructure is just a single rails app and you don’t need S3 or SQS or Lambda or any of the other AWS offerings then you probably aren’t benefiting much from AWS.
Well for me it's more like... why should I pay $5/month for the cheapest DO droplet for a little utility web app that gets less than one hit per day?
Seems like a waste to pay for a month of VM time when only a few milliseconds are actually used. I like the idea of FaaS where I'm not being charged when my code isn't running.
I mean, for me, I host a lot of small utilities on my small DO droplet, as well as a few websites that I do want up all the time. One can do more than one thing on a droplet.
For me it's because I don't want to spend more time configuring a hosting environment than it took to write the utility web app. I can spin up a DigitalOcean droplet in two minutes and spend the other 58 minutes writing the code.
Classical web hosting still exists. Probably even has less vendor lock-in and is less likely to bankrupt you if someone posts the small utility web app on HN.
Really depends on how long the page rendering time is. And you also didn't account for any traffic yet - that's another 0.09$/GB on top.
Taking the statistics from some old Ask HN thread [0] (which is probably an underestimate?) you would need to render your site in less than 2ms to stay below 1$ for that day. And serve less than 51kB of data per page view to stay in the free 1GB of traffic region (this part should be pretty doable if you don't include any large images).
Of course assuming you don't have any existing VM or web hosting you could reuse for that.
Edit: Ups.. Used per-second billing instead of per-ms by accident. So even up to >1s per request it's below 1$ for that actually. So only leaves significant more traffic and lots of bandwidth usage as possible sore points for tiny web services.
Also, when it comes time to scale, it's a lot easier to bring up a relatively simple (but scalable) infrastructure on AWS than deal with: colo location, buying servers, buying spares, replacing failing hardware, imaging, setting up backups, etc etc. Those typically become concerns once you're at big scale where the cost savings are significant.
But for most companies it never "comes time to scale". Especially if you have 100% dedicated hardware and you can start out with a 64 core db server with 120GB of memory and a matching web server pretty much for free.
You need to "scale" early in a cloud environment because you are allocated a fraction of the hardware and someone else on that same hardware is using all the resources. AWS/DO etc boost their margins by packing as many vps's onto one physical box as they can. That's why on a VPS your 95th percentile response time is 1000x your 50% response time. All you can do is "scale" and pray enough of your connections are hitting the boxes that nobody else is using right now.
For what it's worth; unlike OP, I'm not suggesting Co-Lo over cloud. Co-Lo still has its place but for most projects that I'm involved with, low-cost services like DigitalOcean, Linode, or LightSail is the best option.
I love how AWS fanboys make it a back-or-white issue. OVH,Hetzner and others are providing bare metal server that are provionnable by API. The choice is not between full-AWS vs full-On-Premises-By-Hand.
It all depends on how your dev's compensation is structured.
AWS has great reliability and a complete ecosystem of plug-and-play offerings that just work. And it's well documented. It sure is more expensive, but it's easier to achieve the same result. And faster. And keep in mind, the dev isn't footing the AWS bill. And isn't getting paid a penny extra for trimming down said AWS bill.
Of course, if you add stocks to the compensation structure or a bonus based on cloud spending, that changes the incentives.
Self hosting being cheaper is obvious. It's just a nightmare to manage reliability. With AWS or any other IaaS provider you're paying to outsource shifts of people around the clock that will work to keep servers and network connections up and running.
Colocation isn't that much better either because there are still situations (like corrupt filesystems or malfunctioning power supplies) that would require physical access to the machine.
With an IaaS provider you're free to go live wherever you want and focus on your business and not need to schedule shifts of people within driving distance of the colocation facility on-call.
Ultimately you're paying to reclaim time, which in almost all cases for startups is a worthwhile trade. Outsource everything that isn't your core business.
>With AWS or any other IaaS provider you're paying to outsource shifts of people around the clock that will work to keep servers and network connections up and running.
This is so often repeated, but at my previous company, the amount of internal outages was smaller than the amount of outages on AWS or GCP. We then migrated to AWS for some stupid reason (we didn't need scalability, that's for sure) and we had a problem with redis nodes: the instances they were running on were too small and an upgrade went wrong. We waited until the next shift for the fix. The irony.
> Colocation isn't that much better either because there are still situations (like corrupt filesystems or malfunctioning power supplies) that would require physical access to the machine.
I disagree.
You still get all those problems with a shared provider, recall Ghandi's outage?, DigitalOcean, Linnode's and AWS a week ago. Your also stuck in a tighter situation when it does occur.
With years of colocation I have never encountered a power-out event within the datacentre. That's a potential seven years of uptime if I hadn't changed providers in-between.
Nor have I ever suffered a corrupt filesystems and that includes myself preforming DR monthly via sudden power-loss cold-boots. Sure I accidentally deleted /etc/ once.
The only issue out-standing is a failed RAM stick however it's a poorly server which I'm retiring in the new year. Poor thing.
Main pro for me was free network gear. Servers are more expensive that self hosting - but I don't need to buy routers and switches or some expensive HP blade server that has internalized switches.
Is it safe to assume these are nascent startups that need scale to succeed?
If that's the case, then you are probably right that most will never need to scale. You can build them a solution that can't scale and it'll be the correct solution for the 90% that fail. And it'll be the wrong solution for the 10% that succeed.
The 10% that succeed won't need that kind of scale on the day they launch though. Probably not even the first 12-24 months after they launch. There will be plenty of time to see what kinds of traffic they really can expect and then plan for that.
In other words; I can build them something for $10,000 that gets them to market in a month that may not be what they need in five years or I can build them something for $100,000 that gets them to market in a year, meanwhile, their scrappy competitor launched 11 months before them.
You go for AWS because in half a day you can whip up a secure, managed and auto scaling app using their proprietary services like SQS, S3, Lambda, Kinesis, Dynamo etc.
Right, but my argument is that most products--especially when the company is a startup--doesn't need that kind of scalability and won't for quite some time, if ever.
They all start out starry-eyed thinking they're the next unicorn and oh-my-gawd what if we start getting the kind of traffic Facebook gets tomorrow?! That's just not a reality. Facebook themselves didn't get that traffic for several years.
I think you're missing the benefits outside of scalability. There are a lot of native AWS services that help you to quickly prototype and build reliable solutions. A startup saves a lot of money (and time) by not needing to hire a whole devops team to manage various service clusters.
the biggest thing in the cloud is storage and managed databases. replacing a dead server with s3 is super easy. managing a database server can be challening aswell.
if you have one node you still need to do backups and s3 is just way easier. also when you abstract your storage your application has a way better structure. I already refactored tons of applications where System.IO or java.io were all over the place. with object storage from the start, you barely run into that problem.
TIL: AWS is only expensive if your time has no value.
Article neglected to add the cost of labour for setup (edit: aside of racking), maintenance (security updates on infrastructure, etc), and disaster response.
It also missed the cost of downtime based on projected MTTR for a failure. The word "fail" only appears once in the page, and not in regards to recovery, only preventative replacement.
It's absolutely true that colo is cheaper than AWS for projects that can absorb the labour and failure related costs, but that usually is only true for hobby or very small business services.
Do you actually use AWS or are you just parroting something you've read? Do you understand that with AWS you still have to setup instances, manage security updates, permission and deal with disaster response (US-East-1 outage?) If you've run anything at any scale for any length of time you'll know that EC2 hosts frequently get rebooted with short notice.
I have extensive infrastructure running across 3 separate org accounts for blast radius, with multiple separate VPCs per org, RDS databases across multiple AZs, and a large variety of instances.
Everything is built up and torn down via Terraform, and ZERO maintenance tasks are done via the console. Changes to infrastructure configuration are managed via CAB which approves PRs on Github which are automatically deployed via TF cloud.
We do monthly disaster simulations where we replace a _production_ environment.
I'll agree that AWS paints a very rosy picture of their own services, but there is substance underneath that PR.
We see just as much downtown with AWS with DR/HA/MTTR as with colo. You still have to set it all up and maintain it, its just different setup and maintenance than if you actually have a servers in DCs. Most people still have the learning curve with AWS and usually do it incorrectly, or have to wait for Amazon to admit degradation of a service is occurring.
It's really a matter of who within the company wants to know whats behind the curtain. I'd almost always vote for self hosting but its a hard argument internally when everyone thinks the cloud eliminates these issues magically.
> or have to wait for Amazon to admit degradation of a service is occurring.
Please compare apples to apples. If you are comparing to colo, I don't know of any colocation facilities which provide higher level services (say, Kinesis). They might provide some features like CDNs, but nothing very fancy.
You are left with a EC2 vs servers racked in some colocation facility comparison.
At that point, it's not even fair. I can run a terraform script and spin up a copy of the production environment in minutes. I don't have to create purchase orders and wait for hardware. I don't have to issue tickets. If any machine goes down it can be recovered in minutes, and it will spin up in another hypervisor, maybe even a different datacenter.
You don't have to spend time and resources with discussions on how the 'network topology' is going to look like, and then send people off to implement and wire up stuff.
You don't have to spend cycles diagnosing issues just to find out that there's a bad transceiver in a NIC somewhere. YOU DONT CARE. AWS cares and, even if they can't find an issue, stop/start, new hardware in minutes.
You cannot compare the two in good faith. Sure, it might make sense to host some workloads in a colo, but you are giving up a lot.
There is a lot there, and some of it is apples to apples and some is definitely apples to oranges. I'm just saying overall you have the same number of headaches, they are just different headaches. Just last week there was significant degradation that took quite some time to be admitted (admitted might be the wrong word but everyone seemed to know there was an issue with AWS before AWS said their was an issue)
> I don't have to create purchase orders and wait for hardware.
Why would this need to be done if initial implementation was planned and done correctly? Baring catastrophic hardware failures you shouldn't need to do this? Im nitpicking your comment and I think is tangential to my original point (And dont want to get into a planning/design/implementation discussion either).
> You don't have to spend time and resources with discussions on how the 'network topology' is going to look like, and then send people off to implement and wire up stuff.
This isn't that hard if you still have the expertise in house. Yes it is work and knowhow, but so is AWS networking/wiring, service interop.
> AWS cares
I think this is arguable and YMMV. Sure it depends on who you are and who you talk to.
Im not saying AWS isn't viable tech, and there are 30 ways to skin a cat at the end of the day if you end up with a skinned cat you like you're good to go. I believe most people don't need AWS to accomplish what they are looking for (subjective for sure). I also find it not the savings or experience that is in the marketing materials or sales pitches. This isn't a binary right or wrong decision, all we need is a skinned cat at the end of the day.
>Why would this need to be done if initial implementation was planned and done correctly?
Anyone who thinks they planned everything correctly is just deluding themselves. Admitting you've fucked up, don't know how you fucked up yet but will one day need to fix the fuckup is very important in engineering.
It's possible but not to think enough through the process that you need to create purchases orders large enough to need approvals? However if you don't have a basic understanding of hardware you'll need you probably shouldn't be working/in charge of this proejct anyhow. If you fked up this bad you're in for a bad time regardless of where you've decided to host.
That's the thing about this argument- it gets sold as not having to worry about something (racking etc) except the reality is it's just a different worry.
Setting up and maintaining Terraform scripts to manage your AWS infrastructure takes time and effort; besides, there’s this nagging feeling that should something happens to Terraform you’d suddenly be left trying to keep track of hundreds of resources (which cost you money each hour, or millisecond in case of Lambda) and their access policies “by hand”.
On the other hand, it could be a very powerful combo.
> If you've run anything at any scale for any length of time you'll know that EC2 hosts frequently get rebooted with short notice.
They do not. We run many thousands of instances on AWS. Sure, given our scale, every week we get a couple of instance retirement emails. Usually they are issued many days in advance(sometimes, weeks). Per year, we may get a handful of instances that are suddenly unresponsive.
And we don't care. You know why? Because it's just a matter of issuing stop/start. Done! Server is back up, potentially even in a different datacenter, but it is none the wiser. It looks like a reboot. Even better, add an auto-recovery alert and AWS will do this for you, automatically. If part of an ASG, add health checks.
For the most part, we don't even notice when instances go down. Our workloads are engineered to be fault-tolerant. If a meteor destroys one AWS datacenter, it might temporarily take out some instances. So what? New ones will be back very shortly, all the while databases will fail-over, etc.
If these were physical instances, someone would have to do the maintenance work, purchase orders, wait for hardware to arrive, and so on and so forth. And, for most "co-location" scenarios, if your datacenter has issues, everything will go down. A single AZ in AWS has multiple datacenters, you might not even be affected if one goes up in flames.
But let's say you run a massive pet server farm and none of then can go down for any period of time. You have given names and everything, and you celebrate their birthdays. Cool. Run that on GCP then. They do auto-migration. I've never seen an instance go down.
> Do you understand that with AWS you still have to setup instances, manage security updates, permission and deal with disaster response
This is true. However, if you are running your own hardware, you have to do that IN ADDITION TO dealing with hardware and datacenter shenanigans, with either a specialized (and expensive) workforce, or a barely capable one that's shoehorned and doing double duty, with zero economies of scale, probably in a single data-center.
Anyone putting anything in "the cloud" and not designing for failure as a first principle is shooting themselves in the foot. I've encountered a lot of folks trying to do this and building systems that can't tolerate downtime and it starts long, difficult conversations always.
But I've never once needed to actually pay attention to those instance retirement emails or GCP auto-migrations. We've built to expect it and we let the provider handle that for us.
If you're at the scale where a data center going down is a problem and thus you need to run in multiple coasts for disaster recovery, you're going to have to be able to handle the kinds of distributed computing issues that'll crop up in both cases.
The advantage of AWS is that in theory you can script your infrastructure. Deploy to multiple zones, add hosts, remove hosts, deploy artifact, etc. Makes all kinds of things very easy. Dedicated SQL / Cassandra / DynamoDB offerings make that a few clicks to get going.
This saves people costs, but just so happens to be more expensive in terms of hardware costs.
You can replicate all of this in your own DC or whatever. You do need to invest in tooling, hardware, etc. which is worth it at the highest scales. Or worth it if you don't need everything and thus don't need to pay for it.
Personally, I think many large organizations can optimize by running with a mix of on-prem and cloud services.
Demand may be elastic but not totally elastic, so your base load could be cheaper in house. Storage is cheaper on premise for huge datasets.
You can afford enough staff to rack servers, but you aren't dominated by the lack of servers.
You buy more EC2 instances to scale your business, letting the product team deliver the product, and deploy on-prem to squeeze out more efficiency behind them. Yes, you probably don't use the full offerings of a cloud provider, but that also means less lock in at a cost of a little lower convenience and more staff.
I have several million dollars in infrastructure that I'm responsible for in multiple cloud providers and on our own metal in multiple physical datacenters.
In terms of management burden, I will take the cloud infrastructure a thousand times out of ten.
Automation work is easily an order of magnitude of less effort in the cloud than on-prem. Labor dollars spent go exponentially further.
They don't get force-rebooted that often, I have EC2 instances that have been up for over 365 days with no issues.
Setting up instances, managing updates, adding IAM perms is orders of magnitude faster/easier than dealing with rack-and-stack data centers.
Full downtime isn't that frequent, either. The Kinesis outage on Wednesday didn't affect any of our USE1 functionality, with the exception of maybe Cloudfront propagation.
Things got rebooted a lot due to Specter and Meltdown mitigations a few years ago, but rarely now.
We’ve had more issues with the collocation data center, like removing one of the good drives in a degraded raid-array and extremely expensive bandwidth.
EC2 is hardly the main service that people decide over. EKS, ECS, Fargate and Lamba, Aurora and other managed DBs, S3 eliminates a lot of the setup skills and time required to setup something even remotely reliable and on par in features. Sure if you need a reliable VM there are cheaper options perhaps.
clearly you've never had to deal with physical servers. A dead mouse once caused an outage (fire) that took an entire dev team days to fix. I'd rather not deal with that in my lifetime.
Running my site on dedicated servers costs me $500 a month, running it on AWS would cost me about $20,000 a month (probably just for the egress traffic alone).
I could hire four system administrators for that money to look after each of my four dedicated servers full time.
On top of that you first need to learn AWS. That thing is complicated - but I can set up a dedicated server in minutes because it's indistinguishable from my development environment.
I joke that dedicated servers are only expensive if your time and money are worthless. Because AWS is going to eat both.
Can you please explain how would this reduce cost? 1TB of egress on EC2 is $90 and on CloudFront is $85. Or are you suggesting to use something like BunnyCDN with $5/TB instead of AWS services?
We are going to say we used four hours of labor. This includes drive time to the primary data center. Since it is far away (18-20 minute drive), we actually did not go there for several quarters. So over 32 months, we had budgeted $640 for remote hands. We effectively either paid $160/ hr or paid less than that even rounding up to four hours.
100%. We had 11 months where nobody touched the racks as an example and the visit 11 months later was the one to physically get a tool I had left there. I mentioned I even track drive time.
How much time did you spend deciding which components to upgrade and purchasing parts to later install? How much would you charge a client for that time?
Very little, but that is probably because we cover the server industry in-depth with reviews and such. I mentioned a bit that that knowledge is key. This is our cost analysis and not going to be the same for everyone.
For a business with growing needs (most people at the point of making the decision), won't they need to go to install more hardware frequently, if nothing else?
That may be true. We grow 20-30% Y/Y which is not as big as many other places (but is more than most others in our space.) 20-30% growth we handle with refresh cycles since we have extra capacity.
A lot of businesses aren't growing at breakneck speeds, and with modern cpu/ram setups, are often massively underprovisioned. A very big car rental company you'd know the name of was being run on four servers, and they were only upgraded at EOL (if something broke, Dell had same day or next day service)
> They are perfect world actual hours that don't include any unexpected downtime
It isn't clear from this particular post, but considering Patrick refers to "overprovisioning" and having a "hot spare" I get the impression that they have designed their DC setup to minimize unexpected downtime. Sure, it can still happen, but with proper up-front planning what you call "perfect world" can become typical.
> or any ancillary costs outside of racking
As I see it from Patrick's post and my own colo experience, there are three groups of costs for colo that are baked in to the cost of AWS compute and storage: capital cost of hardware (Patrick has a specific line item for this); facility cost; and labor for maintaining the hardware (which is the cost under discussion. What do you think is significant, missing, and unique to colo?
You think bad things don't happen on AWS? You think that an experienced admin isn't going to spend significantly more time trying to communicate with an actual human at Amazon than fixing a problem on a physically controlled server?
We moved to AWS because bad things happen there less often than they were happening to us with various DC providers we used to use.
I've never had AWS tell us that they can't find the server we're renting that suffered HDD failure. AWS has never cut us off from the Internet because a sysop typed a command in the wrong session.
And we find AWS support to be far quicker to respond, and far more effective, than our previous providers.
So, YMMV, but AWS is pretty damn good for our requirements.
I agree with you, but I can't find a good way to communicate this to the other camp. I wonder if you can't clearly see this without having been a "sysadmin" long before "devops" was a title.
It's both myopic and naive to assume there's a binary set of circumstances that lend an upper hand to self-hosting or cloud hosting.
It's a sliding scale. What makes you money? What's your expected growth? What are your security needs? What experience does your team have?
In this thread, people at each end of the scale are 1. conflating different use cases (wordpress is not the same as a massive, scaled data ingest pipeline) and 2. are ignoring the hidden costs of both.
> TIL: AWS is only expensive if your time has no value.
Is it only expensive if developer time is free or if it is, let's say, a quarter of a typical Bay Area senior engineer?
Salary levels are vary quite a lot across the world, even looking only at developed countries. At what $/hr level does managing your infrastructure become more cost-effective?
Which accounts for almost all hosting and startups. Most startups will never see more than a handful of users and yet microservices, lamdba, nosql etc must be deployed because ‘scaling and failover’, wasting their own and investers’ money for nothing. Only a small % will ever need any scaling and failover.
If aws were truly cheaper, we would be using it for everything; currently we mix which is far cheaper. Anything not fail critical that benefits from fixed price cpu/gpu, bandwidth and storage we host traditionally. We and our clients saved fortunes over the years. And for clients or projects that do not need such heavy lifting as aws, we simply host traditionally completely.
It also depends on how computationally intensive your application is, AWS premium can be higher than labor cost if you need large scale data crunching.
Running something on the open Internet is super annoying. If you properly subnet, or restrict to VPN, I can see it as useful but then you have a whole other host of problems to conquer.
If you're doing a static page or such, no issue, but the second you're hosting a webapp or need to have a database, yeah.
When you're uninformed and don't know what you're facing, things seem complicated and cloud solutions tend to look very attractive but what is so hard about running a web app?
For sake, just spin up a single server somewhere cheaper and run it all in there and vertically scale up after you've put the appropriate monitoring to know when you need to scale up.
I agree with you. As counterpoint, I was embarassed for our profession when something like 70% of people in the industry said they were confused with the syntax for httpd.conf
I agree. I’m a contractor and my client is running Azure. Having the possibility to snap my fingers and have a Redis cache appear ready to go is a no brainer - it costs per month what I cost per hour.
Despite the naysayers, you're completely correct. The effort required to build and maintain AWS is so much lower than on-prem. The only time this isn't true is when you have highly optimized, calculated needs where investing in that specialized infrastructure will pay off.
I want my team focused on building stuff that makes us money.
If I'm reading the source html pages of "servethehome.com" correctly, it's a Wordpress site and that's what Patrick is basing hosting cost comparisons on.
I've mentioned before[0] that AWS too expensive and overkill for Wordpress sites (especially simpler non-ecommerce ones). I don't think this conclusion is controversial. Simpler hosting requirements is why Patrick only has to spend 4 hours of labor in 1 year to upgrade some hardware.
It's when you need the higher-level value-added services of AWS services (Dynamo, Redshift, region failover, etc) that the comparison becomes more complicated.
E.g. Companies with mission-critical transactional websites or mobile backends are more complex and they need agility to add/change the infrastructure landscape in response to unknown workloads. They don't have the money (or expertise) to code an in-house version of AWS services portfolio. E.g.[1]
Totally correct. I mention that a bit in the video as well. Also - I do not view this as an AWS v. Colo. We use AWS for some services as well so it is a specific part of the workload we run (not just WP) that is in our hosting cluster.
And again, this is a fraction of what we have in data cetners due to the labs and such.
Wordpress hosting is expensive and overkill to do on AWS if you do it traditionalally.
Add simply static or a similar plugin, whip up a script to upload the static HTML to S3, set up Cloudfront with good cache, and you're done. Your site is faster, more secure, cheaper and can easily run on a t3.micro for peanuts.
Only gotcha is comments, but it's a solved problem ( disqus or any of the alternatives) and similar.
I find it interesting that these comparisons always pit AWS againsta colocation.
It would be much more interesting if the comparison was between AWS and dedicated servers. Dedicated servers have all the benefits of colocation, only you don't need to deal with the hardware. The DC takes care of the hardware, all you have to do it administer the server.
I find dedicated servers to be very cost effective. You can get off the shelf servers in minutes and if you need custom hardware that's only a quote away. Most providers will offer month-to-month billing, so there is no lock-in.
Often dedicated servers are even cheaper than colocation when you add up all the costs of hardware, racking, sparing, financing, etc.
Yesh, I'm so tired of the idea that AWS doesn't involve the same management overhead as any other server. It's so far removed from the truth that I wonder if they are shills.
Same. I’ve been in places that self host and it’s not nearly enough to overcome what AWS (in my case Azure) charge. 99.5% uptime in contracts lets you realistically have a lot of downtime to fix your problems. That’s more than 400 hours of acceptable downtime. We’re in the process of making the switch and we project to pay 30% of what we’re currently paying.
You don't even need to buy a new EPYC or Xeon Scalable system. A 20 core Xeon v3/v4 system with 128gb of ram can be had off lease for $1000. Psychz.net provides 1u colocation with 1/Gbs link (10/Gbs burst) for $80/mo.
Equivalent EC2 is $300-600/month with no better reliability. Plus you have to deal with noisy neighbors and bandwidth overages. Build your baseline usage in colo and burst into AWS unless you like buying Bezos more houses.
Pretty sure EC2 instances and EBS volumes have a lot more redundency than a single server. You really need two colocated servers to replace a single EC2 instance. Still probably cheaper but also a larger time investment.
If the difference between AWS vs colocation is an additional FTE then AWS is cheaper.
I'm always nonplussed by the "additional FTE" argument, its obscenely overestimated -
"""We are going to say we used four hours of labor. This includes drive time to the primary data center. Since it is far away (18-20 minute drive), we actually did not go there for several quarters. So over 32 months, we had budgeted $640 for remote hands. We effectively either paid $160/ hr or paid less than that even rounding up to four hours.
"""
Software and systems configuration wise you aren't really going to be doing much different timewise then you would by doing loop de loops with AWS configuration stuff anyway. Tftp etc is just not that tough.
About 7 years ago we brought in the ELK stack for security log ingestion and basic analytics. It took at least three FTEs to maintain the cluster, along with constant issues of queries crashing clusters, or provisioning/building/repairing the dozens of servers and storage units to handle the several terabytes per day of ingest.
To be able to throw that behind an infinite, horizontally scalable mechanism would have saved us a lot of pain and troubleshooting.
How much of those 3 FTEs would you have needed anyways, though? that's the point-- to compare apples to apples.
Also, dozens of servers is past the point we're talking about, I think.
Bare metal and your own infrastructure makes a lot of sense really small (a few servers); and it may start to make sense again at some point really large.
You have a centralized PXEBoot server (basically a DNS + TFTP Box). All servers you manage auto-boot to PXEBoot, then download their image from the central server, then automatically provision themselves through Puppet or Kubernetes or whatever the heck you're doing.
Throw down a few network power-adapters (to turn on/off in times of emergency), IPMI for network remote KVM, a VPN to provide security, and you're set to run your own small cluster.
-------
If you need more servers, you buy them up, give them a PXEBoot thumb-drive, and add them to your network.
The centralized PXEBoot server is a single point of failure, at least how you presented it.
You also need expertise in DNS/(T)FTP/PXEBoot/hardware/Kubernetes/Puppet/IPMU/KVM/VPN/etc.
It's like that joke about the expert called in to repair expensive factory equipment. He comes in, takes a piece of chalk out of his pocket, marks the piece that should be replaced with an X. A month later they get the bill: $100000. The factory manager wants to see a detailed invoice for that huge expense. They get it 1 week later:
$1 for the piece of chalk
$99999 for the expertise needed to know where to put the X
Shockingly, ebay. Search Dell r730/r630 or Supermicro v3. You won't reasonably get more than 36 cores/72 threads in a single box. I just looked and you could easily spec out the system you describe for under $1000. Possibly even less if you want to use mechanical disks.
For us to get the same functionality as all of these services in colocated data centers would be an INSANE amount of work, we'd have to hire so many IT/hardware specialists for the networking, let alone hiring data storage tech specialists for stuff like Spark/Hadoop.
Looking only at costs for instances is only 10% of the story here.
If you hadn't tethered your software to proprietary AWS services in the first place, you wouldn't have the problem to "replicate" all of that on your own infrastructure. You would simply "have" that stuff - or better said: you'd have the parts that you actually need (having a giant grab bag of ready-to-use cloud services at your disposal tends to entice developers to use them, regardless of actual need, especially in "enterprisey" environments where just another service doesn't really ring a bell with anyone, corporate pays for everything anyway and every developer is constantly engaged in putting more tech buzz words on their CVs).
Queueing and messaging systems, databases, key-value stores, backup solutions etc. have all been invented pre-AWS, and there are battle-tested solutions for all of that out there, actively being used by companies which did not choose to depend on AWS.
I run a business on AWS. I have a few terabytes of data on S3. It costs like $30/mo. I have about a half terabyte of data in RDS. That's about $40/mo, plus a couple thousand a year for the RIs. I run the company myself. I run SES for less than a dollar a month to handle all my transactional emails.
If I move this out of AWS, I need that data replicated for redundancy. I need some meaty servers set up, and I need to maintain the boxes that run the storage and database software. I need to build out the observability software that I get for almost free with Cloudwatch. My database instances need to failover appropriately, which doesn't happen automatically. I need to monitor disk usage and add capacity as the databases grow. I need to set up and manage whatever firewall setup I otherwise get for free with VPCs and security groups.
Managing my own email infrastructure is almost laughable. Spending more than about two and a half minutes on it each month outweighs the benefit of just running it on AWS.
My time isn't free. I either need to hire someone to build this, I need to learn it and apply it correctly, or find some magic docker stuff that does it for me.
Where is the cost savings of moving out of the cloud for me? If I'm spending all of my time in a SSH session installing kernel updates or diagnosing why my storage cluster is misbehaving, when am I supposed to actually build my product?
My company has all of these things on-prem, but with teams of at minimum 5 FTEs backing each one. That's the point here: a small shop cannot necessarily afford to fill an oncall roster with Kafka experts or Postgres experts or whatever else is needed.
100% agreed. Every one of those AWS services can be reproduced in a colo, but it takes time and effort to set up a queueing solution which just exists already in AWS. It takes time and effort to set up notifications for storage and database events which are easy-as-anything in AWS.
Even without getting into multiple availability zones, things like SQS/SNS, Dynamo, S3, and Lamda combine to make things very easy on day one with AWS.
Could it be cheaper? Well, yeah, I'm sure. And if all you're doing is running EC2 instances, go somewhere else. But also, maybe look into some of the other stuff AWS provides!
I agree with the statement, but one thing to note is that if you were building for colocation you would probably build the whole thing in a different way. Not suggesting that it would be trivial or anything, but you probably would not replicate the mentioned services, more likely you would use other/different types of software packages (for example rabbitMQ for queues). It may not be the exact replica but it covers the use cases needed in your application (now no longer designed with "infinitely" scalable cloud).
While one of the benefits of things like AWS and GCP is the availability of all kinds of services, but at the same time we are to some extent loosing control. I have personally had several cases where I was thinking "This thing would be so useful" but AWS/GCP/Pick your cloud did not have it and it would have been a hassle to jam that one piece of supporting software together with the core services, so it never got taken into use.
Yeah, at least for us the value of AWS is not in "is it cheaper to run a boring old server here".
It's in "how easily can we script and template all of our services and associated resources". It's in "when we need a highly durable and available queue, how much work/infra do we need".
When you factor all that stuff in, no way it comes out cheaper to self host here. Just the couple full time employees we need to setup and maintain all the ancillary stuff would cost us more than our AWS bill.
Someone who is doing a cost comparison of running Wordpress and forums has absolutely no idea about any of those mentioned services and the power they bring.
SQS is just such an amazing tool in a developer's toolkit that is one of those transformational pieces to how you write software. Combine that with Lambda to consume messages, S3 as your object storage, and Dynamo as your state management, and you unlock the capability of a singleengineer to write applications that would have taken teams and teams of people only a few years ago.
Again, does this mean anything to the Wordpress crowd? Probably not. But the ability to bring a new project to market quickly and grow that project is no longer bottlenecked by how fast you can get on your phone to your colo provider on Saturday night.
For real. I think people don't understand that the software that cloud providers bring us just a _tremendous_ force multiplier.
How many different vendors and licensing schemes would do you need to bring to your own data center to get an equivalent amount of functionality going? (I ask rhetorically; this is the equivalent of a few FTEs at the least).
I think the moral of the story is: if you have a predictable workload that fits within a smallish, stable infrastructure and you can make on-prem work, that's great, more power to you.
If you are treating cloud services as purely an apples-to-apples cost comparison then you've missed the point of deploying to (insert your favorite cloud provider here). What you get with cloud services is flexibility and speed. If the OP wants to undertake a new project that they have no idea of the resulting workload or popularity, they need to guess about the required underlying infrastructure. If you miss on your guess, you can either kill momentum for that new project, or you can wind up overprovisioning and paying way way too much on hardware.
I'm not sure how quickly their colo provider could spin up new hardware for this hypothetical new project, but you can assume it's not the dynamism that you get from deploying to an elastic cloud environment. Again, you're paying a premium on day-to-day costs in order to have this freedom to create and deploy. That's what cloud infrastructure is about.
To give an example -- I have a very-CPU-heavy workload that started its life out on prem. As more and more customers signed up, I would go through tiers of adding more hardware to a rack, where my overall margins looked like a sawtooth waveform when I would provision more hardware. I started out moving to EC2 and then finally to ECS/Fargate. If I want to spin up a new piece of functionality for my users, I don't need to provision any new hardware, set out any real new infrastructure, or do anything that you would consider prework in order to get that new functionality deployed. My margins are much much more predictable and I get faster time-to-market on feature development. That's what you're paying the premium for, that ability to just move faster.
Our small team spends roughly $40-50k/mo at AWS. If you're just hosting a website with a SQL database (glorified LAMP stack), you're probably in it for the wrong reasons. Stick to a dedicated box or DigitalOcean etc.
AWS is like co-locating your hardware, and then the data center having 1000s of employees offering highly reliable services you can access from your infrastructure that lets you move WAY faster for things that are hard to do.
e.g. Discussion today was increasing log retention. 10 years ago I would have run the numbers, extended some SAN volumes, considered procuring more NetApp shelves, etc.
Today it's simply a cost question: is it worth it to us to store those logs for 10x longer? Sure? Ok, done.
I think the point that you're making here is subtle and that people are missing it.
Your small team can _comfortably manage_ 50k/mo worth of AWS resources. That's _INCREDIBLE_.
I think people mistakenly think that cloud costs only go up exponentially and that costs are an unmanageable mess.
I've worked with teams with hundreds of engineers serving a major enterprise and only an AWS bill 3-5x yours. And they only had a small team managing it all. Comfortably.
To do similar with physical servers requires a massive stack of people and salaries. The ongoing recruiting costs to maintain staffing would dwarf what the AWS bill is.
> To do similar with physical servers requires a massive stack of people and salaries. The ongoing recruiting costs to maintain staffing would dwarf what the AWS bill is.
No no they wouldn't. Number of people is equal for both, its just different tasks they have to do.
I have a feeling you could host that site on a PC at home for even less. It looks like their forum gets < 100 posts per day. I can't imagine they're hitting much more than 1 QPS.
Wait - this article appears to compare hardware costs of self-hosting against the total cost of using AWS. It's missing the most expensive component (labour) which just so happens to be extremely expensive to scale.
Are you including having someone on call who can drive to the colo on a moment's notice? Unless your hardware is redundant and remotely swappable then you need to be ready for physical disaster response. Cloud providers abstract this away for a price.
Not true. In the article remote hands refers to actual hours of maintenance performed ($640 for 4 hours). The cost of having a human on call to solve hardware problems is far higher than that.
There is no need to have anyone on call yourself to drive to the colo, since any decent colocation provider will have staff on-site 24/7 to perform any physical intervention required.
Part of that you pay through your colocation bill and part you pay by the hour (depending on your contract/provider).
Ideally one always has OOB (out of band) access to all devices to make sure one can always recover from any error, such as messing up firewall config or similar. But there are transactional network management tools that can automatically roll back changes in case one messes up.
But the colo techs can do pretty much anything you would do yourself.. Plenty of people run racks in locations they have never set foot into themselves..
That's false equivalence. If your instance dies, you don't need to drive over to the colo, plug into a KVM, diagnose, and then pull and re-rack a new server. They do all that stuff, and it's worth the $$$.
Also, you'll never get into a situation on AWS where you have a 3 day lead-time for a replacement GBIC, or NVMe drive, or seomthing, while your customers scream and bail out for a competitor that isn't down.
Yes, you need to manage the infrastructure, but now it's a software configuration task instead of a physical maintenance one.
I manage a rack of 12 servers, I have 110 hard drives in that rack. I've been managing this rack for a decade. I occasionally go to the datacenter to swap out a machine that is end of life, and I go every 3-6 months to swap out a failing drive or two. I'd estimate I spend < 20 hours a year at the datacenter. Everything else is almost 1:1 with what it would take to manage that using AWS.
Obvious anecdata here, but it's not hard to run a redundant physical infrastructure that is low maintenance. Obviously bad luck is a thing, but with the correct planning and setup, physical infrastructure isn't that big of a deal for most web businesses.
You and people posting similar answers miss a very important point. You've mentioned you have over 10 years of experience managing rack servers. That's super great!
This means a company that hires you does not need to use AWS for some of the services, because you've got the necessary expertise to do it yourself. Other companies buy this expertise via fully managed services, like S3 or Aurora.
Once you have expertise doing X, it might seem wrong paying premium to cloud providers to do the same thing for you. But other companies don't have you on board. They save themselves time recruiting two sysops with partcular skillset, and spin up a working HA rdbms in a day.
Comparing ec2 and vms on xenserver doesn't sound convincing because ec2 itself is simple. But comparing HA Aurora or Dynamo cluster to on-prem solution is a different beast. As a dev, I don't really want to know all the gore details of managing HA rabbitmq, and might opt for SQS instead.
This is a startup/business/tech forum, I am a founder of a 20 person business (hiring 7 more) that does this part-time because it's very low maintenance. I am finally hiring a devops person to handle the command line side of things, though I am still pretty happy to dink around at the datacenter every once in a while.
I taught myself as I went. I save about 10x over AWS. I would likely make the same decision today if I was starting from scratch, as I did 12+ years ago when I first started doing infra for our small website.
Infra is not magical, it's probably the easiest part of my job. Way easier than messing around with JS dependencies, way easier than troubleshooting a segfault in a C extension for our Rails app, and way easier than marketing, business management, accounting, interacting with legal, etc. And takes less time too.
I think the trouble is a lot of developers have little to no insight into the business side of things and treat that initial AWS credit + follow-on VC funding as this magic money fountain. If you're a founder you understand a dollar saved is a dollar that keeps your doors open longer or a dollar into your pocket.
I'm glad you've taught yourself infra things as you needed. But, you see, this is an anecdotal proof (like the whole article, anyway).
Learning infra is not the case all the time. Some companies choose AWS (or any other) for speed.
We can, again, argue but the argument is my PoV vs yours, or any other new commenter. Although I know how to maintain small infra: bunch of vms, reverse proxy and an rdbms, I'm not really into hosting HA Rabbitmq or Postgres myself. There is a ton of problems and configuration options coming from replication and consistency I would rather pay premium to people who have done it in the past.
I want my team to build solutions solving customers problems instead of fixing bugs in our replication code. It's all about perspective, I guess we can't reach a definitive answer for one or the other.
You would rather have a team of specialists for every piece of technology you have. I'd rather have my team members not focus on nuissances of database replication, and instead solve our customers' problems.
Looks like both of us are successful. Why does it have to be that there is only one way to success?
There isn't just one way, but from what I can tell AWS works great for the really small and the really big, and is a massive waste of money for the in-between. They have run an amazingly effective marketing campaign to make everyone feel like they are missing out if they aren't using it.
The learning curve for working with AWS is similar to the learning curve of working on bare metal. It honestly does not take a specialist to use MySQL, Postgres, etc. They are really quite easy to make a HA turnkey solution. It's as easy as learning how to effectively use an amazon service. We aren't talking about the days where you had to shard out which was a big pain in the ass, because machines are cheap now with 128 physical cores, 1tb of memory, massive fast storage, etc. You can run the nastiest largest database on a single machine and achieve a pretty large scale doing it.
One of these paths costs about $2500 a month with server purchase amortization, and on the other about $20,000 per month. Either way I need someone to manage it (unless you are talking heroku level of turn-key, then up that $20k a month to $50k a month).
Why pay premium when there are cheaper cloud alternatives? Just get your team to learn a thing or two than being sold by AWS.
I'd take it most people just feel enthusiastic about technologies they don't even need and being able to run stuff with mouse clicks makes them feel accomplishing.
To me, AWS is just an expensive provider that only makes sense to put backup files because of cheap storage and no incoming bandwidth cost.
Your post contains two distinctive arguments: paying premium for managed services instead of learning a thing or two, and paying AWS for their services.
Re first argument: I don't know how to implement DDoS protection, neither is my company a DDoS protection shop. Should I learn how to do it myself, or pay CloudFlare to do it for me? What about S3? What about Aurora/Dynamo/SQS? Where is the line between "your company TOTALLY has to do it on its own" vs "pay the people who know how to run X to run it for you"?
Regarding AWS prices: it's not like you are forced to use AWS. Use the cloud your team is most familiar with. I'm not an expert on cloud offerings, yet if your problem is $20k/month and $15k/month infrastracture costs, you might be solving a wrong problem.
That's true - I was comparing managing a server rack in the office building against AWS - not comparing the relatively light hardware interaction needs when it comes to co-location.
99.9% of colo facilities offer 24/7 remote hands. With the extreme cost savings of colo you could build in very significant redundancy, which is to say nothing of how infrequently modern server hardware fails.
The whole point of using the AWS ecosystem is generally to use as many of the managed services as possible so as to minimize the latter. If you're using it to be multi-cloud portable, then yes, your point stands -- but then it brings into question whether you're exploiting its capabilities to their full potential.
Of course. However, you don't need someone to provision and monitor power, connectivity, datacenter security, and all the responsibilities AWS absolutely prices in for you. You also substantially reduce your effort and cost toward capacity planning and managing depreciation. You pay for consumption of _all the things_ required to operate compute infrastructure, plus their margin.
There are certainly companies and problems for which colocating still makes more sense than the cloud but 9 out of 10 of these articles completely gloss over most of the costs. This article is one of them.
I wondered whether this comparison (like many others like it) would /finally/ bake cluster operational costs into the calculus. Not this time, it looks like.
To raise a more constructive point. Looking at [1] (an earlier post in the series), it looks like their workload is WordPress + VBulletin forums, and "pets" not "cattle" -- I wonder how much more they'd be able to get from separating their stateful and stateless layers more cleanly, and using some of the more powerful (and cheaper) AWS primitives for serving traffic that involves the latter. Why do they need such beefy instances for essentially serving content?
The truth is, it's because the platforms they're using (vBulletin + WP) exist, they're very powerful and they get the job done. This leads me to believe that the operational UX for running these kinds of common applications (which compose most of the internet) on AWS without highly technical supervision is not great.
The biggest advantage of using self-hosted infrastructure is real freedom from the whims of large cloud providers and the governments who indirectly exercise control over which company, country and individual such cloud providers can do business with.
On a large cloud provider a government notice is enough to remove access to compute resources and data hosted in it. This in essence means company or individual using this cloud services neither own the compute resources and data hosted in it.
On the other hand in self-hosted infrastructure government notice will not block access to resources owned by you and data in those resources. You can respond to notice and continue using those resources until court rules. So it is more freedom than a cloud can offer.
Obviously given the large marketing budget and convenience cloud provider flourish at the expense of freedom.
Self-hosted infrastructure needs a new renaissance given whats happening around the world and how government and large corporation taking away freedom one bite at a time. So it is very essential for development and freedom for humanity to have self-hosted compute, networking and data infrastructure.
I have a hard time believing the setup pictured offers equivalent reliability/disaster recovery to "cloud" hosting.
As far as I can tell, all traffic is being routed through two servers. These servers are running Linux? What happens when a routine software update bricks your routing (it could happen to anyone). Without remote access you need someone to go in and fix this. Do you have after hours access? Is someone nearby on call to respond to complex issues in person? Not all physical problems can be solved by a random colo tech.
Colocation (as pictured) lacks management of a lot of variables that could lead to big problems. When your downtime targets are in minutes per year any incident requiring physical response is unacceptable.
A simple Google search reveals a service philosophy compatible with this hosting[1]:
> As with all web systems, at some point, downtime is required.
I'm probably sure this is response to the AWS Reinvent announcments on the front page. Not that I work on AWS, but we heavily use and rely on it.
The ones comparing AWS to VPSs even colocation think AWS is just short sighted. If VMs and bandwidth is your only case, yes it is expensive. Skipping the marketing talk, AWS has great services like SQS, S3, Dynamo, SNS, Kinesis, Firehose and hosting open-soure or paid alternatives of them requires a lot of engineering power, effort and extra monitoring, restless nights. I prefer our engineers to work on features, not infrastructure and only deal with it when it is required financially.
Most organizations are fine with vendor lock-in when vendor is stable (not deprecating, not increasing costs). Money is always an issue but its relative, there are many factors in real business. Some can afford cloud, some cannot.
I also agree that you would be fine if you did not use brand new X service that does AutoML, any shiny new over marketed features for you etc. but comparing AWS to Colocation is just one dimensional thought.
Some of the big benefits of major cloud providers (AWS, Azure, GCP):
- the sheer number of services available
- managed services
- high availability, within and/or across regions
- APIs/SDKs for many of their services, for many languages
- monitoring and alerting built-in
- RBAC
- means of grouping and organising resources and subscriptions
I work as an architect in the enterprise space, where those last 2 are pretty important because a lot of different teams are building/using a lot of different systems, and often multiple organisations are involved. High availability of some services is vital, because downtime can affect the bottom line, or mean an entire workforce has to down their tools.
Considering only the first 3 items, most of the non-huge systems I design - and, I would imagine, most non-enterprise/FAANG-scale systems - have only modest availability requirements, very rarely needing cross-region availability. And most of those systems only really need web apps and APIs, proxies/gateways, a database, a message queue, blob storage, and often some form of background processing (VMs, containers, serverless). OK, that's still a fairly long list, but it's a tiny fraction of what the big cloud providers have on offer, and the database is probably the only tricky thing on the list.
Makes sense. Good reasoning for the use case you have in mind.
Cloud stuff makes sense when you have some combo of the following imho:
* Need for scalability - at one place I worked high-load was 3x low-load so going up and down saved a lot of time
* Rare need for powerful resources (sort of the same as above) - I provisioned a real fatboi on AWS the other day to do some file processing. Worked like a charm and gone in the hour
* Ability to use higher-level primitives - lambda vs. EC2 instances, databricks vs. EMR, IAM/Kinesis all that
* Need for exploration - I could not afford the peak ElasticSearch + Redis + Postgresql that I had but I can trial it
* Need for resilience - If you need to preserve the data/compute for whatever reason, AWS is way easier to get to a good place
* Lack of knowledge on top-tier ops - I've run long-term dedicated servers for almost two decades now, with top uptime in a decade. I have fuck-all knowledge of doing smart colo ops. If you want to keep your team lean and you don't have this to lean on, it's going to sink you.
If you're using hardware as hardware and it isn't changing that much and you have the skillset, you can get a long way with a colo even today.
These calculations never account for capital overhead of having spares on hand. That cost is part of what you pay for on Amazon. If the hardware breaks on Amazon, you just switch hardware.
If hardware breaks in your datacenter, it's an emergency and you better have spare capacity or extra hardware laying around.
I tried to back into the estimates here, and I can't quite get my head around them.
$737.19 of transfer out is 8TB per month at standard EC2->internet prices, which is fine; something like 25 Mb/s on average.
$3558.56 for EC2 is something like 20 x m5a.2xlarge ... and that makes no sense to me. That's 160 vCPUs and 640GB of RAM.
How does a WP site like that possibly need that much compute? The whole thing feels like it ought to be a pair of instances plus $20 a month for Cloudflare.
Yeah, I'm rather confused too. I run a site with over 10TB transfer per month on a $15 / month Digital Ocean virtual machine. It's fast enough and Cloudflare caches over 80% so I seldom go over the included 2TB of monthly bandwidth.
Even ignoring labor, none of these analyses ever consider BC/DR or backup costs.
S3 for backups is the single biggest ease-of-mind feature AWS provides, IMO. I do not want all my data sitting in a single colocation facility without backups.
If I run a database on RDS, maybe it costs more than a bare-metal server, but I can replicate the automatic hourly backups to as many regions as I want. Incredible peace of mind over a self-managed facility.
"S3 for backups is the single biggest ease-of-mind feature AWS provides, IMO. I do not want all my data sitting in a single colocation facility without backups."
I'm biased, but in my opinion if your infrastructure is on AWS and your backups are on AWS you're doing it wrong. Even accounting for zones and geo-disparity, etc.
... which is made possible by rclone[1], which is excellent, and the fact that rsync.net has rclone built into the remote environment[2] such that you can execute it over SSH.
It never fails to amaze me how many HN contributors don't seem to read the linked articles yet seem compelled to tell you off for things you've already accounted for, explained and justified.
There's absolutely nothing in the article (which I did read) about it. If you yourself had read the article, you would know that it was not explained, accounted for, or justified.
There is a single vague reference to a "primary DC". There is no discussion at all of backups or disaster recovery.
It's because comments are usually far more intelligent and interesting to read and I set my app to open up comments than the story by default. I don't blame the others doing the same.
So how can you comment on, or have a critical thought about an article, if you're just reading a bunch of other comments likely left by others who also likely didn't read the article?
I find that genuinely baffling, and somewhat worrying, especially on more serious topics.
I have my own 1/2 cab, AS number, ARIN allocated IP4 and IP6 space. It costs about $500/mo. It took maybe 100 hours to set up the basics and $6000 for the way I built the hw (which is similar to servethehome's rack). I run everything on FreeBSD. I get to directly talk BGP and get bandwidth at .12c/gbit. Nothing else can come close to this level of low cost, control, learning, and autonomy.
My advise as a CTO is to avoid AWS or GCS in all stages of a healthy company. The danger of getting locked-in. The aggressive sales team bugging you every week. The impossible to comprehend pricing model.
Hire yourself a decent set of says engineers and go for VMs in an early stage of the company. A switch to bare metal could be taken in a later stage. No always needed.
I just had a simple PHP application move from simple VPS hosting to full blown AWS with load balancers, RDS databases and EFS storage.
It's still the same app the load balancer can only send the traffic to one machine. RDS now means DB connections are TCP not pipes. The EFS is slower than an SSD. So it's slower and costs more.
Why did this happen? Because an external consultant said AWS is what the big players use. I argued that our app needs to be engineered to take advantage of AWS or this is an expensive waste. But what would I know, we went with the consultants and payed them tens of thousands to set up a bunch of AWS infrastructure I could have run on a $10 a month VPS...
Maybe this is my old-school sysadmin showing, but how is managing a rack server more difficult than managing an EC2 instance?
The hardware part is not difficult, if you've built a PC you can set up a rack server (it's arguably easier because they're built to assemble without needing tools), and the software is all Linux on both, no? Or do you mean hiring on-site engineers within driving distance of your rack servers is difficult?
The pieces aren't difficult in isolation, but the combined logistical total cost of operational ownership is high and includes huge tail risks. When your metal fails, it stays down until physically fixed. Depending on your business, that's either acceptable or not.
We mitigate that risk by having our backup servers and database replicas in the cloud, but the big beefy primary machines are bare metal run on cheap Quebec power w/ unlimited 1Gbps residential bandwidth.
For those generations of people who grew up by being able to spin up servers with a few mouse clicks, doing that manually seems like a whole level of professional skills.
I feel almost the opposite. The AWS console seems so chaotic and needlessly complex to me compared to the simple familiarity of plugging in a hardware server. Maybe I'm an old geezer already but I'm not even that old!
When you can effectively spawn a new server with a command line you've got saved in your notes... it kinda feels a bit inefficient to be physically building and deploying them yourself.
It's also really pretty amazing to be able to spin up instances or services to test with for however many hours you need and then simply release them again.
Once you've got used to that level of freedom and convenience it's hard to go back, everything just moves faster.
But I have a beefy server with capacity for many extra VMs for a fraction of the price, so I can still spin them up on a whim. The power use is proportional to system load so I'm not paying much continuous overhead, just the up-front cost of having a larger server.
AWS gains is like a bell curve, the value is highest in the middle. If you're very small, you probably don't need it, and DigitalOcean without surprise costs should do. When you start getting large enough, AWS starts paying off, as you get really huge, then it might be cheaper to colocate and hire your own admins. However, I think Netflix shows you can't be too big tho for AWS. Their hybrid approach is probably best. Run on AWS/GCP for things that you just can't afford to fail, then move your non critical workload to colocation.
I don't buy that, liability to who? If Netflix goes down, I'm gonna cry and tweet about it, I'm not going to cancel my account. Their catalog is unmatched. colocating across the world is not easy, they also have scale that goes up and down based on time of day, and say holidays where they really need to scale up. Cloud is great for that, the elastic part is great, place the core on there and scale up and down as needed. What they did with active-active resiliency and being able to shift traffic across regions is the entire point of cloud, as we some a bunch of services where down last week with AWS having issue but netflix never skips a beat on such issues.
I don't mean they'll make excuses toward customers but say they can simply blame AWS toward content providers when something happens to the infrastructure instead of begging them to wait while they prepare for a damage report on their custom infrastructure.
I host a web app that has 200KB pages and reaches 3.5Gbps peak bandwidth in two racks in an expensive datacenter. The lease plus bandwidth is about 1/5 to 1/7 the cost of AWS.
Always love Patrick's posts. Lambda has built a business on moving people from the cloud to on-prem and reducing the cost of cloud AI infrastructure. As some of the other comments have pointed out, for machine learning applications with GPUs, it almost always makes sense to run your workloads on prem, STH is showing that it can also true for traditional CPU workloads.
For example, we're able to provide GPU instances (https://lambdalabs.com/service/gpu-cloud) that are half the hourly cost of AWS hourly on-demand pricing. How? Because there are huge markups on clouds services.
We've done such extensive benchmarking and TCO analysis and the jury is out: it's simply less expensive to run on-prem. You're just paying for convenience when using a cloud. GPU or otherwise.
We used to use an independent host (essentially self hosting) but they went under due to COVID. We moved nearly 1000 customers over to Azure and in the end we are saving money due to simple things like reservations and being cost conscious. We also took over a company that used AWS and they were over spending on their hosting by almost 10-20x what they needed. Not because AWS is inherently expensive but because it was so mismanaged. I don't think self hosting is worth it, when I can freely scale up or down machines in seconds on Azure and my price goes down, we have no fixed costs or maintenance worries. I think it's a fair trade off.
It all depends on the scale of not only the app, but the organization too. We have enough people to click buttons in RDS. We don't have enough to mess around with replication, backups, etc. The cost isn't just for the hardware, but the services that we'd otherwise have to hire someone to do.
I used to work at a place that would borrow hardware from one client in order to handle the load on another. It was for seasonal stuff, a few days a year when we knew someone would melt down if they didn't have double or triple the capacity, but they didn't need it the rest of the year.
Now I deal in auto scale groups and don't care how many servers we use or need as long as it isn't wasteful. "Hardware" that isn't ours is so easy to swap out, try a different size or scale that is hate to go back to metal.
That's is great, not to mention you get real CPUs, no throttling inside your LAN, real storage.
The cloud makes you addicted to the "but you have to scale" by selling 2 to 4 vCPUs plus 8 to 16GB of memory for the same price you can get a 12 vCPU with 64 or 128GB somewhere else.
Having previously performed an extensive TCO calculation (2010) it was substantially cheaper for on-prem vs AWS. We had <10000 machines, mostly cheap/supermicro. We had our own parts inventory and staff dedicated to maintaining equipment. I also ran facility operations for things like the generator, fuel testing, UPS PMs, air filters, professional cleaning, etc.
You have to know your business and you need people with facilities skills. Investing in cold aisle containment reduced our PUE quite a bit. You also have to become an expert in logistics to run your own facility, where just almost anyone can click away in AWS and spend money.
For those of you who have a modern serverless architecture (lambda, dynamo, firebase etc.), how do the costs compare there? I’ve always thought serverless is cheaper, but I’ve never ran anything at scale.
Yes, AWS is expensive. And depending on your workload you can certainly save a fair wedge of cash by colo-ing things.
For wordpress, (the site that is running serverthehome) its perfectly possible to run it in a lambda, with aggressive caching and save a whole bunch of cash. depending on plugins, load and a number of other things, it could be a potential saving of 90%. (plus a massive reduction in attack surface)
Before you ask, yes I do know from bitter experience, parts of the financial times had WP wedged into them. making them fast and secure was an interesting experience.
AWS is only cost effective when you are using ec2 with a duty cycle of less than 50%. This means that you are not using AWS for hosting 24/7 compute.
In terms of storage, There are two compelling offerings, S3 and EFS/lustre. however if you're a large scale EFS user, its better to run your own GPFS system.
The hardest part of "scaling" is orchestration. If you're using K8s, there really isn't much difference between running it on virtual vs real steel, barring bringup scripts(don't get me started on networking in K8s, its totally warped.)
TLDR:
IF you need 100+ machines on 24/7, AWS is going to be more expensive.
If you have transient loads, and the average time on for a machine is 4 hours or less, then AWS is for you. However, you'd better use all the other bits that come with AWS to make it cheaper, like fronting it with fastly/otherCDN
Yeah, last place I worked had transient loads we pretty much did everything with just ec2 t3a.medium spot instances (minus s3 and rds [which kind sucks with using postgresql if you want to resize down or load from a pg_dump since you cant copy the sql to the rds instance and forced to backup over the network], I wanted to move to ec2 instances and from s3 to b2 since its cheaper for bandwidth with [we had cdn77 in front of cloudfront to lower costs outside of US/EU]).
Using docker or k8s just doubled the costs over just using disk images and custom load-balancing daemon with nginx (with 3rd party modules for changing upstreams without restarts/reloads)/python, on top of adding more networking complexity (like to caching/search/rabbitmq instances). Probably around 20+ machines at peak when i left, but kept them all below 20% cpu. Sad to see so many places go in the docker/k8s direction.
The fact that static old school servers in the cloud are more expensive than hosting it yourself is news to precisely nobody. That’s not the business case for the cloud. In 2020 if your tech stack is still a monolith of giant servers on a rack somewhere you’ve got far bigger problems in your future.
Meanwhile, when I build something for a client my first go-to is something like DigitalOcean or Amazon LightSail. I know, LightSail is still AWS. But it's not going to automatically increase the bill by 300% because someone accidentally toggled something in AWS's byzantine admin console.
When ("if" is more often the case) that client starts to see the kinds of traffic that necessitates a full-scale AWS build out then I'll advise them of their options. By that point, they're actually generating real revenue and, as importantly, they didn't pay me to prematurely optimize a product that wasn't yet generating any revenue, which helped them succeed.