Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Have you ever switched cloud?
274 points by dustinmoris on April 7, 2022 | hide | past | favorite | 259 comments
Has anyone ever switched clouds from one service provider to another (e.g. AWS to Azure or vice versa), partially or entirely?

If so why? They all offer almost identical services. Do small (but maybe significant?) differences or unique products (e.g. Spanner) make such a big difference that it has swayed someone to switch their cloud infrastructure?

I wonder how much these little things matter and how such a transition (in partial or as a whole) went along and how key stakeholders (who were possibly heavily invested in one cloud or felt they were responsible for the initial choice) were convinced to make the switch?

I'd love to hear some stories from real world experiences and crucially what it was that pushed the last domino to make the move.




Yes. I once did zero downtime migration first from AWS to Google, then from Google to Hetzner for a client. Mostly for cost reasons: they had a lot of free credits, and moved to Hetzner when they ran out.

Their savings from using the credits were at least 20x what the migrations cost.

We did the migration by having reverse proxies in each environment that could proxy to backends each place, set up a VPN between them, and switched DNS. Trickiest part was the database failover and ensuring updates would be retried transparently after switching master.

Upside was that afterwards they had a setup that was provider agnostic and ready to do transparent failover of every part of the service, all effectively paid for through the free credits they got.


Would you have a write up in more detail of what you did, even high level. Seems cool thing to do


Unfortunately not, but it's surprisingly straight-forward, apart from the database bit, but here's a bit more detail from memory. There are many ways of doing this and some will depend strongly on which tools you're comfortable with (e.g. nginx vs. haproxy vs. some other reverse proxy is largely down to which one you know best and/or already have in the mix) [Today I might have considered K8s, but this was before that was even a realistic option, but frankly even with K8s I'm not sure -- the setup in question was very simple to maintain]:

* Set up haproxy, nginx or similar as reverse proxy and carefully decide if you can handle retries on failed queries. If you want true zero-downtime migration there's a challenge here in making sure you have a setup that lets you add and remove backends transparently. There are many ways of doing this of various complexity. I've tended to favour using dynamic dns updates for this; in this specific instance we used Hashicorp's Consul to keep dns updated w/services. I've also used ngx_mruby for instances where I needed more complex backend selection (allows writing Ruby code to execute within nginx)

* Set up a VPN (or more depending on your networking setup) between the locations so that the reverse proxy can reach backends in both/all locations, and so that the backends can reach databases both places.

* Replicate the database to the new location.

* Ensure your app has a mechanism for determining which database to use as the master. Just as for the reverse proxy we used Consul to select. All backends would switch on promoting a replica to master.

* Ensure you have a fast method to promote a database replica to a master. You don't want to be in a situation of having to fiddle with this. We had fully automated scripts to do the failover.

* Ensure your app gracefully handles database failure of whatever it thinks the current master is. This is the trickiest bit in some cases, as you either need to make sure updates are idempotent, or you need to make sure updates during the switchover either reliably fail or reliably succeed. In the case I mentioned we were able to safely retry requests, but in many cases it'll be safer to just punt on true zero downtime migration assuming your setup can handle promotion of the new master fast enough (in our case the promotion of the new Postgres master took literally a couple of seconds, during which any failing updates would just translate to some page loads being slow as they retried, but if we hadn't been able to retry it'd have meant a few seconds downtime).

Once you have the new environment running and capable of handling requests (but using the database in the old environment):

* Reduce DNS record TTL.

* Ensure the new backends are added to the reverse proxy. You should start seeing requests flow through the new backends and can verify error rates aren't increasing. This should be quick to undo if you see errors.

* Update DNS to add the new environment reverse proxy. You should start seeing requests hit the new reverse proxy, and some of it should flow through the new backends. Wait to see if any issues.

* Promote the replica in the new location to master and verify everything still works. Ensure whatever replication you need from the new master works. You should now see all database requests hitting the new master.

* Drain connections from the old backends (remove them from the pool, but leave them running until they're not handling any requests). You should now have all traffic past the reverse proxy going via the new environment.

* Update DNS to remove the old environment reverse proxy. Wait for all traffic to stop hitting the old reverse proxy.

* When you're confident everything is fine, you can disable the old environment and bring DNS TTL back up.

The precise sequencing is very much a question of preference - the point is you're just switching over and testing change by change, and through most of them you can go a step back without too much trouble. I tend to prefer ensuring you do changes that are low effort to reverse first. Need to keep in mind that some changes (like DNS) can take some time to propagate.

EDIT: You'll note most of this is basically to treat both sites as one large environment using a VPN to tie them together and ensure you have proper high availability. Once you do, the rest of the migration is basically just failing over.


People get paid hard cash for lower quality plans than you’ve just provided, thanks a lot! :)


> If you want true zero-downtime migration there's a challenge

It is astounding how many people require 24/7 ops... while working 8/5.

Otherwise this comment is an exemplar on how things should be done. My take on this is what OP is a sysadmin, not a dev. *smug smile*


> It is astounding how many people require 24/7 ops... while working 8/5.

In this case the client had an actually global audience. They could have afforded downtime for the actual transition, but it was a usual test for the high availability features that mattered for them.

I do agree with the overall principle, though - a whole lot of people think they need 24/7 and can't afford downtime, yet almost all of them are a lot less important than e.g. my bank, which do not hesitate to shut down their online banking for maintenance now and again. As it turns out, most people can afford downtime as long as it's planned and announced. Convincing management of that is a whole other issue.

> My take on this is what OP is a sysadmin, not a dev. smug smile

Hah. I'd say I was devops before devops was a thing. I started out writing code, but my first startup was an ISP where I was thrown head-first into learning networks (we couldn't afford to pay to have our upstream provider help set up our connection, so I learnt to configure cisco routers while having our provider on the phone and feigning troubleshooting with a lot of "so what do you have on your side?") and sysadmin stuff, and I've oscillated back and forth between operations and development ever since. Way too few developers have experienced the sysadmin side, and it's costing a lot of companies a lot of money to have devs that are increasingly oblivious to hardware and networks.


> It is astounding how many people require 24/7 ops

Yet when us-east-1 goes offline, it's mostly just shrug wait for it to come back because it's not our fault...


Really well done keeping this simple!

It's also another one of those situations where good design principles and coding practices pay off. If the app is a tangled mess of interconnected services, scripts, and cron jobs this kind of transition won't be possible.


Damn, this is why I come to HN. This is awesome, thank you so much for taking the time to write it up.


This was really nice to read. Thanks!


Bump! This sounds very interesting.


Highly recommend WireGuard for this (see kilo for k8s specific that works with whatever network you have setup). Setting up a VPN that just works is super simple.


yep, wireguard is the secret for intercloud, for sure.


Same. Bump!


We haven't done a cloud migration, but I know from zero-downtime DATABASE migrations that if you do this:

- don't use any cloud service that isn't a packaged version of an installable/usable OSS project

- architect your services to be able to double-write and switchover the read source with A/B deployments

If you can migrate your database without downtime this way, then you are much more flexible than if not.


> architect your services to be able to double-write

Can you share any details on how to achieve this?

For instance, if the first database accepts the write but the second is temporarily inaccessible or throws an error, do you roll back the transaction on the first and throw an error, or <insert_clever_thing_here> ... ?


You're right double-writes are flexible, and it's great when it works. With schema migrations I'm fine with it, because you can usually enforce consistency.

But with migrations from one database to another at different locations, I'm lukewarm to it because it means having to handle split-brain scenarios, and often that ends up resulting in a level of complexity that costs far more than it's worth. Of course your mileage may vary - there are situations where it's easy enough and/or where the value in doing it is high enough.


At my previous job we ran a bare metal cluster at Hetzner, and monitoring the hardware was quite an intensive task. Always monitoring hard drives network bottlenecks, CPU usage, etc. This was before K8s, so it might not be comparable to today.

Would you say bare metal cost a lot of extra monitoring/maintenance, or is this something you did on the cloud hardware as well anyway? Do you run virtualization on the Hetzner machines?


I would say cloud costs a lot of extra maintenance. When I did contracting, those of my clients who insisted on AWS tended to be far more profitable clients for me, because they needed help with so much more stuff.

In terms of monitoring, it boils down to picking a solution and building the appropriate monitoring agent into your deployment.

I've run basically anything I run in some virtualized env. or other since ~2006 at least, be it OpenVz (ages ago), KVM, or docker. And that goes for Hetzner too. It makes it easy to ensure that the app environment is identical no matter where you move things. I managed on environment where we had stuff running on prem and in several colo's, on dedicated servers at Hetzner, and in VMs, and even on the VMs we still containerised everything - deployment of new containers was identical no matter where you deployed. All of the environment specific details were hidden in the initial host setup.


Just a note, there are some companies that have been popping up recently to try to bridge the gap for services on clouds like Hetzner.

https://elest.io

https://nimbusws.com (I'm building this one so I'm biased for it).

> Would you say bare metal cost a lot of extra monitoring/maintenance, or is this something you did on the cloud hardware as well anyway? Do you run virtualization on the Hetzner machines?

It cost a lot of monitoring/maintenance up front, but once things are purring the costs amortize really well. Hetzner has the occasional hard drive failure[0], but you should be running in a RAIDed setup (that's the default for Hetzner-installed OSes), so you do have some time. They also replace drives very quickly.

If you really want to remove this headache, you run something like Ceph and make sure data copies are properly replicated to multiple hosts and you'll be fine if two drives on a single host die at the same time. Obviously nothing is every that easy but I know that I spend pretty much no time thinking about it these days.

I run a Kubernetes cluster and most of my downtime/outages have been self-inflicted, I'm wonderfully happy with my cluster now. Also another thing to note is that control plane downtime != workload downtime, which is another nice thing -- and you can hook up grafana etc to monitor it all.

[0]: https://vadosware.io/post/handling-your-first-dead-hetzner-h...


My estimates - the company had a combined total of $200,000 dollars in credits across both clouds and the OP charged $10,000 for the migrations.


You're reasonably in the ballpark. It was complicated a bit because hosting on Hetzner was so much cheaper for them that $1 in credit was not worth $1 to them, as if/when they had to spend cash they spent substantially less than that at Hetzner.


Thanks. Yeah, I totally understand that dynamic. Now that we have run out of credits on AWS and are shifting some of our workloads into a datacenter ourselves. :)


Several years ago I had a string of clients who also came into enough Google Cloud credits to make a switch worthwhile.

For these companies it wasn't a problem to have a few minutes of downtime, so the task was simply recreating their (usually AWS) production environment in Google Cloud.


If it had been a one off migration we might have done the same, but since the end goal was Hetzner from the start, and that meant their architecture needed to handle the HA piece anyway, and served as nice validation we really could fail over without anything going down.


I hate to be the bean counter, but what was the true cost ultimately? As you and your team's cost as resources.

It's nice that you ended up with a provider agnostic capability to deploy anywhere, but none of that was free in terms of ownership costs to get there.


I was the only person doing the migration work and setting up the HA setup and documenting it for them to take over. My fee for setting it up and doing it accounted for that 20x difference. Hetzner in the end was far cheaper for them to run including the devops work they contracted to me. I effectively got paid to reduce my future earnings from them. But that was fine; when I was doing contracting, that was a big part of my pitch - that I'd help drive down their costs and pay for myself many times over - if people were in doubt I'd offer to take payment as a percent of what they saved.

So, no, it wasn't free, but it saved them far more money than it cost them, both the initial transition and in ongoing total cost of operation.

In fact, my first project for them was to do extensive cost-modelling of their development and operations.


Ah, now I understand. Sorry it hadn't joined up in my brain you were providing this as a service as a third party. Thanks for the explanation.


See this older sibling comment https://news.ycombinator.com/item?id=30950842


At GitLab we went from AWS to Azure, then to Google Cloud (this was a few years ago). AWS was what we started with, and I think like most companies very little attention was paid to the costs, setup, etc. The result was that we were basically setting money on fire.

I think at some point Azure announced $X in free credits for YC members, and GitLab determined this would save us something like a year's worth in bills (quite a bit of money at the time). Moving over was rather painful, and I think in the months that we used Azure literally nobody was happy with it. In addition, I recall we burned through the free credits _very_ fast.

I don't recall the exact reasoning for switching to GCP, but I do recall it being quite a challenging process that took quite some time. Our experiences with GCP were much better, but I wouldn't call it perfect. In particular, GCP had/has a tendency to just randomly terminate VMs for no clear reason whatsoever. Sometimes they would terminate cleanly, other times they would end up in a sort of purgatory/in between state, resulting in other systems still trying to connect to them but eventually timing out, instead of just erroring right away. IIRC over time we got better at handling it, but it felt very Google-like to me to just go "Something broke, but we'll never tell you why".

Looking back, if I were to start a company I'd probably stick with something like Hetzner or another affordable bare metal provider. Cloud services are great _if_ you use their services to the fullest extend possible, but I suspect for 90% of the cases it just ends up being a huge cost factor, without the benefits making it worth it.


I love Hetzner, and host most of my own stuff on it. And it takes so little to be prepared to move. E.g. a basic service discovery mechanism, a reverse proxy and putting things in containers and you can migrate anywhere. Now that Hetzner has some cloud features too I see even less reason to go elsewhere (though in my current job we use AWS, but we use AWS with the explicit understanding that we're low volume - currently mostly running internal tools - and can afford the premium; if we needed to scale I'd push for putting our base load somewhere cheaper, like Hetzner)

One additional suggestion to people considering bare metal: Consider baking in a VPN setup from the start, and pick a service discovery mechanism (such as e.g Consul) that is reasonably easy to operate across data centres. Now you have what you need to do migration if you need to, but you also have the backbone to turn your setup into a hybrid setup that can extend into whichever cloud provider you want too.

A reason for wanting that is that one of the best ways I've found of cutting the cost of using bare metal even further is to have the ability to handle bursts by spinning up cloud instances in a pinch. It allows you to safely increase the utilisation levels of your bare metal setup substantially with according cost savings even if you in practice rarely end up needing the burst capability. It doesn't even need to be fully automated, as long as your infra setup is flexible enough to accommodate it reasonably rapidly. E.g. just having an AMI ready to go with whatever you need to have it connect to a VPN endpoint and hook into your service discovery/orchestration on startup can be enough.


Azure to GCP was right after GV became an investor. Again, credits was the reason to change over. Not the poor Azure performance.


One of the things I'm immensely curious about is how you handle security/networking/firewalls when working with Hetzner or other bare metal providers? It seems they don't provide network gear for firewalls and to protect against DDOS attacks?

Do you just use iptables? Or do you build out more complex solutions like software routers running on Linux/BSD?

I work in online gaming, and we're constantly seeing attacks on our infra.


Hey! I also make games and we solved the issue by doing stateless ACLs on the border switches and having a really fat pipe.

You can add a magic header to traffic and drop anything that doesn’t contain the header.

Since this is done in hardware it operates at line speed. In our case 100GBit/s.


This seems such a good trick. We could even do it at cloudflare rule level I guess.


If you have zero-cost internal network costs, I'd consider adding another server in front of the primary servers to act as a reverse-proxy and/or firewall. Basically, you'd use that server as a firewall and then pass only the good traffic onwards to your game servers, which are probably bigger and more expensive.

If there isn't a possibility for internal-networking (free), then I'd probably use the included iptables for a firewall on each machine. You should honestly have this running on the game servers anyway, if only to restrict communication to between the reverse-proxy and game server.


Hetzner has some built in DDOS protection but you should add your own:

https://www.hetzner.com/unternehmen/ddos-schutz


I wish Hetzner provided RDS


So it’s a bit in the future but managed postgres is something I want to offer through nimbus[0]. In the Meantime if you’re interested in other managed services, please sign up!

[0]: https://nimbusws.com


Wow, never heard of Nimbus but the kickbacks for open source projects are an awesome idea - fair play.


Thanks! It's not out yet :) I've been working on it for a bit and I really think this is what's missing from the hyperscaler/cloud model.

If we just put a LITTLE bit back into (like 5% of revenue for any of the big companies honestly -- but I'm not big yet so more for me) the F/OSS ecosystem, can you imagine what kind of world we'd be in??

I want to live in that world, so I'm trying to make it happen.


I think your website is broken on mobile. The alignment is off and the contact form is cut off.


oh thanks -- I think I've got some image-jumping-border issues, looking into it right now!


It's still a bit early for the company but if you like RDS, PlanetScale might be worth a look too: https://planetscale.com/


I switched from Azure to DigitalOcean to Hetzner. Reasons were as you stated, simpler cost model, simpler technology.


does hetzner have servers in US ? if not - if your app ok to be hosted in the EU ?


I was also wondering this so I just looked it up -- looks like they recently (well, Nov. 2021) expanded to Virginia.


My business is based in the UK which is compatible with EU’s GDPR, so it’s fine to be hosted there.


I think at some point Azure announced $X in free credits for YC members, and GitLab determined this would save us something like a year's worth in bills (quite a bit of money at the time). Moving over was rather painful, and I think in the months that we used Azure literally nobody was happy with it. In addition, I recall we burned through the free credits _very_ fast.

That sounds like the worst reason to change cloud providers: "because that provider bribed me to"


Moved from Google Cloud -> Digital Ocean -> OVH.

Running our own stuff on high powered servers is very easy and less trouble than you think. Sorted out the deploy with a "git push" and build container(s) meant we could just "Set it and forget it".

We have a bit under a terabyte of Postgresql data. Any cloud is prohibitively expensive.

I think some people think that the cloud is as good a sliced bread. It does not really save any developer time.

But it's slower and more expensive than your own server by a huge margin. And I can always do my own stuff on my own iron. Really, I can't see a compelling reason to be in the cloud for the majority of mid-level workloads like ours.


> Really, I can't see a compelling reason to be in the cloud for the majority of mid-level workloads like ours.

I work on a very small team. We have a few developers who double as ops. None of us are or want to be sysadmins.

For our case, Amazon's ECS is a massive time and money saver. I spent a week or two a few years ago getting all of our services containerized. Since then we have not had a single full production outage (they were distressingly common before), and our sysadmin work has consisted exclusively of changing versions in Dockerfiles from time to time.

Yes, most of the problems we had before could have been solved my a competent sysadmin, but that's precisely the point—hiring a good sysadmin is way more expensive for us than paying a bit extra to Amazon and just telling them "please run these containers with this config."


> None of us are or want to be sysadmins.

It's such a huge misconception that by using a cloud provider you can avoid having "sysadmins" or don't need that kind of skills. You still need those, no matter which cloud and which service you use.


Which skills specifically do you think we might be missing that we would need to run an app on a managed container service and managed database?

I know how to configure firewalls, set up a (managed) load balancer, manage DNS, and similar tasks directly related to getting traffic to my app.

What I no longer have to know how to do: keep track of drive space, manage database backups, install security updates on a production server without downtime, rotate SSH keys, and a whole bunch of other tasks adjacent to the app but not actually visible to incoming traffic at all.


You still need to do backups, a database backup is just one part of that, if you are not following the 3-2-1 rules and don't test your restore mechanism, you don't have reliable backups.

Those things you listed are sill sysadmin tasks in my eyes, and you are doing them, validating my point.

You still have to track storage space, either because you are paying for it and need to expand when necessary, or you have to manage costs at one point, that's not completely out of the picture. It can be easier for sure than building your own storage hardware.

You still need to keep systems up-to-date either you are using Docker so you are doing it on your "application level" or you are using Linux VMs and you need to upgrade those systems/images. Even if you are using something like Functions or Lambda, those have their own environment which you need to be aware of and they usually support specific versions of programming languages, so you need to upgrade your own stack when they don't support older versions anymore.


I tell you that ECS has eliminated a ton of extra work for my team for a bargain price, and your response is "but you still have to do x, y, and z!" It's like saying that I shouldn't buy a dishwasher because I'd still have to wash the pots.

Yes, we still need to do some sysadmin-y tasks. But ECS handles so many of them that we actually have the time, energy, and knowledge to take care of the few that remain.

(As an aside, keeping language and OS versions up to date becomes a development task rather than an ops task when running Docker + ECS. We increment a version number in the repository and test everything, the same as we do for any library or framework that we depend on.)


> As an aside, keeping language and OS versions up to date becomes a development task rather than an ops task when running Docker + ECS.

It's a development task with a proper bare metal setup too.


If you use purely PaaS offerings (or FaaS as well), then you also don't really need sysadmins.

That's not to say that you can get away with knowing _no_ sysadmin skills in these scenarios, but you don't need to have someone on staff who knows the ins and outs of Cassandra or Mongo or whatever you're using. In awful workplaces with high turnover, it's worth it for management to opt for these managed services so that when the overworked tech lead decides to rightfully bail on them, she/he doesn't leave them in the lurch. (Note: I'm not defending these workplaces, but just explaining that when they can't keep adequate in-house talent to manage their own services, it makes financial sense to outsource it, and pay the "cloud tax").


I think the problem with cloud environments is you do not "need" sysadmins - it is not obvious you need them, so what you end up with is a bunch of systems glued together without much thought, and then crazy things like HTTP logs not being turned on for your various services, insane service costs b/c of not understanding pricing tiers, etc..


The difference doing ops setting up a couple of Lambdas or Fargate containers vs provisioning your own servers is substantial.


In fact, if you're using Linux on your workstation you'll use the same skills locally as you do on the VPS/bare metal (depending on your scale.) Arguably "cloud" services need more sysadmin skills, not less.


Thats a very big if.

I have yet to work with a $corp that uses Linux for workstations.

Overwhelming majority uses Windows. Some use macOS.

The ocasional developer that uses linux will usually be in a VM, or if IT policies allow, WSL.

So yeah, running cloud services doesnt require sysadmin skills, unless you assume copy pasting from oficial documention "sysadmin skills".


That's funny... every team I've been on in the last 10 years has used Linux workstations almost exclusively, with a few Macs here and there.


In 27 years, I've had exactly two jobs where I didn't have Linux on my desktop, for a total of 5 out of those 27 years. In both cases, I still did all of my dev work on Linux.

It boils down to what kind of jobs you look for.

> So yeah, running cloud services doesnt require sysadmin skills, unless you assume copy pasting from oficial documention "sysadmin skills".

If that's the extent of how you're managing your cloud setup, then I could equally argue running bare metal servers doesn't require sysadmin skills either. When I did contracting, a large part of my income was to come in and clean up after people had relied on "copy pasting from official documentation" as a substitute for actual ops.


It's far easier to maintain my own Linux workstation than an internet-facing server used daily by customers.


Absolutely but most of the knowledge translates, it's the procedures that differ.


It's those different procedures that I'm trying to avoid. It's not that I couldn't do those things or learn to do them, it's that my time is best spent building and improving our applications, not keeping servers running, secured, and up to date.

At some point we hope to get to the scale where it makes sense to pay a human to do that, but at this point the additional cost incurred by an ECS instance over an equivalent server is negligible.


Very similar experience here. I work on a two person "DevOps" team. Without AWS ECS we would have to have a much higher headcount. I get to spend most of my time solving real problems for the engineers on the product team rather than sysadmin work.


What are “real problems” for the engineers or product team?


Things like automating manual workflows, building small infrastructure debugging tools, or providing infrastructure consultation to an engineer trying to decouple two parts of a legacy code base.


Managed container services (like Amazon ECS) are a sweet spot for me across complexity, cost, and performance. Mid size companies gain much of the modern development workflow (reproducibility, infrastructure as cattle, managed cloud, etc.) using one of these services without the need to go full blown Kubernetes.

It's lower level than functions as a service, but much cheaper, more performant, matches local developer setups more closely (allowing for local development vs. trying to debug AWS Lambda or Cloudflare FaaS using an approximation of how the platform would work).


Very much agree - due to a coworker leaving recently, I'm looking after two systems. They're both running on ECS and using Aurora Serverless.

My company takes security very seriously so if these two systems were running on bare-metal I'd probably be spending one day a week patching servers rather than trying to implement new functionality across two products.


I can bet our team is smaller than yours.

And yet… Sysadmin tasks take up maybe 2 hours a month.

Your theory is right though if no one on your team knows how to setup servers. In your case the cloud makes sense.


To the peeps running ECS? Why not just straight up AKS or GKE? Have you compared ECS to Cloud Run on GCP?


In my case, mostly because it was easier to get buy-in from the rest of the team on ECS than Kubernetes.


"Infrastructure is cheaper than developers(sysadmins)" all over again.


I also found that running a PostgreSQL database is really simple. Especially if most of your workload is read only, a few dedicated servers at several providers with a PostgreSQL cluster can deliver 100% uptime and more than enough performance for pretty much any use case. At the same time, this will still be cheaper than one managed database at any cloud provider.

I've been running a PostgreSQL cluster with significant usage for a few years now, never had more than a few seconds downtime and I spend next to no time maintaining the database servers (apart from patching). If most requests are read only, clusters are so easy to do in Postgres. And even if one of the providers banned my account, I'd just promote a server at another provider to master and could still continue without downtime.

I recently calculated what a switch to a cloud provider would cost, and it was at least 10x of what I pay now, for less performance and with lock-in effects.

But I understand that there are business cases and industries where outsourcing makes sense.


can you share more details, because im in teh process of doing the same since having a few terabytes of postgresql / dynamodb is stupid expensive.

for a lot big organizations its a matter of accountability. if they say AWS went down vs our dedicated servers went down, it matters a lot for insurance, clients.

what i dont get are 4 man startups paying thousands to AWS ... because everybody does it.


As I said, if most queries are read-only it's really simple. Streaming replication works very well out of the box, just make sure you keep enough WAL segments on master so that slaves can catch up after some downtime.

I have a 1-1 relationship between application servers and databases. The application queries replication delay and marks itself as unhealthy and reports an error if the delay is too high. You can also do that via postgres (max_replication_delay), but I found this way to allow for more graceful failovers.

With streaming replication, servers are completely identical, so you can easily provision a new server. Failover is done by just one command on a slave. I don't have automatic failover as I only needed to use that once in several years (and that was on purpose), I'd rather accept downtime than having an unwanted failover.

With that setup you can always failover and can scale read operations really well. There are solutions for postgres if you need more complicated setups, but I never looked into them.

If you're in Europe, it's really cheap to get a dedicated machine from Hetzner with a few TB of NVMe. Just pay the extra money for 10gbit link, otherwise replication will take forever. But there are also some decent providers in the US, it's just more expensive. But with Hetzner, a two machine setup will be <$500 per month for really beefy servers.

I'd just be careful with using block storage, I often found that to be a bottleneck with database servers. Local storage is almost always much faster.

But in the end it depends on your use case. In the end, your database will usually go down because of a bug in the application or some misconfiguration. Both can happen on any service. It's really so rare these days to lose a server without notice. And Postgres is really stable, I've never seen it crash.


Maintainability is much easier on a well working cloud setup from people who have potentially less knowledge.

One company had 6 servers and used AWS snapshot for backup + managed MySQL.

Backup and recovery of that db is possible by more people in the team as if it would run as non managed service.


In my company, we were aware about the potential honeypots in each cloud and we developed our product from the first commit to be deployed on 3 (!) clouds: AWS, Azure, IBM.

And while we made it work by sticking to the least common denominator which was FaaS/IaaS (Lambda, S3, API GW, K8s). It was certainly not easy. We also ignored tools that could've helped us greatly only against a single cloud in order to be multi cloud.

The conclusion after 2 years for us is kind of not that exciting.

[1] AWS is the most mature one, Azure is best suited for Microsoft products and Old Enterprise features. And IBM is best if you use only K8s.

[2] Each cloud has a lot of unique closed code features that are amazing for certain use cases ( Such as Athena for S3 in AWS or Cloud Run in GCP). But relaying on them means you are trapped in that cloud. Looking back, Athena could have simplified our specific solution if we were only on AWS.

[3] Moving between clouds, given shared features, is possible, but is definitely not a couple clicks or couple of Jenkins jobs away. Moving between clouds is a full time job. Finding how to do that little VM thing you did in AWS, now in Azure, will take time and learning. And moving between AWS IAM and Azure AD permission? time, time and time.

[4] Could we have known which cloud is best for us before? No. Only after developing our product we know exactly which cloud would offer us the most suited set of features. Not to mention different credits and discount we got as a startup.

Hope this helps.


Why did you feel IBM is best when you only use K8S? Our platform fully K8S and I'm looking to somewhere where we get more performance per buck, probably not IBM but I'm surprised it's even in the list. Do they have an extra nice K8S dashboard or something?


Not sure where you're currently at and considering moving to, but GKE has been much simpler than EKS. Not sure about cost but it'll likely save some operations time (auto scaling is a single check box, no IAM, scaling controllers, etc)


IME GKE on GCP is top-tier if you want a painless managed k8s offering. I've been running my business on it (solo founder) since ~2017 with minimal fuss. Hosting a couple dozen WP sites and some bespoke webapps/apis.


Second this question. Never heard of IBM being exceptional for k8s. Curious to know what makes OP say it’s so good.


Hi, I was in an IBM Startup Accelerator. I just got the feeling that they were pushing for it hardcore. Gave us startup credit, free training and free premium support for K8s. So If you want to go with K8s and you are a Startup, that is the best offering I experienced.


Maybe they're using Openshift (OKD). Openshift has a bunch of value add features they add a template marketplace and turn it into a hybrid PaaS with some other multitenancy management bits afaik


OKD is slightly different than the OpenShift (OCP) provided on IBM Cloud. OpenShift is on all clouds but IIRC OpenShift on IBM Cloud receives OCP updates first.


I'd throw in stability, IAM, storage and the management plane.


Cloud Run works on all three major clouds, and VMWare, and Bare Metal. No lock-in here.


Do you mean GCP Cloud Run [1] ? I would love to have it on AWS and Azure if you have a link to share. Or do you mean it's possible, but through different services on each cloud?

[1] https://cloud.google.com/run



I assume the OP is referring to Knative[0] which is the framework powering Cloud Run behind the scenes.

[0] https://knative.dev/


No, Knative Serving, via Anthos fleets.


Athena isn't fully closed source. It's a customized, hosted version of Presto originally built by Facebook.

Apache Drill is in a similar space to Athena and can query unstructured or semi structured data like object/S3


You are correct, But as I understand, They get "RAW" file access to the S3 hard disk (or the equivalent) in their solution. So no matter what solution I might spawn, it will always be slower and more expensive.

Maybe I should have said "closed internal access features"


Doubtful. I've personally had projects pull many hundreds of gigabits per second of s3 throughput. How you architect and design has a large influence to your analytics performance.


“Many hundreds”

Last time I talked to a technical person at AWS the limit was 5GBits. Wonder what you’re doing differently.

Perhaps that changed.


There's another benchmark somewhere showing S3 can max out a 100Gbps instance.

https://github.com/dvassallo/s3-benchmark

Another potential issue is ListBucket rate limiting. If you have lots of small objects, you'll spend most of the time waiting to discover the names than transferring data


you were quoted a per-object rate.


Was it a business or technical decision to do multi-cloud?

Did you run simultaneously in 3 clouds? Can you explain the setup?

If not, did you do just run on each for a while to test, or have a reason to switch?

This is probably an impossible question to answer, but: were the savings/benefits of doing this actually worth the engineering costs involved in the end? Eg even if you chose what turned out to be the most expensive, worst option, would the company ultimately have been in a better place by having engineering focused on building things to increase customer value instead?


> Was it a business or technical decision to do multi-cloud?

> Did you run simultaneously in 3 clouds? Can you explain the setup?

The solution itself could be running on a single cloud. But we work in the finance sector and targeting highly regulated clients. And we got a tip very early on, that each client could ask for deployment on their cloud account that is monitored by them. Which will probably be AWS or Azure. Today we know only some require that. So it helped somewhat.

> were the savings/benefits of doing this actually worth the engineering costs involved in the end?

Like you said, very hard to know. In our case, we had a DevOps Cloud guy working a full time job, so, it was not noticeable. Reason being, Probably,

[1] Because although he had problems to solve on all clouds, clouds deployments eventually get stable enough, So pressure was spread.

[2] Although all clouds still need constant maintenance, it's a-synchronic (you can't plan ahead when AWS EKS K8s will force a new version), so pressure was spread out and it never stopped client feature building.

But who knows, maybe for other architectures or a bigger company, it would have become noticeable.


Also cold start times for serverless differ greatly between those cloud providers. AWS is < 1 second, whereas Google Cloud is 5-10 seconds


This is pretty misinformed. Each provider has multiple “serverless” offerings and cold start time has much to do with your specific application and what it’s doing on start up.


None of my workloads (5x Cloud Run services, 10's of Functions) have anywhere near 5s cold starts. More like 2s with network latency.


Thanks for acknowledging how much harder this is when you use a cloud-specific feature. Modifying your codebase to migrate off some cloud specific service seems like it would be by far the hardest part of switching clouds.


Thank you for this very useful answer!


Yeah, but it was only for a side-project, so I only had a single VM to migrate.

I went from AWS (cost ~£25/mo) to Microsoft Azure DE because I didn't want any user data to be subject to search by US law enforcement/national security. I thought the bill would be about the same, but it more than quadrupled almost overnight even though traffic levels, etc., were the same (i.e., practically non-existent).

What was happening was Azure was absolutely shafting me on bandwidth, even with Cloudflare in the mix (CF really doesn't help unless you're getting at a decent amount of traffic).

In the end I moved to a dedicated server with OVH that was far more powerful and doesn't have the concerns around bandwidth charges (unless I start using a truly silly amount of bandwidth, of course).


+1 on dedicated server. It's so much simpler & cost effective than the cloud.

10 big dedicated servers can probably handle the same load as 100s to 1000s of cloud nodes for a fraction of the cost. Configuration and general complexity might even be simpler without the cloud.

It's not as hard as people make it out to be to set up backup and redundancy/failover.


Absolutely. And dedicated servers at many providers (e.g. Hetzner, OVH) can be provisioned much like vms. So the only real difference is that usually there's a minimum contract of 1 month, but often at a price where it's cheaper than running a VM at a cloud provider for 30% of the time.


There are two kinds of cloud users - those who treat their cloud as a VM to use, and those who actually use all the fancy API features.

The first group are almost always better served by dedicated VMs or hardware from a provider specializing in the, if the VM is long-lived.


I'm still not a fan of the second way. If you develop all your software to tightly integrate with AWS, you might save time developing the software but create a huge amount of technical debt.

Managing your own infrastructure (with dedicated servers, so no hardware management) isn't too hard, even if you're a small shop. And managing a fleet of AWS services isn't necessarily less work.

Maybe there's a reason all ads for cloud tend to compare it to running your own data center. Because once you get rid of hardware management, it's not really that much easier being in the cloud, at the risk of lock-in and huge surprise bills.


It’s not technical debt if it’s making you money. I would much rather solve the business problems than managing services myself.


It's technical debt if you may need to change your product because a third party makes changes. The beauty of software with little dependencies is that you can run decades old software on a system just fine with no need to regularly refactor.

I know how tedious it is to maintain decades old enterprise Java software, but from a cost perspective, it makes much more sense to keep those rather than constantly refactor to chase the newest trend.

As an example, if you had a software that was written 20 years ago to store data in a relational DB, updating it to work with current versions of that database system won't be much work (if any). If you rely on managed services, I wouldn't be too sure that you get away that easy.


> As an example, if you had a software that was written 20 years ago to store data in a relational DB, updating it to work with current versions of that database system won't be much work (if any). If you rely on managed services, I wouldn't be too sure that you get away that easy.

This very much depends. In the example you gave, nothing changes if you used managed services or not.

But you could argue that the 20yo software is technical debt preventing the upgrade of the database due to the source code being lost, the library used to connect to the latest version of the database doesn't exist requiring a rewrite in a moderm language or framework. Etc.

Technical debt really is about code that cannot easily be modified to adapt to the requirements of a business.

If you wrote some code, and it was trash, made no sense, in an obscure language that few people know, with no comments. Yet it ran for 10 years flawlessly with everyone too scared to look at it, but made the business money. It's not technical debt until it needs to be changed/modified.


> Managing your own infrastructure (with dedicated servers, so no hardware management) isn't too hard.

If you have infra skills then absolutely, it's way simpler to manage. But infra people like don't really fit in "small shops" because the price tag for one of us is (depending on the cost of living) anywhere between a quarter and half a mil total comp. And if you ever want them to take a vacation not on-call you'll need at least three. I say this against my own interest, just go with the managed services and consider it "good problems to have" when you feel the itch to hire some infra person to clean up the debt.


Maybe in SV the comp is that high. I only have insights into European markets, but here the comp doesn't differ too much from programmers. And an AWS expert won't be cheaper than someone with knowledge how to manage infrastructure.

And yes, you'll need three people, but not full time. From experience I can say that even with a hundred servers, it tends to be a small chunk of each of their time. And if you have a redundant system and don't deploy on Fridays, the chance that someone has to respond to a call on the weekend is pretty much 0.


That's something I find so fascinating. The "cloud" will almost always be more expensive and "not worth it" if you are only using the IaaS services. I mean, look at the numbers, everyone sees that.

Cloud only ever is worth it when one uses the higher-tier services, like AWS Lambda and the likes. Even running Kubernetes in the cloud is only semi worth it, because it's not high enough in the stack and still touches too many low-level IaaS services.

Of course, higher tier means more vendor lock-in and more knowledge required and all that. But if you are not willing to pay that price, then OVH, Hetzner and the likes will have better offerings for you.


the problem as many have already pointed out around this thread, is that, in an enterprise env. you cant really do that too much anymore. And as a result that starts being felt by non-enterprise shops too.

And you cant really do that because people dont really wanna deal with on-prem shit and server hostings

Tehnically speaking, i am rhcsa certified, i know how to do all of this on-prem, hybrid things. I dont even bother looking at job offers from companies that arent cloud based (even if i would get a 10-15% increase, or more if coming from the financial sector) because, i genuinely cant be arsed to deal with all that bullshit again.

I'm done with caring about disk space, and hw firewalls and configuring bs in linux. Fuck iptables, let me manage everything from a (network) security group. Fuck Traefik and F5 and all this bs, let me just plop an Application Gatway in Azure or API gateway in AWS. Fuck database clusters. At this point, i havent even configured an apache/nginx server in a couple of years. WebApps in Azure are more than fine; and for the rest K8s.

As a result, good "classic" sysadmins are a dying breed even at enterprise level. So they're even more rare and accessible for small/medium sized business. If i go to my IT dept. right now, i can guarantee 80% of them would be completely lost to setup and use an AD, AAD is just too convenient.

That basically leaves you with: move to cloud, or learn how to do all of these things by yourself. And those things take time (to learn and to manage)

It's like deciding to make apps with Perl. Can you do it? sure. But you'll probably have to do it on your own.


Oh god quadrupled from £25/month for a small low traffic project as you have described sounds like daytime robbery!


Sure, but think about this: that little side project now has about 100x the traffic it did when my bill jumped.

That still isn't much traffic (at all) in the grand scheme of things, and Cloudflare's lowest paid tier deals with around 80% of the bandwidth. Still, it's not hard to imagine that bill blowing up to several hundred pounds per month had I chosen not to act. That would translate to several thousand pounds over the course of a year. I don't know very many people for whom such an expenditure, particularly if it's unnecessary and avoidable, would be something they'd regard as insignificant.

Putting it into everyday terms: it could have grown into my second largest outgoing after my mortgage. That doesn't really seem proportionate or sensible, so why wouldn't I look for a better deal?


It might be different if you're a developer in SV, but I'd say for most people, that's not insignificant. That's more expensive than most hobbies, I wouldn't spend £100 on a side project if there's an alternative.


Switched from GCP to my own server hardware. After doing the math it came out that it would pay for itself in less than a year. Depending on individual usecase, cloud can be prohibitively expensive and running a server for a small business really isn't nearly as hard as they'd have you believe


Cloud isn't about cost though. Cloud is about value. You can scale super easy when you inevitably need it (assuming you've "made it" anyway - whatever that means). You get a burst of new users, it's trivial to add additional nodes (again, assuming you've set up your infra to be easily scalable).

With own hardware, scaling is not as easy. You'll have to do a lot more around plumbing too. Networking, security, many other things that you'll have to address. Stuff that has already been solved for you.


That "inevitably" is doing some really heavy lifting in your post. It practically never comes for most companies.


But most companies' dream is to reach that ""inevitable"" point where the cloud saves it; it becomes a bet where they either reach that level of scale, or die trying.


> But most companies' dream is to reach that ""inevitable"" point where the cloud saves it; it becomes a bet where they either reach that level of scale, or die trying.

And how many die trying due to bleeding all the funding to AWS, instead of running everything off a couple cheap boxes underneath the CTOs desk? I've been in at least one which ran out of money that way.

Don't pay for the future pipedream now, pay for what you need. That "inevitable" dream of near-infinite scale up usually never arrives for most companies. If it does, worry about it then.


AWS gives away literally tens of thousands of pounds to start ups. Granted, it's to lock them in down the line but getting free credits gets you further than not having credits.


AWS credits are a toxic trap. BTDT. They'll get you far enough that you are hopefully (from AWS perspective) locked in pretty tight. And then the high monthly costs start to hit you hard.


Sure, that's true. But I've been in companies where they've had major investments and have a pretty significant number of users, and they still had stupid amounts of credits. One of my previous companies was running a third of their infra on AWS credits, and they still long runway in terms of free credits.

Meanwhile, they had to shake off a £1 million a year contract for the next 5 years for 2 DCs. With AWS they were using less than half of that per year (this includes the credits they had). But even if it wasn't cheaper, requesting a new server took days, not minutes. Scaling was not possible. I'll take the credit and potentially get trapped than having to deal with an inflexible mess that is in-house managed infrastructure.

At least bigger orgs are able to afford it by (hopefully) building a cloud on top of their infrastructure, but outside that, the majority should of companies be looking into the cloud. Whether that be AWS, GCP, or the smaller Clouds like DO, it doesn't matter.


> But I've been in companies where they've had major investments and have a pretty significant number of users, and they still had stupid amounts of credits.

Your companies have been wiser and more frugal than mine!

In every case, I've seen the credits run out before there was a penny of revenue coming in.


Surely if they are that succesfull, they also have the resources to redesign/upgrade their backend?


Yes, but what about the million new customers you're going to get next week?


Having plans for what to do in case of a success disaster is good. Spending in expectation of having a success disaster can be disastrous.


Funny, but honestly, a single dedicated server should easily be able to handle a million users, for most CRUD apps.


The cloud has normalized terrible underpowered VMs, so many new developers may just not be aware of how much performance a real dedicated machine has - even a relatively "mid range" one (i7 & 16 GB RAM).


> The cloud has normalized terrible underpowered VMs, so many new developers may just not be aware of how much performance a real dedicated machine has

This. I'm now seeing many younger developers with no exposure at all to hardware servers, only underpowered cloud VMs.

Not sure how to solve this, but I certainly enourage everyone to spend time now and then benchmarking cloud setup vs. local hardware to at least understand the performance spread and cost tradeoffs you're making.

I've seen too many setups paying 4 and even 5 digit US$ monthly bills to AWS, for workloads that could have been served off a single $1000 box without it breaking a sweat.


This! I benchmarked a whole bunch of different cloud providers for fun (& my bachelor thesis) and was impressed by how bad some cloud VPS perform. Considering the really steep price to get any kind of significant memory/CPU resources with the major cloud providers as well as the steep bandwidth charges this little experiment was eye opening


Would be very interested in reading this thesis, my email is in my profile


Could you please share your thesis? Maybe over email?


I appreciate the interest, unfortunately I can’t share as it is under a non disclosure. But since there seems to be so much interest in the benchmark part I will see if I finde the time to re compile the data and publish it as a blogpost


I'd be interested as well.


Funnily enough that was a use case for dedicated servers I encountered. Usage based pay is great if you have an unlimited budget, but most companies would rather have predictable cost and not serve every traffic spike.

Especially for internal use (CI/CD, Analytics), you'd rather want to queue a few things up than always having to consider your budget when you want to run something.


Doing it right now. Entire company is migrating from AWS to Azure for reasons I can't discuss, and I'm currently tasked with this migration in the team I am in.

Honestly? It's quite fun. Despite considering myself more of a programmer than devops, I really like the devops stuff - as long as I'm doing it as part of a team and I know the domain and the code - and not being that general devops guy who gets dropped into project, does devops for them and gets pulled into another one to do the same.

Working out all those little details of switching from AWS to Azure is fun and enjoyable and I also feel like I'm doing something meaningful and impactful. Luckily there's not much vendor-locking as most of the stuff lives in k8s - otheriwse it would be much trickier.


AWS to Google Cloud. Already mature product (public company). Many potential customers are strongly Amazon-averse. Switching to GCP won some deals that were being lost otherwise.

Anybody's cloud strategy should try and stick to the most basic services/building-blocks possible (containers/vms, storage, databases, queues, load balancers, dns, etc) to facilitate multi-cloud and/or easy switching.

Not that each cloud doesn't have its quirks that you'll have to wrap your head around, but if you go all in with the managed services you're eventually going to have a bad time.


I concur, in my experience the biggest driver of growth for Azure and GCP is that customers of SASS companies and consulting companies make it a requirement to choose anyone but AWS. Legacy companies are terrified of Amazon.

Google does have some innovative big data products like BigQuery and Dataflow. In general choosing GCP over AWS shouldn't hinder a companies growth at this point IMO.


>customers of SASS companies and consulting companies make it a requirement to choose anyone but AWS. Legacy companies are terrified of Amazon.

Is there a particular reason for this?


Amazon sell a lot of stuff and consequently compete with a lot of legacy companies: Brick and mortar retail in nearly all fields (fashion, food, tools, appliance, books, etc), other online marketplaces (often with a mail order past) and even some tech companies.

Generally, companies are unwilling to give money to their competitor, reinforcing it.

They also might want to avoid some PR issues, as using your competitors product can lead to juicy stories depending on the situation/field.

If they are somewhat paranoid, they can also be reluctant to have their data accessible by their competitor (and I'm not talking DBs access there, even something like ELB logs can give valuable information, iirc such story did pop a few years)

Working for a marketing SaaS solution, requests from our clients to not be hosted on AWS are still quite common.


Apparently Walmart and Target stay away from AWS but most of it seems to be FUD regarding, I guess, Amazon potentially targeting their competitors and their suppliers to steal data stored in AWS/use it for strategic business moves. It would be a swift death for AWS if they were found to be doing this, though, so imo the fear is unfounded.

https://www.forbes.com/sites/andriacheng/2019/07/14/amazon-a...


Netflix is hosted on AWS to name just one such instance. I have heard that internally the customer names are obfuscated, including other Amazon services. Indeed it would be a death sentence to prioritize themselves.


Moved a project with around 600k monthly users from Heroku / Google (split setup) to full AWS setup.

Whole process took around 3 months, that was from start of creating AWS account to finish when all production environments were running on AWS and Heroku was "shut down". There was some planning ahead of this as well, so actual time varies.

Heroku was heavily limiting platform (for example, they didn't and still don't support http2) and we needed more control over our infrastructure to support further growth without paying enormous costs (for example, redis prices in Heroku are just mind-blowing).

Also as we were about to open few new markets, Heroku would have required a lot of manual work to get everything working, something which is really, really simple with Kubernetes.

Our monthly costs did go up vs what we had at Heroku at that time, but we're getting a lot more control and bang out of the buck.

Regarding convincing stakeholders, you really need to have good reasons to do it. These kind of switches are not cheap nor easy and come with bunch of risks. The easiest thing to sell is always pricing, but in that case you have to show calculations (big guys like AWS and Google have pretty decent calculators you could use) which show the switch is worth it.

As I was moving from small player (Heroku) to a big player (AWS) I also had other good reasons (better CI, better logging, better performance overview, more control in general). So it really improved a lot of things for the developers, devops and users.


What are you using for compute EKS and Fargate? We just helped someone switch from Heroku to AWS and they dropped their cloud costs by 67%. This is using ECS + EC2. Fargate is typically 2x more expensive.


Spot/Reserved instances are used.

I probably should have clarified that the extra cost was expected as we did a lot more in AWS than we could in Heroku, we used the switch to start using bunch of stuff AWS offers like Lambda functions, CloudFront, RDS etc. Stuff that we just didn't (and couldn't) use on Heroku, thus didn't pay for it.

As the purpose of the switch was to get more control, features and out of the Herokus "black box", higher costs were expected and perfectly normal.


Is that taking spot/reserved instances into account (for both). Afaik Fargate supports both of those


Without Spot/Reserved incentives. Could get even more if that’s done.


yes.

I've done Aws, Azure, Google.

My basic impression, as a software engineer/site reliability engineer, is that Google >> AWS >> Azure.

This relates to sophistication of offering and design of cloud.

The dominant questions look like this: - what is the familiarity to the infra people - can we implement appropriate governance concerns - how tightly bound is your important code with the specific cloud?

I have generally focused on Kubernetes in the last 5 years, to allow the service layer to be relatively portable. This is very useful in the switching/migrating question.

My general thought process is not to use cloud services unless its very obvious (ec2, s3, etc); prefer to have k8s services provide that capability and use the cloud provider as portable COTS.


DigitalOcean -> Hetzner Cloud. Simply realized it's much cheaper for my use case (single instance running everything), even doubled my ram from 2GB to 4GB and it was still cheaper. It's also a European company which helps (more trustworthy IMO). Also took the time to simplify my deployment, which was nice.


I Was a very early aws client (first year of ec2). I moved to azure for better ai/cognitive features, sso, ad blabla, also ms being better open source steward, better docs, better vscode plugins, also felt aws re: kibana/elastic license is/was not aligned with my desire to not live in a dystopian version of the future.

It was painful but Az has improved a lot of the sharp edges I encountered.


I've helped around 900 companies make the switch (others to GCP in my case), and I can confirm based on their results over time that the differences are not as small as they might seem. For so many, it's just ease of use and efficiency to get things done; for others its attention and partnership; for a third cohort its absurd cost advantages; and a fourth its performance and reliability. Our customers see gaps in one or many of those four areas? They move.


I migrated a previous company's infrastructure from AWS to Fly.io

Our AWS bill was the main reason. It was far higher than it should have been for the traffic we were serving. Even after we'd halved our AWS bill (the original bills had been crazy), it was still kinda high

Fly was a pretty clear choice when we looked at the lower costs and ease of transitioning from single-region to multi-region infrastructure

I'd been nudging the CEO about doing a migration for about a year before we decided to make the move. When I found that I couldn't really get our AWS costs any lower and did a full cost estimate of Fly vs AWS, the wheels moved reasonably quickly

The CEO primarily cared about lowering our monthly costs and being able to do the migration reasonably quickly (~1 month)


I know of several retailers and companies serving retailers that switched away from AWS around the time they bought Whole Foods. Before our company switched away, we had multiple retailers say that they would not use our services hosted by one of their biggest competitors.


Switched from Amazon to Google because I hate Amazon more than Google.

Feature-wise, I'm just as happy. However, I trust Google more, but that probably boils to my hatred again. :)


Interesting that you trust Google more in this regard. Given Google's terrible history of deprecating products, I would not trust any "real world" business to any of their services.

Also, at some point I was playing with Serverless in both Google and AWS. Google's Serverless examples were broken (Google cloud was returning 500 errors) while the same stuff in AWS worked smoothly. That left me with a bad taste.


> Interesting that you trust Google more in this regard. Given Google's terrible history of deprecating products, I would not trust any "real world" business to any of their services.

That doesn't apply to me, because I have never used any of Google's deprecated products. I assume that's because they haven't (or have they?) deprecated any of their cloud services.

> Also, at some point I was playing with Serverless in both Google and AWS. Google's Serverless examples were broken (Google cloud was returning 500 errors) while the same stuff in AWS worked smoothly. That left me with a bad taste.

Good for you!

I have used Google Cloud Run for more than a year now, and can't be more happy. Never had problems with AWS either, which means that there are at least two cloud providers providing the same-ish service that lots of people can enjoy.


>Dozen of decade+ billon+ customer contracts at this point make shutting GCP down a silly thought.

Really crappy that the demos sucked tho, sorry :(


https://www.aljazeera.com/economy/2020/7/8/google-shut-down-...

https://cloud.google.com/support/docs/shutdown

Also, there have been plenty of Google products with paying customers that Google has shot down.

At this point, I have 0 credibility in Google.


> Also, there have been plenty of Google products with paying customers that Google has shot down.

I highly doubt they will shut down a $5.5B business. Either way, because my applications are cloud-ready, it's easy to switch to another cloud provider again.


I've done the same just with a different provider. And I see a trend, people go for the lesser evil out of the ones available.


In startup mode, we switched multiple times, from Azure to Google Cloud to AWS, chasing those startup credits. Things that were easy to move tended to move quickly, but systems that lost owners or priorities didn't tend to get moved all that quickly, so left a fun legacy of a system or two on an old cloud account.

Growing out of that mode has the team mostly focused on a single cloud provider, with a few things that'll remain on alternatives because they're better suited, and projects will clean up the rest in a couple of years.


There have been two circumstances where we’ve seriously thought about it, but as of yet haven’t changed:

1) Under some circumstances we might want to give very stringent uptime guarantees for some systems, and I do not trust providers to have zero global (cross AZ) outages. Having a hot standby or even load balancing across clouds could be tempting.

2) One cloud provider is very keen to get into our sector and has made extremely generous overtures which we’d be silly to completely ignore.

As I say, not something we’ve followed through as yet but both are serious considerations.


My team was migrating 1000+ VMs from aws to gcp, mostly for cost efficiency (and adoption of k8s for even better cost)

We used kafka mirrormaker 1, with 2 ways sync (new cluster have separate topics for write that are synced to old cluster, and all topics from old cluster synced to new cluster) For postgres failover switch to new master required about 1-2 minutes of downtime

We migrated ~80 microservices within 8 months and now our infra cost about 1/4 of what we paid to aws, completely worth the effort!


I switched from Heroku to AWS, then eventually back to Heroku. Heroku to AWS was for cost reasons (cut monthly costs by roughly 35%) but wasn't enough savings to justify hiring a devops person. As soon as there were too many issues I didn't know how to fix (setting up everything Heroku offers was hard and likely done wrong, which made ongoing maintenance some level of hell), I switched back to Heroku where the lack of devops needs basically paid for itself.


This is very, very true. One of our customers described it really well “The nightmare of DevOps kept us from managing AWS directly, but with TinyStacks we can scale a billion+ requests a day in audio advertising leveraging the full power of AWS.”


I would checkout Render and DO App Platform. Heroku has actual competition now.


Moved from AWS to vultr. Primary reason is unsafe billing practices at AWS. I'm a small shop and felt very uncomfortable about the potential of getting a surprise $xx,000 bill from AWS. The hack in December tripled my anxiety. Changed my password and migrated in January. Closed out my AWS account once every last service was migrated.

I mostly use only basic services so pretty much any cloud provider can fit my use case. It took some time but I have peace of mind now.


> The hack in December

What are you referring to?


Went from AWS to GCP. Everything we deploy is on Kubernetes, so a combination of Terraform and ArgoCD was pretty easy to move. It was pretty much push button.


Definitely a space where Kube shines!


Yes. We migrated from Rackspace to GCP. When I joined the company, it was clear Rackspace was loosing the cloud race, but back in 2012/2013 it was a strong contender.

Sometimes one adopts a technology too early...

We had a longer selection process between AWS, GCP and Azure. AWS was difficult because some of our customers see Amazon as a competitor. However today we also offer the option to run on AWS. GCP won over Azure.


Yes, switched from AWS to Google to Azure. Don't switch to Azure unless your employer forces you, you will regret it. Google is great on a technical level though, especially if you do things with Kubernetes.


> Don't switch to Azure unless your employer forces you, you will regret it.

Can you give any details? Pricing, reliability, weird quirks you have to program around, ...?


Many operations are INCREDIBLY slow. Creating a VM. Mounting a disk to a VM. All take easily 10-30x longer than I'm used to with other clouds. This is especially annoying with Kubernetes as it likes to move disks between VMs which... takes a while on Azure.

Many, many, backwards incompatible changes in their Kubernetes platform. I've had to recreate clusters about twice a year so far. Lately it's been better since they finally got node pools working (about 2 years after all of their competitors).

They can't get their network stable. Things like `kubectl port-forward` or `kubectl logs` hang after 4-30 minutes of inactivity (i.e. the tunnel is open but no packets actively being sent) which, according to Azure, is "working as expected". This makes the tooling utterly unusable. It has to do with the way Azure's load balancers deal with idle TCP connections.

Also, their support engineers are unwilling to help you unless you run Windows. They always insist on remoting into your machine using some Windows utility, even though the issue is with their cloud instead of my machine.


> Also, their support engineers are unwilling to help you unless you run Windows. They always insist on remoting into your machine using some Windows utility, even though the issue is with their cloud instead of my machine.

That last point sounds like enough of a reason to never touch then with a barge pole.


In my experience a lot of difficultly with Azure arises because a business is probably already using Azure to manage it's IT infrastructure.

You then get into sticky situations where IT are unhappy handing over admin access to the Ops teams: for example refusing access to AAD because it is integrated with the _corporate_ domain.

This might seem fine on the surface, it's just a people issue, but it can become very tiresome when dealing with Azure resources their permissions.

Worse still is some Azure resources use AAD almost like a data store: such as b2b and b2c. If you have write back enabled on your AAD (as most companies would - otherwise users wouldn't be able to self service forgotten passwords) you will _apparently_ clog up the on-premise domain with foreign objects from your b2b configuration.

Of course you can get around all of this by having a separate tenant in Azure for SaaS teams/Ops only. But you introduce the headache of management (should both tenants be under a super enterprise tenant?) and single sign on (security might say there is only one user login for everything with RBAC & MFA managed from one place... now you have to join the tenants somehow...)


I'm pretty sure that B2C is effectively a stand-alone instance of AAD that isn't integrated automatically with other applications and on-premise WSAD.


Heh, this reminds me of the mess with separate Microsoft Windows, Outlook and Skype personal accounts...


The biggest problems: things don't work, things don't compose, and support is unknowledgeable & ineffective.

Things don't work: outages, bugs. So many I'm not sure where to start: ACR: slow as molasses. I don't know why they didn't build it on blobstore. Many queries are clearly O(images) even when they shouldn't be. Throughput it terrible and made worse by inane pagination; listing images, for example, has a throughput of something like 60kbps. And "b", as in bits. Minutes for 3MB of data. It's absurd; they think that's "working as expected". AKS: they manage the API server in AKS, and we find it is routinely non-responsive. We went through a quarter-long support ticket, which went back and forth between "you're putting too much load on the API server" -> "we don't think we are, what load is there?" "here's the top queries" and they're all queries from like, the cluster controllers — which are also managed by AKS/Azure. -> "well there's too much load!". App Gateway: normally stable, but had an outage when Let's Encrypts old root expired. (We were using a cross signed cert — i.e., our cert was valid, but App Gateway failed it, i.e., a false negative validation.) They never acknowledged the outage, and the support ticket we filed didn't get a response until like days later — missing support's SLA — meanwhile, some engineer somewhere clearly fixed the service, as it started working all on its own. ACR: we used to get 500s. IDK if these still happen or not, as we retry-looped most of the spots that were hitting them. Support response was ridiculous "what's your ISP?" "… this is inside your network, Azure, and my ISP wouldn't cause you to serve a 500." "Maybe the image is too big" "The image is ~100 KiB" … global AAD outages, the status page going down while the twitter account is like "check the status page", the Portal has issues (a few days ago listing subscriptions just returned 0 rows. Like, okay, I guess I'm not looking at those today), the activity log will occasionally just return errors, or today, it returned 0 entries where I knew there to be entries.

Composability: a cloud provider's job is to offer bricks from which I can build whatever infra my company's needs demand. But Azure constantly says "well, no, you can't put those bricks together that way." IPv6 anywhere in your vnet? You can't add a managed PostgreSQL server to that vnet, cause not only does it not support IPv6, you can't add it to an IPv4-only subnet in a dual-stack vnet. Like, the entire point of dual-stack…; also, when we attempted this, the API request took 2 hours to fail, and failed with "Internal error, please retry", which we then did, like chumps. 4 hours later, support ticket. AKS will add new features (that ought to have been there from day 1) like nodepools … but only to newly created cluster. You want to take advantage of that? Too bad! Recreate that cluster from scratch!

Support: Azure has no meaningful tooling for handling bugs in their services. The only hammer they have is support, and by god it's going to hit that nail. Support might (if you can get them to admit that yes, shit's broken) field a bug report, but then the support ticket is closed. Is the bug fixed? How will you know? IDK. Also, AFAICT, certain products you just can't open a support ticket for; notably, the portal. It isn't an "Azure service", so it isn't in the list of things to select from. Also, they override the mouse wheel on the list of services, so scrolling ~1 "detent" on a trackpad results in the list scrolling at Mach 4. Support tickets lack URLs, so they're unlinkable. Occasionally whatever the agents use to view info on tickets gets desynced from the ticket, and new replies in the portal are black-holed (but email replies still work). You can't put ">" in a support ticket, it's not allowed. You can't upload certain file formats, it's not allowed. (E.g., want to send video of a bug? Not today!) SLAs are regularly missed, and the responses often ask for information included in the opening ticket. And … the agent's grasp on English is frankly terrible. (I'd accuse them of out-sourcing it, but we once had an agent on a ticket go dark on us … because he was in Texas when Texas losts its power.)

Honorable mention: My God, AAD is trying to be as complex as can be. Apps, Service Principals (which are needlessly and confusingly just called "Enterprise Apps" in the UI! — oh and the search box for that page doesn't work), Roles, Permissions, Role Assignments, Tenant, Subscriptions … oh my God. Like, AWS IAM is frankly a terrible implementation, but Azure AAD makes AWS IAM look amazing.

My entire 2+ year experience with Azure has made me an ardent believer in AWS, and willing to try GCP.

I need to write this all in a blog post.


> I need to write this all in a blog post.

Please do. It would be very valuable to find these things when searching about Azure, because this absolutely matches my experience.


I had a company that wanted to move from AWS to GCP. This was a top-down decision that was made with an incompetent "tech lead" stating that we could move everything and save money with the 10 year commit hundred million dollar contract in Google.

It failed, horrendously. Even though multiple people in the organization were calling out how bad of an idea it was they still moved forward. Google has some nicities with how things connect, setup, etc, but at the end of the day they are cloud providers and not everything they provide is a silver bullet.

This project was canceled after 3 years after spending millions on migrating since the migration was not a drop-in replacement (no one thought it was except the "tech lead").

There are a lot of things besides tech that can affect these projects. If you hire AWS experts and they are having to be AWS experts expect to need to hire GCP experts.


I once worked with a CTO who decided to move from AWS to GCP (and also move from Spark to Hive and python to scala, at the same time). That guy was an idiot.

(At one point somebody accidentally spent £30,000 in data transfer costs with one key press.)

The project was never completed and the CTO just moved on to another fancy CTO position.


Why on earth would you go from spark to hive?


At work we moved from AWS to GCP for pricing reasons. We are still paying loads more than before moving to the cloud, but it's hard to find good sysadmins nowadays that don't want to do everything on the cloud. For personal projects I've moved things to Linode and Digital Ocean, as they provide quite decent value. Actually, for a comparison of AWS/GCP/Azure/Linode/DO/Tencent/Ali value & performance, check out an extensive comparison I ran recently: http://blogs.perl.org/users/dimitrios_kechagias/2022/03/clou...


What is the difference in expense between the "loads more" you are spending and hiring and managing a sysadmin? Ballpark is fine, interested to know if it is 1x,10x, 50x?


So, we went from a cost of a bit over $3k/month for infrastructure, to almost $15k/month (would be more with AWS). Yes, you have to buy the servers, yourself at extra cost, but on Google most people run instances on Haswell/Skylake, which is >5 year old stuff, so a quick calculation for buying top of the line every 5 years (which for the first couple of years at least gives you faster hardware than on GCP) comes to about extra $2-3k/month. Also, we had 1 (very good) sysadmin and 1 other developer who assisted part-time sys-admining, to a cloud that has some extra bells and whistles, but requires 2 full-time sysadmins (we are temporarily left with 1 and he can't keep up). Part of the problem is that the whole architecture was very fine-tuned to run on our customized rack servers (it was doing happily for almost 15 years), so a lot of things did not translate that easily in the cloud. I guess if you have a small system, or designed from scratch for the cloud a single cloud platform person might be enough. Overall, there are some extra disadvantages beyond cost - e.g. Cloud SQL manages some things for you, but there is more lag compared to when our super fast DB server was on the same rack as the application servers, and other such little performance things that we could fine tune when we had control of the hardware.


Yes. Huge AWS -> GCP migration.

Why? Incoming CTO signed a massive GCP deal probably because it was marginally cheaper than AWS (while probably ignoring the migration costs).


Same thing happened to my old company (large insurance company). Moved from Azure to AWS + Terraform/Kubernetes because of cost and "cloud independent" nature of Terraform. The whole IT department spent 2+ years moving hundreds of (relatively modern + legacy) services and applications. Some services couldn't move because they were managed by third-party. I am pretty sure they didn't factor in the cost of migration.


Always look for kickbacks if there are cases where the company ends up being out more money all-in. These decisions should not have been made in this way, and it isn't always incompetence that drives this.


Yes, moved away from rackspace to IBM Cloud. Since then use several cloud providers for redundancy.

Every provider has severe downtime (when even phone lines are not operational) so we do failover across several providers. Saved a lot of uptime for us.

Also we do not use (almost) vendor-specific solutions. Almost everything your cloud provider sells up to you can be achieved without using the provider lock. Will save time later when provider quality goes into sh*t (eventually happens) and you have to migrate your infra somewhere.


Consolidated all of my small cloud VMs from DigitalOcean, GCP, and AWS into a single dedicated server on Hetzner. It costs way less and I don't have to pay for egress anymore.


If so why?

Company policy change. We went from each office (or even department) basically doing their own thing and having their own billing accounts and negotiating (or not) their own deals, to one central cloud deal with central billing and administration.

The change from everybody doing their own thing to having a central devops strategy we all had to work within was a much bigger change than the actual changing of cloud providers.


We decided on a multi-cloud infrastructure. Now the team is thinking about choosing a tool that would allow you to migrate and do multi-cloud in two clicks. We try to go to not very large clouds, because first we want to support small businesses, and secondly, sometimes it is more convenient for regional legislation. The main clouds are Scaleway, DigitalOcean and your own servers.


I'd be surprised if more small businesses hadn't, the credits sales people are willing to give you are massive


Yea, that is also my experience working mostly in startups. I wonder if they will keep up this credit strategy.


I've bounced between AWS, Azure, and GCP. Once they get entrenched enough the credits stop but right now it's GCP with the best incentives.

Also, don't sleep on Oracle. Their cloud platform is stupidly competitive price-wise but limited feature-wise. If you're just looking for basic compute and storage they can't be beat, sans credits.


Good point with Oracle. As much as I don't like their company the pricing for compute is significantly lower than AWS (didn't compare others).


Amazon EC2 to OVH AI Docker. Price for GPU instances went down 80%.


> to OVH AI Docker.

Can't find an OVH product by that name (would have surprised me), is this a buzzword bingo joke?


I searched for exactly those 3 words on Duckduckgo.com and got https://docs.ovh.com/us/en/publiccloud/ai/training/build-use... which is a tutorial for what I use.


Docker and AI are technologies used on the OVH host, it's not part of the product itself...


Yes and no. OVH has dedicated instances for AI which are different from the regular cloud instances and you don't get a root VM but only root inside docker.


Probably OVHcloud AI tools. They're docker based.

https://docs.ovh.com/us/en/publiccloud/ai/


Moved a SaaS tool from Linode to DigitalOcean. Reason for the move was security driven. DO's K8S volumes are LUKS encrypted at rest by default; one less thing in your security controls to worry about. Prices are higher than Linode's, and have had some reliability issues occasionally with the LoadBalancer, but for the most part it works really well.

The SaaS tool was mostly cloud-agnostic, so the changeover was not terrible. Changing the deployments to use DO's CSI storage, setup secrets, deploy services. I stood up the entire infra on DO, then moved the DNS over one subdomain at a time.Took about 4 days to make the move, including validating everything, and finally cutting off Linode totally.


Interesting to hear. A friend also had a lot of trouble with Linode reliability and moved to a bigcorp after that. Running a multiplayer game, even turn-based, had huge hiccups and the problem simply didn't reproduce on the new systems. Iirc support couldn't do anything but I'd have to ask for the details.

He spent a lot of time debugging it with a minimal example to rule out other causes, iirc a websocket pinging every few seconds on their kubernetes offering (again, I'd have to ask for the details), and it reproduced on Linode but not on the platform they were considering moving to (with a similar hosted kubernetes offering).


Yes from Linode to Hetzner [1]

Mainly because of price: more CPU/RAM/Storage for a lower price.

I think my previous server was underpowered, because it kept swapping. Now it runs as smooth as it can (it never swaps). Migrating is a bit of a hassle though and things might not work as you expect [2]

[1] https://j11g.com/2021/12/28/migrating-a-lamp-vps/

[2] https://j11g.com/2022/01/03/bypassing-hetzner-mail-port-bloc...


I've done so at two companies (without naming names). Not entirely, but enough to have a bargaining chip with AWS. At the largest company, our monthly bill was at least $1M at AWS, so the savings more than made up for the engineering time.

Also, in both cases, it was moving from AWS to GCP, and in both cases we were using Kubernetes and not really using much of the provided services of the platforms. I suspect this is the biggest reason for Google to push Kubernetes; abstract away compute so it's easier to switch.


New CTO came into my previous place and forced a switch from AWS to GCP. It took 3 years and an enormous amount of effort just to help the bottom line a small amount. Totally ridiculous.


Yes, I changed from AWS to Oracle, from Caprover to k8s, because of the Oracle Cloud Always Free resources, it can save $20/month, and give me 20 times more ram, 6 times more storage

I wrote a blog before the implementation. [move to k3s cluster](https://tim.bai.uno/how-to-deploy-a-django-application-to-a-...)


From OVH to AWS. 30% TCO reduction. This was before the fire.


I’m really curious how this math adds up given how inexpensive OVH is.

Other people in this exact thread are saying they experienced an 80% cost reduction by moving from AWS to OVH.

https://news.ycombinator.com/item?id=30943058


One phrase: right sizing. The original infra on OVH used an instance type that was suboptimal for the workload. Because re-provisioning was not an option due to the size of the cluster, especially without moving hard drives we had only one option. Move the data to S3, find the right instance type and switch over. The ability to right size your infra is grossly underrated. Decoupling data from compute too. We are talking about hundreds of nodes.


Would you mind sharing what size instance types you were on at OVH and are on now at AWS.

I ask because I've found that even radically larger sized OVH hardware is still way less expensive than AWS.


At the time (many years ago) the largest instance type of OVH was used because that had the required disc capacity that the customer needed. On AWS S3 has a vastly different cost structure and the fact that you can use whatever instance type for your workload enabled us to save up this much. We could use some low spec m5.* instance type that was running with 80% CPU utilization for the workload and also we could use fewer instances.

To re-iterate:

- the most cost saving comes from the fact that we could de-couple storage from compute

- the second part of the cost saving came from the fact that we could use fewer instances with lower spec


So you saved money because you optimized the structure, not because of the provider? I guess you would've saved more rebuilding it on OVH?


Reread what I wrote. It could not have been rebuilt because for the rebuild you need to duplicate the data stored that was 1PB+.


We moved a load of stuff from an acquisition from Azure to AWS only because it's under one cost centre then so we don't have to pay or justify as many invoices.

After futzing with this stuff for years though I really would only use the IaaS options in clouds if you want to consider portability. Network, storage, compute and nothing else. The neutral abstraction is Linux for me these days, not a specific vendor!


We switched 3 times with 0 downtime. Linode > Google > Azure > AWS. Chasing that sweet startup package at each place. We stuck with AWS.


Didn't, but went multi-cloud for performance reasons (we have to be close to the people we're talking to). The most annoying part was the network topology. Used DirectConnect with a provider so that x-cloud latency was as low as it could be. Overall, not too upset. Used terraform to configure things across clouds. Without that, would be too hard to keep track of everything.


Yes. I built an entire new platform on Google Cloud using Kubernetes (GKE). We moved from AWS.

It was a chance to re-architect the platform, make things simpler and cost effective. Huge success on all of those. Costs were reduced by millions a year.

Other than GKE, there was not a significant technology reason to move to Google Cloud. AWS didn't even have a managed version of Kubernetes at the time.


I have a VM running on GCP. I have the world's simplest usecase: how do you export the instance as a snapshot so that I can run it from somewhere else. Say, a virtual box environment or another cloud provider. How do you export a snapshot of your disk and VM?

Good luck searching for a solution. I just spent 2 hours trying to figure this out and it seems impossible :(


The general answer nowadays is "you don't": you spin up a new VM and provision it using whatever IaC/scripting solution works for you.

Edit to add: even if this were possible, it would be made exta challenging due to the cloud provider specific agents that are installed in VMs.


As long as you keep on using common services that are available everywhere-ish (=virtual machines, NFS file share, managed mysql/pgsql, managed kubernetes, DNS) it's relatively painless to switch if you're using Terraform.

Worst problem in my experience is all the stuff that creeps upon you with time that assumes hardcoded IPs and service names.


In OpenNebula we're working on trying to ease these use cases by adding an extra layer of abstraction, so you can migrate load as easy as this: https://www.youtube.com/watch?v=IopA_deQK4M&t=1s


Yes, I moved a decent sized project from AWS to Digital Ocean. The biggest hurdles I had to deal with were the things Digital Ocean did NOT support. It was worth it in the end, because our bill went from millions of dollars a month to a few hundred thousand a month. Estimated labor cost was about a million.


Would love to learn more



We did, a few times. Only because many of them had generous credits for early stage startups. GCP, Azure, SoftLayer (now IBM Cloud), DigitalOcean. We weren't relying on any particular service except managed Kubernetes, but we were happy to set it up overselves if it wasn't on offer.


I ported a client from OVH to AWS with zero downtime. Built new infrastructure with terraform, flipped dns. Was easier than wading through the existing architecture fixing problems and security vulnerabilities.

It was a single web app though. Not too complicated.


I've done a few zero downtime migrations for clients, mostly due to the common story of the credits when they started out on coming to an end.

More often than not they realised a super dynamic cloud infrastructure was completely overkill for their business needs.


2016. AWS to GCP. Mostly cost reasons and AWS had some annoyances at the time.

We also replaced our ~4y old very manual setup with a ground-up rewrite with terraform.

Zero regrets, but we didn't have a lot of vendor-lock in with <100 VMs and S3 + a couple of SSL things.


AWS to GCP.

Google sales people made it worth the companies while and it was a good time for us to refactor the orchestration stack. BigQuery is pretty sweet.

Services were done one at a time with a VPN holding the two together till everything was migrated over.


Yup moved from AWS --> GCP --> Oracle

Most because of cost mitigation.


Are you happy with Oracle? I wanted to try it, but the onboarding process (until you're able to actually order some beefy servers) was so problematic (delays, weird error messages, conflicting messages from support) that I didn't dare putting production servers there.


Yup. Pretty much no complaints. We do have a very boring setup though YMMV


Yes, I moved from AWS to GCP while a product was in a late release cycle.

I talk about it in detail in a google cloud podcast: https://www.gcppodcast.com/post/episode-265-sharkmob-games-w...

The primary reasons were: Ease of Use, Support and Cost (in that order).

I had a bunch of what I call "3am topics" which inhibited our ability to perform stressful operations in the middle of the night meaning we minimise our opportunity of successful outcomes when on-call... I'm not a fan of that.

I also argued (quite successfully) the case that AWS was not saving us money or much time when the alternative was renting a handful of servers from a provider.

There were attempts by AWS staff to lock us into the platform but those services (cloudfront and ECS being large ones) worked so poorly that caching/reconciliation layers were added into the product to build resiliency: all that was needed was to replace what populated the cache with something else (eventually we moved to Kubernetes which worked much better).

Cloudformation was so hard to work with (at least our implementation) that replacing it with terraform was easy: the hard part was understanding what was needed and what was fluff.

We also had to care a _lot_ about how the network was setup in AWS, there were issues with MTUs not being aligned by default in some cases for example so we had to write workarounds, and the VPCs being zonal by default meant we had much more complex setups.

Another ease of use topic was the sizing of instances, instead of specifically saying what shape of machine most matches your workload in GCP you allocate a number of cores and you're kinda done. Another ease of use topic was that discounts are provided retroactively via sustained use (though, they do have commitments too) where as in AWS you needed to very carefully carve up your requirements to get significant discounts, or write your application to be as stateless as possible (which you can do in GCP too). This is not a lot better than physical machines because the upfront work of trying to capacity plan is still there... at least if you care about cost.

Regardless, our dev cycles are much more streamlined now, it's rather easy to deploy an entirely new environment, the operations can be handled by a single individual on a part-time basis, which in my opinion is the point of a cloud provider: to save you time.

I can go into much more detail if you have any specific questions.


> Cloudformation was so hard to work with (at least our implementation) that replacing it with terraform was easy: the hard part was understanding what was needed and what was fluff.

I hated cloudformation when I first started using it. To get started the documentation sucks. Once you get the hang of it the docs are great and is actually really simple. I now quite like it.


Switched from heroku to a vultr VPS, and AWS S3/Cloudfront -> Backblaze B2 + Bunny CDN.

Saved myself a boatload and don’t regret it for a second.


I once had to move 30 PB of satellite imagery from Aws to GCP. google gave us all these tools to do it, none of it worked. It was rough


I'd love to hear any tricks for avoiding punitive egress bandwidth charges needed to get your data out of AWS.


I have done a few zero downtown migrations. It is not rocket science if you do not use cloud-specific products.


For kubernetes, we prefer GCP but have to also support Azure because Microsoft is also our business partner.


I went from Digital Ocean to AmazonWEb Services. Mainly to get familiar and train on the platform.


We have to follow GDPR-like laws and in our country providers aren't very reliable ("hey we have a maintenance tomorrow with a 30 minute downtime, sorry") so we migrate between providers a few times per year. There are always at least two providers we use at any given time (exact configuration changes every year), one in standby mode with replication between the two, and various homegrown tools to make the switch instant. Sometimes our application can enter a critical state (hardware errors or a bug in the application) so we switch to a different instance to investigate later without hurry. This complex setup was dictated by the stakeholders after we had a series of painful downtimes in 2019. There also have been cases when we migrated due to pricing changes. For the US region we use AWS and it is the most stable of all, we have zero switches there. Our platform is therefore provider-agnostic mostly.


Yes. I got a new job at a new company.


credits

and sometimes service reliability


I have. I was working for a company that had its own data centers, but we were running out of space and time to add more. We were already using Rackspace as a managed hosting provider (we didn't want to manage our own infrastructure), so we decided to move from our data center to Rackspace's data center.

The difference in terms of services was negligible, because they all offered almost identical services. But we did it because:

* We were looking for someone to manage the hardware and infrastructure for us.

* Rackspace's managed hardware offered higher availability than what we were able to achieve on our own.

* We had a relationship with Rackspace and they understood our needs, so we felt comfortable switching over entirely.


Sounds really interesting. I don't understand why so many companies think the alternative to owning your DC must be the cloud. I think the step you describe is where companies can get rid of a lot of overhead and save massive amounts of money. Migrating to the cloud tends to do the opposite.


Interesting - we moved the other direction (to Linode) from Rackspace because their VM/dedicated pricing was just out of whack. Our new box was nominally the same as the old at half the price, but the "4 CPU" was so much newer that it was nearly 2x.


Whenever I've interacted with Rackspace they've seemed almost proud to be expensive.


At the same scale? I guess if you move a whole data center, they'll quote very different prices from their public ones.


Yeah, I suspect Rackspace is aimed at "we'll datacenter for you" and not toward small businesses with small needs. And they price themselves accordingly to discourage those customers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: