Here is my golden setup: Cloudflare Tunnels + All ports closed (except ssh) + bare metal server. You can scale to the moon with like a million active users on a couple of 16 core servers (1 primary, 1 hot failover). You don't need AWS. You don't need Terraform. You don't need kubernetes. Hell, you don't need docker because you know apriori what the deployment environment is. Just run systemd services.
99% of the startsup will never need anything more. They'll fail before that. The ones that succeed have a good problem on hand to actually Scale™.
What we're seeing is premature-optimi...errr scaling.
Edit more context:
For Postgres, setup streaming replication between postgres and hot standby. You need a remote server somewhere to check health of your primary and run promote to your hot standby if it fails. It is not that difficult. Have cron jobs to back up your database with pgdumpall in addition somewhere on Backblaze or S3. Use your hot standby to run Grafana/Prometheus/Loki stack. For extra safety, run both servers on ZFS raid (mirror or raidz2) on nvme drives. You'll get like 100k IOPS which would be 300x of base RDS instance on AWS. Ridiculous savings and performance would be just astonishing. Run your app to call postgres on localhost, it will be the fastest web experience your customers will ever experience, on edge or not.
If you use Tailscale, make sure to keep spare VGA cables or hook up an eth cable to the IPMI port. Tailscale has not been reliable in my experience. There is nothing like straight SSH connection and you can setup `ufw limit OpenSSH` to limit tries to ssh port.
You still pay the latency cost, even though your data travels mostly inside CF's network. It's very noticeable when you're far away from the server, which you will be for most of the world if you sell to everyone. Perfectly fine if e.g. you're only targeting anyone from your region.
This setup is good if you are only serving people in a single continent. The TCP handshake with someone half way across the world is still going to eat all of your latency. You can’t beat the speed of light.
Most startups are going to be operating in a one country. And most requests would be handled by Cloudflare edge except for dynamic requests to the origin.
You might be surprised how fast it would be. And most companies blow their latency budget with 7 second redirects and touching 28 different microservices before returning a response.
All I am saying is don't get fixated on geo-latency issues. There is a bigger fish to fry.
But after all fish have been fried, you’re right. Servers on the edge would help.
What happens if your remote server thinks the primary is down when it isn't really, and you end up with two hot primaries? Is this just not an issue in practice?
In that case, Postgres new-primary would just be detached and will not stream/pull additional new data from the old-primary after promotion. You should also make sure to divert all traffic to the new-primary.
Actually, this might be much simpler with Cloudflare tunnels. So it failover scenario would be something like this:
1. Primary and Hot standby are active. CF tunnels are routing all traffic to primary.
2. Primary health check fails, use CF health alerts to promote Hot standby to primary (we'll call this new-primary).
3. Postgres promotion completes and new primary starts receiving traffic on CF tunnels automatically.
4. No traffic goes to old-primary. Backup data and use old-primary as a new hotstandby, configure replication again from new-primary.
Even better strategy would be to use Hot Stanby as read-only so traffic is split dynamically depending on write or read needs by the app. Take a look at StackOverflow infra architecture: https://stackexchange.com/performance
I wish the thinking that leads to articles like the one in the OP would take physical form so I could take a baseball bat and release my frustrations by beating the hell out of it. So many of my current pains in the ass at work stem from adding complexity to get some benefit that does nothing to make our product better.
the point is the developer audience that is just coming out of college...will not be familiar with postgres.
they would make most of their applications saving data via "Durable Objects" on the edge. Someone just built Kafka using Durable Objects.
Not arguing that DO is better that postgresql. I'm arguing that a lot of the developers wont realise that. Because the DX of durable objects is superior.
C'mon - lets not wrap it up in all that, be real. It's a JSON object and a few JS functions passed as arguments to a class constructor.
I'm not saying it isn't an elegant design, but can we pls not talk about proprietary implementations of a particular design pattern as if they're some kind of industry standard?
You're getting the data format confused with the database engine. Yes, a database might just be storing JSON, but how it's stored and replicated matters.