Instead of "edge", a lot of websites should just have 3 locations (us,eu,apac) w...

naiv · on Oct 6, 2022

I would even say that for 99% of the existing websites a free oracle cloud instance with 4 cores and 24GB ram + cdn is more than enough.

systemvoltage · on Oct 6, 2022

Here is my golden setup: Cloudflare Tunnels + All ports closed (except ssh) + bare metal server. You can scale to the moon with like a million active users on a couple of 16 core servers (1 primary, 1 hot failover). You don't need AWS. You don't need Terraform. You don't need kubernetes. Hell, you don't need docker because you know apriori what the deployment environment is. Just run systemd services.

99% of the startsup will never need anything more. They'll fail before that. The ones that succeed have a good problem on hand to actually Scale™.

What we're seeing is premature-optimi...errr scaling.

Edit more context:

For Postgres, setup streaming replication between postgres and hot standby. You need a remote server somewhere to check health of your primary and run promote to your hot standby if it fails. It is not that difficult. Have cron jobs to back up your database with pgdumpall in addition somewhere on Backblaze or S3. Use your hot standby to run Grafana/Prometheus/Loki stack. For extra safety, run both servers on ZFS raid (mirror or raidz2) on nvme drives. You'll get like 100k IOPS which would be 300x of base RDS instance on AWS. Ridiculous savings and performance would be just astonishing. Run your app to call postgres on localhost, it will be the fastest web experience your customers will ever experience, on edge or not.

dweekly · on Oct 6, 2022

Delightful. Only change: I'd add Tailscale for your SSH access (and to access your dashboards/logs) so you don't have ANY ports open.

systemvoltage · on Oct 7, 2022

If you use Tailscale, make sure to keep spare VGA cables or hook up an eth cable to the IPMI port. Tailscale has not been reliable in my experience. There is nothing like straight SSH connection and you can setup `ufw limit OpenSSH` to limit tries to ssh port.

luckylion · on Oct 6, 2022

You still pay the latency cost, even though your data travels mostly inside CF's network. It's very noticeable when you're far away from the server, which you will be for most of the world if you sell to everyone. Perfectly fine if e.g. you're only targeting anyone from your region.

__turbobrew__ · on Oct 6, 2022

This setup is good if you are only serving people in a single continent. The TCP handshake with someone half way across the world is still going to eat all of your latency. You can’t beat the speed of light.

systemvoltage · on Oct 7, 2022

Most startups are going to be operating in a one country. And most requests would be handled by Cloudflare edge except for dynamic requests to the origin.

You might be surprised how fast it would be. And most companies blow their latency budget with 7 second redirects and touching 28 different microservices before returning a response.

All I am saying is don't get fixated on geo-latency issues. There is a bigger fish to fry.

But after all fish have been fried, you’re right. Servers on the edge would help.

anonymousDan · on Oct 6, 2022

What happens if your remote server thinks the primary is down when it isn't really, and you end up with two hot primaries? Is this just not an issue in practice?

systemvoltage · on Oct 6, 2022

In that case, Postgres new-primary would just be detached and will not stream/pull additional new data from the old-primary after promotion. You should also make sure to divert all traffic to the new-primary.

This problem happens in AWS RDS as well: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_...

Actually, this might be much simpler with Cloudflare tunnels. So it failover scenario would be something like this:

1. Primary and Hot standby are active. CF tunnels are routing all traffic to primary.

2. Primary health check fails, use CF health alerts to promote Hot standby to primary (we'll call this new-primary).

3. Postgres promotion completes and new primary starts receiving traffic on CF tunnels automatically.

4. No traffic goes to old-primary. Backup data and use old-primary as a new hotstandby, configure replication again from new-primary.

Even better strategy would be to use Hot Stanby as read-only so traffic is split dynamically depending on write or read needs by the app. Take a look at StackOverflow infra architecture: https://stackexchange.com/performance

jacooper · on Oct 7, 2022

You don't even need Cloudflare if the server is in the same locale you are serving.

arcanemachiner · on Oct 6, 2022

Saving this

zoover2020 · on Oct 6, 2022

treis · on Oct 6, 2022

I wish the thinking that leads to articles like the one in the OP would take physical form so I could take a baseball bat and release my frustrations by beating the hell out of it. So many of my current pains in the ass at work stem from adding complexity to get some benefit that does nothing to make our product better.

soperj · on Oct 6, 2022

Nothing from Oracle is ever free.

systemvoltage · on Oct 6, 2022

Nothing from anyone is ever free.

llamaLord · on Oct 6, 2022

Yeah, but especially not Oracle.

alexvoda · on Oct 6, 2022

Never trust the lawnmower to not mow your budget.

tobinfekkes · on Oct 7, 2022

Is this a Bryan Cantrill reference? :)

alexvoda · on Oct 7, 2022

For those not in the know, you learn a piece of software developer lore today:

https://news.ycombinator.com/item?id=15886728

dmitriid · on Oct 6, 2022

99% of websites can run from the cheapest Digital Ocean droplet.

bombcar · on Oct 6, 2022

Some percentage of websites are actually Atlassian Confluence. Those might fall over.

pclmulqdq · on Oct 7, 2022

Atlassian needs at least 100 cores and 1 TB of RAM to get anything resembling performance.

intelVISA · on Oct 7, 2022

Only 1TB? You must have the lite version!

cercatrova · on Oct 6, 2022

Agreed, except without the Oracle part.

qudat · on Oct 6, 2022

We use the cloud free tier for https://prose.sh

We aren’t even close to hitting any bottlenecks at this point

sandGorgon · on Oct 6, 2022

the point is the developer audience that is just coming out of college...will not be familiar with postgres. they would make most of their applications saving data via "Durable Objects" on the edge. Someone just built Kafka using Durable Objects.

Not arguing that DO is better that postgresql. I'm arguing that a lot of the developers wont realise that. Because the DX of durable objects is superior.

llamaLord · on Oct 6, 2022

*groans* is this another new name for an old thing? I know I could just Google it, but wtf is a durable object?

It sounds like a json file… is it a json file? C’mon, tell me it’s NOT just a json file…

skybrian · on Oct 6, 2022

It's a sharded key-value store with stored procedures written in JavaScript.

llamaLord · on Oct 7, 2022

C'mon - lets not wrap it up in all that, be real. It's a JSON object and a few JS functions passed as arguments to a class constructor.

I'm not saying it isn't an elegant design, but can we pls not talk about proprietary implementations of a particular design pattern as if they're some kind of industry standard?

skybrian · on Oct 7, 2022

You're getting the data format confused with the database engine. Yes, a database might just be storing JSON, but how it's stored and replicated matters.

jensneuse · on Oct 6, 2022

I agree. It's probably good enough for a lot of use cases.

jl6 · on Oct 6, 2022

Agree, except that amount of RAM ain’t free.

yusefnapora · on Oct 6, 2022

The free x86 VMs are capped at 1GB each, but the ARM ones go up to 24GB.

> Arm-based Ampere A1 cores and 24 GB of memory usable as 1 VM or up to 4 VMs with 3,000 OCPU hours and 18,000 GB hours per month

https://www.oracle.com/cloud/free/#always-free

lewisl9029 · on Oct 6, 2022

The architecture I eventually ended up with for my product (https://reflame.app) involves:

1. A strongly consistent globally replicated DB for most data that needs fast reads (<100ms) but not necessarily fast writes (>200ms). I've been using Fauna, but there are other options too such as CockroachDB and Spanner, and more in the works.

2. An eventually consistent globally replicated DB for the subset of data that does also need fast writes. I eventually settled on Dynamo for this, but there are even more options here.

I think for all but the most latency-sensitive products, 1. will be all they need. IMHO the strongly consistently replicated database is a strictly superior product compared to databases that are single-region by default and only support replication through read-replicas.

In a read-replica system, we have to account for stale reads due to replication delays, and redirect writes to the primary, resulting in inconsistent latencies across regions. This is an extremely expensive complexity tax that will significantly increase the cognitive load on every engineer, lead to a ton of bugs around stale reads, and cause edge case handling code to seep into every corner of our codebase.

Strongly consistently replicated databases on the other hand, offer the exact same mental model as a database that lives in a single region with a single source of truth, while offering consistent, fast, up-to-date reads everywhere, at the cost of consistently slower writes everywhere. I actually consider the consistently slower writes also a benefit since it doesn't allow us to fool ourselves into thinking our app is fast for everybody, when it's only fast for us because we placed the primary db right next to us, and forces us to actually solve for the higher write latency using other technologies if our use case truly requires it (see 2.).

In the super long term, I don't think the future is on what's currently referred to as "the edge", as this "edge" doesn't extend nearly far enough. The true edge is client devices: reading from and writing to client devices is the only way to truly eliminate speed-of-light induced latency.

For a long time, most truly client-first apps have been relegated to single-user experiences due to how most popular client-first architectures have not had an answer for collaboration and authorization, but with this new wave of client-first architectures solving for collaboration and authorization with client-side reads and optimistic client-side writes with server-side validation (see Replicache), I've never been more optimistic about the future (an open source alternative to Replicache would do wonders to accelerate us to this future. clientdb looks promising).

ihattendorf · on Oct 7, 2022

FYI your light/dark mode icon is off-center for me. Fedora 36/Firefox.

Also at some window heights, the "Deployed with Reflame in x ms" box obscures the "Have questions? Let's chat!" text without generating a scrollbar.

lewisl9029 · on Oct 7, 2022

Thank you so much for catching these! Just deployed a fix for both.

anonymousDan · on Oct 6, 2022

Good post. Do you have any resources describing the 'new wave of client-first architectures' you mention? I'm struggling to understand how you can do client-side authorization securely.

lewisl9029 · on Oct 6, 2022

Replicache (https://replicache.dev/) and clientdb (https://clientdb.dev/) are the only productized versions of this architecture I'm aware of (please do let me know if anyone is aware of others!).

But the architecture itself has been used successfully in a bunch of apps, most notable of which is probably Linear (https://linear.app/docs/offline-mode, I remember watching an early video of their founder explaining the architecture in more detail but I can't seem to find it anymore (edit: found it! https://youtu.be/WxK11RsLqp4?t=2175)).

Basically the way authorization works is you define specific mutations that are supported (no arbitrary writes to client state, so write semantics are constrained for ease of authorization and conflict handling), with a client-side and server-side implementation for each mutation. The client side gets applied optimistically and then sync'ed and ran on the server eventually, which applies authorization rules and detects and handles conflicts, which can result in client state getting rolled back if authorization rules are violated or if unresolvable conflicts are present. Replicache has a good writeup here: https://doc.replicache.dev/how-it-works#the-big-picture

anonymousDan · on Oct 6, 2022

Great, thanks. Are they open-source (or are you aware of anything open-source that does something similar)?

lewisl9029 · on Oct 6, 2022

Replicache has a commercial license but is source-available. Clientdb is open-source, but doesn't seem as mature yet. I'd love to see more open source solutions in this space too.

belmont_sup · on Oct 7, 2022

There with you. We chose cockroach and fly to achieve our global needs in the simplest manner. But to use their cloud multiregion offer, it costs ~3400usd/mo at their recommended specs and 3 regions. Pricey depending on how you see it.

We hope their serverless tier meets feature parity soon.

lewisl9029 · on Oct 7, 2022

Yep, Cockroach's dedicated offering was pretty cost prohibitive when I last looked too, and I really didn't want to have to operate my own globally replicated database, so Fauna seemed like the best option at the time.

Really looking forward to Cockroach's serverless options too. More competition in this space is very welcome.

presentation · on Oct 7, 2022

Anyone used Yugabyte before for a globally replicated DB? I use Hasura a lot, which depends on Postgres, and Yugabyte seems like a possible drop in candidate for Postgres, but wondering if others are using it in prod.

lewisl9029 · on Oct 7, 2022

I haven't but it's definitely an interesting option! Cockroach is another option if you're looking for Postgres compatibility.

aledalgrande · on Oct 7, 2022

How are you finding working with distributed databases? I've worked with Fauna and Dynamo and they are a nightmare from the DX and iteration speed point of view compared to Postgres.

I'm gonna try CockroachDB next.

lewisl9029 · on Oct 7, 2022

Dynamo has definitely been a bit of a nightmare to work with, but I actually find Fauna reasonably pleasant. You're not going to get the vast ecosystem of SQL based tooling, but the query language itself is designed to be composable, which reduces the need for ORMs (though I do still miss the type generation from Prisma sometimes), and there's a pretty nice declarative schema migration tool that handles that aspect pretty well: https://github.com/fauna-labs/fauna-schema-migrate

Haven't found myself needing much else from a DB.

aledalgrande · on Oct 8, 2022

Yeah, I mean, apart from missing intellisense, Fauna forces you to set in stone everything, which is not good for fast iteration and prototyping. I understand it's for performance, but things, especially in startups, continuously change.

jstummbillig · on Oct 6, 2022

> But they explicitly say that your serverless functions should be close to a database if you're using one.

Or else?

jensneuse · on Oct 6, 2022

Let's say the user is in FRA, the database in SF. If the "server" is on the edge, you'll end up with 100-200ms between server and database, while the user has less than 10ms latency to the "edge". If the server does multiple round-trips to the database, it can take seconds until the first byte. If the server and database are both in SF, TTFB will probably be less than one second, as round trips between database and server are almost zero. One thing to mention is that it would be beneficial of the TLS handshake could be made on the edge, as it's a multi roundtrip transaction. Ideally, we could combine a server close to the DB with a stateless edge service.

So, the future of the web might not be on the edge. It's rather: The future if the web will leverage the edge.

tmikaeld · on Oct 6, 2022

You'll get the latencies noted for Heroku in the article.