How We Built Fly Postgres

garganzol · on Nov 30, 2022

Database is a thing I never want to deal with at such low level. Such approach will be brittle, very brittle. Managed data storage with SLA guarantees is not easy and there are quite a few companies specializing on that for a reason.

As a proof, check out fly.io forums. They are full with posts about suddenly broken Postgres instances.

mrkurt · on Dec 1, 2022

You should definitely use a managed Postgres.

So should many of the people in our forum. This is part of why we wrote this! That said, the vast majority of PG users on Fly have no issues. Most issues are the result of people trying to spend as little money as possible on their underlying infrastructure. When you have a 1GB disk, you'll run it out of space pretty quickly. You may not pay us for it, though, which is what they value most.

We tell people to use CrunchyBridge _all_ the time. It's better than our Postgres for many people.

aidenn0 · on Dec 1, 2022

Right; managed Postgres makes sense for businesses. For my hobby projects, the fact that I can't even get pricing without first creating an account is a signal to look somewhere else.

craigkerstiens · on Dec 1, 2022

Pricing is available at https://www.crunchydata.com/pricing/calculator. To add-on to that, we really liked the Fly pricing model where if you consumed less than X we wouldn't charge you. Recently we introduced a $10 a month plan and if you suspend the database while developing and your bill is under $5 you won't get charged.

aidenn0 · on Dec 1, 2022

Thanks I did find it after I commented; I had to follow two buttons from the front-page that were the same color as the background though ("Learn More" -> "Check out pricing" [which was a contrasting color] -> "View Pricing Breakdown")

[edit]

Also other managed DB hosts have similar anti-funnels for pricing, so I'm not trying to pick on Crunchy Data.

craigkerstiens · on Dec 1, 2022

Thanks for the feedback, we can do better. Will see if we can get fixed this week.

garganzol · on Dec 1, 2022

Another thing you can improve is to communicate clearly how many databases a selected tier can contain. Another frequent question of mine: can a cluster size be changed later?

samtp · on Dec 1, 2022

? If you look under the "Cloud" menu on the main nav, it's the 3rd & 4th items down.

aidenn0 · on Dec 1, 2022

1. I didn't check the "Cloud" menu the first time.

2. The 3rd item on there goes to the "pricing" page that doesn't actually tell you the pricing (it says "Starting from $10/mo" but you need to scroll down and click through to get to the actual pricing table) :/

3. The 4th item "pricing calculator" is great; I just didn't find it.

voltagex_ · on Dec 1, 2022

I don't even see pricing for this when logged in, FWIW.

jwmoz · on Dec 1, 2022

Second this. Instant leave the site.

Winsaucerer · on Dec 1, 2022

I don't know about all managed postgres offerings, but quite a few of them have a setting I consider to be insane: if you delete the instance, it deletes all your automated backups as well (and possibly your ad-hoc snapshots too). This was true for a few managed offerings when I last checked.

craigkerstiens · on Nov 30, 2022

FWIW, managed providers can fully plug into fly just fine. Here's actually a profile of performance times [1] of Fly with various providers and configurations, along with a repo [2] to reproduce/create the same setup yourself.

1. https://webstack.dancroak.com/

2. https://github.com/croaky/webstack

*Disclaimer I work at one of those fully managed database providers.

samwillis · on Dec 1, 2022

I would love it if Crunchy Bridge was available on Fly. Have you considered offering it alongside the other clouds you run on?

I could see a Crunchy Data managed version of Fly Postgres's doing super well. Combining the best aspects of both your experience managing HA Postgres and backup, with the distributed read replicas from Fly.

Scarbutt · on Nov 30, 2022

Why are the response times (api checks) so high?

lukeasrodgers · on Dec 1, 2022

Yeah those response times are in the realm of “why would I even consider that”. If I am trying to tune my db queries to have p95 latency of 1ms (for example) there’s no way I would choose an architecture that then threw that all out the window with ~100ms network latency. Hopefully I am misunderstanding those numbers somehow.

sb8244 · on Dec 1, 2022

I saw this earlier and thoroughly didn't understand what is going on with that. I can't make any sense of why that would be so high.

Feels like a better test would be for the healthcheck to time the query RTT and report that back. Completely remove the web request from the equation.

ngrilly · on Dec 1, 2022

Would be interesting to see the request-response round trip time between the Go service and the SQL database, for each combination.

keehun · on Dec 1, 2022

I was recently just one of those affected by a random issue! I was made aware in the forums that maybe my VM had run out of space. That would've helped if an error message immediately alerted to my exhausted disk space. I realized that what I actually wanted was a managed provider and moved (back) to Heroku just for postgres. I recently moved out of there for my backend hosting & postgres to fly.io. My backend hosting stays on fly.io but will be putting my postgres in a managed platform.

worthless-trash · on Dec 1, 2022

Just to be clear, your code makes postgresql connections from fly's servers to heroku's servers, I'm a little surprised. I'm not a web programmer, more of a traditional systems programmer, I have some questions.

I assume you use SSL, and these are not made on demand (connection pooling ?) On 'cold start' do you get massive latency ?

What kind of latency is there between client and server ?

What made you choose Heroku and not a specific managed postgresql service ?

Did you try flys postgres service?

keehun · on Dec 1, 2022

> Did you try flys postgres service?

Yes. I wrote: "I was recently just one of those affected by a random issue!"

I don't notice any measurable latency. Heroku exposes their postgres instances directly on the public web (secured with a strong user/pass combo of course), so there's no difference, networking-wise. There's no additional reverse proxy or some tunnel to route through.

Heroku was the cheapest for the very small instance I needed.

The connections are made on-demand and are insanely fast.

worthless-trash · on Dec 4, 2022

Thanks for the answers I'm going to look into this method.

MuffinFlavored · on Nov 30, 2022

> You can spin up a Postgres database, or a whole cluster, with just a couple of commands. Sign up for Fly.io and launch a full-stack app in minutes!

What is the HackerNews opinion on: who is their actual customer?

Obviously lots of companies pay for managed databases. It's not an uncrowded market for a reason.

But like... pricing wise... it seems so expensive? What is the HackerNews take on the valuable proposition specifically for hosted databases? Is the answer basically 1:1 with anything cloud hosting related?

mrkurt · on Nov 30, 2022

Our pricing is a rorschach test.

If you come from Heroku, we seem too cheap: https://community.fly.io/t/comparison-of-prices-to-heroku/89...

If you are most familiar with AWS, we seem pretty close to what you're used to. But very cheap for egress.

If you are a DigitalOcean user, VMs seem pretty ok, but bandwidth seems more expensive.

If you are a CloudFlare user, you think everything else is too expensive until they try to sell you an Enterprise plan.

If you're a happy user of Hetzner or OVH, you pay something close to the same price as we do for servers. And might be surprised at our prices. Because we also want to have reasonable margins.

People really like us when they have an app that seems valuable to them, they don't want to think hard about running servers, but they do know they can "eject" and run the rest of their infra too: https://community.fly.io/t/startup-credit-program/6709/3?u=k...

mdasen · on Dec 1, 2022

Fly.io's pricing seems fair. It's not amazingly cheap, but there aren't a lot of PaaS offerings out there and most are very expensive and have complicated pricing compared to Fly.io - even Digital Ocean's AppPlatform is more expensive.

I am curious about the freemium model for PaaS systems. I've always wondered what percent of compute ends up being free and if the paid prices have to be higher to subsidize the free tier. Would it be better for the paying customers if the service was 30% cheaper and there was no free tier? Of course, I might be incredibly far off on how much the free tier customers cost.

I think for people that think Fly.io is expensive, it just feels like what Fly.io does should be table stakes rather than a premium service in 2022 - and yet it's so hard to find! Heroku is 15 years old and Fly.io feels like the first platform I've used since that just gets it.

I would say that a collaboration between you and Neon (https://neon.tech/) would be pretty cool. While your site does link to Neon as a recommendation, Neon's datacenters often aren't that proximal to Fly.io's - Ohio isn't that close to Chicago, Virginia, or New Jersey. Maybe that'll get better in the future.

I'd always love it if Fly.io were cheaper, but more than that I'm glad that Fly.io seems to really get what customers need.

ignoramous · on Dec 1, 2022

> If you are a CloudFlare user, you think everything else is too expensive until they try to sell you an Enterprise plan.

tbf, Cloudflare's Enterprise plan includes usage, too. Just like Fly's own plans: https://fly.io/plans

Besides, the Cloudflare platform is way more reliable and their products actually make it to GA :)

emptysongglass · on Dec 1, 2022

> If you are most familiar with AWS, we seem pretty close to what you're used to. But very cheap for egress.

But then why not just use AWS App Runner and keep everything in AWS? No egress costs and you get the best support in the industry on legitimately enterprise-grade infra. Don't like AWS? Ok, GCP has Google Cloud Run.

Why would I ever choose Fly when the competition already has a Fly-alike with global infra spend 1000x yours?

rozenmd · on Nov 30, 2022

Last I checked, a 4 vCPU/16GB RAM/1TB storage configuration costs around $80 USD per month at VPS hosts like Hetzner. It's $762 USD per month on RDS.

There are trade-offs of course (https://onlineornot.com/self-hosting-vs-managed-services-dec...) between the two options.

I've been hoping fly.io builds a strong automated middle ground for a while now!

macNchz · on Dec 1, 2022

A few years ago I managed a 10TB Postgres cluster backing a moderately high traffic site. I no longer even give the briefest of blinks at any of the fully managed database prices that used to make my eyes water.

DiggyJohnson · on Dec 1, 2022

Because you feel like you got good value for your 10TB DB or because you feel like you got fleeced? Not sure I understand your undertone.

Cheers

HillRat · on Dec 1, 2022

I think the implication is, “I used to manage my own DB instance and I’m now willing to pay anything even vaguely reasonable to not have to do that again.”

macNchz · on Dec 1, 2022

Correct!

w-ll · on Nov 30, 2022

Am I reading https://instances.vantage.sh/rds/?min_vcpus=4&cost_duration=... wrong. It seams you can get that as a db.t4g.xlarge for ~$188/month.

Still more than 2x the cost, but not nearly $760

whakim · on Dec 1, 2022

Yeah, tack on another $115 or so for the 1TB of storage. And as a sibling comment mentions, on-demand pricing means it isn't really an apples-to-apples comparison. In almost every case other than a hobby project it feels like the convenience of RDS vastly outweighs the savings of ~$250 you'd get by self-hosting.

some_developer · on Nov 30, 2022

> for ~$188/month.

Note that this is still the on-demand price.

Although it's still not Apple to Apple, no one (sane) would useRDS 24x7 without RI. That probably brings you down another 50 bucks or so.

ithrow · on Nov 30, 2022

$188 without storage and bandwidth.

mnutt · on Nov 30, 2022

The t4g instances are also burstable so running your database with 4 cores under load would either get you throttled or produce more charges.

losteric · on Dec 1, 2022

and the price excludes the 1.5 TB EBS volume

nicoburns · on Nov 30, 2022

> who is their actual customer?

Anyone who wants a managed database + to run Docker contains on VMs without having to do much ops work or deal with the complexities of the big clouds.

> But like... pricing wise... it seems so expensive?

It's quite a bit cheaper than Heroku who run a similar service and have bootloads of customers.

MuffinFlavored · on Nov 30, 2022

> Anyone who wants a managed database + to run Docker contains on VMs without having to do much ops work or deal with the complexities of the big clouds.

At what cost though? Why not run a VM that runs Postgres? It might take a week to set up in terms of automated backups, cluster, failover, etc. (ok, maybe a few more weeks) but hosted DB costing 5x the underlying VPS is insane?

tptacek · on Nov 30, 2022

By all means, if you have a database stack you're comfortable managing yourself, use it! We did Fly Postgres because we more or less had to, not because we want to own the storage stack on Fly.io. We want, in fact, the exact opposite thing. If you've got a database stack you enjoy managing, and extra time on your hands, spin it up for other Fly.io customers as a product! We'll love you for it.

icedchai · on Nov 30, 2022

Many companies have nobody who can do that work competently, and would rather focus on development. If you've ever had a random IT person "set up a database server", only to discover it barely has any memory or CPU allocated, it's configured with the slowest storage options possible, and it has no backups or monitoring, well... that's what you're paying for.

dalyons · on Dec 1, 2022

In actuality I think the number of people I have worked with in ~20 years that I would trust to set up a production database to the same level to which I trust RDS… is maybe like 2 people. Those skills are RARE and hilariously underestimated. “Just run a Postgres VM” is so far from the mark I can’t even explain.

dinosaurdynasty · on Nov 30, 2022

You know a week of an engineer is multiple thousands of dollars right?

sodapopcan · on Nov 30, 2022

I can understand the flawed line of thinking that leads to not taking salaries and time into consideration when making comments like "why not just do it yourself?", but it's interesting how rampant it is. I've worked with people who are like, "To avoid paying some company $999/year, I'm going to invest $1,500 of my time setting up a free version running on hardware that has a smaller recurring cost (but still does have a cost). And let's just not worry about how much of my time might be required to support this once it's up an running."

ownagefool · on Dec 1, 2022

The opinions scale.

My last gig had an MSSQL database component that pushed $700,000 a year.

It wasn't _that_ big but it was pushing the upper published limits of what you could do with an MSSQL RDS. One day replication stopped working, and amazon business support of whatever couldn't resolve it.

I've been around the block and have continuously come to the conclusion that it's usually better to have the skills to run them yourself, and then it's just a balance of whether or not you're time or cash poor.

nicoburns · on Dec 1, 2022

It also depends a lot on your scale. My last company was paying $50/month for a hosted database (4gb RAM), we upgraded to one with 8GB RAM for $200/month that was likely to be sufficient for the foreseeable future. That was expensive for what we were getting, but it certainly wasn’t worth our time or effort to build out our own.

ownagefool · on Dec 1, 2022

For sure. If the product makes money and the renders the amount insignificant, I make the same choice.

Opportunity cost and all that.

treewalking · on Nov 30, 2022

Why spend a few hundred dollars when you can spend tens of thousands!

MuffinFlavored · on Dec 1, 2022

Any company that has an engineer is going to most likely have them full-time. They either work on standing up a database and keeping it alive some of the time, or they watch their business go from $100/mo -> $1k/mo -> $10k/mo for a PaaS database, no?

yebyen · on Dec 1, 2022

That is assuming you're successful (or at least have a busy storage tier.)

I think the people who are doing this calculation wrong are mostly confusing and conflating those two outcomes. Just because your database servers are very busy, does not mean your product achieved some commercial success. You can't just pay extra for the full-service treatment and expect to receive a commensurate output in value,

(and the people who won't pay for a real person to own their databases full-time are likely in a circular Venn diagram with people who won't pay attention to optimizing their database queries, so your point can probably stand unmodified... except there is an important case where the PaaS provides value, and it's nearest to the starting point of practically everyone.)

If you look at every dollar spent on IT/databases as a sunk cost which cannot be recovered from future recurring expenses, it's very bleak indeed to put pennies on top of pennies and pile them higher every day... or you could pay attention to the signal that SaaS is providing every month (the bill for usage), and ramp up the attention paid to optimizing queries before it gets too late in terms of dollars and cents? The dollar impact of slow queries cannot be understated.

But some business ideas will also not ever make it that far. You could spend the money on DBA salary or DIY and never know how much it was really being wasted, comparatively, if you didn't ever get a handle on your own usage metrics.

nicoburns · on Dec 1, 2022

Why not wait until you hit around $1k/month or more, and then hire an engineer to manage the database for you?

mdasen · on Nov 30, 2022

Unlike your code which you can redeploy after a bit of downtime, you might not be able to un-f^ck your database. I think that's ultimately the selling point. No one wants to be responsible for keeping the data safe when it's not their job. Do you work at a company that is going to applaud you for testing your backups? If not, you're wasting your time doing that. Do you work at a company that is going to promote you for getting high-availability right before an outage happens?

Some people certainly do work for companies like that. I'm sure big places like Facebook or Apple or Netflix applaud and promote people for this - and have the scale at which having people working on these problems makes sense. At your startup with a few dozen people? Probably not. If you're not building the product, you're not helping the company succeed. Ok, you're saving the company a bit of money, but at the cost of your time and the cost of the company actually getting product out the door, finding product-market-fit, etc.

Do you want to use your employee time setting up a HA database cluster and saving the company $1,000 per month or developing your app?

That's a key question: pay Linode $1,560/mo for a 3-node 32GB RAM cluster or launch 3 32GB boxes for $720, figure out PostgreSQL replication, make sure you setup the replication users, make sure you don't open any security holes, make sure you have Patroni or Stolon so that you can switch over when the primary fails, make sure you have etcd or Consul or something to handle that coordination, make sure that you have Barman or pgBackRest setup to take your WAL and persist it to S3, setup your S3 buckets, setup a full backup schedule so that you can restore easily, make sure that you're regularly testing your restores, make sure that you're testing your failover (do you even know if Patroni is actually working?), figure out how your app is going to cut over to the new primary when that happens (is there a shared IP that you need to move, are you using a proxy like HAProxy that's checking health-checks to see which it should proxy to, etc). Or would you rather just pay someone $1,000/mo so that you don't have to deal with that?

I hate paying up for something I feel like I should be able to do myself, but it does make some sense. If I decide I don't want to pay for Google Cloud Run and my servers all die, I can boot up some new boxes and get my app running again with some downtime. That's not great, but recoverable. If I don't want to pay for Google Cloud SQL and my servers die, now I'm hoping that my backups were working, that I can bring a much more complicated deployment back online than just some random process or container, etc. One of those two just carries more risk. Yes, backups should work and should be tested and you should even test backups in a managed service, but if you're a startup trying to move fast and find product market fit, I'm guessing that the premium is worth saving your engineers that time. I hate saying that because cloud providers are pushing such high margins, but it's probably true.

As a curiosity, do you run your own databases? If so, which? Do you find that everything Just Works or that it's a pain to get everything running, debugging things, testing backups, etc.? Is this in a high-traffic, commercial situation or just as a hobby? I think hosting your own database is relatively easy if you're just going to have one server and pg_dump -> rsync a backup nightly. In the rare event that you lose a server, maybe have an hour of downtime. If you're able to recover within 50 minutes and you lose a server every week, you're still at 99.5% uptime. If you can recover in 40 minutes and you lose a server every month, you're at 99.9% uptime. Do we need more? How often does a VM go down (note, don't say "I have 1,000 servers and I lose one every other day" - that would mean the average instance is lasting several years)? Won't Google's live-migration of VMs handle a lot of that?

So, it's trade-offs, but I think we're not living in a world where companies say "we're not going to architect for HA and we'll suffer an hour or two of downtime every year or two when we lose a box." Sometimes we certainly get ourselves into situations where we've made complicated systems that end up having complicated failure scenarios too.

Still, I think managed databases are an easy sell, even at their premium price point.

denvrede · on Dec 1, 2022

> That's a key question: pay Linode $1,560/mo for a 3-node 32GB RAM cluster or launch 3 32GB boxes for $720, figure out PostgreSQL replication, make sure you setup the replication users, make sure you don't open any security holes, make sure you have Patroni or Stolon so that you can switch over when the primary fails, make sure you have etcd or Consul or something to handle that coordination, make sure that you have Barman or pgBackRest setup to take your WAL and persist it to S3, setup your S3 buckets, setup a full backup schedule so that you can restore easily, make sure that you're regularly testing your restores, make sure that you're testing your failover (do you even know if Patroni is actually working?), figure out how your app is going to cut over to the new primary when that happens (is there a shared IP that you need to move, are you using a proxy like HAProxy that's checking health-checks to see which it should proxy to, etc).

Little bit off topic: It's funny that all the time you see comments on HN saying something along the line: why using Kubernetes, it's just overly complex. But after that are, most likely, returning to their self managed DB clusters that need all that maintenance and setup just listed. While, if they were using Kubernetes, they would have Operators or Helm Charts that do 90% of the stuff.

(Now back to topic) Don't get me wrong, I'll chose a managed DB all day every day over something self managed but sometimes (e.g. startup without a indefinitely amount of VC money) self management of infrastructure is the fastest and cheapest way to go.

ignoramous · on Dec 1, 2022

> Still, I think managed databases are an easy sell, even at their premium price point.

Except Fly Postgres isn't a managed offering unlike, say, CrunchyBridge or PlanetScale or Alloy or Aurora. It is pretty much a "Fly app" you'd have to tend to yourself.

hamandcheese · on Dec 1, 2022

But fly.io also isn’t charging a premium over the base infra cost (are they? I don’t see fly Postgres on their pricing page)

tptacek · on Dec 1, 2022

Fly Postgres is just a Fly app. It's open source; you can grab it off Github and deploy it yourself, we'll never know. No, we're not upcharging for Postgres.

kevin_nisbet · on Dec 1, 2022

The way I like to think of this is risks.

Different companies or even teams within different companies will have different risk acceptance. The thing with managed services, is part of the premium is your getting all the bells and whistles... but that may not be aligned with what customers need, that paying the managed premium is buying them.

So I suspect the important thing here is that customers realize what they're getting and have proper expectations set. In this case, unless I'm missing something, that you're not getting a managed database service from fly.io, you're getting an OSS tool that makes running postgres on fly.io a bit easier. Kind of like a database controller for kubernetes... helps you automate some things, but it's still just software running on your cluster.

And then it's up to those customers to decide, whether that's acceptable risk to them or not. Maybe a bunch of customers get to vote with their wallets on whether this works for them or not. Maybe there will be enough demand where some partner specializing in database tech like neon, crunchy, cockroach, etc will have a service targeting fly.io specifically, or maybe fly.io will get stuck building it themselves if customers demand it.

Lots of maybes, so at least I'll be interested to follow this and see how it develops.

nijave · on Dec 1, 2022

Not only that, you have to figure out how to fix the thing if the HA contraption breaks.

There's also security to think about (do you run a CA? how do you handle Postgres certificates, etcd certificates, rotation, revocation, etc).

Some hosted providers also have nice value-adds like query-level performance monitoring and DBA services

cplusplusfellow · on Dec 1, 2022

Let us put 3tb, let alone 4pb through this.

Can anyone tell me I should or shouldn’t trust I’ll be ok?

stuff4ben · on Nov 30, 2022

Using Stolon for PG is a poor choice. Up until just very recently, they haven't had any significant updates in a year. We've abandoned our use of it in favor of EnterpriseDB.

tptacek · on Nov 30, 2022

There's a bunch of our own engineering going into this. But: the good news about this whole situation is, if you have a clustering solution you like better: you can just use it. We "automate" Fly Postgres, but we don't "manage" it. Fly Postgres is using features of Fly.io that are available to anybody's application, not just ours.

asguy · on Nov 30, 2022

> we’re good at consul

Thank god someone is. I’ve lost more of my life to consul partition failures that’s any other part of the nutech stack.

atonse · on Dec 1, 2022

Yup our experience with brittleness of consul and nomad soured me on the HashiCorp stack even though I was very excited about it. We went from buying enterprise licenses to throwing it all out without ever pushing to production in the span of a year.

This stuff, for all the hype about raft and things, is brittle enough and requires enough special attention.

Which is why I’d gladly rather have it be fly.io’s problem.

tptacek · on Dec 1, 2022

For what it's worth, if I was deploying a bounded set of applications or services in a small number of geographies and couldn't use Fly.io for some reason, I wouldn't hesitate to use Nomad. Nomad is pretty great; it's like Flask to K8s's Django.

atonse · on Dec 1, 2022

I did love the simplicity of nomad.

And in general, nomad worked pretty well for us but our consul cluster kept mysteriously failing. I think that caused our nomad cluster to fail because it was backed by Consul.

The one complaint I did have about nomad (same as consul) was that the recovery process was manual where you had to manually generate a peers.json.

I was shocked when I saw that. Truly one of the "finding how the sausage is made" moments even though I've managed linux servers for two decades – I always assumed it would use zeroconf/bonjour/multicast DNS (remember cloud auto-join?) or something similarly elegant to auto discover other nodes in the network and just reconnect and rebuild a cluster. I mean what's the point of all this stuff if it can't be used to recover a cluster and just Do The Right Thing™? The shiny new experience is stellar (like sales, or setting up a new cluster), but the flip side (when things go wrong) is a mess. That's why we eventually said "nope!" to all the custom stuff and went with boring, plain vanilla ECS, which is itself too much now that we've started using fly.

Don't ever want to even think about having to hand-write a peers.json file to recover a cluster, boot things up, and pray to the ancient gods that it works.

We don't have time for that nonsense. Please, take my money, Fly/Render/everyone else. Your costs are a margin of error compared to what I had to pay a devops person to build our own stack. (I'm not even exaggerating. It was six figures. DevOps people are worth every penny but they cost many, many pennies.) Ultimately, we never used the infra.

I want to focus on building solutions for my customers and not fiddling with weird server stuff.

chucky_z · on Dec 1, 2022

how long ago did you use nomad? nomad integrates with consul but isn't backed by it. it's also pretty trivial to run consul in quite a shitty network environment by bumping up some of the settings (they should probably change their 'production' suggestions).

atonse · on Dec 1, 2022

We decided to retire the whole infra about 8 months ago. A lot of our consul complications happened about 18 months ago.

The consul clusters would keep failing in QA (they were running on t2.nanos – but that should be plenty of bandwidth for raft not to blow up every couple weeks, same happened with t2.micros too).

Before we pulled the plug we had started seeing something about ec2 health checks failing and the autoscaling groups yanking servers out and replacing them with new servers. but this is exactly the kind of case where consul should've just added the new machine right in. Instead, the 3 node cluster (now 2 nodes) would just sit there saying "hey I can't find a leader... aaaah. I can't find a leader" – well, to paraphrase Mike Myers on SNL, TALK AMONGST YUHSELVES and figure it out, there are two of you remaining.

madeofpalk · on Nov 30, 2022

They've had their fair share of pains with it, but they still seem pretty happy with it https://fly.io/blog/a-foolish-consistency/

tptacek · on Nov 30, 2022

Happy is a word. There are lots of words. I like words! You could be creative about what word could take the place of "happy" in that sentence, and probably still be correct. Get weird with it! Maybe "sanguine" would work. "Engaged".

The reality is: we've got a fair bit of experience with Consul at this point, we respect the hell out of it for the problems it was designed to solve, and we're unlikely to stretch it any further than we've already stretched it. Distributed lock service for Postgres clusters? Sure. Source of truth for all our app state? We've built our own thingy ("corrosion", a Rust distribute state system) to phase Consul out with. I'll get Jerome to say things about it.

subarctic · on Dec 5, 2022

>Sure. Source of truth for all our app state? We've built our own thingy ("corrosion", a Rust distribute state system) to phase Consul out with. I'll get Jerome to say things about it.

Looking forward to the blog post!

madeofpalk · on Dec 1, 2022

Sorry yes. 'Invested' was more along what I was thinking :)

WFHRenaissance · on Nov 30, 2022

What else would you include in the nutech stack? HashiCorp products in general?

chrisweekly · on Nov 30, 2022

I always enjoy Fly.io's blog posts; the friendly, casual tone coupled with real-world experience and strong technical chops of the staff add up to HN gold in my book. I'll def revisit their "treat pg as an app" approach next time I engage w stuff in that realm.

atonse · on Dec 1, 2022

I do have a question about this – (as I've said in another comment here) after investing a lot in building our own terraform/aws stack, we've started moving all our smaller apps to fly and it's downright delightful to the point where we are considering moving even our HIPAA-compliant software to Fly as soon as I saw the BAA option in your pricing, (and the only hesitation is that there is no Vanta integration and we'd probably have to fill out a bunch of stuff manually to make auditors happy).

But to my actual question: I remember seeing somewhere that Fly Postgres is not a managed database and isn't as fire-and-forget (somewhere in the docs), and that honestly scares me a bit. It shouldn't, because at the end of the day, RDS is probably downright hairy and ugly under the hood, it's just hidden from us.

I've also seen a handful of reliability issues on the forums around postgres. So what is Fly's position on actually running these clusters? Are you guys feeling pretty good about its stability for critical workloads?

On a related note, these folks that are continuing to use RDS via fly... what are the latencies like with a us-east-1 RDS? I know you're in Ashburn so it must be sub 5ms?

That itself would be worth keeping our RDS and moving everything else to Fly to be honest.

sb8244 · on Dec 1, 2022

I recently setup fly -> ec2 -> rds using their wireguard + pgbouncer template.

I'm getting 7ms RTT on us-east-1. Of course I messed up my setup the first time and had RDS in Ohio... That was 30ms.

You would probably get sub 5ms if you did public RDS connection. Don't know if that would ever pass an audit though.

atonse · on Dec 1, 2022

Yeah we’d never even consider adding a public ip to our databases (or web servers)

fifanut · on Dec 1, 2022

Improving Postgres is solid and boring.

Solid and boring is often a good choice. I'm glad to see startups in this space.

What's the latest on adoption of Spanner-like databases?

phamilton · on Dec 1, 2022

This was on HN a few months back: https://github.com/losfair/mvsqlite

While not Spanner, it is essentially an open source db like AlloyDB or Aurora, pushing replication and scale out to the storage layer (in this case via FoundationDB). The most interesting bit of mvsqlite is it's multi-writer capabilities, using FoundationDB to perform page-level locks.

I'm neither the creator nor using it in production, but I'd love to see more DBs using FoundationDB as storage. It's a pretty cool solution.

ignoramous · on Dec 1, 2022

> This was on HN a few months back: https://github.com/losfair/mvsqlite

The lead developer on mvsqlite has since joined Deno...

dikei · on Dec 1, 2022

Never used Stolon, but I've hand good experience using Patroni to auto-manage the fail-over of multiple PostgreSQL clusters.

smallerfish · on Nov 30, 2022

This is tangential - anybody have a reasonably clean way of running JVM apps on fly?

Given that the deploy model of a jettified jar is so clean in comparison to the various hipster stacks ;) that they do have primary support for, I'm not sure why they don't have much in the way of documentation for it. I did find a support thread that refers to https://archive.is/o9YE1, which seems like a fairly gross and opaque sequence of steps.

@tptacek, what are the chances that somebody at Fly could produce a more streamlined recipe and/or documentation for JVM apps?

garganzol · on Nov 30, 2022

If you avoid buildpacks altogether and just go straight to Dockerfiles then life suddenly becomes good. Speaking from my own experience.

Scarbutt · on Nov 30, 2022

they support docker as a second class citizen.

michaeldwan · on Nov 30, 2022

Docker (or rather OCI) is our first class citizen. The launchers for phoenix/rails/etc just generate dockerfiles and config by inspecting source code.

satvikpendem · on Dec 1, 2022

Any way to run Fly with multiple locations at the same time? For context, I want to build a simple uptime service which tells me when my website is down, but for that I don't want to use a single VPS, I want to load balance between several locations and servers in case any one of them goes down.

Am I supposed to be looking for serverless deployment? Deploy a master/parent version of the app on one main VPS then several sub versions on other VPSes distributed around?

tptacek · on Dec 1, 2022

You can scale a Fly app to an essentially arbitrary number of instances (`scale count x`), and deploy in any number of regions (`regions set nrt syd sin fra`). We'll load balance between instances in nearby regions.

satvikpendem · on Dec 1, 2022

Thanks Thomas, glad to see you here. For the regions, will I be able to know programmatically in which region a particular instance of the app is running? For reference, I want the users of my service to know that we tested their site in regions A, B and C, and that it's currently up in A and B but down in C and show them that in the web dashboard.

tptacek · on Dec 1, 2022

Yep, it's in the environment: `FLY_REGION`.

satvikpendem · on Dec 1, 2022

Thanks, appreciate it! By the way, did you start at Fly recently? I seem to recall you had your own company or something like that before.

tptacek · on Dec 1, 2022

I've been at Fly.io since early summer 2020. I've had a couple of companies before that. None of them as fun as this one! :)

solarkraft · on Dec 1, 2022

I have a somewhat unusual use case: A publicly available Postgres server with PostGIS for (infrequent) use with QGis.

I understand why Fly doesn't want customers to expose the database to the rest of the internet (ecosystem and such), but am very happy that Railway allows it.

DAlperin · on Dec 1, 2022

We allow this now: https://community.fly.io/t/new-proxy-handler-pg-tls-postgres... :)

langsoul-com · on Dec 1, 2022

Fly is pretty cool whilst free.

Very easy to setup stuff. Would recommend using prepaid credits because there's no capped billing.

Getting closer and closer to heroku deploy and auto config handles everything.

tobase · on Dec 1, 2022

We are actually migrating from Fly because of bad performance and unreliability on the pg services.

Still love the company tho but can’t support it more :/

jacktheturtle · on Nov 30, 2022

the site is down now