Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: PgCat, Postgres pooler with sharding, load balancing and failover (github.com/levkk)
233 points by levkk on Feb 9, 2022 | hide | past | favorite | 54 comments



So I thought it would be kind of cool to take Pgbouncer to the next level. I've been using it with large database clusters for a while and it's been spectacular, but it does have some limitations that always seemed arbitrary.

This is my take on what a modern Postgres pooler can be. Besides supporting same session and transaction pooling modes as Pgbouncer, it also adds support for load balancing between replicas (round robin at the moment), failover in case a replica fails a health check, and the coolest thing yet I think: sharding at the pooler level.

It's written in Rust, which I think makes it much easier to iterate on and improve.

Note in case it wasn't obvious: this is super experimental, please don't use this in your production environment unless you're adventurous. In which case, please do and let me know how it goes!

- Lev


You might want to consider a different name, as there's already a pgcat project which appears to be a logical replication extension to augment native logical replication.

https://github.com/kingluo/pgcat


I am really excited to see this project. I can't tell you how many times my team at work has hit a limitation with pgbouncer and swore up and down we should just re-write it in rust.

Looking forwards to where the project goes!


Sharding and failover support is really cool indeed!

But I've always struggled to understand the use cases of pgbouncer. In most cases, you want to front your database with a service and don't want many clients connecting to it directly. And I've seen fairly large workloads easily managed over a small number of connections, so a good client-side pool implementation should easily suffice. In what sort of situations does pgbouncer come into play?


It’s pretty easy to exhaust a DB if you have many replicas of a service connecting, each at high concurrency. 20 replicas at 20 connections each is enough to hit limits (like on Cloud SQL for GCP)

As you say, you can write a service to stand in front of the DB and mediate access, but PgBouncer is already written AND production ready


Another reason you would want a service in front is for schema management. I suppose it depends on the use case at hand.

Also, 20 connections per service process seems pretty high. Unless there are many long running queries and clients are willing to wait long times for responses.


Schema management is not too hard at small numbers of devs. We have a monorepo and a well defined data access layer in the monolith, so simple migrations, grep, unit tests, and frequent deploys are enough to handle 99% of needs

The harder part is ensuring people don’t hold onto connections during long-running operations. In some client libraries, it’s easy to accidentally hold one open. That plus autoscaling results in ticking bombs that are hard to catch with automated tests


Our backend can open hundreds/thousands of DB connections per service process. Those queries take ~100ms to ~3s to run. Is that "long-running?" It seems average to me — complex OLAP queries are the comparative advantage of RDBMSes like Postgres, after all. (These queries are not CRUD operations, certainly.)


I have run a large distributed app that was somewhat microservice based, but also had a large central sharded set of postgres tables. About 10% of the total CPU usage was in the postgres servers, the rest was in the frontends, and they handled many hundreds of requests per second per instance, maintaining a pool of open connections to Postgres. The mulciplicative effect meant that each postgres server would have had thousands of connections open to it - Or we could use pgpooler and have only dozens of actual connections, with pgpooler doing lightweight pooling between the many frontend instances that might be using just one or might have a few dozen requests in flight. Load on each frontend to each database was pretty random - Bell curved, but the highs and lows were enough that the N frontends we needed times the M max concurrent requests that a frontend was likely to have was way outside of the budget for max concurrent connections for postgres, even having tuned the per-connection memory buffer.


> so a good client-side pool implementation should easily suffice.

There are some architectures that tear-down all resources at the end of every requests, e.g. PHP, or lambda-like things, so you can't use a connection pool, because the "pool" would only exist for that request.

pgbouncer can be used locally too, however.


I've got a PG setup that has a bunch of clients in different languages (shell, Perl, PHP, Python, JS and Go (don't ask)). So, rather than pool in each language, pgBouncer was a quick and easy setup in front - and everyone gets it - even software we're not allowed to modify.

E: Oh, and how could I forget all the mod_php based scripts for these web-app front-ends.


Sorry, I must ask. shell?


Shell scripts that invoke `psql` or other `pg_*` tools.


Bash scripts most likely


We use Pgbouncer in in our Apache Airflow cluster.

During busy time, Apache Airflow can launch hundreds of workers, each of them need at least 1 connection to Postgresql to save its result. Because each worker run in its own process due to Python's lack of support for threading, client-side connection pool cannot be use effectively.


Some shops don't design like that :) some folks build giant monoliths and then have to scale/deploy that app with it's database connections broadly.


Believe that Django used to be rough with postgres so pgbouncer was an easy way to mediate between the two.


The performance is currently behind pg_bouncer though.. ofcouse the project is much more early stages, just want your thoughts.

Where should new people contribute or can help most ( I do golang and am good with k8s stuff ) all the best!


As a fan of Rust, this seems like the perfect application domain for "fearless concurrency". Having gone through the bowels of a few different connection pool implementations in various languages (because they had bugs in production), I can tell you the concurrency management becomes extremely difficult to reason about as the feature complexity increases. I think you're making a good bet that offloading a good chunk of that to the compiler will improve your ability to iterate in the future.

GoLang seems like the language du jour, but I think this post [1] really illustrates how powerful it can be when you have a compiler that handles so much more for you. It's not just about performance. Deadlock detection and a robust compile time set of error checks are huge wins over looser languages.

1: https://fasterthanli.me/articles/some-mistakes-rust-doesnt-c...


This is actually pretty amazing. I wouldn't normally trust a single author implemented project to sit between my application and my database, since that is a rather critical single point of failure. However, a grep that fails to find `unsafe` in a rust code base lets me know that this code is in fact pretty safe[1]. I don't need to trust the author, as much as I trust the Rust core team, and LLVM.

Since I can see basic functionality exercised in the test suites, and trust that Rust has validated its more general correctness, this gives me more confidence in your project than I would historically expect to be able to gather in such a short amount of time.

1: https://github.com/levkk/pgcat/search?q=unsafe


Being in Rust or not the code has 0 tests, write arbitrary bytes, hard code values etc ... looks like more a PoC project that something that going to replace pgbouncer any time soon.


It's definitely a PoC. I started it two weeks ago. Overall I've been super happy with Rust because it was a great safety harness and it gave me a fast concurrent program. PgBouncer implements this in about 30k lines of C, while I managed to get 80% of the way and then some in less than 1k.

As for tests, it's covered with pgbench, python (psycopg2), ruby (pg / active record gems), psql and a couple inline tests for trivial things like config.

As for bytes and hardcoded ints...give pgbouncer's and postgres' source code a look :).


Yep the readme is pretty clear it's a POC "Alpha: don't use in production just yet."

That said I've struggled with more than my fair share of pgbouncer misconfigurations and outright bugs. I personally would love a simpler (harder to misconfigure) and less buggy alternative.


> a grep that fails to find `unsafe` in a rust code base lets me know that this code is in fact pretty safe

This statement does not make any sense. That, or we have vastly different definitions of what “safe” means.


When you remove the context around statements (thread about rust concurrency), it's easy to make a statement into overblown nonsense. You may be interested in the broader analysis contained in the link from parent comment, about the type of safety being discussed in this thread, or you may not.


yeah maybe it safely deletes all your tables at midnight


The “unsafe” keyword in Rust is a source of perpetual oblivious overconfidence. It should’ve been called something that befits it’s actual narrow scope (`disableStongTypeChecks` or some such) but that ship has sailed.


"unsafe" is pretty clear: the block/function is not safe, that doesn't mean the rest is safe. I suppose it might have been better named as "asserted", "assumed_safe", or "manually_checked", but that's much more verbose and not necessarily clearer.


These blocks are supposed to be rare, so a more verbose keyword isn't a problem. It would be better to be precise here and avoid a concept so general as safety.


This is really interesting and exciting to see. Though it probably has a ways to go before being production ready. We've seen to date a few other efforts at connection poolers and balancers in Postgres and every time come back to pgbouncer being the one that works best. While it has its issues, pgbouncer does indeed work. Others we've explored tend to break in unclear ways.

That said some of the things this one looks to solve are indeed very interesting.

I've seen issues at times with pgbouncer being single threaded. Since it's only running against a single core it is possible to peg that core and have a max throughput. While not common we have seen this for customers before, both at Crunchy Data[1] on and back at Citus.

The other interesting piece is sharding. We considered this approach for years at Citus and never managed to pull it off. If you can specify a identifier and it can route accordingly it can give you a lightweight option for sharding really quite easily.

Again, probably has a ways to go, everything we've tried over the last 10 years has failed to live up to a replacement for pgbouncer. But pgcat is attempting to do I'm fully supportive of if it can accomplish it.

[1] On Crunchy Bridge www.crunchybridge.com we have pgbouncer built-in, with multiple pgbouncer support to alleviate single threaded issue.


Alternative: yandex/odessey

https://github.com/yandex/odyssey


I spent some time playing around with odyssey and it seemed like every other situation not explicitly experienced or anticipated by the author would result in a segfault just like that.


it doesn't have sharding and failover..


This is great. Let me know how I can help


1. rename the project, there's already pgcat. 2. bring it to production level, i would love it :D 3. Are there things like gis or jsonb which would break with it, or does it just work?


I'm really impressed by how lean this code base is. Is PgBouncer also this lean? Looks like auth is the only thing left here then, aside from maybe some more testing? Very cool to see!


Judging by Cargo.lock, it has about 60 dependencies.


You got a good laugh out of me with the meow in your description.


Anyone have a good way to understand Postgres a bit better?

So i can better understand sharding/load balancing/failover/indexing etc.


This is so awesome.

For those wondering why a pooler is needed, Tl;DR containers. It's much more normal to have hundreds of containers running an application configured to have tens of connections. Now you have thousands. Each connection costs postgres something on the order of ~2.5MB ram, so once you get into the thousands you're starting to talk real numbers = ~2.5GB

Problems I've had with pgbouncer in the past:

- stats. Any modern tool should natively emit its own statistics in something like statsd format. - pgbouncer is a single process - I can't throw more CPU cores at it to make it faster.

Problems that I'm still struggling to solve...

We use RDS/Aurora which uses DNS to fail over. Yes applications should handle this. Yes they should retry. Yes devs should have tested this. But in any large company this is a uphill battle involving many teams and apps. Much easier to introduce a proxy layer to handle fast failover. Even better if it can re-try queries that fail transparently to the application.


Pgbouncer does have stats, you can get them using the `SHOW STATS` family of commands. You can then either manually scrape them or use something like pgbouncer_exporter[1] for Prometheus.

[1] https://github.com/prometheus-community/pgbouncer_exporter


Thank you! Retries are on the road map.


Postgres is the greatest.


Agree. I've been hoping for a PostgreSQL extension with in-built sharding (think Netezza or Teradata). I know this is ambiguous so a low-effort definition is in order: a tightly bound cluster (nodes are aware of each other and share data to fulfill queries) where you specify distribution for a table but there is no explicit rebalancing command. Admins can add nodes and the user is none-the-wiser (except for improved performance, of course). Cross-node joins work (reasonably) well. I've been watching Citus for a while but - unless I'm misunderstanding - the sharding is a bit more explicit and sometimes manual.


(Ozgun from Citus / Microsoft)

Hi there, thanks for mentioning Citus. Could you share a bit more about the user experience you're looking for with sharding?

With Citus, you create your Postgres table as-is. If you'd like for the table to be distributed, yes, you'd need to pick a distribution key. You'd do this by calling: SELECT create_distributed_table('postgres-table-name', 'distribution-column');

We also thought about picking a distribution key on behalf of the user. This however has performance implications, particularly as you add more nodes to the cluster.


Not only is not the greatest, it's literally the worst database technology that's in use for the things PgCat is trying to solve. And that's not a knock on PgCat all (or on Postgres!), it's a knock on the "golden hammer" type worship of Postgres that's been the zeitgiest for the past few years.

The state of the art for sharding, connection scaling, and failover for Postgres is far behind everything else. MySql has Orchestrator and Vitess, the NewSQL systems are doing lots of interesting stuff with replication and sharding, etc. etc.

edit: Look at the work that Notion had to as of just 3 months ago to shard Postges: https://www.notion.so/blog/sharding-postgres-at-notion. Maybe they would make the same choice over again, and that's fine, but doing stone age level work to shard a database in 2021 doesn't jive with the whole "just use Postgres" idea to me.


Most application never reaches the scale that requires sharding. I'll just use the quote from the notion blog as arguments against automated sharding solution like Vitess.

>> Besides introducing needless complexity, an underrated danger of premature sharding is that it can constrain the product model before it has been well-defined on the business side. For example, if a team shards by user and subsequently pivots to a team-focused product strategy, the architectural impedance mismatch can cause significant technical pain and even constrain certain features.

>> During our initial research, we also considered packaged sharding/clustering solutions such as Citus for Postgres or Vitess for MySQL. While these solutions appeal in their simplicity and provide cross-shard tooling out of the box, the actual clustering logic is opaque, and we wanted control over the distribution of our data.²



When you’re at the point that sharding is an important consideration, "just use X" doesn’t apply for any X anymore (except "the expertise in your own team" etc).

The advise is clearly targeted at those with a scale orders of magnitude smaller than Notion.

I don’t think anyone is claiming "you should just use Postgres" without knowing about your constraints. And if they do, I guess that’s a useful heuristic to know when someone doesn’t know what they’re talking about.

It’s still good advise for the 95%, even if you yourself fall outside of that. Part of the reason you get paid well is to know when advise applies and whet it does not, and to not waste energy arguing in either case.


Postgres has focused on being a stellar database, not on being a stellar exascale application platform. It deserves the love it's getting on the database side, and for most people the weak exascale story isn't a problem.


Sure, but this post is specifically about a tool that relates to that weak story. In the context of the use cases PgCat is trying to address, Postgres is most definitely not "the greatest". Apparently, that's a controversial statement :)


I don't think calling out the far end of the postgres scaling story is controversial. On the low end, you can add more hardware or improve queries/indexes, and the mid level works fine with a write master and read replicas, but once you get into multi-master/sharding it's definitely more involved than other systems, and if you just use postgres as a dumb data dump, maybe the value isn't there. It's all about the right tool for the job.

Of course, there will always be fanboys who take things too far.


Postgres deficiencies you describe can be solved with elbow grease

MySQL's cannot


That's backwards IME. MySQL has bad defaults and bad error reporting, but you can work around these things by being careful about the configuration of every instance and making sure you always use strict modes and check for errors after each statement. Postgres has generally better defaults and better behaviour upfront, but things like true (master-master) HA or particular secondary index behaviour that's needed for some high-performance workloads are just impossible.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: