Hacker News new | past | comments | ask | show | jobs | submit login

Sharding and failover support is really cool indeed!

But I've always struggled to understand the use cases of pgbouncer. In most cases, you want to front your database with a service and don't want many clients connecting to it directly. And I've seen fairly large workloads easily managed over a small number of connections, so a good client-side pool implementation should easily suffice. In what sort of situations does pgbouncer come into play?




It’s pretty easy to exhaust a DB if you have many replicas of a service connecting, each at high concurrency. 20 replicas at 20 connections each is enough to hit limits (like on Cloud SQL for GCP)

As you say, you can write a service to stand in front of the DB and mediate access, but PgBouncer is already written AND production ready


Another reason you would want a service in front is for schema management. I suppose it depends on the use case at hand.

Also, 20 connections per service process seems pretty high. Unless there are many long running queries and clients are willing to wait long times for responses.


Schema management is not too hard at small numbers of devs. We have a monorepo and a well defined data access layer in the monolith, so simple migrations, grep, unit tests, and frequent deploys are enough to handle 99% of needs

The harder part is ensuring people don’t hold onto connections during long-running operations. In some client libraries, it’s easy to accidentally hold one open. That plus autoscaling results in ticking bombs that are hard to catch with automated tests


Our backend can open hundreds/thousands of DB connections per service process. Those queries take ~100ms to ~3s to run. Is that "long-running?" It seems average to me — complex OLAP queries are the comparative advantage of RDBMSes like Postgres, after all. (These queries are not CRUD operations, certainly.)


I have run a large distributed app that was somewhat microservice based, but also had a large central sharded set of postgres tables. About 10% of the total CPU usage was in the postgres servers, the rest was in the frontends, and they handled many hundreds of requests per second per instance, maintaining a pool of open connections to Postgres. The mulciplicative effect meant that each postgres server would have had thousands of connections open to it - Or we could use pgpooler and have only dozens of actual connections, with pgpooler doing lightweight pooling between the many frontend instances that might be using just one or might have a few dozen requests in flight. Load on each frontend to each database was pretty random - Bell curved, but the highs and lows were enough that the N frontends we needed times the M max concurrent requests that a frontend was likely to have was way outside of the budget for max concurrent connections for postgres, even having tuned the per-connection memory buffer.


> so a good client-side pool implementation should easily suffice.

There are some architectures that tear-down all resources at the end of every requests, e.g. PHP, or lambda-like things, so you can't use a connection pool, because the "pool" would only exist for that request.

pgbouncer can be used locally too, however.


I've got a PG setup that has a bunch of clients in different languages (shell, Perl, PHP, Python, JS and Go (don't ask)). So, rather than pool in each language, pgBouncer was a quick and easy setup in front - and everyone gets it - even software we're not allowed to modify.

E: Oh, and how could I forget all the mod_php based scripts for these web-app front-ends.


Sorry, I must ask. shell?


Shell scripts that invoke `psql` or other `pg_*` tools.


Bash scripts most likely


We use Pgbouncer in in our Apache Airflow cluster.

During busy time, Apache Airflow can launch hundreds of workers, each of them need at least 1 connection to Postgresql to save its result. Because each worker run in its own process due to Python's lack of support for threading, client-side connection pool cannot be use effectively.


Some shops don't design like that :) some folks build giant monoliths and then have to scale/deploy that app with it's database connections broadly.


Believe that Django used to be rough with postgres so pgbouncer was an easy way to mediate between the two.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: