Could you elaborate on the load balancing? Is this a replacement for PG bouncer ...

dalyons · on May 25, 2023

IMO the biggest reason folks use pgbouncer is not for load balancing (which it can do, -ish) but instead for connection pooling. Postgres connections are expensive for the db server (one process per connection not one thread) so if you have say thousands of web application pods you need to use pgbouncer or similar as a proxy to multiplex those thousands of connections down onto a more manageable number (~200). So no, not really.

(EDIT: if you don't know this already - the _establishment_ of connections is also super expensive. so another reason to pgbounce is to keep connections persistent if you have app servers that are constantly opening and closing conns, or burst open conns, or such like. Even if the total conns to pg doesnt go super high, the cost of constantly churning them can really hurt your db)

Nextgrid · on May 25, 2023

Out of curiosity, if the problem of connections being expensive is solvable by PGBouncer-style connection multiplexing, why doesn't Postgres just do that by itself?

anarazel · on May 25, 2023

If it were easy to do well in-core, I think we'd do it immediately. Unfortunately the interesting pooling modes in pgbouncer also break a few things - which would likely not be acceptable in core postgres. Avoiding such breakage requires non-trivial architectural adjustments.

Historically connection state and "process state" have been tightly coupled, for good server-side pooling they have to be divorced. While good pooling is doable with the current process model (passing the client file descriptor between processes using SCM_RIGHTS), it's much harder with processes than with threads - this is one of the reasons I think we will eventually need to migrate to threads.

Eventually I want to get to a point where we have a limited number of "query execution workers" that handle query execution, utilized by a much larger number of client connections (which do not have dedicated threads each). Obviously it's a long way to go to that. Ah, the fun working on an complicated application with a ~35 year history.

There also are use cases for pgbouncer that cannot be addressed on the server-side - one important one is to run pgbouncer on "application servers", to reduce the TCP+TLS connection establishment overhead and to share connections between application processes / threads. That can yield very substantial performance gains - completely independent of server side pooling support.

dingledork69 · on May 25, 2023

Because pgbouncer's transaction-based pooling, which is what the previous poster was referring to, breaks a few postgres features. This is fine for most applications, but not all. See the table on https://www.pgbouncer.org/features.html

Tostino · on May 30, 2023

I really wish there was a way to handle transaction pooling without losing session pooling features.

Essentially, a "please serialize everything (temp tables, SET GUC values, etc) from this session to disk and load it back when necessary".

dalyons · on May 25, 2023

yeah good callout. I'd would be totally fine with a server mode, or connection option, that opts you into pooling but then disables the incompatible features. You choose pooled mode if you need it, and the tradeoffs are documented.

dalyons · on May 25, 2023

I believe there has been talk and perhaps progress towards building it into PG, but i cant find the dev thread right now, and i dont know what the status of that is.

interestingly enough this is what Oracle does AFAIK. They are also process-per-conn & have an optional sidecar proxy thingy that you can run on your oracle host to do the pooling. I would rather it be built more tightly into the rdbms but thats not a terrible solution.

danmcs · on May 25, 2023

There were changes made to idle sessions in postgres 14.0 that were supposed to reduce the resource usage of open but idle connections.

Crunchydata mentioned it on their blog a while back (https://www.crunchydata.com/blog/five-tips-for-a-healthier-p...) and the pg 14 release notes mention a few changes to idle sessions (https://www.postgresql.org/docs/release/14.0/)

I don't know if they were sufficient that pgbouncer is no longer necessary, haven't had a need to try it.

jake_morrison · on May 25, 2023

Postgres forks an OS process for each connection, which is relatively heavy weight. Oracle has a similar architecture to Postgres, and they solved it with a "multi-threaded listener". MySQL, in contrast, uses threads, which makes connections lighter weight.

It's fundamentally a question of how the connection listener communicates with the rest of the database, e.g., using shared memory or some other IPC mechanism, work queues, etc. Having too many connections results in problems with concurrent access and lock contention independent of how heavyweight the actual listening process is.

tracker1 · on May 25, 2023

That's pretty much been my take as well. Been looking at CockroachLabs (CockroachDB hosted cloud), and been kind of hard to get any detail if pgbouncer is beneficial for this, since I'm planning to connect via node_compat via Cloudflare Workers, and have kind of decided to punt the issue and connect directly for now in development, and closer to prod release do some tests with/without pgbouncer at each cloud location for the larger prod cluster instances for release.

Not the same service(s) as PG even if it's the same protocol, so I know it's really beneficial for connection queueing WRT my scenario for PG, but no idea on the CDB side.

ttfkam · on May 26, 2023

Postgres 14 already made huge improvements on this front.

https://techcommunity.microsoft.com/t5/azure-database-for-po...