I always see people recommending external pools like this here on HN. I'm wonder...

inopinatus · on Oct 27, 2020

They serve somewhat different purposes. For example, application-internal connection pools won't shield the database server from a rapid scale-out of application instances: n * the instance count will still result in n * connections.

I've always considered the #1 purpose of internal pools to be minimising latency, by moving the database connection handshakes outwith the critical path.

anarazel · on Oct 27, 2020

> I always see people recommending external pools like this here on HN. I'm wondering a bit about that as many frameworks/libraries that use Postgres already implement an internal connection pool. Is this recommendation generally assuming that multiple applications will access the server? Or are there other reasons to prefer an external pooler over internal pooling in your application?

It's pretty common to want both, particularly for larger installations. If you have more than one or two application servers the number of connections held by the "internal" pools can become pretty large. With naive configuration you often end up with each of the internal pools being substantially over-sized, due to handling some momentary imbalance between the application servers.

In particular if your workload allows to use transaction (or even statement) level pooling, you can get very large boosts by having those two levels of pooling. The application pools ensure the connection overhead is not incurred, the external pooler ensures the total number of connections is manageable.

hans_castorp · on Oct 27, 2020

> Is this recommendation generally assuming that multiple applications will access the server

That's very often a reason to use external pools, yes. Or just think of a single web application that runs on multiple nodes to distribute the load on the application servers. You either carefully configure each connection pool or you use an external pooler to which each node connects to (rather than directly to the database)

viraptor · on Oct 27, 2020

Even if you configure it carefully, you have the limit of: your db connection capacity < app servers X max internal pool size.

Every time you add another instance, you're not adding a "happy path number" of connections. You're adding up to the internal max, because at some point you will get spammed with real or bot traffic.

It's easier to aggregate the connections than carefully balance that setup every time you make infra changes or scale up. And you need some headroom for "oh, this point release of the library spawns a thread now and gets its own connection pool... oops".

user5994461 · on Oct 27, 2020

>>> I'm wondering a bit about that as many frameworks/libraries that use Postgres already implement an internal connection pool.

That's precisely the problem.

The usual misconfigured python/ruby application will throw 16 connections at the database.

The model for these languages is to deploy one process per core, so there will be 8 concurrent processes running. 8 cores * 16 pool = 256 connections total.

Multiply by N parallel servers and that's even more connections.

There are issues with large amount of connections and postgresql, hence people decide to put pgbouncer in front of the database to mitigate.

throwaway189262 · on Oct 27, 2020

Its for IMO, crap languages that don't have proper threading support. Every language with native decent threading implementation has its own connection pool in the client or as an add-on library