It’s unfortunate pg is unable to maintain large numbers of connections open, nec...

osigurdson · on Sept 11, 2021

This appears to be improving as well - especially idle connections.

https://pganalyze.com/blog/postgres-14-performance-monitorin...

fulafel · on Sept 11, 2021

Interesting that the graph goes up to 10k connections without any cliff in throughput, I wonder what use cases people have beyond that. Or maybe the issue with idling connections is memory usage?

pgaddict · on Sept 11, 2021

Yeah, the absence of a sudden cliff is very nice. I think we fixed the main causes back in ~9.5. But the gradient at the end is pretty clear, and even at 10k the throughput is already less than 50% of the max. And it's dropping faster and faster.

It's also important to keep in mind this is an extremely simple workload, essentially just read-only point queries, on about 3GB of data.

jasfi · on Sept 11, 2021

Postgres is improving rapidly. I started developing my own DB a few years ago, but decided to do something else. One of the main reasons was because of the progress this DB is making on all fronts.

potamic · on Sept 11, 2021

When does a database need to maintain large number of open connections? You really don't need a lot of connections to serve high throughput. The database system typically becomes a bottleneck before the connections do. A good client-side pooling implementation will manage and limit connection usage.

latch · on Sept 11, 2021

Say you have something really simple, web -> db. But at a minimum, you have 2 web servers for HA. So you're already at 2 client-side pools.

Now make it a tiny bit more complex and split this into 2 services, so you're now at 4 client-side pools. Should we make it 4 services instead? Or do you see where I'm going with this? (then add ad-hoc scripts / crons, ...). What if we do make it 4 services and some of them are being deployed dynamically, and at peak, you might have 8 instances of some of them.

Now, say you're at the point where you want to run some on-demand reporting queries on your DB, but you're only talking about tables with 10-100million rows - not quite the point where you want to manage a reporting-specific database. So maybe you need a slightly larger pool because some of the queries can take over a minute.

Now, work_mem is a global setting, but it's super important and you really want to set it to 16MB (but 32MB would be ideal for a few of your cases). Drats, now you're getting squeezed on the most expensive hardware component: memory.

marcosdumay · on Sept 11, 2021

Well, 32MB * 400 clients ~= 12GB. It's not an irrelevant amount, but if you have that much load on your DB, you should have a beefy machine anyway, and that for one of the 2 main memorysets isn't any extravagant (and the shared memory is much smaller). In fact, I wouldn't even try to run SQL Server on that amount (not to talk about Oracle).

But yeah, if you go full microservices automatically deployed over kubermenets all connecting to the same place, you will need something in between so your DB server doesn't get crazy. This setup would break any central DB anyway, it's just that different DBMS would break for different reasons.

pgaddict · on Sept 11, 2021

But no one says you need a separate connection pool for each client application. There are cases when it's the right thing, but you can just as well have a single connection pool for each PostgreSQL instance (and all apps will go through that). So you absolutely don't need "8 instances" at peak or whatever.

The problem with Postgres vs. many connections is a simple combination of two simple facts.

Firstly, Postgres has a dedicated process managing each connection - which significantly simplifies some of the programming because most resources are "private" etc. This made perfect sense back when machines very few cores, threading was much more expensive, etc. And development time is always precious, of course.

Secondly, people often don't distinguish between active and idle connections. But a system with X cores reaches top throughput at 2X-3X active connections, and at some point the throughput tanks because the active backends have to share resources (CPU, work_mem, etc.).

And the database and/or DBA has to manage that somehow - if you have 128GB of RAM available for query processing, it matters if you allow 100 or 10000 connections. With 100x more connections you can't allow the backends to use as much memory, which probably means less efficient sorts etc. You may assume most connections will be idle at any given time, but that may easily change (a bug in new app version or whatever), storming the DB, exhausting resources, etc.

Moreover, some (fairly hot) parts of the transaction management code need to walk active connections to check visibility etc, and the more connections you have the more expensive this is. And there are probably more places with similar behavior. We've fixed / optimized lot of them, because the number of cores is growing and it was hurting even "reasonably" configured instances, but that gets you maybe to 500 - 1000 connections (?), not 100k.

And I don't think that'll change anytime soon, so your best bet is still a connection pool.

Now, I'm not claiming it's ideal - but there simply are historical and practical reasons why things are designed the way they are, and it's not a matter of swooping in and optimizing one or two places.

potamic · on Sept 11, 2021

At that point you really want to front all database access via a service otherwise it's calling for other problems with schema management, security, resource quotas etc.

You can even have separate pools for different use cases like reporting if you don't want a separate server. But each pool really doesn't need more than like 5 connections, even with multiple pools and multiple replicas of your service it will rarely add up to the hundreds.

user5994461 · on Sept 11, 2021

Modern web frameworks generate a lot of connections.

Let's say, a typical application may run on 4 instances each one with 16 cores. That means a total of 64 processes because one process per core.

Each process opens 16 connections to the database because bad configuration or default to the number of cores. We're now facing a total of 1024 SQL connections out-of-the-box for nothing.

That's the reason databases are commonly subjected to insane amounts of connections.

pgaddict · on Sept 11, 2021

The fact that some other part of the software stack does something silly does not mean the database has to cater for that. If it's a misconfiguration, fix the misconfiguration.

avereveard · on Sept 11, 2021

When you have junior devs insisting in using orm, then the connection setup costs become an issue

moonchrome · on Sept 11, 2021

Don't ORMs use connection pooling by default to avoid these issues ?

saurik · on Sept 11, 2021

Is this really such a problem? I feel like you'd want pgbouncer anyway in order to mitigate connection establishment latency. Like, in my infrastructure I generally have not one but two layers of pgbouncer, one on either side of the network, so the topology of underlying network connections could be reasonably static.

tomnipotent · on Sept 11, 2021

You'd be surprised how websites are out there running Wordpress/phpBB and managing 100+ concurrent connections with per-process pooling.

castorp · on Sept 11, 2021

"hundreds" of concurrent connections isn't really a problem for Postgres.

Several thousands are - at least up until now. V14 will improve this substantially.

tomnipotent · on Sept 11, 2021

Plenty of people are running on VPS or other shared hosting, with generally limited resources.

pgaddict · on Sept 11, 2021

Well, I'd argue that's more an issue of the web application. If you're on a system with limited resources, and the webapp insists on opening hundreds of connections, assuming they're free of charge, it's a bit silly.

Anyway, this is unlikely to change anytime soon, given the Postgres connection/process model. We're continuously improving things, but each connection has some non-negligible costs (CPU, memory), so in those cases a connection pool is still a good idea.

mnahkies · on Sept 11, 2021

Sometimes this is a symptom of an oversized connection pool at the application level. It might be worth monitoring for how many of your connections are actively used Vs idle - you might be surprised

Otherwise I find pgbouncer to be a good solution when it's needed

sojournerc · on Sept 11, 2021

I find pooling is effective for applying back-pressure on clients. If it can't handle the connections it is likely under-provisioned.

Pooling can be managed in client server processes instead of using pg_bouncer

fffweiuh332 · on Sept 11, 2021

PostgreSQL made a poor decision to use processes instead of threads for connections. Processes are much more expensive, so you can't create as many connections as in other DBMSes.

pgaddict · on Sept 11, 2021

Sorry, but that ignores about 99% of the context when that decision was done.

Postgres started in early 90s (1996 is the first open source release). We may have fast threading libraries now, but that was not the case when the decision was made. Moreover, Postgres aims to support a wide range of Unix-like platforms, and the maturity of threading varies quite a lot.

Plus the processes are often easier to develop with, exactly because they share less state, are easier to debug, etc. They may be more expensive to create, but that's mostly irrelevant for long-running connections. Threads may allow sharing more state, but that requires more locking which is not great either. And debugging with threads ... ewwww.

And it's not like switching to threads magically makes things cheaper. There are benchmarks comparing Postgres with databases using threads, and it's not like Postgres loses. See for example https://www.percona.com/blog/2017/01/06/millions-queries-per...

Now, I'm certainly not claiming processes are perfect, but presenting them as obviously "poor decision" is just wrong.

There's a bunch of reasons why processes vs. threads is ultimately not the main issue. For a general purpose database (like Postgres), the big challenge of supporting large number of connections is pretty much resource management - how do you distribute memory/CPU/IO between the connections? You only have X cores and it does not matter all that much if you have Y processes or Y threads, that won't make a huge difference. Similarly, if you have 100GB of RAM, it'll make a huge difference whether you have to divide that between 100 or 10000 connections, but not if those are threads or processes.