Interesting that the graph goes up to 10k connections without any cliff in throughput, I wonder what use cases people have beyond that. Or maybe the issue with idling connections is memory usage?
Yeah, the absence of a sudden cliff is very nice. I think we fixed the main causes back in ~9.5. But the gradient at the end is pretty clear, and even at 10k the throughput is already less than 50% of the max. And it's dropping faster and faster.
It's also important to keep in mind this is an extremely simple workload, essentially just read-only point queries, on about 3GB of data.
Postgres is improving rapidly. I started developing my own DB a few years ago, but decided to do something else. One of the main reasons was because of the progress this DB is making on all fronts.
When does a database need to maintain large number of open connections? You really don't need a lot of connections to serve high throughput. The database system typically becomes a bottleneck before the connections do. A good client-side pooling implementation will manage and limit connection usage.
Say you have something really simple, web -> db. But at a minimum, you have 2 web servers for HA. So you're already at 2 client-side pools.
Now make it a tiny bit more complex and split this into 2 services, so you're now at 4 client-side pools. Should we make it 4 services instead? Or do you see where I'm going with this? (then add ad-hoc scripts / crons, ...). What if we do make it 4 services and some of them are being deployed dynamically, and at peak, you might have 8 instances of some of them.
Now, say you're at the point where you want to run some on-demand reporting queries on your DB, but you're only talking about tables with 10-100million rows - not quite the point where you want to manage a reporting-specific database. So maybe you need a slightly larger pool because some of the queries can take over a minute.
Now, work_mem is a global setting, but it's super important and you really want to set it to 16MB (but 32MB would be ideal for a few of your cases). Drats, now you're getting squeezed on the most expensive hardware component: memory.
Well, 32MB * 400 clients ~= 12GB. It's not an irrelevant amount, but if you have that much load on your DB, you should have a beefy machine anyway, and that for one of the 2 main memorysets isn't any extravagant (and the shared memory is much smaller). In fact, I wouldn't even try to run SQL Server on that amount (not to talk about Oracle).
But yeah, if you go full microservices automatically deployed over kubermenets all connecting to the same place, you will need something in between so your DB server doesn't get crazy. This setup would break any central DB anyway, it's just that different DBMS would break for different reasons.
But no one says you need a separate connection pool for each client application. There are cases when it's the right thing, but you can just as well have a single connection pool for each PostgreSQL instance (and all apps will go through that). So you absolutely don't need "8 instances" at peak or whatever.
The problem with Postgres vs. many connections is a simple combination of two simple facts.
Firstly, Postgres has a dedicated process managing each connection - which significantly simplifies some of the programming because most resources are "private" etc. This made perfect sense back when machines very few cores, threading was much more expensive, etc. And development time is always precious, of course.
Secondly, people often don't distinguish between active and idle connections. But a system with X cores reaches top throughput at 2X-3X active connections, and at some point the throughput tanks because the active backends have to share resources (CPU, work_mem, etc.).
And the database and/or DBA has to manage that somehow - if you have 128GB of RAM available for query processing, it matters if you allow 100 or 10000 connections. With 100x more connections you can't allow the backends to use as much memory, which probably means less efficient sorts etc. You may assume most connections will be idle at any given time, but that may easily change (a bug in new app version or whatever), storming the DB, exhausting resources, etc.
Moreover, some (fairly hot) parts of the transaction management code need to walk active connections to check visibility etc, and the more connections you have the more expensive this is. And there are probably more places with similar behavior. We've fixed / optimized lot of them, because the number of cores is growing and it was hurting even "reasonably" configured instances, but that gets you maybe to 500 - 1000 connections (?), not 100k.
And I don't think that'll change anytime soon, so your best bet is still a connection pool.
Now, I'm not claiming it's ideal - but there simply are historical and practical reasons why things are designed the way they are, and it's not a matter of swooping in and optimizing one or two places.
At that point you really want to front all database access via a service otherwise it's calling for other problems with schema management, security, resource quotas etc.
You can even have separate pools for different use cases like reporting if you don't want a separate server. But each pool really doesn't need more than like 5 connections, even with multiple pools and multiple replicas of your service it will rarely add up to the hundreds.
Modern web frameworks generate a lot of connections.
Let's say, a typical application may run on 4 instances each one with 16 cores. That means a total of 64 processes because one process per core.
Each process opens 16 connections to the database because bad configuration or default to the number of cores. We're now facing a total of 1024 SQL connections out-of-the-box for nothing.
That's the reason databases are commonly subjected to insane amounts of connections.
The fact that some other part of the software stack does something silly does not mean the database has to cater for that. If it's a misconfiguration, fix the misconfiguration.
Is this really such a problem? I feel like you'd want pgbouncer anyway in order to mitigate connection establishment latency. Like, in my infrastructure I generally have not one but two layers of pgbouncer, one on either side of the network, so the topology of underlying network connections could be reasonably static.
Well, I'd argue that's more an issue of the web application. If you're on a system with limited resources, and the webapp insists on opening hundreds of connections, assuming they're free of charge, it's a bit silly.
Anyway, this is unlikely to change anytime soon, given the Postgres connection/process model. We're continuously improving things, but each connection has some non-negligible costs (CPU, memory), so in those cases a connection pool is still a good idea.
Sometimes this is a symptom of an oversized connection pool at the application level. It might be worth monitoring for how many of your connections are actively used Vs idle - you might be surprised
Otherwise I find pgbouncer to be a good solution when it's needed
PostgreSQL made a poor decision to use processes instead of threads for connections. Processes are much more expensive, so you can't create as many connections as in other DBMSes.
Sorry, but that ignores about 99% of the context when that decision was done.
Postgres started in early 90s (1996 is the first open source release). We may have fast threading libraries now, but that was not the case when the decision was made. Moreover, Postgres aims to support a wide range of Unix-like platforms, and the maturity of threading varies quite a lot.
Plus the processes are often easier to develop with, exactly because they share less state, are easier to debug, etc. They may be more expensive to create, but that's mostly irrelevant for long-running connections. Threads may allow sharing more state, but that requires more locking which is not great either. And debugging with threads ... ewwww.
And it's not like switching to threads magically makes things cheaper. There are benchmarks comparing Postgres with databases using threads, and it's not like Postgres loses. See for example https://www.percona.com/blog/2017/01/06/millions-queries-per...
Now, I'm certainly not claiming processes are perfect, but presenting them as obviously "poor decision" is just wrong.
There's a bunch of reasons why processes vs. threads is ultimately not the main issue. For a general purpose database (like Postgres), the big challenge of supporting large number of connections is pretty much resource management - how do you distribute memory/CPU/IO between the connections? You only have X cores and it does not matter all that much if you have Y processes or Y threads, that won't make a huge difference. Similarly, if you have 100GB of RAM, it'll make a huge difference whether you have to divide that between 100 or 10000 connections, but not if those are threads or processes.