Interesting that the graph goes up to 10k connections without any cliff in throughput, I wonder what use cases people have beyond that. Or maybe the issue with idling connections is memory usage?
Yeah, the absence of a sudden cliff is very nice. I think we fixed the main causes back in ~9.5. But the gradient at the end is pretty clear, and even at 10k the throughput is already less than 50% of the max. And it's dropping faster and faster.
It's also important to keep in mind this is an extremely simple workload, essentially just read-only point queries, on about 3GB of data.
Postgres is improving rapidly. I started developing my own DB a few years ago, but decided to do something else. One of the main reasons was because of the progress this DB is making on all fronts.
When does a database need to maintain large number of open connections? You really don't need a lot of connections to serve high throughput. The database system typically becomes a bottleneck before the connections do. A good client-side pooling implementation will manage and limit connection usage.
Say you have something really simple, web -> db. But at a minimum, you have 2 web servers for HA. So you're already at 2 client-side pools.
Now make it a tiny bit more complex and split this into 2 services, so you're now at 4 client-side pools. Should we make it 4 services instead? Or do you see where I'm going with this? (then add ad-hoc scripts / crons, ...). What if we do make it 4 services and some of them are being deployed dynamically, and at peak, you might have 8 instances of some of them.
Now, say you're at the point where you want to run some on-demand reporting queries on your DB, but you're only talking about tables with 10-100million rows - not quite the point where you want to manage a reporting-specific database. So maybe you need a slightly larger pool because some of the queries can take over a minute.
Now, work_mem is a global setting, but it's super important and you really want to set it to 16MB (but 32MB would be ideal for a few of your cases). Drats, now you're getting squeezed on the most expensive hardware component: memory.
Well, 32MB * 400 clients ~= 12GB. It's not an irrelevant amount, but if you have that much load on your DB, you should have a beefy machine anyway, and that for one of the 2 main memorysets isn't any extravagant (and the shared memory is much smaller). In fact, I wouldn't even try to run SQL Server on that amount (not to talk about Oracle).
But yeah, if you go full microservices automatically deployed over kubermenets all connecting to the same place, you will need something in between so your DB server doesn't get crazy. This setup would break any central DB anyway, it's just that different DBMS would break for different reasons.
But no one says you need a separate connection pool for each client application. There are cases when it's the right thing, but you can just as well have a single connection pool for each PostgreSQL instance (and all apps will go through that). So you absolutely don't need "8 instances" at peak or whatever.
The problem with Postgres vs. many connections is a simple combination of two simple facts.
Firstly, Postgres has a dedicated process managing each connection - which significantly simplifies some of the programming because most resources are "private" etc. This made perfect sense back when machines very few cores, threading was much more expensive, etc. And development time is always precious, of course.
Secondly, people often don't distinguish between active and idle connections. But a system with X cores reaches top throughput at 2X-3X active connections, and at some point the throughput tanks because the active backends have to share resources (CPU, work_mem, etc.).
And the database and/or DBA has to manage that somehow - if you have 128GB of RAM available for query processing, it matters if you allow 100 or 10000 connections. With 100x more connections you can't allow the backends to use as much memory, which probably means less efficient sorts etc. You may assume most connections will be idle at any given time, but that may easily change (a bug in new app version or whatever), storming the DB, exhausting resources, etc.
Moreover, some (fairly hot) parts of the transaction management code need to walk active connections to check visibility etc, and the more connections you have the more expensive this is. And there are probably more places with similar behavior. We've fixed / optimized lot of them, because the number of cores is growing and it was hurting even "reasonably" configured instances, but that gets you maybe to 500 - 1000 connections (?), not 100k.
And I don't think that'll change anytime soon, so your best bet is still a connection pool.
Now, I'm not claiming it's ideal - but there simply are historical and practical reasons why things are designed the way they are, and it's not a matter of swooping in and optimizing one or two places.
At that point you really want to front all database access via a service otherwise it's calling for other problems with schema management, security, resource quotas etc.
You can even have separate pools for different use cases like reporting if you don't want a separate server. But each pool really doesn't need more than like 5 connections, even with multiple pools and multiple replicas of your service it will rarely add up to the hundreds.
Modern web frameworks generate a lot of connections.
Let's say, a typical application may run on 4 instances each one with 16 cores. That means a total of 64 processes because one process per core.
Each process opens 16 connections to the database because bad configuration or default to the number of cores. We're now facing a total of 1024 SQL connections out-of-the-box for nothing.
That's the reason databases are commonly subjected to insane amounts of connections.
The fact that some other part of the software stack does something silly does not mean the database has to cater for that. If it's a misconfiguration, fix the misconfiguration.
Is this really such a problem? I feel like you'd want pgbouncer anyway in order to mitigate connection establishment latency. Like, in my infrastructure I generally have not one but two layers of pgbouncer, one on either side of the network, so the topology of underlying network connections could be reasonably static.
Well, I'd argue that's more an issue of the web application. If you're on a system with limited resources, and the webapp insists on opening hundreds of connections, assuming they're free of charge, it's a bit silly.
Anyway, this is unlikely to change anytime soon, given the Postgres connection/process model. We're continuously improving things, but each connection has some non-negligible costs (CPU, memory), so in those cases a connection pool is still a good idea.
Sometimes this is a symptom of an oversized connection pool at the application level. It might be worth monitoring for how many of your connections are actively used Vs idle - you might be surprised
Otherwise I find pgbouncer to be a good solution when it's needed
PostgreSQL made a poor decision to use processes instead of threads for connections. Processes are much more expensive, so you can't create as many connections as in other DBMSes.
Sorry, but that ignores about 99% of the context when that decision was done.
Postgres started in early 90s (1996 is the first open source release). We may have fast threading libraries now, but that was not the case when the decision was made. Moreover, Postgres aims to support a wide range of Unix-like platforms, and the maturity of threading varies quite a lot.
Plus the processes are often easier to develop with, exactly because they share less state, are easier to debug, etc. They may be more expensive to create, but that's mostly irrelevant for long-running connections. Threads may allow sharing more state, but that requires more locking which is not great either. And debugging with threads ... ewwww.
And it's not like switching to threads magically makes things cheaper. There are benchmarks comparing Postgres with databases using threads, and it's not like Postgres loses. See for example https://www.percona.com/blog/2017/01/06/millions-queries-per...
Now, I'm certainly not claiming processes are perfect, but presenting them as obviously "poor decision" is just wrong.
There's a bunch of reasons why processes vs. threads is ultimately not the main issue. For a general purpose database (like Postgres), the big challenge of supporting large number of connections is pretty much resource management - how do you distribute memory/CPU/IO between the connections? You only have X cores and it does not matter all that much if you have Y processes or Y threads, that won't make a huge difference. Similarly, if you have 100GB of RAM, it'll make a huge difference whether you have to divide that between 100 or 10000 connections, but not if those are threads or processes.
I'm sure we'll get a bunch of (well deserved) praise for PG here but, does anyone have a case where PG really shit the bed? (Besides the Uber one) (which is its own long thread)
I’m a huge fan of Postgres. This one is “user error”, but we still got bit pretty hard.
A query plan changed, on a frequently-run query (~1k/sec) on a large table (~2B rows) without warning. Went from sub-millisecond to multi-second.
The PG query planner is generally very good, but also very opaque. The statistics collected during an ANALYZE and used by the planner are subject to some significant caveats. Essentially, the planner would sometimes wildly mis-estimate costs due to under-sampling, and would choose a bad plan. We fixed it in two different ways: 1) lower the auto-ANALYZE threshold; 2) increase the number of rows sampled when collecting statistics for the relevant column.
Again, this was “user error”. That said, it will probably happen again on the same or another query, because it’s hard to know if/when a query plan is about to change, and pg_hint_plan and similar are very heavy-handed solutions.
I'm not even too bothered by the opaqueness of the query planner (although I'd love better visibility into it). But the fact that the query plan can change any second is insane: you can't lock it, and you can't force another one as a short-term fix.
There's no option that I know of. If you reach an impossible-to-anticipate threshold and the query plan changes, your whole system can be down and you can only fix forward, which might take a _long_ time to figure out and is super dangerous as you'll pretty much have to experiment on your prod database.
It's insane, I've not yet been bit too bad by it but I know it's coming for me.
Well, what exactly would you expect for better visibility into the planner? I mean, you have the source code, and I'm not sure how to visualize the extreme number of combinations considered by the planner. Any examples of databases doing interesting things?
As for the "locking" of plans, I personally have rather serious doubts about that. Yes, I've heard it suggested as a viable solution, but knowing how vastly different plans may be "right" for the same query with just slightly different parameters ...
> what exactly would you expect for better visibility into the planner
Dunno. At the moment I need a fairly deep understanding of how the planner works (eg how it uses statistics or indexes) to optimise queries, I'd love to be able to _see_ that rather than guess. Not saying it's easy, I'm just wishing
> As for the "locking" of plans, I personally have rather serious doubts about that. Yes, I've heard it suggested as a viable solution, but knowing how vastly different plans may be "right" for the same query with just slightly different parameters ...
What's the problem with vastly different plans being "right" for the same query?
All I am (and many other people are) asking for is a way to ensure PG doesn't bring down my entire system because it decided to change the query plan it uses without 1. any sort of warning 2. any way to revert it. It doesn't feel like it's asking for too much! Maybe locking plans is a good solution, maybe it's not, I'd just like _something_ that lets me sleep at night
Aurora PostgreSQL has something called Query Plan Management - which I like - is meant to address this type of issue especially for large tables that have key queries that you could blow up DB basically if they go haywire in planning.
Would def be a feature that would be nice to see in PostgreSQL itself.
It'd be nice if one could tell pg to only change its query plans during certain change windows, say, the first Saturday each month, with on call staff ready.
I'm currently having a similar issue where the query planner refuses to use the indexes on a search query (was fine for w hile, but one day it just started de-optimizing itself). Instead just does a seq-scan. Instead of the execution taking ~40ms with indexes the query planner thinks that the seq scan of ~1.5s is better...
Re-indexes the db and run analyze the table. It gets better for max 30min then PG de-optimizes itself again.
I'm kinda stuck on it, any ideas what can I do to resolve it?
Try lowering the random_page_cost value; this is the performance cost query planner uses for random reads, which is usually too high if you're using an SSD where random reads are cheap (on disks it's expensive). Just setting it to 1 works well in my case.
This solves many "it does a slow seq scan even though there's an index"-cases.
If using SSD or similar fast storage subsystem, or those that hide a higher random access time vs sequential, you may indeed want to reduce random_page_cost to make random_page_cost / seq_page_cost in the 1.2-1.5 range.
But it's also wise to review the default_statistics_target being used, that autovacuum is running frequently enough (which does autoanalyze), that the analyze thresholds are also properly tuned...
Thank you for mentioning https://postgresqlco.nf Team member here :) All these parameters mentioned here are well documented there, with recommendations.
Is it a HSTORE column with GIN index? The default "FASTUPDATE=ON" option will delay updates to the index until vacuum time, but if you don't vacuum soon enough suddenly it can decide it should sequentially scan instead of reading through the delayed updates.
This is behaviour I've seen on 9.x on the Aurora variant; for that the solution was to use the FASTUPDATE=OFF index storage option. You can see the delayed tuples by using "pgstatginindex" function.
Using some of the extra options of EXPLAIN (ANALYZE, BUFFERS, COSTS) might give more hints.
If not HSTORE/GIN, then it could be that the analyzer, after some auto-analyze of the table things that what you are asking for will match a significant number of the rows in the table. So there's no point in random seeking through an index because it thinks it needs to read e.g. 50% of the table anyway, so it might just as well not use the index.
set `enable_seqscan` = 'off' or set local `enable_seqscan` = 'off'. This will force the pg query planner to use indexes. Experiment with it until you figure out why your query performance deteriorates. Maybe you are doing a lot of updates/deletes? Increase the statistics sampling size? Autovacuum more frequently?
Because the user is using values that are no longer covered by the statistics. For example incrementing timestamp or id column.
If the stats are from yesterday and they say nothing about the frequency of todays timestamps the query will have to take a pessimistic view of the world. It might be that the data has radically changed since the last stats run, or not. Need to analyze the table to know and make optimal choices.
I'm not quite sure why you consider this "user error"? I work on the optimizer a bit, and I wouldn't say it's a fault of the user ...
OTOH I'm not sure it's a fault of the DB either :-( The statistics collected by ANALYZE are pretty much a lossy compressed version of the database, and so some details are missing - that's kinda the point of collecting the stats.
I'm not sure why lowering the autoanalyze threshold would fix this - it increases the frequency of stats updates, so my feeling is it makes it more likely to trigger similar issue. OTOH increasing the statistics target seems like the right thing to do (although it also keeps more accurate stats, not just increase the sample size).
I don't know if there are better solutions (both practical and in principle) :-(
Unfortunately, this is just the reality of using an RDBMS. I've seen similar behavior on Informix and SQL Server (with a smaller load than yours). They all occasionally generate suboptimal query plans. That's what your DBA is for.
SQL Server was somewhat notorious for it when migrating to 2014 because they rewrote the cardinality estimator. It generally worked better, but in some systems it really didn't. Some people ended up using a trace flag to use the legacy estimator. They have steadily improved the new estimator and it's no longer a problem, but it goes to show how much is going on under the surface.
It's the reality of Postgres, yes, but not all relational database. You mentioned SQL Server, which lets you lock in a query plan, specifically to cover the use case the parent described. When you have a Very Important frequently-run query that pulls from a monstrous table, it's nice to be able to sleep peacefully knowing the DB won't shit the bed because something completely unrelated changed someplace else in the database.
One fair criticism of Postgres (and many other open source projects) is that they can be a little too religious about how the thing should work in an ideal world (in this case, SQL being as declarative as possible), sometimes to the detriment of practicality and of making things easier for the business.
We update statistics weekly on SQL server. One week did did a particularly aggressive data cleanup and then ran stats which created a bad plan when the "tiny" table quickly grew.
> Unfortunately, this is just the reality of using an RDBMS. I've seen similar behavior on Informix and SQL Server (with a smaller load than yours). They all occasionally generate suboptimal query plans. That's what your DBA is for.
Other DBs let you lock in query plans or provide query hints, but postgres' developers are against either, which is not necessarily complete nonsense as it avoids users shooting themselves in the foot… but it also prevents users from digging themselves out of query planner stupidity.
PostgreSQL is robust across a wide range of applications but it does have some architectural sharp edges that can cause serious operational problems in practice if you run into them. Most of these only show up at scale. Only a few do not have any viable workaround in practice.
The worst one, in my experience, is that the statistics collector is architecturally broken for some large tables, which can cause the query planner to go haywire. As a consequence, query performance can be erratic, varying by a couple orders of magnitude each time the table is analyzed. There are no workarounds in many cases, and the architectural defect that causes it is decidedly non-trivial to address (otherwise it would have been fixed years ago).
This is the only defect that consistently rears its head in operational environments in my experience. As data volumes grow, more Postgres instances are experiencing this problem. At a minimum there needs to be a way to easily override the query planner when it goes insane.
I wonder if the planner can look at stats of full query execution in addition to the table stats. That way it can fall back to a previous plan (or previous table stats) if things get crazy.
No, we don't collect any optimizer stats during query execution. It's trickier that it seems, because (a) collecting the stats is actually pretty expensive, and (b) when using indexes, you may actually see just a tiny part of the data.
There have been some discussions about "learning" and correcting some of the estimates, but there was no patch so far.
Even if a full rework or overriding would be hard to implement, saving/loading statistics, or disabling the collector (after it already managed to do a decent run) couldn't be done? Or the nature/representation of the stats don't allow for this? (So they need constant updating, even if badly? But if stats only influence query plans then after good plans are found for the most common queries they could be persisted, right? Am I missing something?)
I think import/export of stats is pretty doable. Not a tiny amount of work, because of how many stats there may be (regular, extended), but I don't see any obvious major challenges ...
Similarly for disabling autoanalyze. We kinda already allow that for autovacuum (ALTER TABLE ... SET ...), so adding autoanalyze_enabled seems possible too. Or you could just set the threshold/scale_factor very high ...
I don't remember specifics, but that Uber article made it clear that they didn't understand how Postgres worked and made some very basic mistakes. The whole thing made them seem surprisingly incompetent, although the general crappiness of Uber apps maybe should have tipped me off sooner.
That said, Postgres have had a somewhat common problem with poorly selected defaults.
I remember that the performance tests back over 10 years ago between Postgres (v7-v9) and MySQL (~v5.0) always showed MySQL way ahead. For a long time people assumed the reason was because MyISAM (the then-default) isn't transactional. Except InnoDB was still faster. Okay, but by default MySQL didn't `fsync()` after writes. But it was still faster when you enabled that, too.
Turns out that Postgres's default memory configurations were either largely unchanged from v6 initial releases a decade earlier, or perhaps more accurately they're simply set to make the DB not a resource hog out of the box. Well, outside of the development environment, most people set up a dedicated server, and they want the server to be a resource hog! Most RDBMSs do that automatically. It's an odd choice.
It's been 6-7 years since I used the platform daily, but I would not be surprised if it was still that way. And given the complexity of the memory configuration [0] it's hard to NOT make "basic" mistakes.
In production workloads, it makes sense for a database to use up all available system resources. If I self-host Nitter on localhost and it runs Redis, or if I install KDE PIM apps which insist on hosting emails and such on a localhost MySQL database, I don't want them to eat all available resources. In my experience, Redis for Nitter, and a barely-used MySQL WordPress database, are quite lightweight in practice.
Unfortunately database design and database administration are entire fields of expertise unto themselves, and I've never seen a resource that covers 100% of the need-to-know info. Oracle, SQL Server, Postgres, and MySQL are all massive, complex, and old, and they have lots of hidden footguns in them.
You should start by knowing how other people design the type of DB you're building. A reporting DB is going to work very differently from a transactional DB. If you're using PG's excellent JSONB support as an alternative to Mongo, that's also something you need to consider at the design stage.
pgbouncer is very commonly used in PG setups, to the point where I don't understand why it isn't included in the default build.
I am really loving Postgres after a foray with Oracle. But I get concerned about using it for large production loads. I welcome recommendations as well.
My DB knowledge is too patchy to give a good summary, but it was extensively discussed on HN several times[1].
In general it seems like they could have solved all of their problems within Postgres, but someone at Uber seemed to just know MySQL better and want to switch. They also made other major changes (like removing schemas from certain data stores) that likely gave them more of an advantage than switching could have.
I remember the article and the discussion, but the fundamental issue seemed to be the difference between MySQL and Postgres indices and the different trade-offs made. Postgres indices are better for read heavy workloads, and MySQL for write heavy. I don't see how this is representative of them not knowing what they are doing. Frankly it's condescending to think you understand their engineering problems better than they do.
It's not a limitation that affects most of users, but lack of official multi master support makes it less suitable for "always on" databases. But to be honest multi master is something hard to get right in ANY database. PostgreSQL is great for many use cases and good enough for many more!
Lately I've been getting a lot of notifications for BDR (short for Bi-Directional Replication) from EDB; the term 'Always On' also appears in those notifications, apparently to woo the Oracle audiences. You might want to take a look into that.
Disclosures/info: I used to work for EDB. BDR was developed by 2nd Quadrant, which was recently acquired by EDB.
The basic HA support is there, and is very well described in the docs. What Postgres doesn't provide is the mechanism to pick when to automatically promote a replica to primary, but that's because this is a system-level concern that's entirely outside the database featureset itself.
If for always on or High Availability setups you want two DBs running that are in constant communication, with every write being committed on both machines synchronously - so if one of them goes down the other can take its place immediately with no downtime.
Postgres has no built in support yet, but tools like STOLON exist.
PostgreSQL has master-slave replication (one read/write master, and multiple read-only standbys), the other users are talking about multi-master replication (all the databases are read/write).
I'm aware. What I'm asking is why multi master is required. You can have a read only slave that is promoted to the master. It's that little bit of downtime during failover in question? Doesn't pgpool make that possible without downtime?
If you use multi master and DNS failover you still have partial downtime for every session routed to the unhealthy master. That doesn't seem to solve the problem, just a little better.
Yup. The synchronous replication described in the docs will allow the configuration of a hot standby, but I don’t think it allows writing to the standby, which is what true master-master does.
Transaction id wraparound. Transaction ids are 32 bit and they are used in a ring like manner, so a transaction id can only be reused if there are no open transactions before it. If you (accidentally) keep a transaction open for a long time, then after a while you run out of transaction ids, and it is tough to recover from, because management operations themselves involve creating transactions.
PostgreSQL has a lot of really great features, but some things are seriously underdeveloped as well, mostly because missing man power. You cannot reorder columns of a table easily, it does not support compression of connections (yet), XID-wraparound as others have already said.
I would also personally love map-reduce indexes (for example for efficiently calculating sums or other aggregate functions...)
Until recent versions CTEs could be very inefficient where they would not be in other databases, though that has now been addressed unless you are running an older release. I'm can't think of any major data loss stories, though there have been one or two "why we switched from pg to <whatever>" ones.
I love postgres, but foreign data wrappers are not really good for much production use in my experience. The query planners need more work, as sometimes they do inconsistent things. The biggest issues is not taking advantage of indices sometimes. Hoping it's solid by version 15 or so.
Shit the bed worse than Oracle? Not in my experience, unless it's from abuse (storing 1000x 128mb blob per single row, making 100000 simultaneous connections+transactions on a 1-cpu instance, etc).
Most of this seems to originate from people treating PG as "magic", which it is....... until it isn't
Can't you, though? I think I've seen several apps doing it, making the set-up and administration of the cluster abstracted away and transparent to the user (I can think of KDE's Akonadi suite for instance).
That doesn't make the whole DB be contained in a single file, sure, but PG can then serve as the data backend of a versatile lot of applications