I'm sure we'll get a bunch of (well deserved) praise for PG here but, does anyone have a case where PG really shit the bed? (Besides the Uber one) (which is its own long thread)
I’m a huge fan of Postgres. This one is “user error”, but we still got bit pretty hard.
A query plan changed, on a frequently-run query (~1k/sec) on a large table (~2B rows) without warning. Went from sub-millisecond to multi-second.
The PG query planner is generally very good, but also very opaque. The statistics collected during an ANALYZE and used by the planner are subject to some significant caveats. Essentially, the planner would sometimes wildly mis-estimate costs due to under-sampling, and would choose a bad plan. We fixed it in two different ways: 1) lower the auto-ANALYZE threshold; 2) increase the number of rows sampled when collecting statistics for the relevant column.
Again, this was “user error”. That said, it will probably happen again on the same or another query, because it’s hard to know if/when a query plan is about to change, and pg_hint_plan and similar are very heavy-handed solutions.
I'm not even too bothered by the opaqueness of the query planner (although I'd love better visibility into it). But the fact that the query plan can change any second is insane: you can't lock it, and you can't force another one as a short-term fix.
There's no option that I know of. If you reach an impossible-to-anticipate threshold and the query plan changes, your whole system can be down and you can only fix forward, which might take a _long_ time to figure out and is super dangerous as you'll pretty much have to experiment on your prod database.
It's insane, I've not yet been bit too bad by it but I know it's coming for me.
Well, what exactly would you expect for better visibility into the planner? I mean, you have the source code, and I'm not sure how to visualize the extreme number of combinations considered by the planner. Any examples of databases doing interesting things?
As for the "locking" of plans, I personally have rather serious doubts about that. Yes, I've heard it suggested as a viable solution, but knowing how vastly different plans may be "right" for the same query with just slightly different parameters ...
> what exactly would you expect for better visibility into the planner
Dunno. At the moment I need a fairly deep understanding of how the planner works (eg how it uses statistics or indexes) to optimise queries, I'd love to be able to _see_ that rather than guess. Not saying it's easy, I'm just wishing
> As for the "locking" of plans, I personally have rather serious doubts about that. Yes, I've heard it suggested as a viable solution, but knowing how vastly different plans may be "right" for the same query with just slightly different parameters ...
What's the problem with vastly different plans being "right" for the same query?
All I am (and many other people are) asking for is a way to ensure PG doesn't bring down my entire system because it decided to change the query plan it uses without 1. any sort of warning 2. any way to revert it. It doesn't feel like it's asking for too much! Maybe locking plans is a good solution, maybe it's not, I'd just like _something_ that lets me sleep at night
Aurora PostgreSQL has something called Query Plan Management - which I like - is meant to address this type of issue especially for large tables that have key queries that you could blow up DB basically if they go haywire in planning.
Would def be a feature that would be nice to see in PostgreSQL itself.
It'd be nice if one could tell pg to only change its query plans during certain change windows, say, the first Saturday each month, with on call staff ready.
I'm currently having a similar issue where the query planner refuses to use the indexes on a search query (was fine for w hile, but one day it just started de-optimizing itself). Instead just does a seq-scan. Instead of the execution taking ~40ms with indexes the query planner thinks that the seq scan of ~1.5s is better...
Re-indexes the db and run analyze the table. It gets better for max 30min then PG de-optimizes itself again.
I'm kinda stuck on it, any ideas what can I do to resolve it?
Try lowering the random_page_cost value; this is the performance cost query planner uses for random reads, which is usually too high if you're using an SSD where random reads are cheap (on disks it's expensive). Just setting it to 1 works well in my case.
This solves many "it does a slow seq scan even though there's an index"-cases.
If using SSD or similar fast storage subsystem, or those that hide a higher random access time vs sequential, you may indeed want to reduce random_page_cost to make random_page_cost / seq_page_cost in the 1.2-1.5 range.
But it's also wise to review the default_statistics_target being used, that autovacuum is running frequently enough (which does autoanalyze), that the analyze thresholds are also properly tuned...
Thank you for mentioning https://postgresqlco.nf Team member here :) All these parameters mentioned here are well documented there, with recommendations.
Is it a HSTORE column with GIN index? The default "FASTUPDATE=ON" option will delay updates to the index until vacuum time, but if you don't vacuum soon enough suddenly it can decide it should sequentially scan instead of reading through the delayed updates.
This is behaviour I've seen on 9.x on the Aurora variant; for that the solution was to use the FASTUPDATE=OFF index storage option. You can see the delayed tuples by using "pgstatginindex" function.
Using some of the extra options of EXPLAIN (ANALYZE, BUFFERS, COSTS) might give more hints.
If not HSTORE/GIN, then it could be that the analyzer, after some auto-analyze of the table things that what you are asking for will match a significant number of the rows in the table. So there's no point in random seeking through an index because it thinks it needs to read e.g. 50% of the table anyway, so it might just as well not use the index.
set `enable_seqscan` = 'off' or set local `enable_seqscan` = 'off'. This will force the pg query planner to use indexes. Experiment with it until you figure out why your query performance deteriorates. Maybe you are doing a lot of updates/deletes? Increase the statistics sampling size? Autovacuum more frequently?
Because the user is using values that are no longer covered by the statistics. For example incrementing timestamp or id column.
If the stats are from yesterday and they say nothing about the frequency of todays timestamps the query will have to take a pessimistic view of the world. It might be that the data has radically changed since the last stats run, or not. Need to analyze the table to know and make optimal choices.
I'm not quite sure why you consider this "user error"? I work on the optimizer a bit, and I wouldn't say it's a fault of the user ...
OTOH I'm not sure it's a fault of the DB either :-( The statistics collected by ANALYZE are pretty much a lossy compressed version of the database, and so some details are missing - that's kinda the point of collecting the stats.
I'm not sure why lowering the autoanalyze threshold would fix this - it increases the frequency of stats updates, so my feeling is it makes it more likely to trigger similar issue. OTOH increasing the statistics target seems like the right thing to do (although it also keeps more accurate stats, not just increase the sample size).
I don't know if there are better solutions (both practical and in principle) :-(
Unfortunately, this is just the reality of using an RDBMS. I've seen similar behavior on Informix and SQL Server (with a smaller load than yours). They all occasionally generate suboptimal query plans. That's what your DBA is for.
SQL Server was somewhat notorious for it when migrating to 2014 because they rewrote the cardinality estimator. It generally worked better, but in some systems it really didn't. Some people ended up using a trace flag to use the legacy estimator. They have steadily improved the new estimator and it's no longer a problem, but it goes to show how much is going on under the surface.
It's the reality of Postgres, yes, but not all relational database. You mentioned SQL Server, which lets you lock in a query plan, specifically to cover the use case the parent described. When you have a Very Important frequently-run query that pulls from a monstrous table, it's nice to be able to sleep peacefully knowing the DB won't shit the bed because something completely unrelated changed someplace else in the database.
One fair criticism of Postgres (and many other open source projects) is that they can be a little too religious about how the thing should work in an ideal world (in this case, SQL being as declarative as possible), sometimes to the detriment of practicality and of making things easier for the business.
We update statistics weekly on SQL server. One week did did a particularly aggressive data cleanup and then ran stats which created a bad plan when the "tiny" table quickly grew.
> Unfortunately, this is just the reality of using an RDBMS. I've seen similar behavior on Informix and SQL Server (with a smaller load than yours). They all occasionally generate suboptimal query plans. That's what your DBA is for.
Other DBs let you lock in query plans or provide query hints, but postgres' developers are against either, which is not necessarily complete nonsense as it avoids users shooting themselves in the foot… but it also prevents users from digging themselves out of query planner stupidity.
PostgreSQL is robust across a wide range of applications but it does have some architectural sharp edges that can cause serious operational problems in practice if you run into them. Most of these only show up at scale. Only a few do not have any viable workaround in practice.
The worst one, in my experience, is that the statistics collector is architecturally broken for some large tables, which can cause the query planner to go haywire. As a consequence, query performance can be erratic, varying by a couple orders of magnitude each time the table is analyzed. There are no workarounds in many cases, and the architectural defect that causes it is decidedly non-trivial to address (otherwise it would have been fixed years ago).
This is the only defect that consistently rears its head in operational environments in my experience. As data volumes grow, more Postgres instances are experiencing this problem. At a minimum there needs to be a way to easily override the query planner when it goes insane.
I wonder if the planner can look at stats of full query execution in addition to the table stats. That way it can fall back to a previous plan (or previous table stats) if things get crazy.
No, we don't collect any optimizer stats during query execution. It's trickier that it seems, because (a) collecting the stats is actually pretty expensive, and (b) when using indexes, you may actually see just a tiny part of the data.
There have been some discussions about "learning" and correcting some of the estimates, but there was no patch so far.
Even if a full rework or overriding would be hard to implement, saving/loading statistics, or disabling the collector (after it already managed to do a decent run) couldn't be done? Or the nature/representation of the stats don't allow for this? (So they need constant updating, even if badly? But if stats only influence query plans then after good plans are found for the most common queries they could be persisted, right? Am I missing something?)
I think import/export of stats is pretty doable. Not a tiny amount of work, because of how many stats there may be (regular, extended), but I don't see any obvious major challenges ...
Similarly for disabling autoanalyze. We kinda already allow that for autovacuum (ALTER TABLE ... SET ...), so adding autoanalyze_enabled seems possible too. Or you could just set the threshold/scale_factor very high ...
I don't remember specifics, but that Uber article made it clear that they didn't understand how Postgres worked and made some very basic mistakes. The whole thing made them seem surprisingly incompetent, although the general crappiness of Uber apps maybe should have tipped me off sooner.
That said, Postgres have had a somewhat common problem with poorly selected defaults.
I remember that the performance tests back over 10 years ago between Postgres (v7-v9) and MySQL (~v5.0) always showed MySQL way ahead. For a long time people assumed the reason was because MyISAM (the then-default) isn't transactional. Except InnoDB was still faster. Okay, but by default MySQL didn't `fsync()` after writes. But it was still faster when you enabled that, too.
Turns out that Postgres's default memory configurations were either largely unchanged from v6 initial releases a decade earlier, or perhaps more accurately they're simply set to make the DB not a resource hog out of the box. Well, outside of the development environment, most people set up a dedicated server, and they want the server to be a resource hog! Most RDBMSs do that automatically. It's an odd choice.
It's been 6-7 years since I used the platform daily, but I would not be surprised if it was still that way. And given the complexity of the memory configuration [0] it's hard to NOT make "basic" mistakes.
In production workloads, it makes sense for a database to use up all available system resources. If I self-host Nitter on localhost and it runs Redis, or if I install KDE PIM apps which insist on hosting emails and such on a localhost MySQL database, I don't want them to eat all available resources. In my experience, Redis for Nitter, and a barely-used MySQL WordPress database, are quite lightweight in practice.
Unfortunately database design and database administration are entire fields of expertise unto themselves, and I've never seen a resource that covers 100% of the need-to-know info. Oracle, SQL Server, Postgres, and MySQL are all massive, complex, and old, and they have lots of hidden footguns in them.
You should start by knowing how other people design the type of DB you're building. A reporting DB is going to work very differently from a transactional DB. If you're using PG's excellent JSONB support as an alternative to Mongo, that's also something you need to consider at the design stage.
pgbouncer is very commonly used in PG setups, to the point where I don't understand why it isn't included in the default build.
I am really loving Postgres after a foray with Oracle. But I get concerned about using it for large production loads. I welcome recommendations as well.
My DB knowledge is too patchy to give a good summary, but it was extensively discussed on HN several times[1].
In general it seems like they could have solved all of their problems within Postgres, but someone at Uber seemed to just know MySQL better and want to switch. They also made other major changes (like removing schemas from certain data stores) that likely gave them more of an advantage than switching could have.
I remember the article and the discussion, but the fundamental issue seemed to be the difference between MySQL and Postgres indices and the different trade-offs made. Postgres indices are better for read heavy workloads, and MySQL for write heavy. I don't see how this is representative of them not knowing what they are doing. Frankly it's condescending to think you understand their engineering problems better than they do.
It's not a limitation that affects most of users, but lack of official multi master support makes it less suitable for "always on" databases. But to be honest multi master is something hard to get right in ANY database. PostgreSQL is great for many use cases and good enough for many more!
Lately I've been getting a lot of notifications for BDR (short for Bi-Directional Replication) from EDB; the term 'Always On' also appears in those notifications, apparently to woo the Oracle audiences. You might want to take a look into that.
Disclosures/info: I used to work for EDB. BDR was developed by 2nd Quadrant, which was recently acquired by EDB.
The basic HA support is there, and is very well described in the docs. What Postgres doesn't provide is the mechanism to pick when to automatically promote a replica to primary, but that's because this is a system-level concern that's entirely outside the database featureset itself.
If for always on or High Availability setups you want two DBs running that are in constant communication, with every write being committed on both machines synchronously - so if one of them goes down the other can take its place immediately with no downtime.
Postgres has no built in support yet, but tools like STOLON exist.
PostgreSQL has master-slave replication (one read/write master, and multiple read-only standbys), the other users are talking about multi-master replication (all the databases are read/write).
I'm aware. What I'm asking is why multi master is required. You can have a read only slave that is promoted to the master. It's that little bit of downtime during failover in question? Doesn't pgpool make that possible without downtime?
If you use multi master and DNS failover you still have partial downtime for every session routed to the unhealthy master. That doesn't seem to solve the problem, just a little better.
Yup. The synchronous replication described in the docs will allow the configuration of a hot standby, but I don’t think it allows writing to the standby, which is what true master-master does.
Transaction id wraparound. Transaction ids are 32 bit and they are used in a ring like manner, so a transaction id can only be reused if there are no open transactions before it. If you (accidentally) keep a transaction open for a long time, then after a while you run out of transaction ids, and it is tough to recover from, because management operations themselves involve creating transactions.
PostgreSQL has a lot of really great features, but some things are seriously underdeveloped as well, mostly because missing man power. You cannot reorder columns of a table easily, it does not support compression of connections (yet), XID-wraparound as others have already said.
I would also personally love map-reduce indexes (for example for efficiently calculating sums or other aggregate functions...)
Until recent versions CTEs could be very inefficient where they would not be in other databases, though that has now been addressed unless you are running an older release. I'm can't think of any major data loss stories, though there have been one or two "why we switched from pg to <whatever>" ones.
I love postgres, but foreign data wrappers are not really good for much production use in my experience. The query planners need more work, as sometimes they do inconsistent things. The biggest issues is not taking advantage of indices sometimes. Hoping it's solid by version 15 or so.
Shit the bed worse than Oracle? Not in my experience, unless it's from abuse (storing 1000x 128mb blob per single row, making 100000 simultaneous connections+transactions on a 1-cpu instance, etc).
Most of this seems to originate from people treating PG as "magic", which it is....... until it isn't
Can't you, though? I think I've seen several apps doing it, making the set-up and administration of the cluster abstracted away and transparent to the user (I can think of KDE's Akonadi suite for instance).
That doesn't make the whole DB be contained in a single file, sure, but PG can then serve as the data backend of a versatile lot of applications