This is how you make your users upset. Speaking personally I have personally enc...

rbranson · on Feb 4, 2011

I have written quite a few very complex long-running queries with 10-12 tables involved and 4-5 deep nested SELECTs in PostgreSQL and never encountered a case where the query planner was just plain wrong. Most of this work started on 8.2.

It's usually a poorly configured server that causes the problem. The default configuration is almost absurdly conservative. For instance, in 8.3, the default_statistics_target is 10, which is brutally low. For larger indexes and/or more complex operations, this can really hamper the query planner.

When building a new PostgreSQL server, I highly recommend seeking the services of a respected consultant that can help properly tune the server for your hardware & query load. A few hours of their time will net you weeks of time savings, even if you think of yourself as an advanced PostgreSQL user, especially if you're building complex queries or working with larger data sets.

EDIT: typo on default_statistics_target (was 100).

btilly · on Feb 4, 2011

I don't doubt that a competent person could have improved the configuration substantially.

However I didn't get to choose the server, it already existed. Furthermore I was dealing with a restored backup of a production webserver, so it was configured appropriately for OLTP even though I was using it for complex reporting. There was simply no way that anyone was going to risk the possibility of production problems to make my life on the back end any easier. And there was no way that I could in good conscience recommend that they do so.

And I did find my temp table workaround, that effectively let me force any query plan I needed.

rbranson · on Feb 4, 2011

I've definitely been in these types of situations, having to make things work in a hostile environment or to meet a deadline. I am just recommending an option to avoid these types of situations for those who are able to seek out the best practice before starting.

I might also speculate that if you were writing analytics queries that ran scans across several, large, heavily contended production database tables, it's likely that dumping the data into a temporary table first might have been the best option even with a highly tuned server. Despite the fantasy we're often told about MVCC, there are still enormous amounts of resources wasted doing large analytics-type scans on live, hot tables. This isn't something that could be solved with a query plan hint and it's unlikely even a smarter query planner could have saved you.

If I had a magic wand, my environment of choice in this situation would be a separate dedicated, read-only slave running on ZFS that would allow me to create filesystem snapshots & bring up a PostgreSQL instance on that snapshot.

btilly · on Feb 4, 2011

I assure you that contention was not an issue on a restored backup. At that point PostgreSQL did not have the option to have read-only slaves. But for reporting, day old data was acceptable so that wasn't a problem.

It still doesn't solve the problem that you really want to tune differently for OLTP and data warehousing, and the server was tuned for OLTP.

rbranson · on Feb 4, 2011

I'll definitely agree that attempting to run OLTP and analytics queries on a single PostgreSQL server is not going to turn out well. I'm not sure if there's a way around it though, even in theory (e.g. MapReduce isn't very good for OLTP).

btilly · on Feb 4, 2011

I made it work acceptably well for my use case. But the query optimizer sometimes was doing stupid things.

danudey · on Feb 4, 2011

I've seen this kind of attitude a fair bit from postgres users and admins, especially as relates to replication. I was told it was a feature I didn't need, didn't really want, didn't understand, etc. I could accomplish the same thing by rsync, there were third-party patches to do it, etc. None of them actually solved the problem of wanting a live, hot spare that I could do reporting or reads against. Eventually they got defensive and hostile, accusing me of trolling or being a MySQL fanboy.

Now, of course, Postgres had a rudimentary replication implementation, and it's a great new feature and we're all excited about it, etc.

I'm not trying to say anything I'll about Postgres itself, or even the core developers, but this attitude doesn't always help.

jtbigwoo · on Feb 4, 2011

> This is how you make your users upset.

More like he's tired of upsetting users on an interactive, case-by-case basis and has decided to move to batch.

fendale · on Feb 4, 2011

I agree with this so much I wish I could upvote it more!

I'm a very experienced Oracle developer and Oracle does a really good job, but I need hintsto fix problems more often that I would like, especially on complex reporting queries.

brlewis · on Feb 4, 2011

Doesn't the last part of the article, proposing a system to selectively adjust cost calculations, acknowledge the existence of people like you?

btilly · on Feb 4, 2011

No. Because in my particular situation I did not have DBA access, and I was running queries against a restored backup of an OLTP system. The risk/reward is such that it made absolutely no sense to try to tweak things to make my queries run faster at the potential cost of doing who knows what damage to the queries that were running in the OLTP system.

Therefore, even if I was still in that reporting role, I would never, ever even CONSIDER playing with the knob they gave me. And even were I so reckless, no sane DBA should allow it.

brlewis · on Feb 4, 2011

This snippet from the article makes me think their proposal only affects one query at a time:

For example, one which allowed you to put "SELECTIVITY 0.1" after a WHERE clause

btilly · on Feb 4, 2011

Oops, you're right.

Let me see. I can peer inside of a running black box and twiddle knobs that do something I don't understand until I magically get the right result. Then I can cross my fingers and hope it doesn't change out from under me.

Why does this not sound appealing?

earl · on Feb 4, 2011

Here's the real reason: this dev is an jerk who thinks postgres users get too much sleep.

My experience on 8.4, within the last 3 years: a query that is run hundreds of times per second with an average run time in milliseconds and a max query time of .3 seconds suddenly starts taking 100 to 1000 times as long to run. Six hours of debugging later starting at 3 in the morning when systems started failing, we figured out that some magic in the query planner had tripped over and changed the query plan to something that is at least two orders of magnitude worse. No indices have changed. No schemas have changed. Data grows by maybe 30k rows per day which is very reasonable given table sizes and the 128GB of ram dedicated to pg.

Of course, there's no way to specify the query plan. Instead, we ended up fucking with configs until the query plan swapped back.

That's why people like locking query plans. Not necessarily to control the best case, but to control the average / worst case.

zmmmmm · on Feb 4, 2011

> That's why people like locking query plans. Not necessarily to control the best case, but to control the average / worst case.

This. I've had exactly the same experience as you.

Nobody wants to put hints in the query plan. But when your web site is down because one day the freakin' query planner decided all by itself that it was time for a change and some of your worst case queries are taking 2 minutes to return I don't want to be fudging about with statistics trying to understand the internal mind of the planner to convince it to return to sanity. I just want to make it do a plan that I know won't lose me my job.

The kind of attitude in this article reminds me - sadly - of the BS that used to come from the MySQL devs. "No you don't need transactions! Your system is broken if it uses transactions!". Of course it was BS and the minute they supported transactions they were all over how good it was.

saurik · on Feb 7, 2011

For the record, that form of statement ("no, you don't need X") cuts both ways, and therefore whether or not I agree that PostgreSQL needs query hints, this argument of comparison bothers me: you could use the same argument for "No, you don't need the ability to violate referential integrity! Your system is broken if it violates declared foreign key constraints!", which is to me a very correct (and "very PostgreSQL") statement to make about how horrible it is that MySQL /added a feature/ that turns off foreign key constraint checks in a way that leaves your data in an incosistent state due to the same kind of pressure from DBAs who weren't really thinking about the long term consequences. Put differently: you stripped away all of the reasoning and are now comparing the lexical syntax of a conclusion and trying to claim that this means two situations are alike, which is actively misleading to your readers and yourself.

btilly · on Feb 4, 2011

Absolutely. It is the nature of the profession that DBAs are conservative. If you have something that works, you want it to continue to work. So if it is working well enough, don't change it!

The optimizer is great. Wonderful. It found plans that are working well for me. Yay!

Now what? The potential upside of it finding an even better plan is minimal. I'm satisfied with its performance as is. The potential downside of it deciding that a worse plan is better is huge. Let me go and tell it to not change its mind!

Relational databases already have enough "fall over with no previous sign of problems" failure modes. (The hard one has to do with locking, if a particular fine-grained lock is operating at 98% of capacity it shows no sign of problems, but at 101% of capacity the database falls over.) There is no need to add more.

Update: I've actually run across people recommending a workaround for this problem: turn off automatic statistics gathering. That way you know that the query optimizer won't automatically change what it is doing. I bet I know what this particular developer thinks of that strategy!

wiredfool · on Feb 4, 2011

I've been there, and have the scars from it.

In several cases, I have order by (id{primary key}+0) desc just to bust up the very strong bias to just backwards scan on the primary key till it fills the limit window. That's perfectly fine if you're looking at the end of a time series, but if you're actually looking 200k records back, even if stuff is cached in memory, that's a hell of a hit.

The case that first triggered that one was when the query planner went from a constant time lookup to that index scan, and took the time to do a large update from 10s of seconds to 12 hours.

fendale · on Feb 4, 2011

Yea, I have seen Oracle do basically the same thing, and our solution is to stick in a stored outlined (effectively locking the plan). If you own the code (ie it is not a 3rd party COTS package) you can also use hints to control queries. I would hate to be without hints in Oracle.

Oracle 11g apparently can spot when a query plan changes and the new one is much slower than the old one - at least that is what is says in the documentation. Whether it works or not I will find out if we ever get a system upgraded to 11g!

jacques_chester · on Feb 7, 2011

> Oracle 11g apparently can spot when a query plan changes and the new one is much slower than the old one - at least that is what is says in the documentation. Whether it works or not I will find out if we ever get a system upgraded to 11g!

I'm surprised this isn't already the default.

I guess the next step for postrgres is to collect statistics on plans as well as data.