Hacker News new | past | comments | ask | show | jobs | submit login
Why PostgreSQL doesn't have query hints (toolbox.com)
156 points by mcfunley on Feb 4, 2011 | hide | past | favorite | 60 comments



This is how you make your users upset.

Speaking personally I have personally encountered times when PostgreSQL (this was in the 8.2 series) simply Could Not Find The Right Plan. I looked at the tables and indexes. I saw the right query plan. It did not. I did not have DBA access so I had no ability to try to figure out why. However I did have the ability to rewrite my query into two, with the first going into a temp table. This effectively forced PostgreSQL to use the query plan I knew would work, which caused my query to run in under a second as opposed to taking multiple minutes.

I had similar experiences with Oracle, and used query hints very successfully.

I don't dispute the claim that this likely happens only on 0.1% of queries. Based on what I've seen in Oracle, I also suspect that most people who use hints use them as voodoo, and most of the time use them incorrectly.

However you're much more likely to hit that 0.1% if you're writing very complex queries (which happened to me when my role was devoted to reporting). People can learn how databases are supposed to work (I certainly did). Even then I acknowledge that 95-99% of the time it does better than I could. But it is still really, really helpful to, in the remaining 1-5% of the time when the database goes wrong, be able to tell it the right way to do its job. And in my personal experiences, the cases where the database gets it wrong, it wasn't a temporary problem - it stayed wrong.

But they don't acknowledge the existence of people like me. They assume that I must be lazy, ignorant, or have had my bad experience decades ago. Because their wonderful software certainly couldn't have given me bad experiences in the last 3 years. (I'm no longer in that reporting role, so I don't hit that case any more.


I have written quite a few very complex long-running queries with 10-12 tables involved and 4-5 deep nested SELECTs in PostgreSQL and never encountered a case where the query planner was just plain wrong. Most of this work started on 8.2.

It's usually a poorly configured server that causes the problem. The default configuration is almost absurdly conservative. For instance, in 8.3, the default_statistics_target is 10, which is brutally low. For larger indexes and/or more complex operations, this can really hamper the query planner.

When building a new PostgreSQL server, I highly recommend seeking the services of a respected consultant that can help properly tune the server for your hardware & query load. A few hours of their time will net you weeks of time savings, even if you think of yourself as an advanced PostgreSQL user, especially if you're building complex queries or working with larger data sets.

EDIT: typo on default_statistics_target (was 100).


I don't doubt that a competent person could have improved the configuration substantially.

However I didn't get to choose the server, it already existed. Furthermore I was dealing with a restored backup of a production webserver, so it was configured appropriately for OLTP even though I was using it for complex reporting. There was simply no way that anyone was going to risk the possibility of production problems to make my life on the back end any easier. And there was no way that I could in good conscience recommend that they do so.

And I did find my temp table workaround, that effectively let me force any query plan I needed.


I've definitely been in these types of situations, having to make things work in a hostile environment or to meet a deadline. I am just recommending an option to avoid these types of situations for those who are able to seek out the best practice before starting.

I might also speculate that if you were writing analytics queries that ran scans across several, large, heavily contended production database tables, it's likely that dumping the data into a temporary table first might have been the best option even with a highly tuned server. Despite the fantasy we're often told about MVCC, there are still enormous amounts of resources wasted doing large analytics-type scans on live, hot tables. This isn't something that could be solved with a query plan hint and it's unlikely even a smarter query planner could have saved you.

If I had a magic wand, my environment of choice in this situation would be a separate dedicated, read-only slave running on ZFS that would allow me to create filesystem snapshots & bring up a PostgreSQL instance on that snapshot.


I assure you that contention was not an issue on a restored backup. At that point PostgreSQL did not have the option to have read-only slaves. But for reporting, day old data was acceptable so that wasn't a problem.

It still doesn't solve the problem that you really want to tune differently for OLTP and data warehousing, and the server was tuned for OLTP.


I'll definitely agree that attempting to run OLTP and analytics queries on a single PostgreSQL server is not going to turn out well. I'm not sure if there's a way around it though, even in theory (e.g. MapReduce isn't very good for OLTP).


I made it work acceptably well for my use case. But the query optimizer sometimes was doing stupid things.


I've seen this kind of attitude a fair bit from postgres users and admins, especially as relates to replication. I was told it was a feature I didn't need, didn't really want, didn't understand, etc. I could accomplish the same thing by rsync, there were third-party patches to do it, etc. None of them actually solved the problem of wanting a live, hot spare that I could do reporting or reads against. Eventually they got defensive and hostile, accusing me of trolling or being a MySQL fanboy.

Now, of course, Postgres had a rudimentary replication implementation, and it's a great new feature and we're all excited about it, etc.

I'm not trying to say anything I'll about Postgres itself, or even the core developers, but this attitude doesn't always help.


> This is how you make your users upset.

More like he's tired of upsetting users on an interactive, case-by-case basis and has decided to move to batch.


I agree with this so much I wish I could upvote it more!

I'm a very experienced Oracle developer and Oracle does a really good job, but I need hintsto fix problems more often that I would like, especially on complex reporting queries.


Doesn't the last part of the article, proposing a system to selectively adjust cost calculations, acknowledge the existence of people like you?


No. Because in my particular situation I did not have DBA access, and I was running queries against a restored backup of an OLTP system. The risk/reward is such that it made absolutely no sense to try to tweak things to make my queries run faster at the potential cost of doing who knows what damage to the queries that were running in the OLTP system.

Therefore, even if I was still in that reporting role, I would never, ever even CONSIDER playing with the knob they gave me. And even were I so reckless, no sane DBA should allow it.


This snippet from the article makes me think their proposal only affects one query at a time:

For example, one which allowed you to put "SELECTIVITY 0.1" after a WHERE clause


Oops, you're right.

Let me see. I can peer inside of a running black box and twiddle knobs that do something I don't understand until I magically get the right result. Then I can cross my fingers and hope it doesn't change out from under me.

Why does this not sound appealing?


Here's the real reason: this dev is an jerk who thinks postgres users get too much sleep.

My experience on 8.4, within the last 3 years: a query that is run hundreds of times per second with an average run time in milliseconds and a max query time of .3 seconds suddenly starts taking 100 to 1000 times as long to run. Six hours of debugging later starting at 3 in the morning when systems started failing, we figured out that some magic in the query planner had tripped over and changed the query plan to something that is at least two orders of magnitude worse. No indices have changed. No schemas have changed. Data grows by maybe 30k rows per day which is very reasonable given table sizes and the 128GB of ram dedicated to pg.

Of course, there's no way to specify the query plan. Instead, we ended up fucking with configs until the query plan swapped back.

That's why people like locking query plans. Not necessarily to control the best case, but to control the average / worst case.


> That's why people like locking query plans. Not necessarily to control the best case, but to control the average / worst case.

This. I've had exactly the same experience as you.

Nobody wants to put hints in the query plan. But when your web site is down because one day the freakin' query planner decided all by itself that it was time for a change and some of your worst case queries are taking 2 minutes to return I don't want to be fudging about with statistics trying to understand the internal mind of the planner to convince it to return to sanity. I just want to make it do a plan that I know won't lose me my job.

The kind of attitude in this article reminds me - sadly - of the BS that used to come from the MySQL devs. "No you don't need transactions! Your system is broken if it uses transactions!". Of course it was BS and the minute they supported transactions they were all over how good it was.


For the record, that form of statement ("no, you don't need X") cuts both ways, and therefore whether or not I agree that PostgreSQL needs query hints, this argument of comparison bothers me: you could use the same argument for "No, you don't need the ability to violate referential integrity! Your system is broken if it violates declared foreign key constraints!", which is to me a very correct (and "very PostgreSQL") statement to make about how horrible it is that MySQL /added a feature/ that turns off foreign key constraint checks in a way that leaves your data in an incosistent state due to the same kind of pressure from DBAs who weren't really thinking about the long term consequences. Put differently: you stripped away all of the reasoning and are now comparing the lexical syntax of a conclusion and trying to claim that this means two situations are alike, which is actively misleading to your readers and yourself.


Absolutely. It is the nature of the profession that DBAs are conservative. If you have something that works, you want it to continue to work. So if it is working well enough, don't change it!

The optimizer is great. Wonderful. It found plans that are working well for me. Yay!

Now what? The potential upside of it finding an even better plan is minimal. I'm satisfied with its performance as is. The potential downside of it deciding that a worse plan is better is huge. Let me go and tell it to not change its mind!

Relational databases already have enough "fall over with no previous sign of problems" failure modes. (The hard one has to do with locking, if a particular fine-grained lock is operating at 98% of capacity it shows no sign of problems, but at 101% of capacity the database falls over.) There is no need to add more.

Update: I've actually run across people recommending a workaround for this problem: turn off automatic statistics gathering. That way you know that the query optimizer won't automatically change what it is doing. I bet I know what this particular developer thinks of that strategy!


I've been there, and have the scars from it.

In several cases, I have order by (id{primary key}+0) desc just to bust up the very strong bias to just backwards scan on the primary key till it fills the limit window. That's perfectly fine if you're looking at the end of a time series, but if you're actually looking 200k records back, even if stuff is cached in memory, that's a hell of a hit.

The case that first triggered that one was when the query planner went from a constant time lookup to that index scan, and took the time to do a large update from 10s of seconds to 12 hours.


Yea, I have seen Oracle do basically the same thing, and our solution is to stick in a stored outlined (effectively locking the plan). If you own the code (ie it is not a 3rd party COTS package) you can also use hints to control queries. I would hate to be without hints in Oracle.

Oracle 11g apparently can spot when a query plan changes and the new one is much slower than the old one - at least that is what is says in the documentation. Whether it works or not I will find out if we ever get a system upgraded to 11g!


> Oracle 11g apparently can spot when a query plan changes and the new one is much slower than the old one - at least that is what is says in the documentation. Whether it works or not I will find out if we ever get a system upgraded to 11g!

I'm surprised this isn't already the default.

I guess the next step for postrgres is to collect statistics on plans as well as data.


The bottom line here is if your site suddenly goes down because PostgreSQL changed its query plan in the middle of the night, the answer PostgreSQL has for you is "sorry, you're stupid and now you are fucked." Until you can decipher what exactly is happening and make an appropriate fix, which can take days or weeks, your site is effectively dead.

Nobody disagrees with the noble goal of building the best query planner, or thinks that index hints are anything less than a hack for dire siutations. Where we disagree is if it's OK for the sites we're responsible for to go down at the whim of the query planner. PostgreSQL's decision to not provide hints tells me they care more about the purity of their query execution engine than the applications that rely upon it.


A couple people have mentioned the query plan changing in the middle of the night. Is the scenario a single query getting executed two different ways by the same running system? Or is it from upgrading to a new version of PostgreSQL and hitting different behavior? If it's the first scenario, would using prepared statements help somewhat with avoiding unexpected changes?

// Edit: Sorry for the naive question. I see that other people are saying that is the first one.


tl;dr They trust the cost-based query optimizer more than the DBA; for the rare cases where the optimizer gets it wrong, they're more likely to add cost adjustments than the explicit "do it like this" directives that, say, Oracle DBAs are used to. There's already one cost adjustment in Postgresql 9.0.

(They also claim that query optimizers all over are good enough that they don't need the hints anymore, and the only reason that anyone asks for them is because they've gotten so used to messing with bad query planners that they don't know how to work with a good one. Which, regrettably, has not been my company's experience with Oracle 11g --- we haven't resorted to explicit hints, but we've restructured queries in other ways for order-of-magnitude improvements in performance.)


>Which, regrettably, has not been my company's experience with Oracle 11g --- we haven't resorted to explicit hints, but we've restructured queries in other ways for order-of-magnitude improvements in performance.

He's not claiming that query planners are so good that you'll never have to restructure your queries. I've written many queries that seem perfectly reasonable, but once I start optimizing I can see I'm doing things completely backwards. It's just like almost everything else programmers do. If I dash off quick program, I don't expect the VM figure out how to make it perform perfectly. Why would I expect database software to be different?


> It's just like almost everything else programmers do. If I dash off quick program, I don't expect the VM figure out how to make it perform perfectly. Why would I expect database software to be different?

Except that SQL wasn't designed as a programming language. It was designed to be a declarative language that normals could use.

In a normal language, you have the freedom to express an algorithm in a billion different ways. You're expected to find the right way to write it. That SQL falls down here is a fundamental flaw in what was supposed to be it's greatest strength.


In what was supposed to be its greatest strength a long, long time ago. I don't know that anyone considers SQL to be anything other than a technical detail any more; certainly not to the extent of expecting normals to be able to get what they need out of a database by writing raw queries.


Yes, but that feature still hinders SQL today. If "usability by normals" was dropped as a design requirement, SQL would look a lot more like a normal programming language, and suck a lot less.


> It was designed to be a declarative language that normals could use.

We may be getting off the query hint topic, but normals can use SQL Server as you've described. They can write queries and get data out, they just can't expect those queries to perform optimally without expertise.

In normal language, you and I have the freedom to convey our thoughts a billion different ways. But only one guy wrote The Sun Also Rises (for example.) Hemingway found the right way to write it.


Oracle's preferred way to solve this kind of problem is to do a stored program outline. Oracle even has tools to help you identify a poorly performing query, have it do an extra thorough analysis (basically spend a minute or so looking for the best plan) and then store the outline for you.


At one place I worked, Sybase told us we wouldn't need to hint the queries or force a plan with the new 12.x series. With the amount of data we were pushing, that was complete bunk. We ended up forcing every plan.


I think you could probably get hints into Postgres if you did them right (and volunteered to do the work!). Where "right" means trying to make the hints at least somewhat declarative: not "use index X for query Y", but "the values of these two attributes are not independent" or "this predicate is likely to be highly selective." Basically, look at the most frequent planner failures, figure out why the planner makes a mistake, and let the user specify the information the planner needs to choose the right plan. Postgres has added features in this direction: for example, you can now specify the cost associated with evaluating a function as part of CREATE FUNCTION.

The work required to implement this would be substantial, and most of the people with the necessary skills would rather improve the planner itself (e.g., collect statistics on cross-column correlations to avoid making the attribute independence assumption in the first place). So it isn't too surprising this hasn't got done.


> Many older DBAs, and their proteges distrust query planners because of bad experiences back in the 80's and early 90's. They are sure they can do a better job, even though they can't.

I remember having cases in the past year (on Oracle 10g) where adding hints helped substantially. Since it's impossible for me to ever do better than the optimizer, I guess this must mean I've gone nuts?

> Other DBAs are lazy or labor under unrealistic deadlines. Applying a query hint is often faster and easier than diagnosing the real reason the query has a bad plan.

Hints are evil and you should never use them, even though they make your job easier.

> Ignorance and weak software play a role too: good diagnostic tools and techniques for troubleshooting bad queries did not become available until relatively recently, and most DBAs still don't know how to use them.

Instead of just telling the optimizer what to do, you should have an extensive discussion with it and attempt to persuade it with concessions so that you don't hurt its feelings.

> The developers who work on the PostgreSQL not-for-profit database project, though, have the privilege of not implementing a bad idea just because a lot of people seem to want it.

Hints offend us, and we don't care if people find them useful.

> All that aside, there are those 0.1% of pathological cases where the query planner does The Wrong Thing even when it's patently obvious what the right thing is. In our community, that usually leads to patches and improvements in the query planner and the statistics system

Not supporting hints helps us extract useful code from our users. (hey, this is an aweseome idea, I need to think how to apply it to my projects...)

> Well, one thought is a system which would allow DBAs to selectively adjust the cost calculations for queries. For example, one which allowed you to put "SELECTIVITY 0.1" after a WHERE clause to tell the planner that despite what it thinks from its statistics, that set of criteria will give you 10% of the table. Even better would be a system of fudging the statistics on database objects, to allow DBAs to indicate (for example) that using a particular index is more costly than it appears because of the poor clustering of the data or because of the complex calculated expression it uses.

Hints are normally done the wrong way, and when we say we won't implement hints we really mean we won't do them that wrong way. We think we know the right way to do hints, and we're working on implementing it.


What arrogance. Perfect optimizers are an illusion. They are guaranteed to fail and to fail at the most inconvenient times. It is not only possible to do better than the optimizer on complex queries with hints its a given. Now that I know hints are not possible in PostgreSQL I will strictly curtail the kind of projects I will apply this RDBMS to. If you haven't found the need to use hints you have not explored the full range of what SQL is capable of (or worse yet use an ORM).


Personally I have witnessed a pretty drastic level of hoop jumping to get pgsql to force a particular index, and forgo destroying the site for another few days. And this hasn't been by schmuck DBA's either. Rather, by actual postgres committers that we have employed as contractors.

Granted, I don't know if reversing the orders of things in WHERE clauses, doing a rain dance, and hoping for the best would have been necessary if the original author of our schema hadn't made some critical mistakes. Like using text fields for enums--that boner seems to have confused the bejeezus out of the planner on numerous occasions.

Then again most big sites start out as some dude learning databases for the first time and making major mistakes. So anyway, I question the 0.1% figure Berkus throws out there.


to get pgsql to force a particular index, and forgo destroying the site for another few days.

It's just a belly feeling but I smell a deeper problem in there. If your live queries are so complex as to require advanced massaging then perhaps you missed to collect some low-hanging denormalization or caching fruit earlier in the game.

The PostgreSQL planner has always done a flawless job for me in terms of choosing between indexes and tablescans - on reasonable queries. I can count the occasions where it made obviously bad decisions on seemingly simple queries (due to thrown off stats after some slony confusion) on two fingers.

However, if your site depends on cascades of sub-queries to build interactive views then I'd first take a step back and re-evaluate your persistence strategy before putting blame on the database.

if the original author of our schema hadn't made some critical mistakes

Okay, perhaps my belly feeling isn't too far off?


The PostgreSQL planner has always done a flawless job for me in terms of choosing between indexes and tablescans - on reasonable queries.

While that has been my general experience as well, things can break down at the edges sometimes. I sometimes have to join 200m-row tables that don't fit into RAM (though individual partitions of the table do), against 1-10m-row tables of new data. This task is probably something more suited to Hadoop, but we're using PostgreSQL for it as that's where the data is.

With a plain vanilla join using indexed columns for the join, the smaller table is scanned while an index is used for larger table on the joined column. This is reasonable except for the fact that this results in a lot of random IO, so if you don't have an SSD or something, this will be slow. Of course, you can just increase the random_io_cost or whatever it's called, to adjust costs estimated by the query planner. I believe that results in a sorting operation for the tables before the join. Sorting can be better compared to a thrashing disk, but still slow.

The core issue is random IO here, so how about just warming up the cache for that particular partition's index/table with a big sequential read of the entire index/table, and letting it use the index-based query plan as usual? The sequential read takes a minute in the worst case for a single partition, but then when everything's cached.. BAM. The join query runs in 2-20 seconds (for a single partition, one of 26). If Postgres were psychic, it could have done this itself instead of making me force caching with an otherwise pointless extra query. But I suppose that behavior would usually not be desired, as it would destroy the existing cache. In this case, destroying the older cache was the right thing to do, but Postgres can't know that.

The whole point of this comment being: DBAs matter and you can't rely fully on the query planner to do what you may expect. Even a scrappy one like me (I'm mostly a programmer, the DBA stuff just comes with the territory for me). This is partly in response to a comment elsewhere on this page, mentioning how the query-planner is better trusted than a DBA - I wish it were always so.


That doesn't sound right. In default conditions, PostgreSQL will shy away from doing index scans unless you're operating on <0.1% of the table. The planner builds a selectivity figure for the table based on table dimensions, index statistics, the operator used (= is more selective than <), and of course it's estimation of cost for the various operations needed to run the query. The decision to perform an index or sequential scan usually has little to do with the table size ratio during a JOIN. It's likely your index statistics were poor and/or the server configuration was messed up.


You're probably right, I'm an amateur at this. My current solution works pretty well, but I should really drill into what may be wrong here.

If you don't mind me asking, what in your estimation would be an ideal query plan for a join of the type I've described? A hash join? Also, I've left statistics to their defaults and run ANALYZE after bulk data uploads (the only time data is written), but I'll try bumping statistics collection up and running VACUUM ANALYZE again. I don't know what server configuration could be messed up to cause something like this; I have my memory settings (shared_buffers, effective_cache_size) set up fine, and cost parameters have been left alone. Other configuration settings I've changed shouldn't be affecting read queries.


Right, what I am describing there is battles in the long war to denormalize, once a normalized site gets popular. We have since denormalized many things into dirt-stupid yet very functional mysql shards.


Could you say more about the problems using text for enums? I use varchar(4) primary keys in some small lookup tables now. Didn't postgresql only introduce the enum type in 8.3?


By 'enum' I meant in the more general sense, a constrained set of a few values. In our case the original schema designer used a text column for states: 'active,' 'dead,' etc. The planner would generally guess incorrectly about the range of values it might expect to find in a text column like that and working around it required hand fiddling.

(Disclaimer: forgive me, it's been a few years since I've had to deal with this specific problem, some of the details may be off.)


Was it indexing the faux-enum columns? First thing I would do would be to drop the index if there is one considering it will add very little to the selectivity of the query.


Ouch, I'm not a DBA nor advocating for hint support in PostgreSQL but were I either, I'd take Josh's broad brush of "old", "lazy" and "ignorant" unkindly.

PostgreSQL is indeed an awesome technology, I'm using it again after several years of working with folks who were stalwart MySQL adherents. I'm glad to be back. The query planner, among many other things, is a huge improvement.


SQL is a declaration of "what should be done". Hints - procedural directives, ie. "it should be done this way". Modern optimizers do pretty nice job so hints aren't ususally necessary, yet taking away control which is extremely important in 10% cases just because it isn't necessary in 90% is just plain stupid. That one more reason explaining why Oracle or MS can charge so much - because no viable alternative.

And i fail to see how supporting hints would prevent improving the optimizer. Oracle does it extremely well. MS seems too.


MS and Oracle also have large teams of people that work full-time on improving the optimizer. With, the handful of people with the skills and inclination to work on the Postgres optimizer don't view hints as a high priority.


there is a difference between statements : (typical marketing style BS coverup of missing feature) "you don't need hints" and (honest) "we think that our limited resources should be all working on optimizer. In our view the priority of a few percent of cases where hints are necessary is much less than the rest that is covered by optimizer".

The latter shows that the people are rational and open to reason. The former shows that ideology and resulting BS byproduct have overtaken the project.


I agree that saying that hints have zero value is silly, but the blog post is just Josh's opinion. If you read the discussions on this topic on the pgsql-hackers list, you'll find a more balanced view.


I'm pretty sure SQL Server's query hints are just placebos!


I've had to use hints from time to time on stored procedures in SQL Server. I had to use WITH RECOMPILE more than once because of the server generating query plans that would take 60 seconds for one input and .002 seconds on another. The plan seemed to be optimizing for a weird case that made no sense for the general case. Adding that option made it work in .002 seconds for every call. I never noticed plan generation overhead doing this.

In MySQL I've had to use FORCE INDEX usually when sorting on a large set where the primary key was used on the table or was available for sorting but when that wasn't what I needed it for. I used force index on the column I was sorting by and that improved performance by around 50% maybe but then I ended up caching the results because the worst case was still too slow for my taste and they didn't need to change as often as I was executing this query (it was a random sample of the latest actions on a site).

These are edge cases I suppose but I'm glad they're there for me when I need them. I respect their reasons for not including them though as I normally see them used incorrectly or just strangely. (e.g. insert into table (nolock) (id, blah)...)


WITH RECOMPILE does opposite effect. It's kind of anti-hint.

You force SQLServer to never use cached plans and regenerate query plan every time you run query. So you have always "plan generation overhead".

Also, such SP can have only 1 running example in server because of SP:Recomile Lock. Kind of singleton in SQLServer.


You can take NOLOCK out of my cold, dead, hands! That one is a lifesaver for reading from terabyte-sized OLTP tables with 100/sec inserts...

I really like what the guy is saying in the article. However, it's the perennial idealism vs pragmatism argument. And, i'll put my hand up for pragmatism: I've needed hints in the past, because like everything else in this world, query planners simply aren't perfect.


In SQLServer in general it's better to write one time SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED, than write hint NOLOCK agains each table.

Result is the same.

This is not a hint for planner. You can write it for PostgreSQL also (I mean SET TRANSACTION).


I don't know that it's as rare to need query hints as the author points out. Perhaps Oracle's query optimizer is bad in some ways, but I have seen plenty of queries that greatly benefitted from query hints. Sure, you might not need them for most queries. But once you start dealing with more complex or bigger queries, they become more important.


Based on the excellent responses here, the conclusion is that this guy is completely wrong and that query plans are definitely useful and even required. Now I think it is interesting to figure out how it can be that he is so wrong. It seems to me the simplest explanation is that the pg devs aren't eating enough of their own dogfood and are just unaware of a certain uses their product is put to? However, I would expect them to have been supplying consultancy services to exactly the users that come across these kinds of problems. In that case, the 'no query plans' must also have become stuck in their mindset, to remain unmodified after coming across practical uses cases that actually require them.


Developers need control of their tools in the 0.1% case. When you are dealing with complex systems, you will start running into that 0.1%. Either due to a poorly designed scheme, legacy cruft, or unexpected growth.

When things get hairy in other languages, sometimes it make sense to bypass the built in controls. Go around the system libraries, implement your own, or even drop into assembly. The same should be true of a query plan. Yes, your query planner (compiler) is going to get things right 99.9% of the time, but man when it gets it wrong, it's worth it to have tools to deal with the other 0.1%.


In my experience the best way to tackle those problems isn't to wrestle with execution plan hints, instead break the query into 2 and use a temp table or table variable(s) for the inbetween data. This has the additional benefit of helping you debug the cause of the slowness. And there's no way for the planner to stab you in the back.


It doesn't have hints but you can turn off the ability for the query planner to use certain types of operations. And let me tell you that the Postgresql query planner does screw up. And the way to fix it is to turn off its ability to use certain operations and then run your query, yes it's not a hint, but it's necessary to make postgresql work.

They should change the FAQ to 'we don't have hints because we're so arrogant that we think our query planner is perfect'.


I like this approach. The attitude here is "Let's invest our time in making the optimiser good, instead of spending time implementing ways for users to walk around it."


If they trust automatic analysis so much, they should rewrite PostgreSQL in Haskell and trust GHC to generate C-like performance assembly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: