Hacker News new | past | comments | ask | show | jobs | submit login
PostgreSQL query performance bottlenecks (pawelurbanek.com)
249 points by pawurb on Jan 18, 2022 | hide | past | favorite | 41 comments



Another postgresql performance gotcha:

Find all the coupons that are expired (90 day expiration):

    SELECT * FROM coupon
    WHERE created_at + INTERVAL '90 DAY' < now()
This will not use the index on the "created_at" column and will be slow.

You should rewrite the inequality to:

    SELECT * FROM coupon
    WHERE created_at < now() - INTERVAL '90 DAY'
and now the query will be much faster. There are a lot of cases in postgres where simple equivalent algebraic manipulations can completely change the query plan


They are not equivalent since `created_at + INTERVAL '90 DAY'` can overflow for every single row whereas `now() - INTERVAL '90 DAY'` is a constant for the purpose of the query execution.


Yes - this is a common restriction in any DB I've used, certainly in MS SQL Server. The idea is that your queries need to be "SARGable": https://en.wikipedia.org/wiki/Sargable


The DB we use (SQLAnywhere) doesn't consider now() a constant either, so no indexes considered just due to that alone #thingsilearnedinproduction


Wow, that's a dealbreaker for me.


But that would never be desirable, so it's just another reason to do the other?


Yeah this seems very logical to me. I wouldn't call it a "gotcha".


What does "can overflow for every single row" mean in this context?


The maximum value in a postgres timestamp is `294276-12-31 23:59:59.999999`. Overflow means that `created_at + interval '90 days'` exceeds this value. This causes an error.


Ah, so a traditional overflow. It's just that it didn't come to my mind that dates this huge would actually be used, so I was wondering if it was some other thing being referred to. Thanks.


I was wondering the same thing, but after staring at it a bit I think the problem is that one of them is doing math on the values from every row and the other is doing the math once.

created_at + INTERVAL '90 DAY' < now() says that for every row take the created_at column, add 90 days to it, and then see if it is less than now()

created_at < now() - INTERVAL '90 DAY' says take now() subtract 90 days, and then see which rows are less than the result.

Atleast, thats my guess. I rarely do any db stuff directly.


Is "overflow" a term used to express "computed for every row"?

I can see where the optimization would come from, when comparing `created_at` with a fixed value `now() - INTERVAL...` (assuming PostgreSQL is smart enough to evaluate it only once and reuse it for all the index comparisons), but the word "overflow" throws me out of the lane.


This smells less like an optimization developers should make and more like a bug or low-hanging-fruit improvement to the engine.


These 2 predicates are totally different.

The predicate in the 1st statement is actually an expression "created_at + INTERVAL '90 DAY'", it's not column "created_at".

Some databases allow users to create indexes on expression. So if you want to write the 1st statement, you need an index on expression, not a normal index.


Wow, is there any public list or documentation about these common cases and how to make them faster in PG? I would expect the PG query optimizer to fix this automatically, but as it doesn't, having this documentation would be of great use for many developers. Thanks for sharing!


Sites like https://use-the-index-luke.com/ capture a lot of wisdom around tuning. But IMO, it's easier to learn from doing.

So write your product, then start monitoring it as you release it to production.

Postgres can track aggregate metrics for queries using the pg_stat_statements extension [1]. You then monitor this periodically to find queries that are slow, then use EXPLAIN ANALYZE [2] to dig in. Make improvements, then reset the statistics for the pg_stat_statements view and wait for a new crop of slow queries to arise.

[1]: https://www.postgresql.org/docs/current/pgstatstatements.htm...

[2]: https://www.postgresql.org/docs/current/using-explain.html


When releasing a new application (or feature) I've always loaded each table in my development environments database with a few million rows. Tools like Python's Factory Boy [1] or Ruby's Factory Bot [2] help get the data loaded.

After the data is loaded up, start navigating through the application and it will become evident where improvements need to be made. Tools like Django Debug Toolbar [3] help expose where the bad ORM calls are or also by tailing Postgres log files.

[1] https://github.com/FactoryBoy/factory_boy

[2] https://github.com/thoughtbot/factory_bot

[3] https://github.com/jazzband/django-debug-toolbar


It can't "fix" it because it isn't broken; they are not the same predicate.


I'm not sure if it's as much a bottleneck in Postgres as it is in MySQL, but I've just written a tome[1] on more effective offset/limit pagination by using something called a deferred join.

The main problem with offset / limit pagination is that the database is forced to fetch and discard more and more rows as you get deeper into the pages.

The trick to getting offset / limit to be faster is to reduce the amount of data that the database has to inspect.

Using a "deferred join" defers the expensive data access until after the pagination has been calculated.

So instead of

  select * from contacts order by updated_at desc limit 15 offset 10000
you'd do

  select * from contacts inner join (
    select id from contacts order by updated_at desc limit 15 offset 10000
  ) as tmp using(id)
  order by updated_at desc
That way the inner query can use a covering index to pull everything it needs (in some cases) and the expensive selecting is done on fewer rows.

This works especially well when you intend to select a lot of data and when you're filtering / sorting on more than just the primary key.

[1] https://aaronfrancis.com/2022/efficient-pagination-using-def...


I'm guessing this is due to MySQL using clustered indexes, which require a separate b-tree lookup for each matching row in the secondary index since all you have is the key. A Postgres secondary index actually contains the offset of the slotted page containing the row.


Considering how easy the workaround for Ordering by NULLS LAST is (asking for ordered non-null values, then for null values) I wonder why postgres doesn't do that optimization internally. At first glance it seems like you could trivially rewrite any ORDER BY x NULLS LAST query that can't use an index into two queries that can use the index, and then UNION them together.

Maybe one of these things that are so easy to work around that nobody has prioritized fixing them.


I suspect the latter, but also the limited win.

In practice indexes are rarely used for sorting. It's almost always more profitable to use an index for filtering on the first table, and some mix of predicate pushdown (for hash and nested loop joins) and key lookups (for nested loop joins) for joined tables. And if a merge join is applicable, it's probably on PK/FKs and not on your presumably semantic sort order. And most of the time, the set of rows responsive to a query is small enough that not using an index for a final sort isn't a big deal.

Where an index is profitable for sorting is when you're doing pagination over a mostly unbounded set of rows, potentially with a predicate that matches on a prefix of the index you're planning to use for the sort. The plans for these queries tend to be fragile, you need to take care when adding extra joins and predicates that they don't cause the optimizer to abandon the index for the sort.


I had the same thought!

Is this almost-always faster?

(If not, would there be a workaround if a user didn't want this new behavior? Postgres core team hates new settings...)

Would it be hard it would be to add to the optimizer as a query rewrite?


Isn't a more common and general solution to the first problem to index the expression lower(email) (1).

I'll add some to the list

1.

    select count(*) from x where not exists (select 1 from y where id = http://x.id)
can be thousands of times faster than

   select count(*) from x where id not in (select id from y)

2.

This one is just weird, but I've seen (and was never able to figure out why):

    select x from table where id in (select id from cte) and date > $1
be a lot slower than

    select x from table where id in (select id from cte limit (select count(*) from cte)) and date > $1

3.

RDS is slow. I've seen select statements take 20 minutes on RDS which take a few seconds on _much_ cheaper baremetal.

4.

pg_stat_statements (2) is probably the single most useful thing you can enable/use

5.

If you're ok with potentially losing data on failure, consider setting synchronous_commit = off (3). You'll still be protected from data corruption and (4).

(1) - https://www.postgresql.org/docs/14/indexes-expressional.html

(2) - https://www.postgresql.org/docs/14/pgstatstatements.html

(3) - https://www.postgresql.org/docs/14/runtime-config-wal.html#G...

(4) - https://www.postgresql.org/docs/14/wal-async-commit.html


> Isn't a more common and general solution to the first problem to index the expression lower(email) (1).

OP mentions & dismisses it in passing before the proposed solutions:

> A query searching by a function cannot use a standard index. So you’d need to add a custom index for it to be efficient. But, adding custom indexes on a per-query basis is not a very scalable approach. You might find yourself with multiple redundant indexes that significantly slow down the write operations.


> RDS is slow. I've seen select statements take 20 minutes on RDS which take a few seconds on _much_ cheaper baremetal.

I'm sure you observed this, but concluding that RDS is slow as a blanket statement is totally wrong. You had to have had different database settings between the two postgres instances to see a difference like that. 3 orders of magnitude performance difference indicates something wrong with the comparison.


You could easily observe this with a cache-cold query performing lots of random IO. EBS latency is on the order of milliseconds, even cheap baremetal nowadays is microseconds


Also rds caps out around 20k IOPS. You can hit 1 million IOPS on a large machine with a bunch of SSDs. Imagine running 50 rds databases instead of 1.

It's a huge bummer that EBS is the only durable block storage in aws since the performance is so bad. Has anyone had luck using instance storage? The aws white papers make it seem like you could lose data there for any number of reasons, but the performance is so much better. Maybe a synchronous replica in a different AZ?


I've used Aurora and the IO is much better there than on vanilla RDS. Postgres Aurora is basically a fork of postgres with a totally different storage system. Their are some neat re:Invent talks on it if you are interested.


We use aurora actually. It's a lot more scalable, but also pretty expensive. The IO layer is multi-tenent, and unfortunately when it goes wrong, you have no idea why and no recourse. I think I've never had a positive experience with AWS support about it either. We've had IO latency go from <2ms to >10ms and completely destroy throughput. Support tells us to try optimizing our queries like we are idiots.


> This one is just weird, but I've seen (and was never able to figure out why):

The LIMIT has me suspicious it has to do with the "correlation" statistic - I know it applies when ORDER BY is involved, but dunno about the IN.

This statistic exists for every column in a table, and measures the correlation between the order of the column's data and the table's order on disk. If the correlation is "bad" and you're getting most/all of the table's data, then the query planner will do a full table scan and sort, to avoid lots of random access on disk. If instead the correlation is "good", it'll do an index scan because it won't have to do much jumping around to different parts of the disk.

CLUSTER can change the table data order on disk to match one of the indexes on the table. It would have to be run regularly though, since there's no way to insert new rows in the middle, and it locks the table for its whole runtime.


If your data allows so, use BRIN indexes. I have one case with append only data, around 1TB, 250GB date field btree that was converted to a BRIN of around 25MB


Why was the title changed? It now seems like a general article about query performance, while it's really just about 5 specific cases.


It's BuzzFeedy, who cares how many are discussed?

From the guidelines [0]:

> If the title contains a gratuitous number or number + adjective, we'd appreciate it if you'd crop it. E.g. translate "10 Ways To Do X" to "How To Do X," and "14 Amazing Ys" to "Ys." Exception: when the number is meaningful, e.g. "The 5 Platonic Solids."

(and I think it makes some attempt to do this automatically)

[0] - https://news.ycombinator.com/newsguidelines.html


> Execution Time: 0.128 ms

This is fantastic. Wouldn't it be nice if that was the end of it and your business logic was running with the data after that exact delay?

It doesn't really detract from the value of query optimization (i.e. system load reduction), but I do find it a little contradictory in principle to push for sub-millisecond execution and then decide to send those results to a completely different computer across a network.

The biggest bottleneck we ever identified with any database vendor was the network stack sitting between database server and application server. I've got some SQLite-backed business services that can transact an entire user request (which involves many database interactions) and then already be returning a final HTTP response in the amount of time it takes PostgreSQL to complete this one example query.


For OLAP-style use-cases, I wouldn't recommend SQLite, but rather something equally (apparently) unfashionable: stored procedures.

In both cases, the goal is to save the time spent round-tripping queries and responses to the DB. But with OLAP, you more-than-likely have a large dataset that needs its own dedicated beefy DB to process. So rather than moving the data to the code (with e.g. SQLite), move the code to the data, by moving as much of your business layer as possible into the DB. (Or possibly all of it, ala https://postgrest.org/.)


And I got queries that dig through 2TB of data and return aggregations in 30 secs. Good luck doing that with SQLite.


Off topic, but never underestimate the read performance of an SQLite database with all of its indexes warm in the page cache. :)


> I've got some SQLite-backed business services that can transact an entire user request (which involves many database interactions) and then already be returning a final HTTP response in the amount of time it takes PostgreSQL to complete this one example query.

So? My in-memory hashmap can do all that in a fraction of the time it takes SQLite to do its thing. Does that mean that hashmaps are superior to SQLite?

SQLite is a different database with a different concept, different goals, different guarantees, different capabilities, and different tradeoffs than PostgreSQL. Choose the best tool for the job. Sometimes it's a hashmap, sometimes it's SQLite, and sometimes it's PostgreSQL.


> Choose the best tool for the job.

Totally agree. Just trying to get the community thinking about this stuff. Most business apps I have seen over the years do not have any actual value-add proposition for existing as 2+ separate computer systems.


Is anyone here an expert with MySQL, optimization and replication? Please contact me at greg at qbix,com and we’d engage you to help solve a couple issues.

Our site is experiencing slowdowns and SHOW PROCESSLIST may hold some clues. We have had 10,000 downloads of our apps worldwide and collect ping statistics in MySQL so that may have something to do with it




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: