> Heck, it was possible to completely rewrite a bad query on the fly and execute...

magicalhippo · 2024-04-15T04:25:36 1713155136

Highly database dependent, as the query optimizer can only perform the tricks programmed into it.

For instance, we use SQLAnywhere at work (migrating to MSSQL), and it wasn't smart about IN (sub-query) so EXISTS was much faster.

Or, as I mentioned in another comment here, MSSQL performs much worse using a single OR in WHERE clause vs splitting into two queries and using UNION ALL, something which has no significant difference in SQLAnywhere.

For MSSQL I've found that even a dozen sub-queries in the SELECT part can be much faster than a single CROSS APPLY for fetching per-row data from another table.

Also the query might rely on certain assumptions that will in practice always hold in that application, but not in general. Especially around NULL, for example NOT IN vs NOT EXISTS[1].

[1]: https://www.mssqltips.com/sqlservertip/6013/sql-server-in-vs...

Scene_Cast2 · 2024-04-14T18:10:20 1713118220

One example is that in Presto, joins assume that left (IIRC) table is the big one you stream, and the right one is the small one you hash. One of the newer features was that in some cases, the optimizer is able to ignore the SQL order and correctly pick which table to hash and which one to stream.

legulere · 2024-04-14T19:51:49 1713124309

Isn't that just basic table/index statistics to know which table is the smallest?

Scene_Cast2 · 2024-04-15T00:49:10 1713142150

Not if you're joining on a filtered table or a result of another join.

arronax · 2024-04-15T15:14:38 1713194078

> Of course there are "bad queries" where you query for things you don't need, join on the wrong columns, etc.

You can inject a hint into the query, forcing it to use a plan that would not otherwise be used, for example. Although, fixing a plan through a baseline is way cleaner. Mostly, I just meant that as an extreme example of something you can do, not something you should do. And yes, the only reason to re-write the query is when the query itself is bad in that it asks for unnecessary data or misses a join column. Admittedly, that's an extremely dirty and dangerous thing to do, as it uncouples app from db, but it is possible.

atwebb · 2024-04-15T02:39:30 1713148770

>Is there really such a thing as a bad query that can be rewritten to give the same results but faster? For me, that's already the query optimizer's job.

I can't tell if your disclaimer covers it but, yes, there are lots of bad queries that take a little bit of a re-write and run significantly faster. Generally it is someone taking a procedural vs set based approach or including things they don't need to try and help (adding an index to a temp table when it is only used once and going to be full scanned anyways). That's outside the general data typing/generally missing indexes.