Hacker News new | past | comments | ask | show | jobs | submit login

> Heck, it was possible to completely rewrite a bad query on the fly and execute a re-written variant.

Is there really such a thing as a bad query that can be rewritten to give the same results but faster? For me, that's already the query optimizer's job.

Of course there are "bad queries" where you query for things you don't need, join on the wrong columns, etc. And yeah the optimizer isn't perfect. But a query that you expect the query optimizer to "rewrite" and execute in an optimal way is a good query.




Highly database dependent, as the query optimizer can only perform the tricks programmed into it.

For instance, we use SQLAnywhere at work (migrating to MSSQL), and it wasn't smart about IN (sub-query) so EXISTS was much faster.

Or, as I mentioned in another comment here, MSSQL performs much worse using a single OR in WHERE clause vs splitting into two queries and using UNION ALL, something which has no significant difference in SQLAnywhere.

For MSSQL I've found that even a dozen sub-queries in the SELECT part can be much faster than a single CROSS APPLY for fetching per-row data from another table.

Also the query might rely on certain assumptions that will in practice always hold in that application, but not in general. Especially around NULL, for example NOT IN vs NOT EXISTS[1].

[1]: https://www.mssqltips.com/sqlservertip/6013/sql-server-in-vs...


One example is that in Presto, joins assume that left (IIRC) table is the big one you stream, and the right one is the small one you hash. One of the newer features was that in some cases, the optimizer is able to ignore the SQL order and correctly pick which table to hash and which one to stream.


Isn't that just basic table/index statistics to know which table is the smallest?


Not if you're joining on a filtered table or a result of another join.


> Of course there are "bad queries" where you query for things you don't need, join on the wrong columns, etc.

You can inject a hint into the query, forcing it to use a plan that would not otherwise be used, for example. Although, fixing a plan through a baseline is way cleaner. Mostly, I just meant that as an extreme example of something you can do, not something you should do. And yes, the only reason to re-write the query is when the query itself is bad in that it asks for unnecessary data or misses a join column. Admittedly, that's an extremely dirty and dangerous thing to do, as it uncouples app from db, but it is possible.


>Is there really such a thing as a bad query that can be rewritten to give the same results but faster? For me, that's already the query optimizer's job.

I can't tell if your disclaimer covers it but, yes, there are lots of bad queries that take a little bit of a re-write and run significantly faster. Generally it is someone taking a procedural vs set based approach or including things they don't need to try and help (adding an index to a temp table when it is only used once and going to be full scanned anyways). That's outside the general data typing/generally missing indexes.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: