I've often thought that a database that could automatically detect slow queries ...

arronax · 2024-04-14T06:40:53 1713076853

Oracle DB is, or was, very close to that with its query profiles, baselines, and query patches. It wasn't automatic back in 2014 when I last worked on it, but all the tools were there. Heck, it was possible to completely rewrite a bad query on the fly and execute a re-written variant. I suppose it all stems from the fact that Oracle is regularly used under massive black boxes, including the EBS.

Also, the problem with automatic indexing is that it only gets you so far, and any index can, in theory, mess up another query that is perfectly fine. Optimizers aren't omniscient. In addition, there are other knobs in the database, which affect performance. I suppose, a wider approach than just looking at indexes would be more successful. Like Ottertune, for example.

dmurray · 2024-04-14T17:09:12 1713114552

> Heck, it was possible to completely rewrite a bad query on the fly and execute a re-written variant.

Is there really such a thing as a bad query that can be rewritten to give the same results but faster? For me, that's already the query optimizer's job.

Of course there are "bad queries" where you query for things you don't need, join on the wrong columns, etc. And yeah the optimizer isn't perfect. But a query that you expect the query optimizer to "rewrite" and execute in an optimal way is a good query.

magicalhippo · 2024-04-15T04:25:36 1713155136

Highly database dependent, as the query optimizer can only perform the tricks programmed into it.

For instance, we use SQLAnywhere at work (migrating to MSSQL), and it wasn't smart about IN (sub-query) so EXISTS was much faster.

Or, as I mentioned in another comment here, MSSQL performs much worse using a single OR in WHERE clause vs splitting into two queries and using UNION ALL, something which has no significant difference in SQLAnywhere.

For MSSQL I've found that even a dozen sub-queries in the SELECT part can be much faster than a single CROSS APPLY for fetching per-row data from another table.

Also the query might rely on certain assumptions that will in practice always hold in that application, but not in general. Especially around NULL, for example NOT IN vs NOT EXISTS[1].

[1]: https://www.mssqltips.com/sqlservertip/6013/sql-server-in-vs...

Scene_Cast2 · 2024-04-14T18:10:20 1713118220

One example is that in Presto, joins assume that left (IIRC) table is the big one you stream, and the right one is the small one you hash. One of the newer features was that in some cases, the optimizer is able to ignore the SQL order and correctly pick which table to hash and which one to stream.

legulere · 2024-04-14T19:51:49 1713124309

Isn't that just basic table/index statistics to know which table is the smallest?

Scene_Cast2 · 2024-04-15T00:49:10 1713142150

Not if you're joining on a filtered table or a result of another join.

arronax · 2024-04-15T15:14:38 1713194078

> Of course there are "bad queries" where you query for things you don't need, join on the wrong columns, etc.

You can inject a hint into the query, forcing it to use a plan that would not otherwise be used, for example. Although, fixing a plan through a baseline is way cleaner. Mostly, I just meant that as an extreme example of something you can do, not something you should do. And yes, the only reason to re-write the query is when the query itself is bad in that it asks for unnecessary data or misses a join column. Admittedly, that's an extremely dirty and dangerous thing to do, as it uncouples app from db, but it is possible.

atwebb · 2024-04-15T02:39:30 1713148770

>Is there really such a thing as a bad query that can be rewritten to give the same results but faster? For me, that's already the query optimizer's job.

I can't tell if your disclaimer covers it but, yes, there are lots of bad queries that take a little bit of a re-write and run significantly faster. Generally it is someone taking a procedural vs set based approach or including things they don't need to try and help (adding an index to a temp table when it is only used once and going to be full scanned anyways). That's outside the general data typing/generally missing indexes.

Tostino · 2024-04-14T14:26:13 1713104773

The problem of new indexes messing up otherwise good queries is something I've battled on and off for the past decade with Postgres. Definitely annoying.

rand_r · 2024-04-14T20:31:53 1713126713

How would an index mess up another query? AFAIK indexes would only hurt write performance marginally per index, but most slow queries are read-only. I’ve tended to just add indexes as I go without thinking about it and haven’t run into issues, so genuinely curious.

adamcharnock · 2024-04-14T23:03:32 1713135812

While I don’t recall running into issues either, I can certainly see that a new index could cause the query planner to make a different decision. And that decision could - in some cases - end up being worse that the previous behaviour.

I definitely have seen the query planing make some peculiar choices in the past.

Tostino · 2024-04-16T12:46:16 1713271576

This is what I ran into. Often times they were indexes with a similar cost as another, and that caused issues.

I think the main index type that bit me are the ones created by exclusion constraints. Often times it looks to the planner like "the right" index to use, but there is another (btree) that is way cheaper...the exclusion constraint is just there to ensure consistency.

In those cases to fix things, I added a WHERE clause to the index (e.g. WHERE 1=1), and the planner wouldn't consider that index unless it saw that same 1=1 condition in the queries WHERE clause.

paulryanrogers · 2024-04-14T23:17:06 1713136626

Indexes cost memory. It could push other indexes out of RAM. Or updates could increase IOPS, pushing them over an expensive threshold.

elric · 2024-04-14T06:01:01 1713074461

I'm sure the database could, but it doesn't mean the database should. Indexes come at the cost of extra disk space, slower inserts, and slower updates. In some cases, some slower queries might be an acceptable tradeoff. In other cases, maybe not. It depends.

kiwicopple · 2024-04-14T07:15:16 1713078916

this is our posture for this extension on the supabase platform. we could automate the creation of the indexes using the Index Advisor, but we feel it would be better to expose the possible indexes to the user and let them choose

dmurray · 2024-04-14T17:16:20 1713114980

You could tell it "you have a budget of X GB for disk space, choose the indexes that best optimize the queries given the budget cap."

Not perfect, because some queries may be more time-critical than others.

You could even annotate every query (INSERT and UPDATE as well as SELECT) with the dollar amount per execution you're willing to pay to make it 100ms faster, or accept to make it 100ms slower. Then let it know the marginal dollar cost of adding index storage, throw this all into a constraint solver and add the indexes which are compatible with your pricing.

gneray · 2024-04-14T13:24:33 1713101073

this is the way ^^

d0100 · 2024-04-14T17:22:46 1713115366

Are the trade-offs measurable? If they are the database could just undo the index...

Not just indexing, but table partitions, materialized views, keeping things in-memory...

remus · 2024-04-14T18:16:22 1713118582

> Are the trade-offs measurable?

Yes, but you need the context about what is the correct tradeoff for your use case. If you've got a service that depends on fast writes then adding latency via extra indices for improved read speed may not be an acceptable trade off. It depends on your application though.

b3lm0nt · 2024-04-14T17:09:01 1713114541

Andrew Kane built dexter, which is an automatic indexer for Postgres.

https://github.com/ankane/dexter

https://ankane.org/introducing-dexter

masklinn · 2024-04-14T06:25:30 1713075930

Because indexes have costs you need a much more complicated system which can feed back into itself and downgrade probationary indexes back to unindexed.

freedomben · 2024-04-14T17:11:32 1713114692

Even this isn't sufficient, because some problems with over-indexing don't become apparent until the size of a table gets much larger, which only happens a drop at a time. I suppose if it was always probationary and continually being evaluated, at some point it could recognize that for example INSERTs are now taking 1000x longer than they were 2 years ago. But that feels like a never-ending battle against corner cases, and any automatic actions it takes add significant complexity to the person debugging later.

fulafel · 2024-04-14T16:48:04 1713113284

Several databases index everything, needed or not. (And sometimes have mechanisms to force it off for some specific data)

tuwtuwtuwtuw · 2024-04-14T05:53:57 1713074037

That exists in Microsoft SQL Server. It can create new indexes, drop unused indexes, change query plans when it detect degradation and so on.

BrentOzar · 2024-04-14T10:38:38 1713091118

Source? I’ve been working with SQL Server for a couple of decades and I don’t believe it will automatically create or drop indexes under any circumstances. You might be thinking of Azure SQL DB.

Ciantic · 2024-04-14T11:10:35 1713093035

"Automatic tuning, introduced in SQL Server 2017 (14.x), notifies you whenever a potential performance issue is detected and lets you apply corrective actions, or lets the Database Engine automatically fix performance problems." [1]

I have used this in Azure SQL too, but according to that it should be in SQL Server.

https://learn.microsoft.com/en-us/sql/relational-databases/a...

BrentOzar · 2024-04-14T13:56:09 1713102969

Read that link carefully: only automatic plan regression is available in SQL Server, not the automatic index tuning portion. The index tuning portion only applies to Azure SQL DB.

couchand · 2024-04-14T13:02:46 1713099766

Good link!

> Automatic index management identifies indexes that should be added in your database, and indexes that should be removed. Applies to: Azure SQL Database

tuwtuwtuwtuw · 2024-04-14T11:33:09 1713094389

What's the point of asking for a source when you would find it on Google in one minute? Odd way of learning. Not like I brought up some debated viewpoint.

hobs · 2024-04-14T12:06:50 1713096410

Probably because most of the stuff you'd find in the top search results would include the GP's name.

Just a few sentences later "Automatic tuning in Azure SQL Database also creates necessary indexes and drops unused indexes" - that's not in on-prem SQL Server.

simplyinfinity · 2024-04-14T16:32:24 1713112344

Google the name of the person you're replying to :)

radicalbyte · 2024-04-14T14:22:09 1713104529

They've had a non-automatic "query advisor" in there forever, it operated on profiling data and was highly effective.

taspeotis · 2024-04-14T14:05:33 1713103533

That’s an Azure SQL thing, not MSSQL.

ComodoHacker · 2024-04-14T05:53:50 1713074030

Big Guys do this. For big bucks, of course.

tuwtuwtuwtuw · 2024-04-14T05:56:45 1713074205

> big bucks

You get that feature in Azure SQL Database for $5/month.

GordonS · 2024-04-14T17:12:46 1713114766

I might be misremembering, but IIRC RavenDB does this (it's a commercial document DB, written in C#).

ed_balls · 2024-04-14T17:12:20 1713114740

Default DB for App Engine (NDB) has this feature. Implicit indexes are tad annoying.