One of my side ideas is to write a postgres plugin offering an alternative syntax for queries, but one that requires you to name the index you use (or to explicitly mention a table scan).
I do understand the value of SQL + the planner for adhoc querying. But so many times I find myself reworking SQL to hint at the planner to use certain indexes, or to add "spurious" filters to make sure an index is used (spurious for application logic reasons). For applications with a relatively low cardinality of queries, some extra tedium might be worth stronger performance property guarantees.
Added advantage of the tedium is it would make a lot of "accidentally quadratic" stuff much clearer. There is no magic in the planner after all!
We can write inline assembler in C or Rust, it feels like it would make sense to offer something similar in SQL. Big problem is how to offer something that isn't too tedious.
Anyways FQL's index model seems to align with my idea, just unsure if I want the rest of it.
This is a thing that people say, but this is incorrect.
Classic example: Multitenant DB. I have a user table. I have a document table. I have a client table. documents and clients are key'd to user. In particular, there's a property of the system that document.client.user == document.user.
I do a "naive" query to count the number of documents per client, like:
- select count(*), client_id from document where document.text ilike %searchstring% and client_id in (select id from client where user_id = 1) group by client_id;
Now, it turns out that since I have this multitenant data, I have indexes alongside things like (user_id, client_id) on document. I also want to support text search so I have an full-text-search index on (user_id, document.text).
Without properly "guiding" the filtering (usually re-repeating the "where user_id = 1" fragment on every joining table) I will not be able to take advantage of my FTS index, because Postgres decides that's not slow (and _doesn't know about the relation between document.client.user_id and client.use_id). It instead tries hard to implement this either by pulling all of the client documents into memory and then doing an ad-hoc search there, or doing a FTS across all tenants (because I have an index on document.text for cross-tenant searching in an admin), and then scanning through that and removing by client_id.
This is a query plan I would not write if I wrote it myself, because it would be "obviously wrong". It would be a lot of work and clearly incorrect for what I want.
Meanwhile people writing features like this add indexes specifically to support certain workflows, and are often blissfully unaware that those indexes are not being used when it makes the most sense. Of course you gotta check experimental data and measure etc etc. But I like the idea of doing more data design before you start having performance issues, and I think that explicit index usage in particular would be a huge benefit on that front.
(there is a "fix" for the original query, involving repeating the user_id filter at other levels)
If you run a prepared query a few times, postgres will switch to a generic query plan. Sometimes the generic plan is much worse. I've never been able to fix this by running ANALYZE. I have been able to fix this by adding completely new statistics or changing the query (e.g. adding an unnecessary sort or removing a filter that we can apply in the app).
I do understand the value of SQL + the planner for adhoc querying. But so many times I find myself reworking SQL to hint at the planner to use certain indexes, or to add "spurious" filters to make sure an index is used (spurious for application logic reasons). For applications with a relatively low cardinality of queries, some extra tedium might be worth stronger performance property guarantees.
Added advantage of the tedium is it would make a lot of "accidentally quadratic" stuff much clearer. There is no magic in the planner after all!
We can write inline assembler in C or Rust, it feels like it would make sense to offer something similar in SQL. Big problem is how to offer something that isn't too tedious.
Anyways FQL's index model seems to align with my idea, just unsure if I want the rest of it.