Nice approach! The latter part of that workflow is where we are focused now, i.e. now that we can plan and recommend B-tree indexes for individual queries, how do we utilize this to recommend a set of indexes for a whole database (taking into account different predicate variations).
On open-sourcing this code: Not at this moment - we do however offer a WASM build of this logic for free use at https://pganalyze.com/index-advisor
> The latter part of that workflow is where we are focused now, i.e. now that we can plan and recommend B-tree indexes for individual queries, how do we utilize this to recommend a set of indexes for a whole database
what I've ended up doing is trying to approximate the number of rows, create dummy data that has a similar spread of data, and create indexes and query until I have a good understanding of exactly how each index behaves with the amount of data. It ends up being a lot of work, but sometimes getting a 50% performance increase is worth that amount of work overall.
> On open-sourcing this code: Not at this moment
completely understandable, the selfish side of me hopes that it will be open sourced eventually, but the practical side of me sees the benefits to your company for keeping it private for now.
those also require real or simulated data to get good recommendations, which is sometimes hard. if you're going to go that far, you might as well create real indexes.
besides, in some environments access to the real data (if there's PII) is much more difficult to get, so simulating is as far as you can go.
not sure if this is where pganalyze is going, if not - it's certainly worth a read. Whatever you guys end up building, it may be a good idea to architect it in a way that makes auto-indexing like in Azure possible.
> This is one of the feature that MS SQL Server has and it works very well. It can suggest indexes with a pretty good correctness.
Not really: it's just a comma-delimited list of columns to consider indexing, sorted by the column order in the table. It just looks scientific to folks because they don't realize that. https://dba.stackexchange.com/q/208947/426
I tried to use your index advisor but lots of functions present in my query are not supported (bool_or, numeric, etc..). Moreover it crashes with big complex query.
I have also tried to setup pg agent but it’s been a few days and still not able to setup log insights for a self-hosted PG (and our pg logs works). Basically we spent our trial just to try to setup the agent . And still not completed.
As a feedback, I think you should invest more in the onboarding phase and not counting days in trial if not logs are collected. Ironically if you record a screencast installing pganalyze on a digital ocean docker would be enough.
Thanks for trying out the pganalyze app! Sorry to hear about the challenges with log monitoring setup - happy to restart the trial and do a screenshare to debug - you can email me directly at lukas AT pganalyze.com
On the index advisor errors: Sounds like you are running into errors with cases where the planner calls functions (e.g. for type conversion, expression simplification). We've recently improved some cases there, but numeric specifically I know there is a pending bugfix. Happy to take a closer look.
The quick summary: Call libclang using the Ruby FFI bindings to parse C code into AST and locate function/symbol definitions, build a tree
of dependent code, remove all code that is not referenced + add a few stubs where there is direct disk access.
pg_query is very nice! But I'm not sure how valuable it is to have the standalone analyzer. Why not just ask to the database? You need the statistics and the catalog to do the math anyway, don't you?
* log the queries from the application
* parse the queries into an AST
* collate the ASTs to form the most common query patterns
* cross-reference existing indexes
* create index recommendations with and without predicates
that worked pretty well, up until you're trying to decide which index type is going to be best.
this is pretty darned neat in comparison. I do hope they open-source it like they did pg_query.