For something less PoCy, https://www.sourcetrail.com/ 's internal representation of the reference graph is a sqlite db file with pretty much a triple store schema.
The idea for doing this experiment was actually that I tried times and times again to produce a SourceTrail graph for the LLVM codebase but always failed due to one reason or another. Then I discovered that they provided the gRPC interface to their clangd index and I came up with this.
EDIT: also, doing things this way you don't need to reimplement C++ indexing because you can leverage the existing clang features
Man. I love SQLite, but the current virtual table extension is a performance dumpster fire. SQLite doesn’t understand multiple-column indexes on virtual tables, and the secret sauce to make it pick the “best” index is found only in the Necronimicon. Inevitably, with even fairly trivial joins, SQLite bails out to a polynomial sequential scan.
I don't know if you're serious, but the `xBestIndex` function is supposed to be a literal description of the size & efficiency of the various tables. Lets say you have four tables: A,B,C are indexed (log access); table D is linear scan. Then, the `xBestIndex` function should return log(num-rows(X)) for A, B, and C; it should return num-rows(X) for D.
The issue is that SQLite considers the entire table when doing query plans, rather than the specific query that's about to be performed; this means if D is especially short, then it'll choose D as the "driver" table, and then linearly scan A, B, and C. This is not the behavior is uses for its own internal tables. Instead, internal tables are log-scanned based off the "best" table.
I suppose, what I'd really like is a strong guarantee that the plans SQLite compiled always used the index. I understand that there's N! possible plan orders for a join, so we can't consider all orders, but whatever mechanism is exposed through `xBestIndex` is just bonkers bad.
I tried doing this using prolog last summer to extract some features from a codebase. I loved it. Being able to query a codebase like a database is extremely useful.
Pretty cool! One question though: if this was based on LSP in general, it could be generalised to any language, right? I wonder why they wired it to clangd specifically.