You’re totally right, but there are a few things missing: this isn’t a DBMS, really, and the files are not going to be huge.
You need fast listing/pagination, key value get/set, and transactional updates. Basically DynamoDB, but for a single file. Build a query layer on top of that, sure. Use those primitive to build persistent indexes if you want.
Or just iterate through the keys in a for loop. It fits in memory anyway.
You don’t need a fully fledged DBMS for a word document. And if you’re shuffling around lots of data in a structured format with no updates needed, you probably want arrow/parquet rather than sqlite because the read performance is going to crush SQLite.
Ok cool: so adding SQL to that is going to magically speed it up?
No. It’s the on disk format that matters. Because it would be just as slow and scary if it used a sqlite file that was embedded in a zip file or something equally as mad.
It’s not the SQL, it’s the file format.
If you decouple the file format from the SQL engine, it becomes simpler to reimplement, more agnostic and less vendor locked.
SQLite (not just the language SQL) would make it much easier to reimplement in a way that's fast and safe, yes.
> If you decouple the file format from the SQL engine
That alone would be a difficult project. If you really want to break things down, easier to say your document standard relies on SQLite's rather simple query language (https://www.sqlite.org/lang.html), and there, it's independent of SQLite's query planner and file format. Wouldn't be hard to make it work with Postgres or MySQL, for instance.
I’d love to understand your thinking behind the idea that a document standard should rely on a query language and not a file format…
Document standards are file formats…
Or are you saying a document format should just be some DDL statements? What? How is that interoperable? It’s coupled to the database that is storing the data as an implementation detail, which is exactly the problem with using SQLite.
> That alone would be a difficult project
I’m not suggesting using the SQLite file format, I’m suggesting the pretty basic idea that the storage for a general purpose widely used and interoperable document format should be logically decoupled from anything else, and definitely not be tied to the implantation details of a single library or even a single version of that library.
The file format is the most important part. It’s the only part. Nothing else matters because there is nothing else.
> Or are you saying a document format should just be some DDL statements? What? How is that interoperable?
Yes. How is it interoperable, because it's quite easy to make DDL for SQLite that also works for many other DBMSes, given that SQLite is kinda the lowest common denominator of those.
Maybe not as interoperable as ODF since it's easier to implement an ODF parser/writer than a SQLite clone, but probably more interoperable than some kind of advanced ODF designed for efficient updates. Just because you define a standard doesn't mean there are good portable implementations out there.
You need fast listing/pagination, key value get/set, and transactional updates. Basically DynamoDB, but for a single file. Build a query layer on top of that, sure. Use those primitive to build persistent indexes if you want.
Or just iterate through the keys in a for loop. It fits in memory anyway.
You don’t need a fully fledged DBMS for a word document. And if you’re shuffling around lots of data in a structured format with no updates needed, you probably want arrow/parquet rather than sqlite because the read performance is going to crush SQLite.