I've worked a few jobs now where application data was stored in text files of va...

simonw · 2024-03-11T19:01:18 1710183678

I'm usually the first person to suggest SQLite for just about anything, but in this particular case I do feel like JSON is a better default format.

Interacting with SQLite from different programming languages is easier than most other formats, but you still need a SQLite binding. They're available for every language but that's still a not-completely-trivial dependency.

I expect most tools that people build against JSON Canvas will run in a web browser. Adding SQLite as a dependency means you need SQLite running in WebAssembly - totally possible, and even officially supported these days (the SQLite team run their own WASM builds now) but still a sizable piece of extra complexity over just using JSON.parse(...)

Also: SQLite files aren't very easy to diff, so they're not great for collaboration in version control. JSON is better for that.

I'm 100% with you on the schema changes and versioning challenge. The best way to address this IMO would be for the spec to include a top-level "version": key which indicates the version of the spec that a file was created against.

Handled carefully and introduced right at the start of the project this could ensure an ecosystem grows up around the standard such that older spec versions can always be opened by newer implementations, and any implementation can fail-fast if it is given a file that it doesn't yet know how to handle.

asa400 · 2024-03-11T19:42:40 1710186160

Valid points and I generally agree! I think you're probably right that the SQLite dependency is possibly too much for some applications to pull in, though I will ask why we're so often willing to pull in more or less anything else regardless of weight (DOM manipulation, state management, animation, etc.) but a robust data layer is often a bridge too far. I wish I knew why this is, as it seems to be one of the larger cultural differences between folks who work mainly "on the backend" and "on the frontend".

> Also: SQLite files aren't very easy to diff, so they're not great for collaboration in version control. JSON is better for that.

Yeah, if we're talking about diffing the literal file itself, then that changes things. At that point, we're not just talking about a storage format, we're talking about interchange as well. In that case I'd ask - of course this is application specific, not general - what data are we putting on the wire? Where does that data live?

For example, if the main state state lives in a your browser instance, and you ship updates (i.e., "CREATE", "EDIT", "DELETE" or some such) back and forth between collaborators, then diffing the state of whatever you have is fairly easy, SQLite or JSON or whatever else. But if we're shipping the actual file itself over the wire or attempting to version control it, then you're absolutely right and diffing the SQLite file is inferior.

There are some interesting tradeoffs in this space. This is a fun discussion!

desmondl · 2024-03-11T20:25:04 1710188704

I think you're talking about the trade offs between supporting features like "DOM manipulation, state management, animation, etc." and "shipping updates" out of the box, versus only storing the data as simple files and leaving everything else to the implementation.

Sqlite as an application file format is great [1], but for a knowledge base / note taking app the benefits are not worth the tradeoffs in my opinion.

Sqlite is more performant more performant and provides lots of built-in features. However, most note taking users do not have enough notes or files to benefit from that performance. Sqlite will also lock the user into the application, whereas a "pack of files" can be used in the shell as a text editor. Using markdown files + a open json format has the benefit of being supported by multiple applications (e.g. sometimes i open my obsidian vault in vscode), while a sqlite database would need a proprietary schema coupled with a single application

I prefer an open file format that isn't tied to a vendor. A "data bridge" might handle syncing and diffing more efficiently than plain files, but it is still tied to the vendor. For example, I prefer not to pay for Obsidian Sync, and I'm able to use a git plugin and storing my files on nextcloud to sync between my devices. This leverages existing tech without having to implement features from the ground up

[1] https://www.sqlite.org/appfileformat.html

eviks · 2024-03-12T04:20:19 1710217219

Except the markdown files are tied to a vendor outside of trivial formatting since it's a simplistic underpecified format without extensions, so all these obsidians specify their own extensions to add complexity, which your vscode does not support (and neither would git diff help to see data change in a sea of formatting changes)

And this spec is for complicated layouts, not trivial notes you're comparing it to, so your intuition from simple notes doesn't translate to this use case

asa400 · 2024-03-11T20:59:49 1710190789

> I think you're talking about the trade offs between supporting features like "DOM manipulation, state management, animation, etc." and "shipping updates" out of the box, versus only storing the data as simple files and leaving everything else to the implementation.

I'm not sure I understand. Can you clarify?

> Sqlite is more performant more performant and provides lots of built-in features. However, most note taking users do not have enough notes or files to benefit from that performance.

For a graph with lots (thousands+) of nodes/edges, SQLite is probably capable of being more performant than a JSON file, depending on whatever specific kind of performance we're measuring. That said, to me, the most interesting thing that SQLite gives for applications when compared to flat files is data integrity, via transactions and schemas/constraints. Performance is nice but not even close to the most interesting thing about SQLite in most applications. Performance has never been the reason I've chosen SQLite over flat files for my applications.

> Sqlite will also lock the user into the application, whereas a "pack of files" can be used in the shell as a text editor. Using markdown files + a open json format has the benefit of being supported by multiple applications (e.g. sometimes i open my obsidian vault in vscode), while a sqlite database would need a proprietary schema coupled with a single application

In a sense you're right about this! I'll grant it's easier to open a JSON file in vscode and edit it if you already know vscode and JSON. That said, SQLite is in the public domain with a well-defined, stable format and there are countless free and open source database editors/viewers out there.

SQLite is self-describing, also. You open `sqlite3` and type `.schema` and it shows you the database schema. You enter a query and get some results. It's all right there. So, while a database might have a schema that was designed for a particular application, that doesn't mean you as the end user can't tinker with it, and the number of people who know SQL is rather large.

pastage · 2024-03-12T13:19:33 1710249573

> SQLite is self-describing, also. You open `sqlite3` and type .schema

My feeling is that xml and json is always easier to parse than SQL. What I often see is that a large amount of application logic is hidden in how the tables are joined or selected. In SQL this is almost always the case and of course this is often the case in json as well to some extent but usually alot less.

In the end it comes down to that xml was not the end all of integration formats, neither are its replacements. Actually being able to read and understand what you parse without tooling is an immense help.

apitman · 2024-03-11T20:29:54 1710188994

SQLite is awesome, but it's still an order of magnitude more complex than JSON. If it wants to be the one file format to rule them all, we're going to need high quality and heavily used implementations in most languages.

Adding C sqlite to a golang project adds a significant hit to build times and cross-compilation/static linking complexity[0]. When I looked into the native Go implementations of sqlite I came away with the feeling it wasn't worth the tradeoffs compared to using the C version, but now I still have to deal with the issues above.

I haven't looked deeply into how sqlite works, but my instincts tell me the reason we don't have high quality implementations in every language is because it's actually too complex to treat as a protocol.

I would love to see something fill the void between plain text and sqlite.

[0]: https://www.arp242.net/static-go.html

mlunar · 2024-03-11T23:48:18 1710200898

Check out https://github.com/zombiezen/go-sqlite if you're interested in trying out Sqlite in Go again. Nice interface, negligible compile time impact, fast, compiles without CGO. It's very comfortable.

I agree that going from text to sqlite is a bit of a hurdle, especially if you're not writing C :)

exceptione · 2024-03-11T20:53:55 1710190435

SQLite has no real types. Use SQLite if data quality is of no concern. The applicability domain of SQLite is far smaller than people think.

If you want to improve on JSON, you would have to go into an other direction. Maybe something like postgis would be helpful for extremely large canvasses.

JSON Schema is pretty powerful by the way. Checkout the documentation. SQLite is absolutely no match there. What Sqlite could bring is speed, but I dont see how in the contxt of canvas it would be of any help here.

asa400 · 2024-03-11T21:03:27 1710191007

SQLite's types are quite loose by default, no doubt.

There is a somewhat recent STRICT mode that strengthens them: https://www.sqlite.org/stricttables.html

samatman · 2024-03-11T21:37:17 1710193037

Exactly so. SQLite also has a rich collection of CHECK constraints which can raise errors if data is not to your liking in some fashion, this includes validating JSON. Not a JSON schema, admittedly, although (just like for Postgres) this is available as an extension. https://github.com/asg017/sqlite-jsonschema

Saying that SQLite doesn't have "real types" is simply false. If one doesn't want to learn how to use a tool, blaming it for that failure is poor form.

exceptione · 2024-03-11T22:03:05 1710194585

The author of SQLite is quite open about it. The lack of typing has been part of the design from the beginning. Sqlite has hardly any types:

    INT
    INTEGER
    REAL
    TEXT
    BLOB
    ANY

Of course one can program all kind of check constraints, like one can program all kinds of value validations in javascript.

Unfortunately, that is not the same as typing. Sqlite lacks typing because, as the sqlite author explains in the docs, flexibility is the goal. He continues with "But other developers are aghast", and so strict tables where born, but you can clearly see this cannot overcome real concerns. Try to look for the DATETIME datetype in that list.

Deep bow to sqlite, its design goal was to be the ini file replacement and it has outperformed itself on that one.

Thanks for the extension link. Although constraints are not reuable type definitions, they would still be helpful in this context. Pity that json doesn't have a type for dates, one has to rely on string formats: https://json-schema.org/understanding-json-schema/reference/...

samatman · 2024-03-12T03:45:17 1710215117

> Of course one can program all kind of check constraints, like one can program all kinds of value validations in javascript.

I don't consider this a valid distinction where databases are concerned.

If you define a datatype in an ordinary SQL database, and try to pass it invalid data, it will fail at runtime. How else could it work? There's no compile-time interaction between the value and the database.

If you define a field as BOOLEAN in Postgres, then the value must be 0 or 1, or the database will refuse to write it and return an error. In SQLite this is spelled INTEGER NOT NULL CHECK (col_name = 0 or col_name = 1). More verbose? Yes. Identical semantics? Also yes.

It would certainly be nice if SQLite had a datetime validator that could be used as a check constraint! Hold, up, I got you fam: CHECK(date IS strftime('%Y-%m-%d', date)). If you need a different format, those are,, available.

I guess it has a date type after all! Learn something new every day.

exceptione · 2024-03-12T11:02:33 1710241353

Types are a contract, that both parties can understand. This notion is not captured by operational semantics.

Parties that do not read the database contract will get caught by runtime validation. However, any code that targets the database contract could make use of that contract, e.g. with scaffolding. This might enable tighter integration with type checking in the client program.

What you propose is not a contract, your code doesn't understand it, so now you introduce a new problem. (I think that is why the sqlite author doesn't seem to be too enthusiastic about bolting on strictness checks, as its potential is really limited and it contradicts its design).

wizzwizz4 · 2024-03-12T20:32:58 1710275578

What are operational semantics to you, if not a contract?

asa400 · 2024-03-12T04:24:46 1710217486

Heh, I haven't seen that particular datetime constraint before, thanks for that!

samatman · 2024-03-13T15:48:18 1710344898

If there's a risk of properly-formatted but nonetheless invalid dates, like 2024-03-34, one can do this: DATE(date_column, '+0 days') IS date_column). The '+0 days' causes 2024-03-34 to normalize to 2024-04-03 (this is part of the ISO standard!) and therefore the check fails.

Admittedly these sorts of tricks are obscure, if by 'obscure' we mean "you have to feed a search engine a string like 'Validate SQLite datetime' and read some sources". But to reiterate my point slightly differently, the verbosity of these CHECK constraints doesn't indicate that they're doing anything different from a "typed database".

Out of curiosity, I asked ChatGPT, which got the "well formed" version, when I pointed out it would accept 2023-03-34, it gave a correct explanation of what SQLite would do with that date, and suggested `CHECK(date_column = strftime('%Y-%m-%d', date_column))`, which is more satisfying than the other one, and has the same effect. Really gotta keep an eye on the chatbot.

wizzwizz4 · 2024-03-12T01:44:51 1710207891

Aren't these the types?

  NULL
  INT(EGER)
  REAL
  TEXT
  BLOB
  (INTEGER PRIMARY KEY)

dolmen · 2024-03-13T16:40:37 1710348037

https://sqlite.org/flextypegood.html

chrisweekly · 2024-03-11T19:17:23 1710184643

Crucial context: its provenance is embedded infinite canvas in Obsidian (amazing markdown-based notes app++), which supports JS but has no external datastore per se. SQLite is fantastic, but inappropriate for this use case.

asa400 · 2024-03-11T19:50:09 1710186609

Oh I read the post and a few of the other linked posts so I'm aware. If we're storing a file, we're already storing a file of some kind, so we're just talking about what kind of file it is. I'm just talking about a different set of tradeoffs one can make if that file is something other than JSON.

LordDragonfang · 2024-03-11T20:48:11 1710190091

>If we're storing a file, we're already storing a file of some kind

It's not just a file "of some kind", it's a text file. That's one of Obsidian's key selling points - that a vault is simply a collection of text files. The "text" part is important to both Obsidian's philosophy and the majority of its users.

asa400 · 2024-03-11T21:09:16 1710191356

Sure! And in a lot of ways this is about values more than technology.

toddmorey · 2024-03-11T20:20:31 1710188431

Agree that SQLite is a great local format for this sort of thing, but seems to get tricky when trying to sync across clients without conflicts. I've seen most CDRT schemes working with json documents rather than SQLite, but curious if there's solid conflict-free syncing out there for SQLite files being used inside applications?

mch82 · 2024-03-12T00:01:13 1710201673

The SQLite docs have a great page on the use case `asa400` describes.

SQLite as an Application File Format, https://sqlite.org/appfileformat.html

eviks · 2024-03-12T04:35:52 1710218152

What's the best example of this format used for complex docs, has anyone implemented their OpenDocument replacement idea?

SQLite · 2024-03-12T10:41:25 1710240085

One example: Fossil (https://fossil-scm.org), the version control system used by SQLite itself. Fossil is Git-like in its underlying design but has a different interface. Fossil stores a complete source code repository as an SQLite database, rather than a pile-of-files as Git does. Storing content this way gives Fossil UI advantages over Git, such as the ability to easily find decedents of a check-in, and the ability to assign the same tag to multiple check-ins (ex: tagging every release with the "release" tag.)

dolmen · 2024-03-13T16:36:44 1710347804

SQLite doesn't even have strict validation of columns declared as integers.