Ws4sqlite: Query SQLite via HTTP

jitl · on March 11, 2022

Interesting! We tunnel SQLite over JSON in the Notion iOS/Android/Mac/Windows apps, and the interface presented here looks pretty similar to our second iteration. Here’s the things I think are missing:

- Precondition. One or more statements to run to check if the transaction body can proceed. We use this to check pragma user_version which we use to mark schema version level. Prevents old processes from writing to the DB once another process has started a forward migration. You can also use it to assert about transaction state or disk space remaining.

- Userspace transaction batching. We found some use-cases (like as part of migrations) where we want to control transactions explicitly. Forcing each batch into a transaction is a good default, but ideally the JSON api doesn’t restrict any DB semantics.

- Websockets! I thought “ws” could mean websocket but web server is fine too. Websockets work better for our low latency local-only connection use case, and have a more traditional socket-pool vibe. They also avoid retransmitting headers etc, and can allow the server to better understand client “sessions”.

- Extensions. I guess portability is a fine goal, but there’s so much great stuff in the extensions. Full text search, JSON support (although the latest version mainlined it), bulk data load. The main one for us is JSON.

mano78 · on March 11, 2022

Thanks for the comment. Some great points here.

- Precondition: this is a goal, once I find a good balance between allowing to do things and not to reinvent a scripting language. But yes, it's on the roadmap.

- Userspace transaction batching: interesting. I don't feel comfortable about letting a transaction "escape" the request, I don't want to introduce timeouts, sessions or such. Could you state an use case? This could be a goal.

- Uh, websockets are promising. Thanks!

- Extension: basically, the point here is that Go doesn't allow for a "clean" cross-distribution compilation if I enable them, and I cannot chase all the possible glibc configurations. But it's an open point for sure.

Isn't JSON included in the "main trunk" now?

jitl · on March 11, 2022

Here's our precondition, it expects a single row with precondition_result to be 0/1. You could add an error reporting string column to the return type, but that's about all we needed. Very basic:

    SELECT
     CASE user_version
     WHEN ${schema.pragmas.user_version} THEN 1
     ELSE 0 END AS precondition_result
     FROM pragma_user_version() LIMIT 1

On extensions: Yes, JSON is included in trunk. Is the latest trunk included in your build? Or are you linking against the platform's sqlite3? At least for the built-in extensions, you can pre-build an amalgamation .c file that contains all the first-party extensions with `make sqlite3.c` and a few env vars. But maybe you're using a pre-built dependency ¯\_(ツ)_/¯.

One extension we like is LIMIT for update/delete. I believe (at least in the version of SQLite we build with), that we need to enable that one when producing the amalgamation.

On userspace transaction batching: our usecase is internal inside an app. When we built the second version of our SQLite bridge, we were replacing a safe but very rigid API, so we decided to build something with as few limitations as possible. The main use-case today is batching multiple transactions into a single IPC message/JSON request object. But, for our platforms, there's exactly 1 client of the JSON bridge at a time, and we know out of band if that client dies or resets -- so we can terminate hanging transactions appropriately. You could also add an assertion that a batch never leaves a transaction open.

For an HTTP use-case, perhaps not a great idea because of multiple clients, sessions, etc. Maybe something to consider for the web socket API, which would give you a protocol session concept to reason with.

mano78 · on March 12, 2022

Sorry, forgot to comment: SQLite is currently 3.38.0, see https://github.com/mattn/go-sqlite3/blob/master/sqlite3-bind... I don't package it directly, but via the go bindings by mattn (go-sqlite3).

mano78 · on March 11, 2022

thanks for your comments, really interesting. I'll come back to this post for ideas ;-). I like the precondition, maybe I'd like to structure it so that it passes when 1+ rows are returned and fails if 0 rows... should be equivalent and doesn't make any assumption on rows _and_ columns being there... but I'll think about it. Thank you again!

psanford · on March 11, 2022

You can also query sqlite databases directly from an http server that supports range requests (like s3). There are a bunch of implementations of this in different languages including Go[0] and Javascript[1].

[0]: https://github.com/psanford/sqlite3vfshttp

[1]: https://github.com/phiresky/sql.js-httpvfs

michalc · on March 11, 2022

And Python - https://github.com/michalc/sqlite-s3-query

(although this is specific to S3 rather than generic HTTP)

(shameless plug - made mostly by me)

mano78 · on March 11, 2022

Author here. This is a pet project of mine, and what a surprise to see on HN! :-) Your discussions on PostgREST and on using SQLite for different use cases are what motivated me for this. So thanks!

yohannparis · on March 11, 2022

This is an amazing project! I always wanted to leverage sqlite behind a REST API, this will make my life easier. Massive Kudos to the well-written info/documentation full of useful links to dive deeper into security and best practices about databases.

mano78 · on March 11, 2022

Thanks a lot!

le-mark · on March 11, 2022

How do you handle multiple readers and writers? Have you benchmarked performance? How is authentication and authorization handled?

mano78 · on March 11, 2022

Performances here https://germ.gitbook.io/ws4sqlite/features/performances, multiple read and writes are serialized under the hood, but concurrency is allowed if the db is in read only mode and WAL is not disabled.

Security discussion here https://germ.gitbook.io/ws4sqlite/security and here https://germ.gitbook.io/ws4sqlite/features#security-features

In particular, authentication https://germ.gitbook.io/ws4sqlite/documentation/authenticati...

marktangotango · on March 11, 2022

In my testing of a similar project I found that WAL and multiple connections could still get errors with concurrent writes, what I found was that a single writer with multiple readers performed extremely well. Have you seen anything similar?

mano78 · on March 11, 2022

I didn't find any concurrency issue; it isn't based on sqlite alone, the underlying layer (go-sqlite3) has a connection pool that I set to 1. I have a test case on this, and it works fine (as far as I can see). If you care to take a look at it, I'd like to have your opinion.

About performances: I was blown out, yes. SQLite is an amazing piece of software, I just had to do my stuff while treating it with respect :-).

bullen · on March 11, 2022

Or you can just use my HTTP native database: http://root.rupy.se

A complete JSON file database over HTTP in 200 lines of code.

mano78 · on March 11, 2022

(author) Different use case, but thanks, I'll take a look

SahAssar · on March 11, 2022

This is just sql in json, I prefer mapping SQL semantics onto HTTP semantics like postgrest does. Not aware of any project that does that for sqlite.

RedShift1 · on March 11, 2022

https://github.com/bradleyboy/tuql converts an sqlite database to a GraphQL API

taway1982345 · on March 11, 2022

In a short time i'll release exactly that, postgrest semantics backed by sqlite :)

jbverschoor · on March 11, 2022

No batches there. And only simple queries...

> For more complicated filters you will have to create a new view in the database, or use a stored procedure

SahAssar · on March 11, 2022

You can do bulk/batch inserts, and because of the single request = single query behavior you can much more easily cache responses.

It's a tradeoff, but ws4sqlite just sees HTTP as a tunnel for SQL-in-json while postgrest tries to represent SQL in REST. Which you prefer is a matter of taste.

mano78 · on March 11, 2022

(author) Yes, it's a tradeoff. ws4sqlite was meant to give an unstructured way to access sqlite, that cannot be accessed on the net; while I see PostgREST (a wonderful product that I also use) as a way to implement persistence for a model you already have in your application. They're not "direct competitors", as I see it, and PostgREST is much more ambitious (and complex, arguably).

hobo_mark · on March 11, 2022

What does that look like?

SahAssar · on March 11, 2022

See here: https://postgrest.org/en/stable/api.html

Cthulhu_ · on March 11, 2022

I would just like to warn people that you shouldn't use this in any production or client-facing app, it's trivial for a malicious actor to change the SQL and drop your database or whatever.

If you need to access a sqlite database remotely, try https://github.com/rqlite/rqlite, it offers clustering on top of a basic SQLite database as well as a HTTP(s) API.

mano78 · on March 11, 2022

Author here. Thank you for this comment! Yes, it's probably a bad idea at this stage of maturity; there are security provisions put in place, please see https://germ.gitbook.io/ws4sqlite/security and of course never do anything of the sort without a reverse proxy. If you care to read it, I'd be interested in discussing things further.

ciupicri · on March 11, 2022

If I'm not mistaking rsqlite is a different beast, it's a RDBMS which happens to use SQLite for storage. You can't just use it to share over the Internet a live database while it's being used by another regular sqlite application.

otoolep · on March 11, 2022

rqlite author here. Yes, that is a correct statement about rqlite.

https://github.com/rqlite/rqlite/blob/master/DOC/FAQ.md

ComodoHacker · on March 11, 2022

Sending arbitrary SQL for remote execution feels very much like a remote shell, and can quickly turn into one with a misconfigured or buggy extension.

mano78 · on March 11, 2022

Not much different than a remote invocation of any other database (ok, this is more "tool-friendly" so it may open up to more attacks). But your point stands.

throwaway984393 · on March 11, 2022

Not using a version in the request, or response, or URL? That's a paddlin'.

mano78 · on March 11, 2022

Very true, thanks. It's on my to-do list.

deutschew · on March 11, 2022

this reminded me of that p2p sqlite project...i can't recall the name but seems like something like this that runs on mesh networks would be a game changer

you could create databases that can't ever be taken down

raihansaputra · on March 11, 2022

This one? https://news.ycombinator.com/item?id=30605252

Something very interesting is happening with SQLite. Has always been widely adopted (in the background?), but lately there's a lot of usecase development. Litestream for no-hassle replication (and live-read replicas coming), querying static DB with VFS, torrent fetching, and now no-setup DB to API.

deutschew · on March 11, 2022

this might be the true decentralization platform that will take off. no merkle tree non-sense.

although gun eco seems to be already quite popular, its a graph DB which makes it tough to use.

raihansaputra · on March 12, 2022

I definitely prefer this flavor of decentralization. Yeah not having to learn Graph DB would be a big selling point.

deutschew · on March 12, 2022

Ironically one thing I reach for with this is how to achieve immutability with sqlite that is decentralized? Can't anybody and anyone read and write? or is it possible with another column that references the previous column with a cryptographic keys from multiple peers....my god we have blockchain on sqlite

mftb · on March 11, 2022

This is the most interesting project I have seen in a while. Well done!

notorandit · on March 11, 2022

What about concurrency?

mano78 · on March 11, 2022

Hi! It doesn't add much to what SQLite allows. It's allowed if database is set as read only and you don't opt-out of default WAL mode; if not, the transactions are serialized. Even so, you rarely notice it, because SQLite is quite fast.

Concurrency at the HTTP layer is permitted, though. Basically what is done is to set 1 max connection on the connection pool.

benatkin · on March 11, 2022

Looks like a clean API.