Cloud Backed SQLite

noman-land · on July 6, 2023

I've just been exploring serving large SQLite databases in chunks and querying them with http range requests to prevent downloading the entire database. It's pretty awesome!

I found a really interesting library called sql.js-httpvfs[0] that does pretty much all the work. I chunked up my 350Mb sqlite db into 43 x 8Mb pieces with the included script and uploaded them with my static files to GitHub, which gets deployed via GitHub Pages.[1]

It's in the very rough early stages but you can check it out here.

https://transcript.fish

I recommend going into the console and network tab to see it in action. It's impressively quick and I haven't even fine-tuned it at all yet. SQLite rules.

[0] https://github.com/phiresky/sql.js-httpvfs

[1] https://github.com/noman-land/transcript.fish

dopamean · on July 6, 2023

The timing of this is incredible. I was recently given a 120gb SQLite db to work with and I was going to investigate ways to work with it in the morning.

danielvaughn · on July 6, 2023

I’m not sure what they offer on this front, but turso is a new edge-based SQLite platform. It might be worth checking out:

https://turso.tech/

OJFord · on July 8, 2023

The current pricing is free for 8GB (10GB or 1B rows read /month) or $3k/month unlimited? So not really possible to start relying on it for anything serious yet (except really serious).

The middle tier isn't even fully described yet, but does say 20GB limit (and from $29/month) - I suppose that means that's a base from which you'll be able to pay for more storage as needed.

dopamean · on July 6, 2023

Thank's! I'll have a look at this as well.

gunapologist99 · on July 7, 2023

FYI, it's using the libsql fork of sqlite instead of sqlite.org itself.

punnerud · on July 6, 2023

Like this? Serving Wikipedia as a static SQLite file, using range request: https://news.ycombinator.com/item?id=27016630

codethief · on July 6, 2023

Fantastic, thank you! I came here to find this link. :)

noman-land · on July 6, 2023

Yes, this is the post I learned about this from and that's the person who wrote the sql.js-httpvfs library I'm using. Extremely cool.

psanford · on July 6, 2023

I often use (my[0]) sqlite http vfs to query databases stored in s3. You can simply make a presigned url and then give that to the vfs and then it all just works.

[0]: https://github.com/psanford/sqlite3vfshttp

pyeri · on July 6, 2023

>> querying them with http range requests to prevent downloading the entire database. It's pretty awesome!

Querying isn't a problem, you can query as much as you want. But where you'll hit the Sqlite limitation is in scaling for multi user write scenarios. Yes, even Sqlite can handle a few concurrent write requests but once your users start scaling in millions, you'll eventually need a proper RDBMS like mysql or postgres.

noman-land · on July 6, 2023

I've been thinking about this a lot. In my particular use-case I can get away with having a totally read-only app.

However if I want people to be able to submit corrections to the transcriptions, I need a way to do that. I was thinking of setting up some sort of queue system to make writes non-realtime since, for this app, it's not a big deal.

It will be in interesting challenge, for sure.

vidarh · on July 6, 2023

sql.js-httpvfs is a read-only solution. The point is to be able to put up pre-baked datasets on any random web server w/range support, or a blob store.

bakkoting · on July 6, 2023

I've been using sql.js-httpvfs to get full-text search on a static Github Pages site [1]. (Just make sure you have fts5 enabled on your database.) It works astonishingly well.

[1] https://matrixlogs.bakkot.com/WHATWG/search?q=database

noman-land · on July 6, 2023

This is awesome. Maybe you can help me figure out the best way to search. The transcriptions are one row per word since each word is timestamped. I'm currently just doing client side search in JS for the titles and descriptions but I'll have to do proper SQL search for the body of the transcriptions. What's the best way to search for strings of text if the db is one row per word (~4 million rows)?

electroly · on July 6, 2023

In theory you can add an index on the token text column and then implement full text searching yourself using the index to find the set of documents containing each search term, counting the occurrences per document, and combining the results from all the terms. But FTS5 already does that for you, and it's more efficient than the above.

The traditional FTS5 technique as documented [0] should work, same as if this were a local SQLite database. You'll have to duplicate the transcriptions into TEXT form (one row per document) and then use FTS5 to create a virtual table for full text searching. The TEXT form won't actually be accessed for searching, it's just used to build the inverted index.

[0] https://www.sqlite.org/fts5.html

bakkoting · on July 6, 2023

I don't know enough about how FTS works to help, I'm afraid. I don't know if it's possible to search across multiple rows.

Were I in your position I'd just create a copy of the DB with the full transcript in each row and run the search against that. If you only have 4 million words, creating an extra copy of the database shouldn't be prohibitively large.

zX41ZdbW · on July 7, 2023

I recommend checking a similar project - storing ClickHouse databases on GitHub pages for serverless analytical queries: https://github.com/ClickHouse/web-tables-demo

eterevsky · on July 6, 2023

I don't know much about SQLite internals, but on the face of it it sounds hacky as hell (pardon if I'm wrong).

Wouldn't it be better to make a proper client-server API similar to traditional SQL databases, but on top of SQLite?

ComodoHacker · on July 6, 2023

The point is to use dumb cheap blob storage instead of CPU-consuming server someone has to manage.

antonvs · on July 6, 2023

There are many use cases where scaling a traditional centralized SQL database is problematic, that can be addressed by something like this.

We have one: we run thousands of VMs at a time, all accessing the same "database". Since we already have a good amount of horizontally-scaled compute, having to maintain a separate database cluster, or large vertically-scaled database instance, to match our peak load requirements is problematic in terms of one or more of cost, complexity, and performance. In particular, horizontally-scaled distributed databases tend not to scale up and down efficiently, because of the complexity and time involved in joining the cluster, so the cost benefits of horizontal scaling of compute are lost.

An approach like this can fit well in cases like these.

eterevsky · on July 6, 2023

I can't see how this could for a readonly database, but how would you resolve collisions on writes and don't make them super slow in the process?

antonvs · on July 12, 2023

Right, it certainly wouldn’t work for a big multi-tenant db. But the right data organization can make it work.

In our case, the data is organized such that there’s usually only one client at a time that would need to write to a given db file. Think for example of a folder per user, each containing a db file and various other files for that user. What we’re doing is actually even more granular than that - we have a folder for each object that a user could be working on at a given time.

phamilton · on July 6, 2023

Databases are really just data structures and algorithms along with some notion of durability.

Client/Server databases are just remote data structures. (E.g. Redis is short for "Remote Dictionary Server")

Sometimes you want your data structures and algorithms to run locally. Could be performance, privacy, cost, or any number of reasons.

Local, in-memory data structures hit a few bottlenecks. First, they may not fit in memory. A mechanism for keeping the dataset in larger storage (e.g. disk) and paging in the necessary bits as needed extends the range of datasets one can comfortably work with locally by quite a bit. That's standard SQLite.

A second potential bottleneck to local data structures is distribution. We carry computers in our pockets, on our watches, in our cars. Delivering large datasets to each of those locations may be impractical. Cloud based VFS allows the benefits of local data structures on the subset they need without requiring them to fetch the entire dataset. That can be a huge win if there's a specific subset they need.

It always depends on the use case, but when the case fits there are a lot of big wins here.

eterevsky · on July 6, 2023

One obvious problem that I see with this approach is that it will break if there is any change in the storage format.

With client-server architecture, the server code owns the data format, while in this storage-level remote access case, you have to ensure that all of your clients are updated simultaneously. Depending on your architecture it might or might not be feasible.

vidarh · on July 6, 2023

For the typical uses for this, you'd tend to serve up a version of Sqlite compiled to wasm or JS to the frontend, so you can be sure it's the same one. Sqlite's storage format is also unusually stable:

https://www.sqlite.org/formatchng.html

zX41ZdbW · on July 7, 2023

It's better to ensure 100% compatibility with the data format so the new server versions can read the old format without conversions. For example, in ClickHouse, you can install the latest version over a version from 7 years ago, and it will work just fine with the old MergeTree tables.

electroly · on July 6, 2023

The VFS layer in SQLite is begging to be used like this. For a read-only implementation (where you generate the SQLite databases out-of-band and upload them), the code nearly writes itself. It's maybe hacky in the sense of "I had to write a SQLite module for this" but it's a very nice fit for SQLite VFS.

noman-land · on July 6, 2023

Yes this code was very straightforward to write and this approach lends itself very well to read-only databases of large datasets with no server infrastructure overhead. It's also "free" to host for the time being.

qbasic_forever · on July 6, 2023

Depends on your needs, if it's just one or even multiple clients reading from the db then range requests of blocks is a great option. Adding a server layer is a huge leap in complexity as you now have code that has to be deployed, managed, secured, etc. vs. just simple blob storage for chunks (s3, etc.).

anon291 · on July 6, 2023

It's not hacky though. SQLite explicitly supports VFS implementations, and it's in wide use throughout industry. All this does is use emscripten + asyncify to compile to javascript and then implements a vfs that calls into regular javascript calls. SQLite is notoriously well tested, and so if it says that the VFS works a certain way, it probably does.

sitkack · on July 6, 2023

How so? It’s not like the CPU or the network care.

eterevsky · on July 6, 2023

It's difficult to implement correctly if there's more than one client. There is no single source-of-truth to manage any conflicts. Might be ok if the clients have only read-only access.

pama · on July 7, 2023

Totally minor but this website does not show anything in lockdown mode on the iPhone.

noman-land · on July 7, 2023

Interesting, thanks for the tip! I've never heard of lockdown mode. Unfortunately I don't own an iPhone. Any chance you could post an issue with steps to repro?

https://github.com/noman-land/transcript.fish/issues

Thank you!

snowstormsun · on July 6, 2023

Wow, that's so cool!

6510 · on July 6, 2023

it badly needs {cursor:pointer} for the things one can click on.

noman-land · on July 6, 2023

Yes, indeed! This has been bugging me and I've fixed it. Thank you.

0xbadcafebee · on July 6, 2023

This adds more cache consistency issues, concurrency issues, network blocking issues, authn+z issues, daemon process supervision issues, ... for what? To store your SQL data in a remote data store? I would rather my app shell out to gsutil (or curl!) than deal with all of this.

Simple hack: mount a tmpfs filesystem and write your sqlite database there. Every 30 seconds, stop writing, make a copy of the old database to a new file, start writing to the new file, fork a process to copy the database to the object store and delete the old database file when it's done. Add a routine during every 30 second check to look for stale files/forked processes.

Why use that "crazy hack", versus the impeccably programmed Cloud Backed SQLite solution?

- Easier to troubleshoot. The components involved are all loosely-coupled, well tested, highly stable, simple operations. Every step has a well known set of operations and failure modes that can be easily established by a relatively unskilled technician.

- File contents are kept in memory, where copies are cheap and fast.

- No daemon outside of the program to maintain

- Simple global locking semantics for the file copy, independent of the application

- Thread-safe

- No network blocking of the application

- Authentication is... well, whatever you want, but your application doesn't have to handle it, an external application can.

- It's (mostly) independent of your application, requiring less custom coding, allowing you to focus more on your app and less on the bizarre semantics of directly dealing with writes to a networked block object store.

CraigJPerry · on July 6, 2023

>> Every 30 seconds, stop writing, make a copy of the old database

That sounds so simple that i almost glossed over it. But then the alarm bells started ringing.

If you don’t need to collaborate with other people, this kind of hacking is mostly fine (and even where its not fine, you can always upgrade to using filesystem snapshots to reduce copy burden when that becomes an issue and use the sqlite clone features to reduce global lock contention when you grow into that hurdle etc etc) but if you need to collaborate with others, the hacky approaches always lead to tears. And the tears arrive when you’re busy with something else.

chasil · on July 6, 2023

If you really need to collaborate with other people, then you likely want something that implements GRANT and REVOKE.

sitkack · on July 6, 2023

Everything it’s just a URL you already have Grant and revoke

vasco · on July 6, 2023

> ... for what?

Backups, replication to different regions, access to the data etc become standardized when it's the same as any other cloud bucket. This makes compliance easier and there's no need to roll your own solution. I also never had to troubleshoot sqlite, so I'd trust this will be more robust than what I'll come up with, so I don't get your troubleshooting argument.

Not everyone will care about this, but those can just not use it I guess.

beebmam · on July 6, 2023

Or just use a database like Postgres. It's a one liner to download and start up a postgres container.

chaxor · on July 6, 2023

Is it really now? I have heavily preferred SQLite to postgres after having nightmares with it many years ago. I preferred SQLite because I often needed a single 1TB file to hand someone to get started on a project with the DB, which seemed far more complex with postgres. There were a ton of steps required to get things set up, just with the installation and authentication alone. I recall needing to make a few users and passwords and specify which tables were allowed for different users, etc. It was far, far too complex for just storing data. Multiple users are extremely uncommon for data analysis, and hiding certain tables isn't really needed most of the time.

I know it does have it's use cases, but if you don't need access control and more complexities, postgres (at least then) seems like so much hassle.

If it's better now perhaps I may try it, but I can't say I have high hopes.

supportlocal4h · on July 6, 2023

If you had to make a few users and passwords and specify which tables were allowed for different users it is only because you chose to design your app this way. You must have felt there was some good reason to divide it up like that.

If that was the case, sqlite would have been unsuitable for your needs.

In other words, the complexity you describe was not caused by postgres. It was caused by your app design. Postgres was able to accommodate your app in a way that sqlite cannot.

Sqlite does have the "it's all contained in this single file" characteristic though. So if and when that's an advantage, there is that. Putting postgres in a container doesn't exactly provide the same characteristic.

sauercrowd · on July 6, 2023

Using the docker container is a breeze - add a POSTGRES_PASSWORD env variable and you're all set. I'd be curios how it performs for a TB of data but I would be surprised if it straight out breaks.

https://hub.docker.com/_/postgres

djbusby · on July 6, 2023

It works fine, even w/larger data. The data is stored on a bind-mount volume on the host; not in the container.

bshipp · on July 6, 2023

SQLite can be awesome for huge data sets as I've rarely found anything that can ingest data as rapidly but, as with any database, it requires some specific tweaking to get the most out of it.

The biggest headache is the single write limitation, but that's no different than any other database which is merely hidden behind various abstractions. The solution to 90% of complaints against SQLite is to have a dedicated worker thread dealing with all writes by itself.

I usually code a pool of workers (i.e. scrapers, analysis threads) to prepare data for inserts and then hand it off for rapid bulk inserts to a single write process. SQLite can be set up for concurrent reads so it's only the writes that require this isolation.

chaxor · on July 7, 2023

This is the way. It would be nice to have a very simple APi that handles the multiprocessing of data for ETL-like pipelines and serializes to one write thread out there. There are a few that get close to that area but I haven't seen any that really nail it.

fauigerzigerk · on July 6, 2023

Is cloud storage authnz is any less convoluted? I realise you're responding only to the "one liner" argument, but as soon as you do anything with SQLite that involves the cloud or authnz or secondary processes, you're right back to where you were with client/server database systems in terms of complexity.

chaxor · on July 7, 2023

I never touch cloud dev, so it's a bit out of the realm I was discussing, but yes my understanding is that postgres is more geared towards companies trying to serve millions of simultaneous users or something. I haven't seen it as very useful simply because I don't develop in that space.

aembleton · on July 6, 2023

docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -p5432:5432 -d postgres

justinclift · on July 7, 2023

There is an occasional downside with the Postgres container though... no automatic upgrade path between major versions:

https://github.com/docker-library/postgres/issues/37

So, if you want to keep on a supported version of PostgreSQL then every few years you'll need to figure something out.

Hopefully that gets fixed at some point though.

My in-development "automatically upgraded PostgreSQL" container is about ~400MB though, which is ~200MB more than the standard one. For my use-cases the extra space isn't relevant, but for other people it probably would be.

justinclift · on July 12, 2023

Just in case anyone here stumbles over this and is interested, there's now a PostgreSQL docker image that automatically upgrades the PostgreSQL data files before launching the server:

https://hub.docker.com/repository/docker/pgautoupgrade/pgaut...

ElectricalUnion · on July 6, 2023

> Every 30 seconds, stop writing, make a copy of the old database to a new file

I hope that by "copy of the old database" you mean "Using the SQLite Online Backup API"[1].

    This procedure [copy of sqlite3 database files] works well in many scenarios and is usually very fast. However, this technique has the following shortcomings:
    * Any database clients wishing to write to the database file while a backup is being created must wait until the shared lock is relinquished.
    * It cannot be used to copy data to or from in-memory databases.
    * If a power failure or operating system failure occurs while copying the database file the backup database may be corrupted following system recovery.

[1] https://www.sqlite.org/backup.html

gregwebs · on July 6, 2023

If I am reading this right the main difference is you are copying the entire db whereas the linked version can copy just the changed blocks. Also in your solution it seems 30 seconds of data loss is expected although that could be avoided by using cloud disk instead of tmpfs and then you could take snapshots as well

throwawaaarrgh · on July 6, 2023

Oh I see, it has weird independent block file semantics (weird because they aren't reused, unless they are?). That's interesting, although potentially problematic, with the whole "I'll just keep adding files, unless I remember to cleanup, which could cause problems" thing

If you can't handle 30 seconds of data loss you should probably be using a real database. I could be wrong but I don't remember this guaranteeing every transaction synchronously committed to remote storage?

There's a lot of caveats around block store restrictions, multi client access, etc. It looks a lot more complicated than it's probably worth. I'll grant you in some cases it's probably useful, but I also can't see a case where a regular database wouldn't be more ideal.

electroly · on July 6, 2023

This module DOES guarantee that transactions are synchronously committed to remote storage before completing the transaction. The transaction can be rolled back if uploading to remote storage fails; this is mentioned in the documentation. Ctrl+F "rolled back" in the doc. Doing it asychronously would be more complicated and this is a pretty simplistic module.

p-e-w · on July 6, 2023

I don't understand what the goal is here. Wherever the cloud is, there is PostgreSQL, and conveniently, PostgreSQL already solves all of the problems that "Cloud SQLite" creates, with support for concurrent access, higher performance, richer queries, and lots more.

Is this just a toy? What are the upsides in practice of deploying an embedded database to a cloud service?

electroly · on July 6, 2023

Hosting files in S3 is much, much cheaper than running a live instance of PostgreSQL. Incomparably so. We do it specifically to move large, mostly cold, read-only data out of the main RDBMS and into S3 in a way that it can still be queried on-the-fly* without having to be fully downloaded and reconstituted on a server first. Works great and we get to pay S3 prices for storage and nothing for compute. As a kicker, because it's S3 we can access it from any availability zone without paying for cross-AZ data transfer. For the size and coldness of this data, the juice is worth the squeeze. None of the downsides matter for our use case at all.

* Using indexes and only touching the pages you need. This is specifically much better than S3 Select, which we considered as an alternative.

eb0la · on July 6, 2023

"Much cheaper" means is that S3 costs about $23 / TB OTH a db.t4g.small instance with 2 vCPU, 2 GB, and 1 TB is about $260 + cpu credits...

electroly · on July 6, 2023

On the S3 side, you need to factor in Intelligent Tiering. We're not paying S3 Standard prices for this--recall that this is a mix of mostly cold data. It's a 10x difference per GB on average between EBS and our mix of S3 Intelligent Tiering.

Add in that your RDS instance needs to be high availability like S3 is (and like our RDBMS is). That means a multi-AZ deployment. Multiply your RDS cost by two, including the cost of storage. That still isn't as good as S3 (double price gets you a passive failover partner in RDS PostgreSQL; S3 is three active-active AZs), but it's the best you can do with RDS. We're north of $500/month now for your example.

Add in the cost of backups, because your RDS database's EBS volume doesn't have S3's durability. For durability you need to store a copy of your data in S3 anyway.

Add in that you can access S3 without cross-AZ data transfer fees, but your RDS instance has to live in an AZ. $0.02/GB both ways.

Add in the personnel cost when your RDS volume runs out of disk space because you weren't monitoring it. S3 never runs out of space and never requires maintenance.

$500/month for 1TB of cold data? We were never going to pay that. I won't disclose the size of the data in reality but it's a bunch bigger than 1TB. We host an on-prem database cluster for the majority of things that need a RDBMS, specifically because of how expensive RDS is. Things probably look different for a startup with no data yet, blowing free AWS credits to bootstrap quickly, but we are a mature data-heavy company paying our own AWS bills.

As a final summary to this rant, AWS bills are death by a thousand papercuts, and cost optimization is often a matter of removing the papercuts one by one. I'm the guy that looks at Cost Explorer at our company. One $500/month charge doesn't necessarily break the bank but if you take that approach with everything, your AWS bill could crush you.

antonvs · on July 6, 2023

And if you need to scale your DB its price goes up, while the cloud storage price remains the same.

DeathArrow · on July 6, 2023

> Hosting files in S3 is much, much cheaper than running a live instance of PostgreSQL.

If your app is running on premises and the DB is on cloud, you can also move the DB on premises so the costs are lower.

If your app runs on cloud, too, then you already are paying for the cloud compute so you can just fire up an VM and install Postgres on that.

stavros · on July 6, 2023

If your app is a static JS file, now your DB can also be static.

electroly · on July 6, 2023

Indeed, our RDBMS is on-prem; we'd never actually use RDS. This data's pages were hogging the cache leading to reduced performance for other unrelated tables, and SAN storage isn't particularly cheap or flexible. We wanted to get it out of our on-prem RDBMS. If you're over a hundred billion rows, it's time to think about whether your data belongs there. Maybe it does, maybe it doesn't. This data had an alternative and I took it.

> If your app runs on cloud, too, then you already are paying for the cloud compute so you can just fire up an VM and install Postgres on that.

This part doesn't make sense. If your app is on the cloud, you're paying for the cloud compute for the app servers. "Firing up a VM" for PostgreSQL isn't suddenly free.

infamia · on July 6, 2023

You can absolutely do all of those things, but there is an intrinsic cost for someone to configure, manage, and monitor those things. SQLite will (generally speaking) have far less management overhead because of its relatively limited surface area (e.g., there is no database service).

qaq · on July 6, 2023

This highly depends on app access patterns

electroly · on July 6, 2023

Absolutely. I’ll go further and say that you must specifically design the database schema knowing that it’s going to be used this way. Your pages need to be packed full with the data you need and nothing you don’t. Spatial locality matters bigtime since “seeks” are so expensive (additional S3 requests), when in a traditional db it matters much less. Wide tables with a lot of columns that might not be used in a query are a bad idea here.

Here’s an in-the-weeds tip for anyone attempting the same: your tables should all be WITHOUT ROWID tables. Otherwise SQLite sprays rows all over the place based on its internal rowids, ruining locality when you attempt to read rows that you thought would be consecutive based on the primary key.

wcedmisten · on July 6, 2023

A few days ago, I tried to use the linked library (sql.js-httpvfs) for a graph network visualization, which went about as well as you'd expect given the poor spatial locality. Do you have any tips for optimizing spatial locality with more complex queries? Can you manually cluster data for some given properties in SQLite?

For my project I ended up just exporting the graph edges as JSON, but I'm curious if it would still be possible to make work.

electroly · on July 6, 2023

In a WITHOUT ROWID table, you have control over the order of the rows. Make an id column as your primary key, and set the id appropriately so that rows accessed together will be next to each other in the file. This is how I manually cluster the data.

Aside from that, I use an entity-attribute-value model. This ensures that all the tables are narrow. Set your primary key (again, with WITHOUT ROWID) to put all the values for the same attribute next to each other. That way, when you query for a particular attribute, you'll get pages packed full with nothing but that attribute's values in the order of the IDs for the corresponding entities (which you manually clustered).

It's worth repeating one more time: you must use WITHOUT ROWID. SQLite tables otherwise don't work the way you'd expect from experience with other DBs; the "primary" key is really a secondary key if you don't use WITHOUT ROWID.

wcedmisten · on July 6, 2023

Thanks for the info! In my case there's not really one primary key that would guarantee good clustering for my query, so I guess there's not much that can be done to optimize here.

I'm trying to find all the ancestors of a node in a DAG, so the optimal clustering would vary depending on the node I'm querying for

p-e-w · on July 6, 2023

> I’ll go further and say that you must specifically design the database schema knowing that it’s going to be used this way.

If it ever turns out, at some point in the future, that you do need features from a standard RDBMS after all, you are going to regret not using Postgres in the first place, because re-engineering all of that is going to be vastly more expensive than what it would have cost to just "do it right" from the start.

So it seems that Cloud SQLite is basically a hyper-optimization that only makes sense if you are completely, totally, 100% certain beyond any reasonable doubt that you will never need anything more than that.

electroly · on July 6, 2023

I can’t reveal too many internal details but this data lived in the RDBMS for years. Its access patterns are well understood. That’s exactly when you start cost optimizing. If this didn’t work out we’d just move back to the old DB schema that we were already using and pay for a bigger server. If we wanted, we could keep the schema as-is and just move it into SQL Server. That would work just fine, too. No re-engineering required.

Don’t know how else to say “we were not born yesterday; we thought of that” politely here. This definitely isn’t something to have your junior devs work on, nor is it appropriate for most DB usage, but that’s different than it not having any use. It’s a relatively straightforward solution to a niche problem.

BoorishBears · on July 6, 2023

If you mentioned Athena maybe I could see how this follows what the earlier comment says, but as is your usecase doesn't really overlap with why people are building cloud sql products

electroly · on July 6, 2023

Am I not “people”? :)

I built a SQLite VFS module just like the one linked here, and this is what I use it for in production. My use case obviously does not preclude other people’s use cases. It’s one of many.

GP asked whether this is a toy and what the upsides might be. I answered both questions with an example of my production usage and what we get out of it.

BoorishBears · on July 6, 2023

A SQLite VFS module isn't a cloud SQL product, my comment refers to companies like fly.io and MotherDuck that are actually selling "embedded database in the cloud"

It's entirely predicated on developer experience, otherwise there's no reason to specifically reach for an embedded database (in-process doesn't mean anything when the only thing the process is doing is running your DB)

electroly · on July 6, 2023

Okay, sure. I’m satisfied that I answered GP’s questions about this VFS module. This HN post is about the VFS module, posted by a notable SQLite extension developer (nalgeon).

pid-1 · on July 6, 2023

Why not use parquet files + AWS Athena?

electroly · on July 6, 2023

The ability to use an index to seek directly to a handful of consecutive rows without processing the whole file was very important for our use case. Athena doesn't support indexing like this; it only has partitioning on a single column. It has to scan whole partitions every time. Both S3 Select and Athena are more useful when you want to aggregate massive data sets, but that's not what we're doing. We want to jump in and pull out rows from the middle of big data sets with reasonably low latency, not aggregate the whole thing.

pid-1 · on July 7, 2023

> it only has partitioning on a single column

You can partition using several columns. But I get your point, it's not optmized for row level operations in general.

simlevesque · on July 6, 2023

To avoid depending on a AWS product.

pid-1 · on July 7, 2023

Athena = managed Presto + managed Hive Metastore

I use a presto containers to mock Athena for local and CI tests + a psql container for metastore.

BoorishBears · on July 6, 2023

From my POV this is part of the developer experience push.

SQLite running in-process is so convenient to build with that even when people use other DBs they write wrappers to sub it in

I guess this is an attempt to let that convenience scale?

benatkin · on July 6, 2023

I think letting people build their own system of isolated databases that can easily be started up and shut down for writing or randomly accessed for reading could be one use. It could be used for both performance and security. Offline use could also be improved, possibly by sending binary diffs.

sitkack · on July 6, 2023

This is a mechanism not a goal. You make the new goals with the new mechanism.

rsync · on July 6, 2023

I've been (proudly) noting these elegant one-liners for ... 18 years now:

  pg_dump -U postgres db | ssh user@rsync.net "dd of=db_dump"

  mysqldump -u mysql db | ssh user@rsync.net "dd of=db_dump"

... but what is the equivalent command for SQLite ?

I see that there is a '.dump' command for use within the SQLite console but that wouldn't be suitable for pipelining ... is there not a standalone 'sqlitedump' binary ?

oefrha · on July 6, 2023

The .backup command is better than than the .dump command suggested in siblings, as it doesn’t block all writes until it completes. You can then do anything with the backup. (Do keep in mind there’s the .backup never finishing problem on a db with non-stop writes.)

JNRowe · on July 6, 2023

The '.backup' command also carries across the metadata too, so you don't lose things like 'application_id'¹.

I mention this as I once wasted a bunch of time trying to get a backup created from a '.dump | sqlite3' pipe to work before taking a proper look at the application code, and finally saw that it silently ignored databases without the correct 'application_id' or 'user_version'.

¹ https://www.sqlite.org/pragma.html#pragma_application_id

polyrand · on July 6, 2023

The .dump command creates a read transaction[0], so it shouldn't block any writes.

From the SQLite docs:

  When a SAVEPOINT is the outer-most savepoint and it is not within a BEGIN...COMMIT then the behavior is the same as BEGIN DEFERRED TRANSACTION

Internally, the .dump command only runs SELECT queries.

[0]: https://github.com/sqlite/sqlite/blob/3748b7329f5cdbab0dc486...

oefrha · on July 6, 2023

A read transaction does block writes, unless you enable WAL mode, which I forgot to mention.

diarrhea · on July 6, 2023

WAL Mode is the default anyway, I thought.

chasil · on July 6, 2023

It is not, due to the many restrictions upon it, and warnings in its use.

“To accelerate searching the WAL, SQLite creates a WAL index in shared memory. This improves the performance of read transactions, but the use of shared memory requires that all readers must be on the same machine [and OS instance]. Thus, WAL mode does not work on a network filesystem.”

“It is not possible to change the page size after entering WAL mode.”

“In addition, WAL mode comes with the added complexity of checkpoint operations and additional files to store the WAL and the WAL index.”

https://www.vldb.org/pvldb/vol15/p3535-gaffney.pdf

SQLite does not guarantee ACID consistency with ATTACH DATABASE in WAL mode. “Transactions involving multiple attached databases are atomic, assuming that the main database is not ":memory:" and the journal_mode is not WAL. If the main database is ":memory:" or if the journal_mode is WAL, then transactions continue to be atomic within each individual database file. But if the host computer crashes in the middle of a COMMIT where two or more database files are updated, some of those files might get the changes where others might not.

https://www.sqlite.org/lang_attach.html

polyrand · on July 6, 2023

True, but I think there aren't many reasons to not run in WAL mode by default.

chasil · on July 6, 2023

Au contraire.

teddyh · on July 6, 2023

You could replace ‘dd’ with ‘cat’ here, as in “ssh user@rsync.net "cat > d_dump"”. The ‘dd’ command has absolutely no reason to be used in the vast majority of cases.

sitkack · on July 6, 2023

What is the computational difference between cat and dd?

teddyh · on July 6, 2023

AFAIK, ‘dd’ reads and writes a block at a time (whatever the its default block size is), while ‘cat’ uses whatever buffer sizes it thinks is the most efficient for the devices it is using for that invocation.

troyjfarrell · on July 6, 2023

I believe that you can echo the '.dump' command to SQLite.

    echo .dump | sqlite3 db | ssh user@rsync.net "dd of=db_dump"

It worked for me.

dmw_ng · on July 6, 2023

You can also pass special commands in the second argument to SQLite:

sqlite3 db .dump | ...

rsync · on July 6, 2023

Thanks!

I have added that one-liner to the remote commands page:

https://www.rsync.net/resources/howto/remote_commands.html

... although based on some of your siblings, I changed .dump to .backup ...

JNRowe · on July 7, 2023

Sadly, that doesn't look like it will work. The '.backup' command needs a filename and won't accept '-' or even '/dev/stdout', at least in the version available on Debian it will simply error out with "missing FILENAME argument on .backup".

As noted '.dump' will work in your pipeline, but with the caveats I and others noted elsewhere.

rsync · on July 7, 2023

Fixed. Thank you.

throwawaaarrgh · on July 6, 2023

  sqlite3 my_database .dump | gzip -c | ssh foo "dd of=db_dump"

Technically .backup is better because it locks less and batches operations, rather than doing a bunch of selects in a transaction. Do you prefer slowly filling up local disk or locking all writes?

comex · on July 6, 2023

You can pass console commands as arguments:

    sqlite3 /path/to/db.sqlite .dump

jamietanna · on July 6, 2023

Couldn't you also just `cp` the SQLite DB file?

idoubtit · on July 6, 2023

In many cases, `cp` on the DB file won't be enough. For instance, when WAL (Write Ahead Logging) is enabled (which is much faster than the old rollback journal), some data may lie in other files aside the main DB.

I once had a problem where my application couldn't read from a SQLite DB whose file was read-only. Turned out that even a read-only access to the DB required a write access to the directory, for a WAL DB. This was partly fixed years later, but I'd learned the hard way that a SQLite DB may be more than a single file.

jamietanna · on July 19, 2023

Ah super interesting, thanks for sharing!

chasil · on July 6, 2023

Why don't you write one? The whole API is presented in C.

https://sqlite.org/c3ref/backup_finish.html

anyoneamous · on July 6, 2023

Lots of people here talking about S3, and yet:

> The system currently supports Azure Blob Storage and Google Cloud Storage.

I'd interpret that as either a hard "fuck you" to AWS, or a sign that the S3 API is somehow more difficult to use for this purpose.

FooBarWidget · on July 6, 2023

The S3 API doesn't have the concurrency control primitives necessary to guarantee consistency in the face of concurrent writes.

I wrote a distributed lock on Google Cloud Storage. https://www.joyfulbikeshedding.com/blog/2021-05-19-robust-di... During my research it was quickly evident that GCS has more concurrency control primitives than S3. Heck S3 didn't even guarantee strong read-after-write until recently.

fs111 · on July 6, 2023

> until recently

Almost 3 years https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-rea...

electroly · on July 6, 2023

While I'm sure you're right, this VFS implementation doesn't use any of that stuff. They just handwave concurrent writes away by saying the application has to ensure a single writer on its own. An S3 implementation appears to be planned; it's mentioned in the comments in the bcvmodule.c file.

janejeon · on July 6, 2023

Ah, I'm guessing that's the same reason why many "cloud storage as a backend" stuff (e.g. Terraform/Vault storage backends, Delta.io backends) require DynamodDB + S3

mbreese · on July 6, 2023

I thought it was a sign of who paid for the feature. I have no insight as to whether or not this is true, but leaving out S3 made me assume either Google or Microsoft (or a third party heavily tied to one of these clouds) paid for this to be added.

stevefan1999 · on July 6, 2023

I highly suspect it would be the latter because if you just want to "fuck" S3 then you could just list Minio, Swift and Ceph RADOSGW support instead.

bob1029 · on July 6, 2023

The whole point of SQLite for me is to get the application code as close as possible to the block storage device without getting my hands too messy in the process. Latency is a major reason you'd consider using SQLite. Simplicity is right up there too.

Putting the storage in the cloud is completely orthogonal to this ideology. Both latency & simplicity will suffer dramatically. I don't think I'd ever use this feature over something like Postgres, SQL Server or MySQL.

codethief · on July 6, 2023

> Putting the storage in the cloud is completely orthogonal to this ideology. Both latency & simplicity will suffer dramatically.

I'm confused by your usage of the term "orthogonal". In my experience, it is used to express that something is independent of something else, similarly to how in mathematics orthogonal vectors are linearly independent and thus form a basis of their linear span. Here, however, it seems you mean to say that "putting storage in the cloud runs counter to this ideology"?

caseyohara · on July 6, 2023

I see "orthogonal" used quite frequently on HN. I remember reading a comment here that explained it something like:

> Ironically, the meaning of "orthogonal" has become orthogonal to the real meaning because now it can mean both "perpendicular" and "parallel", both "at odds" and "unrelated"

Edit: found it https://news.ycombinator.com/item?id=1997093 (my memory was pretty close)

mcculley · on July 6, 2023

I have some workflows where a SQLite database is built from data sets and then used read-only by multiple processes. I currently implement this by putting the SQLite file into S3 and copying it to the machines that need it. This will save the copy step.

fbdab103 · on July 6, 2023

It is not clear to me - is this an officially supported, core module? Or more, "You can technically do this, but ehhhh" kind of deal?

Would this mean I might eventually be able to do something insane like run Datasette (from within Pyodide) against an external cloud storage?

benatkin · on July 6, 2023

It seems like they're taking a bottom-up approach. I can't see any link to it on the main site. It would probably first bubble up to the News page, like WebAssembly did on 2022-11-16. https://sqlite.org/news.html WebAssembly also has a URL inside the root that redirects to doc/trunk/www/index.wiki: https://sqlite.org/wasm/

rickette · on July 6, 2023

It looks really promising, would also love to know the status of this. Can't find any announcements of this yet.

captn3m0 · on July 6, 2023

I don't think the blob storage APIs (which this seems to be using, instead of accessing over the web-gateway urls) are CORS accessible. So you might have to proxy it.

However, it might become possible to run datasette etc much more easily in an edge function.

antonvs · on July 6, 2023

Both Google and AWS storage, at least, are CORS accessible:

https://docs.aws.amazon.com/AmazonS3/latest/userguide/enabli...

https://cloud.google.com/storage/docs/using-cors

electroly · on July 6, 2023

SQLite's VFS layer is flexible and easy-to-use. There are lots of implementations (public and private) of this "use a SQLite database directly from cloud object storage" idea because it's pretty obvious how to do it when you look at the VFS functions that you have to implement. If you do a read-only implementation, it's an afternoon project. It's mostly a matter of following the VFS docs and when you're about halfway done implementing the functions, SQL queries start working instead of producing errors. The only tricky bit, as always, is caching.

noman-land · on July 6, 2023

I used this[0] awesome library to do just this over http. I posted more info in my other comment. I'm still exploring but so far it's been pretty impressive to me.

[0] https://github.com/phiresky/sql.js-httpvfs

chasers · on July 6, 2023

> although ensuring that only a single client is writing to a database at any time is (mostly) left to the application

Sounds like trouble.

pachico · on July 6, 2023

I am very puzzled by this. Wasn't the very point of Sqlite to be local? If it's not anymore, why would you prefer to use it rather than any other relational database?

moonchrome · on July 6, 2023

How is this even comparable to other relational databases ?

In this setup you use something like blobstore for public shared db access. Looks like single writer at a time.

In traditional databases you connect to a database server.

This is like having a cloud sync for your local app state - shared between devices on a cloud device.

dist-epoch · on July 6, 2023

> Wasn't the very point of Sqlite to be local

Not quite. The very point is not needing a server. Subtle difference.

Think more data lake and less relational db.

vbezhenar · on July 6, 2023

I always wondered by never bothered to implement:

What if I would implement an S3-backed block storage. Like every 16MB chunk is stored in a separate object. And filesystem driver would download/upload chunks just like it does it with HDD.

And then format this block device with Ext4 or use it in any other way (LVM, encrypted FS, RAID and so on).

Is it usable concept?

DaiPlusPlus · on July 6, 2023

What you're describing already exists in Azure Blob Storage (Azure's S3 compete), known as Page Blobs, which is also how you host VM HDDs (VHDs) in Azure, as it allows for page-level (i.e. disk-block-level) reads/writes.

I'm not clued-in to AWS's equivalent, but a quick google suggests it's AWS EBS (Elastic Block Storage), though it doesn't seem to coexist side-by-side with normal S3 storage, and their API for reading/writing blocks looks considerably more complicated than Azure's: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-acce...

bob1029 · on July 6, 2023

Azure blob storage took me by surprise when I had to load a custom VHD. My experience with AWS/EC2/S3 led me to think it would take half a day or longer.

Turns out, there's no transformation layers or anything. You upload the image using a special tool, it saturates your uplink for a few minutes, and before you can brew some more coffee the VM will be booting up.

yread · on July 6, 2023

Their fuse-based filesystem driver even supported CACHED block access ie you would download, cache and read the blocks you needed locally instead of sending every read over the wire. Close to local fs performance for little space

DaiPlusPlus · on July 6, 2023

How does that handle concurrent writes to the same block? How is the cache invalidated?

mappu · on July 6, 2023

The single FUSE process can coordinate all that with an in-memory mutex.

This thread is in the context of filesystems like ext4, which are completely unsafe for concurrent mounts - if you connect from multiple machines then it will just corrupt horrifically. If your filesystem was glusterfs, moose/lizard, or lustre, then they have ways to handle this (with on-disk locks/semaphores in some other sector).

electroly · on July 6, 2023

Behind the scenes, EBS Snapshots are stored in S3. When you first create an EBS volume from a snapshot, it pulls blocks from S3 on-demand.

damagednoob · on July 6, 2023

Not sure if it's a usable concept but it sounds similar to what someone else did[1]:

> ...I implemented a virtual file system that fetches chunks of the database with HTTP Range requests when SQLite tries to read from the filesystem...

[1]: https://phiresky.github.io/blog/2021/hosting-sqlite-database...

candiodari · on July 6, 2023

https://github.com/s3fs-fuse/s3fs-fuse/wiki/Fuse-Over-Amazon

Seems like a usable concept yes.

kirubakaran · on July 6, 2023

https://github.com/s3fs-fuse/s3fs-fuse

vbezhenar · on July 6, 2023

It uses S3 as a filesystem. I'm talking about using S3 as a block device. Slightly different approach.

franky47 · on July 6, 2023

Sounds like a very expensive way to build a filesystem.

cbluth · on July 6, 2023

Yes, its called s3backer

vbezhenar · on July 6, 2023

Thanks, that's the thing.

up2isomorphism · on July 6, 2023

Linus had once said database people seldom have good taste, I have to agree with him in this case.

wg0 · on July 6, 2023

Why would I use it over Litestream?

aembleton · on July 6, 2023

Because the DB you need to access might be too large for your device to download

infamia · on July 6, 2023

You could use it in coordination with Litestream as a failover or read replica. It would be a lot slower, but maybe that would be an OK tradeoff for certain use cases.

schemescape · on July 6, 2023

Edit: I misread the date, as pointed out in a reply.

Why does the forum say some messages are from 19 years ago?

https://sqlite.org/cloudsqlite/forum

benatkin · on July 6, 2023

Because they really are years old, and not days old.

Except it's 1.19 years ago.

And the oldest one is from 2020. https://sqlite.org/cloudsqlite/forumpost/da9d84ff6e

It might be announced recently, but it has been under development for a while.

api · on July 6, 2023

Make a plugin to put this in FoundationDB or TiKV. That'd be pretty interesting.

TekMol · on July 6, 2023

I guess you could just proxy a normal SQLite DB?

Should take only a few lines of PHP. Maybe just one:

    echo DB::select($_GET['query']);

All that is needed is a VM in the cloud. A VM with 10 GB is just $4/month these days.

And you can add a 100 GB volume for just $10/month.

jdthedisciple · on July 6, 2023

Hmm seems like a more complicated setup than a classic WebAPI-based architecture with PostgreSQL/MySQL, with no apparent benefit.

Anyone care to prove me wrong?