This is really neat. It looks like it's using a similar trick to https://phiresky.github.io/blog/2021/hosting-sqlite-database... from last year - querying SQLite databases hosted as static files using the HTTP range header to fetch back just the pages from the database file that are needed by a specific query.
i wish s3 supported multiple ranges in the range header. I've done this sort of access with a custom columnar format that stored a set of columns in a single file
it was a non-trivial overhead to have to send an http request per column, when i could have sent one with multiple ranges.
i can imagine something similar with this, fetching multiple pages in one request
I made a similar sqlite vfs in Go[1]. I really love the VFS api. HTTP range queries is about the most mild thing you can do with it. The VFS implementations that use non-traditional backing stores are more fun. Like say, using DynamoDB as a read/write backing store for your sqlite database[2].
That DynamoDB VFS looks cool! I agree that the VFS api makes one think about plenty of crazy ideas. Someone is working on a VFS based on Foundation DB[0] that looks very promising. It was recently discussed here[1]
I wish I knew rust so I could contribute, as I think that's the best one. FDB has the best durability/transactional guarantees of any key/value store that's distributed as far as I've seen. Syclla for example is also pretty great but doesn't have the same guarantees for transactions.
For a summary. Tldr is that FDB provides transactional guarantees similar to an rdbms but can be distributed and has optimistic concurrency, meaning locks are not necessary.
In the context of SQLite that’s the main issue it has and FDB is positioned to solve it.
Here's [1] a particularly crazy one which offers synchronous, low-latency (millis) replication of a SQLite database to a distributed recovery log. It does this by instrumenting the file operations SQLite is making and recording their effects into the log.
Something really neat about SQLite is how self contained the database file is.. You can take say, a large book or log output, then copy it into a table with full text search. Optimize the DB to get rid of empty space, then ship that out to S3 or something. Now you can fetch that down and have indexed and fts access to the contents.
Cool way to store data with indexes into it in a way that makes it as easy to use as copying around and opening a file; no need for dumps or restores or server processes.
Most problems can be neatly packed into 1 sqlite database. Even a big AAA video game could benefit from this type of representation on disk.
I am also now starting to look at SQLite as a sort of application framework / operating system. The SQLite VM is running on billions of devices right now without any issues. Seems like a safe bet if I wanted to use the relational/SQL model to represent most of my application state & logic. Anything else I want to bring to the party I can inject by way of application-defined functions.
This paper was a very big deal for us. I thought “oh this is nice” and then I got to chapter 9...
SQL is so much more powerful than most developers are aware. It's not just some annoyance to trod through in order to get to your data. It is an entire data-as-computation paradigm that also happens to be a domain specific language.
SQL is functional. It is also relational. Figuring out how to combine both of these properties across your entire domain is peak elegance. We got pretty damn close on our current iteration. The walls of the matrix are a bit messy, but from inside a given SQL expression you can't tell. It all looks very clean from the inside.
SQLite represents the most stable and portable foundation upon which to construct one of these systems. Nothing else comes close.
I like it just because it's a single file (most of the time). I can put pretty much anything in it without having to think about how to store data on disk in a sane manner, and SQLite can handle a lot of data. You get so many good things for free: transactionality, durability, concurrent access, and an easy consistent backup mechanism.
Even if JSON blobs happen to be the most convenient way to represent your data, you can still just put those blobs in an SQLite database and maybe worry about converting them into something relational later.
I mean, the database IS a file. The key here is that it colocates indexes and other metadata that can be used to efficiently access large data sets without having to hand-roll indexes and/or process them on-the-fly everytime you load the data up...
This is cool. It would be interesting to see some stats like how many separate requests are needed for a single SELECT statement, since S3 charges per GET, even on a small range, I believe. Obviously that would depend on the query and various DB settings like page_size, etc. Also wondering if this does aggressive local caching for the same reason.
Author here! This is something I want to look into. One of the reasons to use s3fs is that the underlying `fsspec` classes already handle caching.
S3File (from s3fs) inherits from AbstractBufferedFile[0], which has a cache[1], implemented here[2]. I haven't read through all the code yet, but experimenting with different cache implementations will probably make the VFS faster. It will also depend on the type of queries you're executing.
Do you have any insight into whether the cache will alter consistency in the face of concurrent writes from multiple locations in a way that SQLite can't handle? I imagine this would be a very real concern for a project like this.
The sqlite cli supports loading modules at runtime, so you can do things like load and use a db via a custom vfs. My Go implementation of this vfs supports running in loadable module mode[1].
I feel like people are pushing SQLite beyond its limits with this stuff. If you're going to do something like this, wouldn't just a normal client/server option like PostgreSQL be way more performant?
I think SQLite is fantastic, but this just smacks of scope creep.
There are some real benefits to using SQLite as a "serialization format" over something like JSON or Protobuf.
To elaborate on a specific case, think about a 'package-lock.json' file in a Node project. These files are frequently many megabytes in size, even for simple projects, and if you wanted to do something like inventory _every package_ used on GitHub, it would be a ton of data.
You might argue that you could compress these files because they're text, and that's true, but what about when you only need to read a subset of the data? For example, what if you needed to say, "Does this project depend on React?"
If you have it as JSON or Protobuf, you have to deserialize the entire data structure before you can query it (or write some nasty regex). With SQLite you have a format you can efficiently search through.
There are some other formats in this space like ORC and Parquet but they're just optimized for column-reads (read only _this_ column). They don't provide efficient querying over large datasets like SQLite can.
This is at least my understanding. If anybody has a perspective that's opposite, I'd appreciate hearing it!
I would call this hacking, actually. Is it useful in a financially productive sense? Maybe, don't care. It's pushing a program we like past its limits. That's fun.
This would be quite convenient when dealing with MBtiles files hosted on S3 or similar. There are efforts such as PMTiles[1] that let us use HTTP Range query to retrieve tiles directly from S3, but we still have to convert MBTiles to PMTiles first.
More importantly that OSM is considering providing MBTiles directly (besides PBF)[2].
I admire people for their inventiveness, however surely attempting to square-peg-round-hole something like this into AWS S3 is only going to end up biting them in the backside ?
S3 is delivered over a stateless protocol (http) and AWS makes no promises about any given request being fulfilled (the user is merely invited to retry). The safety of your data is then further dependent on what storage tier you chose.
There just seem to be so many footgun opportunities here. Why not just use the right tool for the job ? (i.e. hosted DB or compute).
> attempting to square-peg-round-hole something like this into AWS S3
I don't think it's like that. AWS already offers services like S3 Select[0] or Athena[1] that do something similar.
> the user is merely invited to retry
Another reason why I used s3fs instead of manually making requests.
> Why not just use the right tool for the job?
I certainly have multiple uses-cases where creating an SQLite database locally and distributing it via S3 (for read-only usage) is orders of magnitude more convenient than the alternatives. It's hard to beat the developer experience of a local SQLite database.
> Why not just use the right tool for the job ? (i.e. hosted DB or compute).
Because sometimes you want a database for 10000 rows, or you have 5 logins a month and you don't want a $30/month database running in the cloud.
There's a market out there for real "server-less" database that charges you per rows stored, is priced per read/write operations, and is reasonably priced for a 10000 rows per month, without having to calculate how many 4KB blocks you are going to read or write.
Maybe my 10000 rows are 5 tables, 2000 rows each, and I want to have some regular SQL features like joins? Can I do that without having to do joins in code? Because if I have to do joins in code, I can just store data in flat files and process everything in memory. But what if I want to have an existing app designed to run against a db system?
If I have to start writing joins by hand, what's the point of even having a database?
Dynamodb doesn't do well with complex queries. Imagine you've got 4 integer fields, any combination of which can be filtered by a range.
Using dynamoDb, you're going to have a hard time querying if field A, B, and D are included. If you add a fifth field, it's going to be a pain to add to historical data.
Seems like it should be. That's pretty much the status quo with how Azure Synapse works. You put your Parquet file out on storage, and query it (via Spark or Pandas).
If I need to query the db repeatedly (N times per second), isn't it more efficient (low latency and aws cost) to just download the db and run queries locally?
I was thinking it would be nice if there were a generic VFS library for C that offered stdio-like APIs (fread, fprintf, etc.) in order to do tricks like this beyond SQLite, without having to make major changes porting existing codebases over to different APIs.
It occurs to me that because a FILE * is an opaque pointer, it ought to be possible for a libc to provide this functionality: the FILE struct could contain function pointers to the the underlying implementations for writing, flushing, etc.
This ought to be backwards compatible with existing FILE * returning APIs (fopen, fdopen, popen), while allowing new backing stores without having to rewrite (or even recompile) existing code.
Local file semantics like FILE* don't map well to how large scale object storage systems operate. For example, FILE* supports seeking (specifically for writes) and mutating files after construction whereas object storage systems generate immutable copies of files once made visible after uploading. There's also no fsync.
FWIW S3 kind of is that "de facto" interface. There were attempts at standardizing it via OpenStack's SWIFT project but to my eye S3 remains the industry de facto for better or worse.
I read of someone using AWS EFS for sqlite, which has nfs functionality, if I recall correctly.
However I like the idea of using a local store with litestream to S3 and updating to the latest db data from S3, if using my application on another device or computer. Sort of like github.
It's not really useful for deep dives into data, though it has gotten me out of a few jams that would have otherwise required a real database just to pull out a value or two from a large dataset.
A comparison to S3 Select would make sense since S3 Select lets you leverage RANGE, leverage compact formats like compressed Parquet, and would presumably have a similar fundamental cost ie: get + data transfer.
Ah even if this were the case, in some situations you can’t mount filesystems (e.g. Docker without certain capabilities enabled?), and so something more in the application layer would be the only way to access SQLite on S3?
Author here! I would say the main advantage is that this just needs uploading a single file to S3. `sqlite-s3vfs` will split the database in pages and upload the pages separately to a bucket prefix.
Having just a single file has some advantages, like making use of object versioning in the bucket[0]. I also think that relying on s3fs[1] makes the VFS more flexible[2] than calling `boto3` as `sqlite3-s3vfs` does.