Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Query SQLite files stored in S3 (github.com/litements)
160 points by polyrand on Sept 13, 2022 | hide | past | favorite | 61 comments



This is really neat. It looks like it's using a similar trick to https://phiresky.github.io/blog/2021/hosting-sqlite-database... from last year - querying SQLite databases hosted as static files using the HTTP range header to fetch back just the pages from the database file that are needed by a specific query.


[Shameless plug] I built a very minimal wrapper to get this working for the Observable web data visualisation environment https://observablehq.com/@mjbo/sqljs-httpvfs


i wish s3 supported multiple ranges in the range header. I've done this sort of access with a custom columnar format that stored a set of columns in a single file

it was a non-trivial overhead to have to send an http request per column, when i could have sent one with multiple ranges.

i can imagine something similar with this, fetching multiple pages in one request


I made a similar sqlite vfs in Go[1]. I really love the VFS api. HTTP range queries is about the most mild thing you can do with it. The VFS implementations that use non-traditional backing stores are more fun. Like say, using DynamoDB as a read/write backing store for your sqlite database[2].

[1]: https://github.com/psanford/sqlite3vfshttp

[2]: https://github.com/psanford/donutdb


That DynamoDB VFS looks cool! I agree that the VFS api makes one think about plenty of crazy ideas. Someone is working on a VFS based on Foundation DB[0] that looks very promising. It was recently discussed here[1]

[0]: https://github.com/losfair/mvsqlite

[1]: https://news.ycombinator.com/item?id=32269287


I wish I knew rust so I could contribute, as I think that's the best one. FDB has the best durability/transactional guarantees of any key/value store that's distributed as far as I've seen. Syclla for example is also pretty great but doesn't have the same guarantees for transactions.


> FDB has the best durability/transactional guarantees of any key/value store that's distributed as far as I've seen.

Can you expand more on this? I keep hearing that on HN but what guarantees does FDB provide over other key/value DB’s?


I’d read this:

https://apple.github.io/foundationdb/cap-theorem.html

And

https://apple.github.io/foundationdb/consistency.html

For a summary. Tldr is that FDB provides transactional guarantees similar to an rdbms but can be distributed and has optimistic concurrency, meaning locks are not necessary.

In the context of SQLite that’s the main issue it has and FDB is positioned to solve it.


Very useful. Thank you so much


Here's [1] a particularly crazy one which offers synchronous, low-latency (millis) replication of a SQLite database to a distributed recovery log. It does this by instrumenting the file operations SQLite is making and recording their effects into the log.

[1] https://pkg.go.dev/go.gazette.dev/core@v0.89.0/consumer/stor...


Something really neat about SQLite is how self contained the database file is.. You can take say, a large book or log output, then copy it into a table with full text search. Optimize the DB to get rid of empty space, then ship that out to S3 or something. Now you can fetch that down and have indexed and fts access to the contents.

Cool way to store data with indexes into it in a way that makes it as easy to use as copying around and opening a file; no need for dumps or restores or server processes.


Most problems can be neatly packed into 1 sqlite database. Even a big AAA video game could benefit from this type of representation on disk.

I am also now starting to look at SQLite as a sort of application framework / operating system. The SQLite VM is running on billions of devices right now without any issues. Seems like a safe bet if I wanted to use the relational/SQL model to represent most of my application state & logic. Anything else I want to bring to the party I can inject by way of application-defined functions.


Ever since I read the "Out of the Tar Pit" paper[1], I've been designing my data structures in a more relational form.

[1]: https://github.com/papers-we-love/papers-we-love/blob/master...


This paper was a very big deal for us. I thought “oh this is nice” and then I got to chapter 9...

SQL is so much more powerful than most developers are aware. It's not just some annoyance to trod through in order to get to your data. It is an entire data-as-computation paradigm that also happens to be a domain specific language.

SQL is functional. It is also relational. Figuring out how to combine both of these properties across your entire domain is peak elegance. We got pretty damn close on our current iteration. The walls of the matrix are a bit messy, but from inside a given SQL expression you can't tell. It all looks very clean from the inside.

SQLite represents the most stable and portable foundation upon which to construct one of these systems. Nothing else comes close.


I like it just because it's a single file (most of the time). I can put pretty much anything in it without having to think about how to store data on disk in a sane manner, and SQLite can handle a lot of data. You get so many good things for free: transactionality, durability, concurrent access, and an easy consistent backup mechanism.

Even if JSON blobs happen to be the most convenient way to represent your data, you can still just put those blobs in an SQLite database and maybe worry about converting them into something relational later.


You can also do the same with a file and you can even open it in plain text and read it too.


I mean, the database IS a file. The key here is that it colocates indexes and other metadata that can be used to efficiently access large data sets without having to hand-roll indexes and/or process them on-the-fly everytime you load the data up...


SQLite is made to replace "dread" by being generally better than storing the data as a json blob or csv

There's still an argument for columnar formats like arrow, but compatibility is iffy compared to SQLite which "just works"


This is cool. It would be interesting to see some stats like how many separate requests are needed for a single SELECT statement, since S3 charges per GET, even on a small range, I believe. Obviously that would depend on the query and various DB settings like page_size, etc. Also wondering if this does aggressive local caching for the same reason.


Author here! This is something I want to look into. One of the reasons to use s3fs is that the underlying `fsspec` classes already handle caching.

S3File (from s3fs) inherits from AbstractBufferedFile[0], which has a cache[1], implemented here[2]. I haven't read through all the code yet, but experimenting with different cache implementations will probably make the VFS faster. It will also depend on the type of queries you're executing.

[0]: https://github.com/fsspec/s3fs/blob/ad2c9b8826c75939608f5561...

[1]: https://github.com/fsspec/filesystem_spec/blob/2633445fc5479...

[2]: https://github.com/fsspec/filesystem_spec/blob/2633445fc5479...


Do you have any insight into whether the cache will alter consistency in the face of concurrent writes from multiple locations in a way that SQLite can't handle? I imagine this would be a very real concern for a project like this.


This VFS is only for reading databases stored in S3, not writing. The README gives some alternatives for other implementations which allow writing.


This is cool, I'll probably end up using this (we store sqlite snapshots in S3).

It'd be very handy if this was packaged as a CLI that worked like `sqlite3`, so you could get a SQL repl by running:

    s3sqlite mybucket/mydb.sqlite3


The sqlite cli supports loading modules at runtime, so you can do things like load and use a db via a custom vfs. My Go implementation of this vfs supports running in loadable module mode[1].

[1]: https://github.com/psanford/sqlite3vfshttp#demo


I feel like people are pushing SQLite beyond its limits with this stuff. If you're going to do something like this, wouldn't just a normal client/server option like PostgreSQL be way more performant?

I think SQLite is fantastic, but this just smacks of scope creep.


There are some real benefits to using SQLite as a "serialization format" over something like JSON or Protobuf.

To elaborate on a specific case, think about a 'package-lock.json' file in a Node project. These files are frequently many megabytes in size, even for simple projects, and if you wanted to do something like inventory _every package_ used on GitHub, it would be a ton of data.

You might argue that you could compress these files because they're text, and that's true, but what about when you only need to read a subset of the data? For example, what if you needed to say, "Does this project depend on React?"

If you have it as JSON or Protobuf, you have to deserialize the entire data structure before you can query it (or write some nasty regex). With SQLite you have a format you can efficiently search through.

There are some other formats in this space like ORC and Parquet but they're just optimized for column-reads (read only _this_ column). They don't provide efficient querying over large datasets like SQLite can.

This is at least my understanding. If anybody has a perspective that's opposite, I'd appreciate hearing it!


I would call this hacking, actually. Is it useful in a financially productive sense? Maybe, don't care. It's pushing a program we like past its limits. That's fun.


This would be quite convenient when dealing with MBtiles files hosted on S3 or similar. There are efforts such as PMTiles[1] that let us use HTTP Range query to retrieve tiles directly from S3, but we still have to convert MBTiles to PMTiles first.

More importantly that OSM is considering providing MBTiles directly (besides PBF)[2].

[1]: https://protomaps.com/docs/pmtiles/

[2]: https://github.com/openstreetmap/operations/issues/565


Aha! This is done in https://github.com/uktrade/mbtiles-s3-server, but using an alternative library to query SQLite on S3, https://github.com/michalc/sqlite-s3-query

(Full disclosure: I wrote most of both of these)


I admire people for their inventiveness, however surely attempting to square-peg-round-hole something like this into AWS S3 is only going to end up biting them in the backside ?

S3 is delivered over a stateless protocol (http) and AWS makes no promises about any given request being fulfilled (the user is merely invited to retry). The safety of your data is then further dependent on what storage tier you chose.

There just seem to be so many footgun opportunities here. Why not just use the right tool for the job ? (i.e. hosted DB or compute).


> attempting to square-peg-round-hole something like this into AWS S3

I don't think it's like that. AWS already offers services like S3 Select[0] or Athena[1] that do something similar.

> the user is merely invited to retry

Another reason why I used s3fs instead of manually making requests.

> Why not just use the right tool for the job?

I certainly have multiple uses-cases where creating an SQLite database locally and distributing it via S3 (for read-only usage) is orders of magnitude more convenient than the alternatives. It's hard to beat the developer experience of a local SQLite database.

[0]: https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectOb...

[1]: https://aws.amazon.com/athena/


> Why not just use the right tool for the job ? (i.e. hosted DB or compute).

Because sometimes you want a database for 10000 rows, or you have 5 logins a month and you don't want a $30/month database running in the cloud.

There's a market out there for real "server-less" database that charges you per rows stored, is priced per read/write operations, and is reasonably priced for a 10000 rows per month, without having to calculate how many 4KB blocks you are going to read or write.


Firebase Firestore?


Like… dynamodb?


Maybe my 10000 rows are 5 tables, 2000 rows each, and I want to have some regular SQL features like joins? Can I do that without having to do joins in code? Because if I have to do joins in code, I can just store data in flat files and process everything in memory. But what if I want to have an existing app designed to run against a db system?

If I have to start writing joins by hand, what's the point of even having a database?


Dynamodb doesn't do well with complex queries. Imagine you've got 4 integer fields, any combination of which can be filtered by a range.

Using dynamoDb, you're going to have a hard time querying if field A, B, and D are included. If you add a fifth field, it's going to be a pain to add to historical data.


Yes but if you only have 10k rows it’s trivial to do manual joins or migrations


This is nice, but before you go to the trouble of using it, consider just putting your CSV, JSON, or parquet data file in S3 and query it with S3 Select https://docs.aws.amazon.com/AmazonS3/latest/userguide/select...

I haven't tried this myself but parquet + S3 select should be especially good for analytical queries.


Seems like it should be. That's pretty much the status quo with how Azure Synapse works. You put your Parquet file out on storage, and query it (via Spark or Pandas).


If I need to query the db repeatedly (N times per second), isn't it more efficient (low latency and aws cost) to just download the db and run queries locally?


I was thinking it would be nice if there were a generic VFS library for C that offered stdio-like APIs (fread, fprintf, etc.) in order to do tricks like this beyond SQLite, without having to make major changes porting existing codebases over to different APIs.

It occurs to me that because a FILE * is an opaque pointer, it ought to be possible for a libc to provide this functionality: the FILE struct could contain function pointers to the the underlying implementations for writing, flushing, etc.

This ought to be backwards compatible with existing FILE * returning APIs (fopen, fdopen, popen), while allowing new backing stores without having to rewrite (or even recompile) existing code.



Thanks! Looks like it is a glibc (and musl) feature, though this led me to discover funopen() in BSD and macOS: https://man.openbsd.org/funopen.3


Local file semantics like FILE* don't map well to how large scale object storage systems operate. For example, FILE* supports seeking (specifically for writes) and mutating files after construction whereas object storage systems generate immutable copies of files once made visible after uploading. There's also no fsync.

FWIW S3 kind of is that "de facto" interface. There were attempts at standardizing it via OpenStack's SWIFT project but to my eye S3 remains the industry de facto for better or worse.


I read of someone using AWS EFS for sqlite, which has nfs functionality, if I recall correctly.

However I like the idea of using a local store with litestream to S3 and updating to the latest db data from S3, if using my application on another device or computer. Sort of like github.

Tailscale was looking into this option, instead of Postgres and other DBs. https://tailscale.com/blog/database-for-2022/

Cheers


I implement a similar concept for Python using APSW and boto3.

https://gist.github.com/Q726kbXuN/56a095d0828c69625fa44c2311...

It's not really useful for deep dives into data, though it has gotten me out of a few jams that would have otherwise required a real database just to pull out a value or two from a large dataset.



There have been many of these custom vfs implementations. Is there a benchmark on performance between them?


A comparison to S3 Select would make sense since S3 Select lets you leverage RANGE, leverage compact formats like compressed Parquet, and would presumably have a similar fundamental cost ie: get + data transfer.


This could be done by mounting S3 as filesystem and plain sqlite. Any differences?


If you did that, wouldn't you have to fetch the entire SQLite database file before you could perform any queries?

With this you don't need to do that - the software magically fetches just the pages needed to answer your specific query.


That really depends on the S3 filesystem implementation. It could make the same sort of optimizations as this does using range reads.


Ah even if this were the case, in some situations you can’t mount filesystems (e.g. Docker without certain capabilities enabled?), and so something more in the application layer would be the only way to access SQLite on S3?


Yep! This kind of thing works around these problems.

https://www.sqlite.org/draft/useovernet.html


Sorry for the stupid question: does this work with MinIO?


Yes, I actually used MinIO to test it (see the `test.py` script)


Has anyone created a vfs for redis?


Is there a benefit over the existing S3 VFS sqlite3-s3vfs,which also uses APSW?


Author here! I would say the main advantage is that this just needs uploading a single file to S3. `sqlite-s3vfs` will split the database in pages and upload the pages separately to a bucket prefix.

Having just a single file has some advantages, like making use of object versioning in the bucket[0]. I also think that relying on s3fs[1] makes the VFS more flexible[2] than calling `boto3` as `sqlite3-s3vfs` does.

[0]: https://s3fs.readthedocs.io/en/latest/index.html?highlight=v...

[1]: https://s3fs.readthedocs.io/en/latest/index.html

[2]: https://s3fs.readthedocs.io/en/latest/index.html#s3-compatib...


I see. The drawback of course is that you don't support writing.

So there are for completely different use cases, thanks!





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: