FWIW, this works for me with Python 3.12 from Homebrew, but not Python 3.12 from python.org. _sqlite3.cpython-312-darwin.so in Homebrew's Python appears to dynamically link /opt/homebrew/opt/sqlite/lib/libsqlite3.0.dylib, but the version in python.org's Python statically links the sqlite3 library.
EDIT: Python 3.9.6 from Xcode doesn't work either. It has _sqlite3.cpython-39-darwin.so which dynamically links /usr/lib/libsqlite3.dylib, but that dylib doesn't exist on my system, and I don't know enough about macOS internals to tell where it's coming from. The _sqlite3 so doesn't seem big enough to have it statically linked.
EDIT2: Xcode's Python works when launching via the real path instead of using the /usr/bin/python3 alias, I assume because /usr/bin is SIP-protected or something.
From the original forum post [0] announcing this improvement:
> But if you modify your application to start storing JSONB instead of text JSON, you might see a 3-times performance improvement, at least for the JSON-intensive operations. JSONB is also slightly smaller than text JSON in most cases (about 5% or 10% smaller) so you might also see a modest reduction in your database size if you use a lot of JSON.
I for one am excited about these improvements (specifically the disk use reduction) since we store a lot of JSON here at Notion Labs, and we’re increasing our use of SQLite.
We’ve used SQLite in our native apps (including desktop) for years, like you’d expect. We’re considering how we could use it in the browser in a few ways now that OPFS and the ecosystem there are stabilizing. We’re also looking at some use cases server side, but not one-db-per-tenant.
I don’t think SQLite’s single-writer model would mesh well with Notion’s collaborative features. I’m actually very curious if the one-db-per-tenant concept turns out to be a good idea or a fad. To me it seems like a small app can very happily fit all their users on a single Postgres instance with much less orchestration effort, and a large app demanding of its database would hit the single-write lock thing.
Who is doing this and where can I read more? What are the tradeoffs?
I imagine that you get a a dataset that is significantly smaller but it is much trickier to keep a dataset in memory the way you could with MySQL.
It's like having a free implicit index on the customer (because you had to lookup the sqlite db file before you could start querying).
I spend a lot of time thinking about tenancy and how to handle it. Tenancy is such a common problem.
Performance is the number one reason tickets are hard to estimate. The second in my experience is security.
Time and tenancy are the number one opportunities for SQL to just be better (I always need tenancy and my Order By or at least one constraint can typically be satisfied with time).
I'm doing it, though I haven't written anything up. Happy to share my opinion though, with a bit more experience than you have.
The databases I'm working with are pretty small - ballpark 4MB of data per "tenant". So, I guess, a single large database sever with half a terabyte of RAM could keep well over a hundred thousand tenants in memory at the same time (I don't have anywhere near that many tenants, so I haven't tested that... and honestly if I did have that many I'd probably split them up between different servers).
Without getting stuck into the into too much detail - "tenant" isn't really a good fit for how we split them up. Our business is largely based on events that happen at a specific date, with maybe a few months of activity before that date. We have an sqlite database for each event (so ~4MB per event). Once the event passes, it's essentially archived and will almost never be accessed. But it won't actually never be accessed so we can't delete it.
I haven't run into any performance issues so far, just with regular sqlite databases on the filesystem. I expect the kernel is doing it's thing and making sure "hot" databases are RAM as with any other frequently accessed file on the disk.
My understanding (it's a theoretical problem I haven't actually encountered...) is SQLite only really struggles when you have a bunch of simultaneous writes. Our business model doesn't have that. The most actively written table is the one where we record credit card payments... and unfortunately we don't make tens of thousands of sales per second.
If we did have that "problem" I'm sure we could allocate some of our billions of dollars per day in profits to finding a way to make it work... my gut instinct would be to continue to use SQLite with some kind of cache in front of it. All writes would go to something faster than SQLite, then be copied to SQLite later. Reads would check the write cache first, and SQLite if the cache misses.
My experience working with a single large database is you end up with a lot of stale data that you is almost never needed. When a table has a hundred million rows, with indexes on multiple columns, even the simplest operating like adding a new row can get slow. My approach with SQLite eliminates that - I'll often have just hundreds of rows in a table and access is blazingly fast. When I need to access another database that hasn't been touched in a long time (years possibly), having to wait, what, an entire millisecond, for the SSD to load that database off the filesystem into memory isn't a big deal. No user is going to notice or complain.
Obviously that's more challenging with some data sets and if you're constantly accessing old data, those milliseconds will add up to significant iowait and things will fall over. I definitely don't use SQLite for all of my databases... but in general if you're doing enough writes for SQLite's simultaneous write performance issue to be a problem... then chances are your data set is going to get very large, very quickly, and you're going to have performance headaches no matter what database you're using.
Finding some way to divide your database is an obvious performance win... and SQLite makes that really easy.
> Fix a couple of harmless compiler warnings that appeared in debug builds with GCC 16.
Some projects use -Werror, only ever test with older GCC, and builds fail with anything recent. SQLite on the other hand anticipates the new compiler warnings of GCC 3 major versions in the future, that's impressive!
While I find your explanation plausible, who has the muscle memory to press "g", "c", "c", spacebar, lift hand to numpad, "1", "6(no 3 sire!)", hand back to home row?
Embarrasing question tbh but with all the cloud-native sqlite stuff like cloudflare d1 and fly LiteFS I'm seriously thinking of switching from postgres to sqlite.
Does anyone have a compare/contrast sort of thing between the two?
It is a lot less feature rich than Postgres so there are things you will miss. Nothing like the range of types, I do not think it has a transactional DDL which is nice to have for migrations, and there are various other things like exclusion constraints and the different index types.
On the other hand SQLite may do all you want and not having to run and configure a separate server is a huge deployment advantage.
For the love of all that is holy, if you do, only use STRICT tables. By default [0], SQLite will happily accept that not only can an INTEGER column store “1234” (helpfully silently casting it first), but “abcd” can also be stored in the column as-is.
I actually kinda like the fact that whatever data you write to the table will actually be written.
I semi-regularly fix a serious data loss bug that has been fixed with an alter table query. Maybe converting VARCHAR to TEXT or INT to BIGINT... of course it doesn't really "fix" your problem, because the data has already been lost/truncated.
What's a real world situation where completely the wrong type could be written to a column? Especially in modern software with good type safety checks/etc to ensure you don't have malicious data inserted into your database? If I ever did have that happen... at least the data hasn't been lost. You can run a simple script to clean up the "horrific" data.
I don't find a DB that losslessly stores what I told it to store regardless of types worrying at all.
So in fact AFAIC the misfeature of SQLite is not that it's typeless, IMO, rather it's that it has this notion of NUMERIC affinity that's all but lossless.
E.g. SQLite has a decimal extension that allows you to work with decimal numbers represented as TEXT, and so is appropriate to handle money without rounding issues. However, if you have a column where the declared type is DECIMAL, MONEY, NUMBER, NUMERIC or whatever it will have NUMERIC affinity. Then if you store a textual decimal number to it, it will deduce it looks like a FLOAT and convert, loosing precision.
Your only solution is to use BLOB affinity (declare no type), which is what I do, most of the time.
For me, the biggest trade offs for sqlite are just that you need to think about how you're going to store and backup the database a lot more. Specifically in container orchestration environments like kubernetes, I think sqlite presents a couple of challenges. With MySQL, you can set up a replicated database server instance outside of the cluster that you just connect to over the network, and you can use standard MySQL tools like mysqldump to back them up. Kubernetes isn't ideal for stateful workloads so that tends to be one of the more sane solutions there.
With SQlite you need to set up a persistent volume to keep the database around between container restarts, and you need to think of a clever way to then back up that sqlite database to somewhere like S3, likely using an sqlite3 command with a VACUUM statement and then an `aws s3 cp` command, which requires AWS credentials. Overall, a lot of additional work and privileges on the application container, at least in container orchestration environments. In lieu of all that, maybe you trust your persistent volume provisioner enough to try to do an online snapshot, but that always sketches me out / I don't trust the backup enough to rely on it.
Of course you can use a public cloud sqlite service like Cloudflare D1, but I haven't used that solution enough to say if it would be flexible enough to work with, say, an on-prem application server, or if it only works with Cloudflare workers. I'm sure I could find that out in the documentation but I've exhausted my mental stamina for the day with leafing through documentation pages.
Backing up sqlite databases is straightforward. `.backup` is a command that you use in sqlite for this purpose. Since you already have a volume for the database, you can backup to that same volume (if offloading to S3/etc is too much work).
> Since you already have a volume for the database, you can backup to that same volume (if offloading to S3/etc is too much work).
I think I just accidentally didn't see this part of your reply or something last night. But backing up a database to the same place the primary one is stored is decidedly _not_ a real backup solution. Nobody should believe that offers them any of the same forms of protection that a real backup would give them. At the absolute best, you're protected from your database getting malformed by your application server. But if you lose that volume for any reason, your backups are just gone. Imagine explaining to your boss in such a scenario that the solution that was come up with was that the backups are kept on the same linux partition as the primary running database. They would fire me. I would fire me.
Yeah, my point was pretty specific to container environments, that using sqlite forces you to add a bunch of sqlite handling logic to your application. Whereas with MySQL (and other similar RDBMS’), you can have your application just worry about its own logic, and handle MySQL backups completely separately from it.
Note that sqlite `.backup` does not back up PRAGMA values. Some people use `PRAGMA user_version` for schema versioning (not a good idea, because of this trap).
It sounds great until you need a centralized billing database and then you might want to just stick with postgres rather than run two kinds of database.
Does anyone have ideas on how to solve that? Not to mention complicating migrations. Unfortunately sqlite-based product docs seem to end right before getting to the hard stuff. Or perhaps I missed them.
Django ORM also uses the same copy and move method but it warns you against doing it on production databases: "same copy and move method but it warns you against doing it on production databases".
That makes me reluctant to use SQLite for a use case it is otherwise well suited to: multi-tenant applications.
Alembic does not have such dire warnings, but still looks problematic with regard to constraints?
I use SQLite/Litestream for https://extensionpay.com! Serves about 120m requests per month (most of those are cached and don't hit the db), but it's been great!
Using SQLite with Litestream helped me to launch the site quickly without having to pay for or configure/manage a db server, especially when I didn't know if the site would make any money and didn't have any personal experience with running production databases. Litestream streams to blackblaze b2 for literally $0 per month which is great. I already had a backblaze account for personal backups and it was easy to just add b2 storage. I've never had to restore from backup so far.
There's a pleasing operational simplicity in this setup — one $14 DigitalOcean droplet serves my entire app (single-threaded still!) and it's been easy to scale vertically by just upgrading the server to the next tier when I started pushing the limits of a droplet (or doing some obvious SQLite config optimizations). DigitalOcean's "premium" intel and amd droplets use NVMe drives which seem to be especially good with SQLite.
One downside of using SQLite is that there's just not as much community knowledge about using and tuning it for web applications. For example, I'm using it with SvelteKit and there's not much written online about deploying multi-threaded SvelteKit apps with SQLite. Also, not many example configs to learn from. By far the biggest performance improvement I found was turning on memory mapping for SQLite.
Cool. I'm still trying to wrap my head around how to use it responsibly: Is this a "good, old fashioned" single durable server type of deployment?
I was wondering if you had any experience with, essentially, "always restore DB during startup".
I like my systems made such that the server can die, get scaled vertically and oops data disk wiped, forget to mount volume in a container, etc. without me lifting a finger. It looks like I can do that with litestream, but I always want to hear from people who have tried it in the real world.
> Is this a "good, old fashioned" single durable server type of deployment?
Yep!
> I like my systems made such that the server can die, get scaled vertically and oops data disk wiped, forget to mount volume in a container, etc. without me lifting a finger. It looks like I can do that with litestream, but I always want to hear from people who have tried it in the real world.
I can't really help you there unfortunately, but I know fly.io is trying to offer something like that.
Oh the complexity and pain I go through in my quest for "keeping it simple."
It's fine, my server for stupid side projects is a cheap VPS with a systemd-based configuration anyway, I am just trying to figure out all the failure modes here vs "apt-get install postgresql", including, most likely, myself.
In my tests sqlite was around 10X faster than postgres.
That mean that a single cheap server is capable of going very very far in normal web workloads.
I think we often add complexity: virtualisation, cloud, separated db server, horizontal scaling when efficient and simple tech is able to go very very far.
Trying to store JSON-like data in a way that's both compact and fast to operate on directly is a challenge. IIUC this is is something SQLite has wanted to introduce for a while, but it took them some time to find a viable approach.
I find it really odd that the decision was made to store ints and floats as text in JSONB. It seems to defeat a lot of use cases for it as far as I can tell. There are few solutions for storing and retrieving/querying unstructured numerical data.
This sounds right. This JSONB seems to be oriented not around semantic parsing, just structural parsing. https://sqlite.org/draft/jsonb.html
Note too that JSON doesn’t really have ints or floats - the number type in JSON doesn’t specify a max size. Most implementations of course do rely on the native number types of their platform, so this implementation choice for SQLite allows them to keep a simpler implementation that sidesteps a lot of complexity that would come from deeper parsing.
The rational was to design a format that can serve as a (flattened) parse tree for the JSON.
JSON handling functions in SQLite took in textual JSON and mostly spat out JSON text. So their structure was: (1) parse JSON, (2) massage in memory representation, (3) serialize JSON.
If you can come up with a format that can serve as the in memory representation, and persist that to disk, your functions can skip (1) and (3), and focus on (2).
Still, many times you'll need JSON text at the boundary, so making (1) and (3) fast are needed to.
Parsing and formatting numbers can also be an unnecessary expense, if you do that more often than you actually need it.
The only thing missing with Json support now seems to be something like a gin index on jsonb fields to make querying efficiently on any member of the field itself.
Short version:
That prints "3.45.0" for me.If you have https://datasette.io/ installed you can then get a web UI for trying it out by running: