For an app running on one VPS? Sure, this might make a decent amount of sense.
But... for an app running multiple instances, the article suggests a lot of extra complexity. Managing that complexity makes less sense to me than just firing up a managed database server (RDS, or whatever).
Even if you're self hosting, I think running a MySQL/Postgres cluster is a lot less complex than the options this article calls out.
edit: and more to the point - MySQL / Postgres replication is boring. It's done by a ton of deployments, and isn't going to surprise you, at least at smaller scale.
Yeah. I've gotten into discussions on HN before about this. sqlite is awesome when you run it in one instance on a VM or something, but solving migration between hosts or backup via the filesystem seems like a mess. Sure LiteFS helps, but is it better than a managed networked database?
> One huge benefit to SQLite is the fact that it runs as an embedded part of your application
But all production workloads I've been involved with the last ≈5 years have either been containerised apps offloading state to databases/caches/blob storage etc, or have strived towards that.
What I do think would be awesome would be an embeddable Postgres library/binary that could use a single state file on your local filesystem for development and use a networked database in production. Getting the benefits I like most about sqlite locally, and not having to deal with files in production.
> sqlite is awesome when you run it in one instance on a VM or something,
This article really needs to be taken in the context of the author’s target audience: He sells courses that primarily target web dev juniors doing learning projects and small projects. That’s why he includes the “most of you reading this” disclaimer in the article:
> So, can you use SQLite? For the vast majority of you reading this, the answer is “yes.” Should you use SQLite? I’d say that still for the majority of you reading this, the answer is also “yes.”
Pushing strong contrarian opinions is part of his social media marketing style. He’s also pushing a “why I won’t use next.js” article and opinion on his social media, while conveniently omitting the obvious conflict of interest that he sells courses that don’t happen to cover Next.js.
Even with his bias you still saw a lot of agreement to his criticisms on the NextJS subreddit (which you would assume was biased towards NextJS) and the React subreddit.
Last time I truly used NextJS was in 2020 (it was v9), and that was only to make a statically generated brochure type site. I had started the site with Gatsby and didn't love Gatsby, so I switched to NextJS and loved it.
Recently I did start up a Next v13 project using the new App Router. But I haven't gotten that far past the initial scaffold and creating a couple components. Seeing all of these complaints about v13, especially the App Router, are making me want to scrap that and use something else.
By the time you're labeling stating your opinions in public and then acting consistently with them as a "conflict of interest", you've abandoned any ability to get anything useful out of the term.
So if I like apples more than oranges, I write a blog post saying I like apples and won't eat oranges, and I also have a business selling apples...it's a conflict of interest?
If a company sells apples and they start pushing social media campaigns about why oranges are bad and they think you should be buying apples instead, then that’s a conflict of interest.
I think some people are missing the conflict of interest because the author blurs the lines between his personal opinion and his business. He turned his personal brand into his business and uses his name as his brand.
EDIT: I think people are missing the meaning of "conflict of interest". Having a conflict of interest doesn't mean something is wrong. You can be right and have a conflict of interest.
The issue is that conflicts of interest exist whenever someone's paycheck depends on an opinion being true. If someone on Twitter was alternating between trying to sell you vitamin supplements and then posting articles about why those vitamin supplements are good for you, HN would have no problem with pointing out the conflict of interest and taking it into consideration in the context of evaluating the claims.
If someone's entire job and personal brand are built around selling courses for particular stacks, that's important context to bring up when they start writing about why other technologies are bad. Agree or disagree with the conclusion, but you have to acknowledge that the writings should be read with the conflict of interest taken into account.
We're missing the "conflict of interest" because you are abusing the meaning of the term.
What you describe is simply a bias.
Conflict of interest would be if I take money from parties A and B. Party A pays me to give a professional opinion about which nutritional supplement to take, which frontend javascript framework to use, etc. Party B pays me to endorse their special Snake Oil Pills and Ointment or React or something else. I take Party B's money and make those endorsements to Party A without their knowledge and without any actual consideration of what is truly best for Party A. There is a conflict of my interests in Parties A and B.
Your definition is excessively broad and could be used to suggest, e.g., that Nike has a conflict of interest because it is their opinion that you should purchase their shoes.
> e.g., that Nike has a conflict of interest because it is their opinion that you should purchase their shoes.
It's even worse: that Nike has a conflict of interest because they think that wearing shoes while running is good, even though it would be extremely weird if a running shoe company thought that running with shoes is bad.
Remove the business from the equation and the opinion is fine — it's the existence of a business that aligns with the opinion that creates the conflict! Galaxy brain definition.
> EDIT: I think people are missing the meaning of "conflict of interest". Having a conflict of interest doesn't mean something is wrong. You can be right and have a conflict of interest.
He has motivations that make him not impartial, but that's not usually termed a "conflict of interest."
You're just using the term strangely. Usually the phrase "conflict of interest" implies the person has some sort of official commitment / obligation / duty to another party that is put in jeopardy because of another conflicting interest.
But this is just some guy selling stuff. He hasn't made any official commitments; he doesn't have any official duty or obligation to remain impartial.
By your definition, anyone selling anything has a conflict of interest because they want money from their customers, which may be against the customers' best interests.
> If a company sells apples and they start pushing social media campaigns about why oranges are bad and they think you should be buying apples instead, then that’s a conflict of interest.
Pretty sure that's just marketing.
A lot of product pages will spell out features they have and list competitors that don't have those features.
> I think some people are missing the conflict of interest because the author blurs the lines between his personal opinion and his business.
No, what's going on is that you don't understand what "conflict of interest" refers to.
Different people having different interests, as when the apple company says you should buy apples even though you prefer oranges, is just a regular conflict. A conflict of interest is when one person has two different interests. The apple company isn't experiencing a conflict in your example.
He made his courses before Next.js rose to popularity.
I wouldn’t be surprised if he offered Next.js courses if it becomes popular enough in the future. For now, the anti-Nextjs push seems like a clear defensive play to steer people back to the course material he already had for sale.
> What I do think would be awesome would be an embeddable Postgres library/binary that could use a single state file on your local filesystem for development
I wired Postgres up for our local dev. I don't believe in mocking the database, so all our tests and local dev run against a real Postgres instance.
The main tricks:
- Store the Postgres installation in a known location.
- Set the dynamic loader path to the installation lib dir, e.g., LD_PRELOAD.
- Don't run CI as root (or patch out Postgres' check for root)
- Create and cache the data directory based on the source code migrations.
- Use clone file to duplicate the cached data directory to give to individual tests.
One thing I'd like to pursue is to store the Postgres data dir in SQLite [1]. Then, I can reset the "file system" using SQL after each test instead of copying the entire datadir.
Yep, I used that trick as well. The evolution we went through:
1. Each test runs all migrations on a fresh database. Each test spends 2.1 seconds on db setup.
2. Each test suite runs all migrations. Each test copies the template database from the test suite. Each suite takes 2.1 seconds to run migrations, but cloning a template database takes 300 ms.
3. Bazel caches the data dir and rebuilds it once for all test suites. Reduces the initial test suite setup from 2.1 seconds to 160 ms (to copy the datadir).
4. Each test in a suite uses clonefile to copy the datadir. Reduces db setup overhead from 300 ms (to copy a template database) to 80 ms (to clonefile a datadir).
Currently, most of our testing overhead is clonefile and cleaning up the datadir. I'm interested in a single file sqlite FS because I could be clever and use LD_PRELOAD to replace Postgres's file system operations with sqlite, avoiding
most syscalls altogether.
Nice, any chance you’ll package all this up one day? I appear to be getting better numbers than these right now but I suspect it’s because the DB is simpler and fairly empty to begin with, so presumably we could be going even faster.
I've tried--macOS doesn't make it easy. It's not a clear performance win since tmpfs with Postgres fsync=off is mostly memory-backed storage via the page cache already.
Another problem is that parallel tests can exhaust memory.
The slowest part of our tests is syscall overhead. A mostly empty Postgres data dir for a medium-sized database with a few hundred tables consists of 3k files. On my M1 macOS, it takes 120 ms to delete the entire data dir. Copying is cheaper at 80 ms using clonefile.
> What I do think would be awesome would be an embeddable Postgres library/binary that could use a single state file on your local filesystem for development and use a networked database in production.
I mean this is more of less what using a local container for Postgres is. It’s not exactly one file but it’s hidden in a volume somewhere where you don’t need to go file digging, so for all intents and purposes sure, it’s one file.
Most ORMs also will let you swap out your RDBMS so you can run sqlite locally and Postgres in prod via the same code. I think this is a bit of an anti pattern though. While the two are similar in most standard use cases, it’s not always apparent when you veer off that standard use case and into something that’s not supported by sqlite. And then all of the sudden you either have a divergence between what works locally vs in production or you start to intentionally limit your app’s capability to only what works with sqlite because you can’t locally validate it.
My general pattern these days is a compose file with postgres and another container to run my migrations (along with anything else I need). Then I have an integration test suite that hits all the code paths that touch the database. The whole workflow gets run again in CI but with the new image via a profile in my compose file. Its so fast and easy to do at this point.
> What I do think would be awesome would be an embeddable Postgres library/binary that could use a single state file on your local filesystem for development and use a networked database in production.
Absolutely! And also, short of that but incrementally building toward it, some sort of simplification of initial db setup and configuration, including really solid, thoughtful defaults (many of them are but I am guessing they could be substantially improved).
I feel like half of people’s attraction to SQLite is the easy setup and first deploy story. Postgres will never match that but could get closer.
In production, use a managed service (RDS, Cloud SQL, tons of others...)
In dev, set up a Docker container.
You won't spend a lot of time on either of these things.
I mean the filesystem is going to be used somewhere of course, and everything depends on your setup / the scale of things.
My point was more that with a networked database, you can easily centralise your backing up of databases. If you are in the cloud with managed databases, the backing up is kinda solved out of the box. If you are doing it yourself or on-prem, you probably have a database cluster of some kind where backing up can be managed (which of course involves a filesystem at some point).
But if you are using sqlite, every sqlite file across your organisation needs to be backed up. You'll probably have some scripts creating the actual snapshot and then some other scripts shipping that away somewhere. Those scripts probably need to be quite bespoke for different hosting solutions. Are you going to ssh into the vm's running the apps and do the backup that way, or will you be running the backing up scripts on each vm?
Of course if you have a single VM hosting a few apps with one sqlite database each, none of this is an issue, and it might be easier in the short term than a networked database. Or even if you are operating at "scale" and treating your VM's more like pets than cattle, it's probably not going to be an issue either.
Why is this considered "not using the filesystem"?
What's the conceptual difference between backing up your Oracle database to an Oracle Cloud Infrastructure bucket and backing up your sqlite database to an S3 bucket?
Your comment implies that running a database cluster is easy, or easier than what OP describes. And while I do not endorse his suggestion of using SQLite in a multi node setup, I can tell you from personal experience that running a multi-node RDMS is hard as hell.
In $oldjob I designed and managed a system that was eventually running many hundreds of MySQL 3-node groups. MySQL in particular has a lot of nobs that you need to understand to set up a cluster that doesn't lose data during a fail over (semi sync, sync bin log, trx commit flush, ....). And then of course: neither MySQL nor PostgreSQL have a built-in mechanism to handle fail overs at all. It's another software stack around that, or even something custom built.
Again, not saying I agree with the author. I'm just saying you're portraying it easier than it actually is.
Tangentially related: my personal open source project (ntfy.sh) is starting to outgrow a one node SQLite-backed setup, and I am now faced with the decision we are talking about here (something I would not have imagined, I might add): do I use some SQLite-replication thing, or do I rewrite it all to work with a RDMS and deal with managing that? It's quite the annoying choice, so I really wish there was a great SQLite backed solution, hehe.
If by "outgrow" you mean you want to improve concurrent read/write performance, availability, and throughput via replication, then I would -- personally, for peace of mind and ease -- look at just buying a managed RDBMS from someone with predictable pricing and using that. Buy PlanetScale, Aurora, Neon, CockroachDB Cloud, etc and let them handle it. PlanetScale and Aurora in particular have plans that let you fix the I/O costs and only pay for compute and storage, which is pretty attractive. They also use standard MySQL/Postgres, so you can migrate in/out to other providers later. It does mean you must spend time on the migration though, and spend time understanding the implications of your new storage choice (latency, failure modes, etc.)
If you just want peace of mind and better disaster handling, and your current uptime, concurrent workload split, and performance is fine -- maybe your hosting provider eats it for an hour or two and you don't want to be hamstrung -- I would suggest just using a tool like Litestream, replicating the DB to S3, and setting up a hot standby server in some alternative region that you can fail over to and that actively synchronizes the working set. Always keep them up to date with deployment automation. Set up some alerting, and if a failure happens, terminate the main instance with prejudice, let the replica catch up by replaying any latest changes, and reorient your load balancer to point to it (Cloudflare tunnels are a good low-tech solution you could use to do that.) This might imply a small downtime window for the hot failover. For bonus points, you can automate this whole task, and actively perform it regularly, swapping between servers on regular cadence -- thus turning the design from having a primary/standby to simply having two interchangeable systems that swap roles. And then you have good confidence in hot-failover disaster recovery. e.g. just do this whole dance every week on Monday at 1am UTC, and you can have confidence it works and will stay working.
I do not know what other architectural constraints you might have. These are just suggestions. Good luck!
I know it's heresy, but could you defer the decision by sharding? E.g. customers with an email that hashes to a positive number goes to instance A; others to instance B. That might buy you some time.
I have and it works really well. This is less about SQLite not being able to handle it -- because it does -- and more about reliability and availability of the overall system. I don't want one server being down to affect the entire system.
The vagueness of this comment and its perceived derogatory nature is merely a result of my ignorance around these technologies. It was not meant the way you perceived it.
That said, I looked at 2/3 a while ago:
- rqlite looks great and mature, though it has a HTTP interface, so I'll need to rewrite my database layer anyway to use it. At that point might as well to Postgres
- dqlite seemed like a failed Canonical experiment to me last time I checked, but looking again I think I may have had the wrong impression.
There is one edge case where comparison makes sense: read only workloads. If you have a large, mostly static database (e.g. a car parts website) it can make sense to just sync a Sqlite file across all your servers.
On the other hand, read replicas are also the easy case for database clusters. And just making your one database server bigger also scales a lot, there is nothing stopping you from adding a terabyte of RAM.
And just making your one database server
bigger also scales a lot, there is nothing
stopping you from adding a terabyte of RAM.
Underrated development strategy. If your max envisioned workload won't exceed what a single server can reasonably handle, this is orders of magnitude easier&cheaper&performant than anything else.
Additionally, I think a large percentage of the HN crowd underestimates what a single large-ish single server can handle these days. Probably because we are all used to only seeing tiny virtualized slices these days.
192+ threads, 2TB of RAM, terabytes of crazy fast PCIe storage. This was multiple racks of servers not too long ago, now in ~2U and it costs less than a car.
You can get Dual AMD EPYC™ 9634 Processor 84-core 2.25GHz 384MB Cache
with 3TB RAM and 200TB of SSD storage for less than a car. So 336 threads in 2u box it's really crazy.
> I can tell you from personal experience that running a multi-node RDMS is hard as hell
There exist varying levels of complexity. Here at my company, we have a very small cluster comprised of a main PostgreSQL instance and four replicas. It just works and requires very low maintenance effort.
But I imagine that things can get really messy if you're dealing with dozens or hundreds of instances, for sure, especially if you need sharding.
> we have a very small cluster comprised of a main PostgreSQL instance and four replicas
As you said, it depends on what you want:
- If you are okay with losing data in the event of a failover, then a simple setup is fine.
- If you want lossless failovers (i.e. not a single transaction is lost), then that's hard.
- If you want entirely automatic failovers, that's also very hard.
- ...
My experience stems from MySQL, and not PostgreSQL, so this may or may not apply to pg. I wrote about my experience in this blog post: https://blog.heckel.io/2021/10/19/lossless-mysql-semi-sync-r... --- TLDR: Not losing data with MySQL is really really hard.
PG fully consistent automatic failovers are also pretty hard, at least they were 3-4 years ago when we did such a setup (initially it seemed easy, but tests showed its limits, and getting it reliable was pretty hard). Now I heard that there are Kubernetes based solution which are said to automate it pretty well, but have not tried those yet.
What concerns me more than anything with a process like LiteFS is the 'unknown unknowns' perspective of what can go wrong - whereas I think MySQL and Postgres have generally more understood failure modes.
I also think that a lot of people in middle sized projects where they're not big enough to have a dedicated database team, etc will end up using some kind of managed database solution - something like Amazon Aurora, or RDS, or even someone like DigitalOcean's managed database product? Those generally take care of sensible configuration, backup, and failover at least.
edit: FWIW, I've done both ways - self-managed MySQL & Postgres clusters and managed services. You're right - getting MySQL in particular set up to be crash-safe etc requires careful documentation reading, and it's very frustrating that default installs of MySQL don't come already configured like this. In my eyes, safe-but-slow is a much better default than fast-but-risky :/
I think the excitement about SQLite comes from LAMP stack fatigue
Every Wordpress site could have been built on sqlite just fine, with better security too. But beyond that, you probably want a real RDMS with auth and alter table.
Back in the day, WordPress always ran on Apache with mod_php, correct? Wouldn't that be unsafe to use sqlite, since every request ran in a separate thread?
Writes and changes to the database itself are locked to a single process at a time, per your link.
If the load is very read heavy the write locks aren't an issue, but if you're tracking page views, comments or any other write-heavy feature, it'll be harder to get good performance.
If you had something that needed very easy setup but also scalability, you could do it all as a monolith and not have anything else to set up besides just the one app and your certificates, and they'd just all automatically find each other.
I'm not sure how else you could do that besides a DHT without a central database or manually assigning certain pieces of data to specific servers.
Honestly, how big your application needs to be to outgrow SQLite? If you can serve X customers from one server with SQLite, good for you. If you need to serve more people, you should be in a position to afford migration to something more complicated and good for you too. I don’t see a problem here. Let’s start small with the right tools and don’t fall in the premature optimization trap. You don’t need DB cluster, Kubernetes and other fancy stuff when you start your project.
SQLite is not easy to use with migrations, because it doesn't support many ALTER TABLE options [1]: you need to create a new table instead of modifying a column, for example. Also, foreign keys are ignored by default and you need to explicitly enable them after connecting.
Also, column types are not checked and you can easily insert a string into numeric column.
Also it doesn't allow you to use multiple application servers.
SQLite attempts to coerce the data into the appropriate type using the usual affinity rules, as PostgreSQL, MySQL, SQL Server, and Oracle all do. If the value cannot be losslessly converted in the specified datatype, then an SQLITE_CONSTRAINT_DATATYPE error is raised.
To clarify the bit about coercion: "SQLite attempts to coerce the data into the appropriate type using the usual affinity rules, as PostgreSQL, MySQL, SQL Server, and Oracle all do."
Worth mentioning that numeric in Sqlite is still just a float.[0] A table of two rows where column a is 0.1 and 0.2, respectively, sum(a) will not yield 0.3.
In Microsoft SQL Server you should use MONEY or DECIMAL.
In Oracle you should use NUMBER.
In Postgres you should use MONEY or NUMERIC.
In MySQL you should use NUMERIC or DECIMAL.
In IBM DB2 you should use DECIMAL.
These are all backed by integer data storage and arithmetic, but the database handles scaling the values for you, to whatever number of decimal places you have configured. SQL Server and Postgres MONEY type will additionally format values with a currency symbol, when converted to a character string.
In SQLite you're out of luck - if you want accounting values you'll have to store them as integers and scale the values yourself.
Source: I work on a mobile app with offline storage of pricing and weighed quantities, using SQLite in the app, and SQL Server on the back end.
You should still not use MONEY in Postgres, even their own website says so [1].
I still wouldn't use floats/numerics/decimals to store currency either in any db generally, as said by others you're going to end up with inaccurate numbers [2].
Therefore using integers is in fact very good for this use-case, especially if you are in accounting or book-keeping!
Source: I work for a Fintech company that processes millions of payments.
At Prisma, the "twelve easy steps" for performing a table change in SQLite documented on https://sqlite.org/lang_altertable.html is a bit of a running joke.
SQLite is super versatile, but very different from other more traditional RDMBS.
A bit of unfair comment. These steps apply only to ALTER statements that are NOT supported natively by SQLite.
The natively supported statements are as easy to perform as in MySQL or Postgres, for example.
> "The only schema altering commands directly supported by SQLite are the 'rename table', 'rename column', 'add column', 'drop column' commands shown above. However, applications can make other arbitrary changes to the format of a table using a simple sequence of operations.
> Also it doesn't allow you to use multiple application servers.
Not in the same way you use postgres etc, but you can do it with sharding or with LiteFS, but you do have to consider carefully how you scale your app.
I'm not _really_ disagreeing with you, but I think you're painting with a bit too broad of a brush.
There's a workaround for the alter table flaw, but it's a bit involved and takes longer - create a new table with the updated fields, move all data, remove old table, rename the new table. I'm sure there's a historical or valid reason for it, it's just annoying.
Not a solid guarantee. More prone to bugs, errors, etc. What is someone changes something using the comnandline client and forgets to issue the pragma command?
I would feel a lot happier using SQLite if this was a per DB setting rather than a per connection one.
Judging by the documentation, if you issue a PRAGMA foreign_keys; and no row is returned containing a 0 or 1, then you are using an unsupported version of SQLite, or the library was compiled with foreign key support disabled. I am struggling to find any documentation that states if anything will occur if enabling foreign keys in the connection string, when the version does not support foreign keys.
It’s also only two years old which is forever in the web world and brand new by Databases ops/maturity standards. There are likely still warts waiting to be discovered (there always are, but the discovery rate tapers over decades).
One thing I don't get ... are people really having problems getting PostgreSQL, or even MySQL, up and running? Is this REALLY a dev/devops/ops hurdle for people?
Just seems strange that is actually a problem.
Don't get me wrong, I use SQLite ... it's just not the database for my company's management web portal.
Just let the HN hype wave wash over you my friend :)
I see the sqlite hype as a (legitimate) pendulum swing away from defaulting to 'web scale' everything to a realization that most of the systems we design don't need it.
On the other hand, when you start mixing in new tech like litefs to paper over shortcomings in the fundamental nature of an alternative, I start to question the sanity of the choice.
We have screwdrivers, hammers, wrenches, let's not select a wrench and wrap it in a 'screwdriver adapter as a service'.
> I see the sqlite hype as a (legitimate) pendulum swing away from defaulting to 'web scale' everything to a realization that most of the systems we design don't need it.
That was exactly my thought, the majority of work that gets put out there isn't going to need to scale to many different nodes to serve thousands+ of concurrent users. If you're actually serving tonnes of traffic and need clustering or high end redundancy, have at 'er. But for most of the professional work I've done, SQLite is more than enough (provided you back it up properly!)
I think this is also influenced by my philosophy of keep things simple and only scale when you need to. I haven't had need of k8s or anything like that and suspect many companies serving small-medium traffic loads don't either.
> We have screwdrivers, hammers, wrenches, let's not select a wrench and wrap it in a 'screwdriver adapter as a service'.
My hats off to you on this comment, even more so if you are the originator. Personal experience shows that I have added more complexity by spinning up a postgres server 'in case I need it' more often than not.
Why do we need to hype what amounts to a core software for the planet. Sqlite is good in some places. Mysql is good in others. Were all smart enough here to make that decision when to reach for what
I question your metaphor - SQLite has been popular here since I started reading this site. Wave suggests a rise and fall of popularity (possibly in a long lasting cycle)... sqlite has always just been popular and well regarded.
You may be noticing it more out of a frequency bias (baader-meinhoff effect) situation.
"Just starting a postgres" is possible but it's a lot harder than it should be.
Generally, you can just run the `postgres` command, but it creates many unnecessary hurdles:
* Just want to invoke `postgres`? Not possible, you need to invoke `initdb` first. Why can't `postges` do that for automatically you if the DB dir doesn't exist, like other servers do (e.g. `consul`)?
* Depending on your invocation, you're likely to run into errors mentioning "initdb: invalid locale settings; check LANG and LC_* environment variables". Why isn't there a simple UTF-8 default?
* Postgres refuses to run as `root`; that's unoverridably hardcoded in its code. So you can't "just start it" when you're `root` inside a container or VM in which only postgres runs. It can't run well under `unshare` when you want things to shut down reliably on cancellation in CI. This constraint should just be removed; no other software refuses to run as root. This mindset is indeed from 27 years ago.
* You can't start an in-memory postgres for quick testing or CI. This feature does not exist.
All of this is fixable, but it currently creats the "oh you can 'just run' postgres, but not in random conditions A, B, C and D".
I always put a docker-compose.yaml into the repo and in the readme some instructions to install docker and docker-compose and that they should run `docker-compose up`
Works fine for almost any team. Obviously it would be great if they would try to understand it, but it's really not necessary.
Not like most web devs know how most of the tools they use work (Source: All the teams I worked at)
“docker run -ti -p 5412 postgres”? Easy enough to type on my phone. That downloads and starts everything you need and wipes the data on exit. Ideal for prototyping. If you want to keep the data, add some -v flag or use a compose file as suggested below, even easier.
Devs not knowing docker? Just install docker on their machine and ask them run the command above. One day has to be the first.
I agree. SQLite is great for some simple purposes, but this article acts like launching PostgreSQL is impossibly difficult and can only be done on a server 100ms away from your web server.
For simple apps I just run Postgres on the same server and size things appropriately in settings. It’s really not hard and it’s well-trodden territory.
If someone is making a single-tenant app that runs on an embedded device or something then SQLite is awesome.
The way this developer concludes that “most of you reading this” should use SQLite is very strange. Is that indicative of his audience building small or toy websites rather than actual production websites?
Exactly the point here. There’s a bunch of brilliant reasons to use SQLite. One good reason is that by the time you outgrow it, you’re probably shifting enough data for it to be a worthwhile use of an engineering team’s time to plan a migration.
If you use an ORM from the beginning, you might not even have to change very much of your application code.
I had one use case at my previous job. It was a bit of management software installed on premises / on physical or virtual servers (it was a bit old fashioned). The total installation involved setting up PHP, Apache, MySQL, and then installing the application itself, which also involved running some migration scripts written in shell scripts.
What I replaced it with was a Go binary with SQLite, where installing it would be a matter of installing the package or just the files and starting it up, it would take care of the rest.
SQLite is great for systems you don't control or have to set up, but maybe not for webservers. Common use cases are apps' internal storage (does not need to be shared with other applications or distributed), you don't want to have to install, configure and run MySQL or Postgres in those cases.
I think you vastly underestimate how many people use SQLite for this. The number of times I've seen someone boot up an entire Mysql Server fora dedicated app is far larger than the number of times I've seen them just reach for sqlite.
Indeed, and given the constraints that allow for SQLite also allow for running a local postgresQL instance, so no service discovery, complicated auth flows, etc, required, it should be even easier.
SQLite is fantastic, it absolutely should be used where appropriate.
I don't however understand those who argue that replacing the likes of PostgresQL / SQL Server with it is generally appropriate.
The vast majority of my development time for any project is setting up a server, database, web framework, etc. It's a massive pain in the ass and gets in the way of what I'm actually trying to do.
Administering an RDBMS server is only part of the problem. It's also mentioned quite clearly there that by having data locality (just disk I/O) you get massive efficiency gains in queries and effectively don't have to worry about the N+1 query problem. Queries run in the order of microseconds, not milliseconds because of network latency.
A DB server won't have to go to disk for every query, sqlite almost always will, or at least reach out from user space to virtual memory, which is still slower than reading from a buffer table.
So ya, in process is faster for trivial stuff. But it's not faster when doing heavier workloads that benefit from things being in memory or having a better query planner.
The query planner is still not nearly as good as PG's, but it's okay for simple applications.
I have actually used sqlite "at scale"... I'm still not sure people know that system (30k+qps dumping data into kafka) is running on sqlite on an EBS volume lol. But in this case it's just streaming stuff from disk, so pretty much any DB would have worked.
There's so, so much missing context from almost every "you should use this tech" post. Software engineering is such an incredibly broad profession. People working on firmware for embedded medical devices and people making line of business software for an insurance company have wildly different choke points on productivity and utility. Budgets vary too. Maybe scaling SQLite can be done for low hosting costs if you just invest 200 hours of labor, but for most orgs, the 200 hours of labor are way more expensive than the hosting costs, so just pay for Postgres on AWS and don't worry about it.
It's not about having problems, it's about the work that it takes to do. It's an extra hurdle that simply doesn't exist for SQLite. For a lot of projects that's worth it.
Your comment is basically the famous HN Dropbox comment but for databases: "are people really having problems getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem?"
It likely has more to do with the “data” team being different than the “application” team at a lot of big companies. It leads to many handoffs and inefficiencies and so people do a lot of things to route around it.
Not a knock against SQLite at all, I use it in production for my own projects and some professionally as well.
Note that your operating system caches file system reads so SQLite doesn't need to put anything extra in RAM. Most reads will be from RAM, so no disk access required.
True but the new hotness is edge and your database can be really far from where you actually deploy. This is what this article is trying to push back against. Mainly because Kent is all about Remix and Remix is more about just deploying to traditional servers.
Yes, the net can be very fast. Of course when you start adding IP, TCP, SSL, HTTP, json serialization, chatty protocols, less than stellar implementation languages... suddenly [1] the network is the bottleneck.
Also, SQLite has a configurable local page cache which it maintains directly in your application's process. There's a very likelihood that you're just hitting that in-memory cache directly, let me emphasize, in your own process.
In-process memory cache hit is faster than separate-process memory cache hit over ethernet.
I'm just as much a corner-cutter as the next "just yeet it in prod and we'll deal with it later" lazy grug brained dev, AND I fucking love SQLite...
But saying it SQLite good because no latency is weird, and saying that THAT is good because you can continue writing avoidable "I am not just lazy but I don't care about thinking about my database for 30 minutes" code... now that's a leaning tower of pisa of arguments that I can't follow.
Yes. We have a bunch of n+1’s in our product, but they are in places where it literally does not matter. A single record getting pulled down, on a page deep in a UX flow.
It’s not worth our time unless they work into pages with large data sets or core flows.
I have "ignore this n+1 warning" in a few places in our unit tests for exactly this reason (details page, so n will never be more than 1). But these don't cause a problem on any database. So, again, not sure how this is a win for SQLite.
AFAIK SQLite can scale quite a bit with WAL2, and then probably it is easy to upgrade it to Postgres. If your data model was bad from the start up, it doesn't matter if you used Postgres initially.
If you find out your data model was bad, and is a constraint to scaling, you can find it out without spending a lot on insanely scaled RDB instances.
I'm for the build one to throw away approach, and SQLite is often a good tool in the toolbelt for the initial attempt of this.
Now we're getting silly. If you have n+1 issues you already have a bad data model which is why what you are really saying is "SQLite lets you have a bad data model which gives you velocity!!!!!" which is obviously (I hope it's obvious) not a good thing.
From a business perspective: You might not have to solve that problem at all in 6 months, because you might not get so many users for the product, that scalability is a problem. You often need velocity more than clean code, because you are not sure if the idea will get traction at all. In that case, from a business perspective, to be able to experiment you need velocity more than a good data model.
I did this kind of exploratory development both with SQLite and Postgres, both had different strengths and weaknesses. Had the purist clean data model with Postgres cost more than it should have when the product had to pivot the product. Also have seen the scalability limits of SQLite. Just shutting down a POC project backed by SQLite, which failed to get traction mostly because of business execution problems, and SQLite saved a lot of effort for the implementation, and using a "cleaner" and "more scalable" solution would have only reduced velocity, but would not have got us more users, the system worked fine this way (even has N+1 queries in some places). Should we got to the point that SQLite starts to become a bottleneck we could have afforded to clean the data layer (we had to rework countless times already as requirements were in a flux) up a bit and move to more scalable solution. Others might stick Firebase or MongoDB there, and it also does work for a while, and that is also a valid decision IMHO from a business perspective.
I think this is a kind of topic where the premature optimization is the root of all evil meme can apply, depending on the product you are making and the available resource pool. Not for the hello world examples of course, and not for the well defined well funded project, but there are lot of exploratory attempts out of the bigco/vc funded unicorn world.
It's frustratingly close to correct. Since it's on local disk, it winds up being cached in in-process RAM, which is..... still not zero latency... but has so much less latency compared to talking the to database over a network stack... that I'm alllllllllmost sorta kinda okay with it.
Yeah, it's interesting to see what happens when people with confirmation bias hit an unexpected opinion and can't explain it away. So all they can do is ignore it and ask everyone else to pretend it doesn't exist.
Saying it's not uncommon for a database to be tens or even hundreds of milliseconds away is ignorant at best. If you care about performance, you're not jumping across the US for every database query.
Honestly this is the kind of stuff written to fluff out a social presence and offers nothing to a technical audience.
There could/can be cases to use SQLite over SQL Server, MySQL, Postgres, etc. It really comes down to reasoning with the problem.
For me personally, I have only used SQLite for 2 scenarios.
1) When developing. SQLite is a simple method to getting something together, before building a proper MySQL/SQLServer/Postgres one.
2) Client software. Some client software needs a temporary data store and find SQLite great for this. These are not complicated databases. They barely have about 5 tables. Nothing more than a Queue system than anything, which goes over the wire via Rest API to a Messsaging Queue. Once sent over with a reply of "success" - it is removed from SQLite. It is a nice, simple system.
There can be a few other reasons.
Other than the above, I don't think I would be confident of SQLite file for a moderately sized database, likely to be used for some web portal or CMS system we created where more than one request could be talking to the database at the same time. This is why you should stick to those central, relational databases.
MySQL and Postgres, especially, are not that difficult to install, whether on the same system or its own dedicated server.
I hope my comment is not seen as a negative for SQLite. If anything, my attitude is the opposite. I really like the work and effort that has gone into SQLite. A virtual pat-on-the-back by those involved! It is a great approach to having data in 1 file, rather than writing your own.. or having lots of "records" in their own files (in a directory) which I have done.
Admittedly, I cannot comment on replication options for Sqlite. I have not bothered to even look into this. To me, it already goes beyond the purpose of what makes Sqlite great --- to be a powerful sql in one file. Great for programs that need a storage solution without the reliability of being connected to another machine.
In my client programs using Sqlite, it has been designed in such a way that if the Sqlite file is corrupted then it is not the end of the world. We know what didn't get sent to the server so we recreate the file and resend, etc.
(However, our Sqlite files have been very reliable with the exception of software updates, but we ensure the sqlite db tables are empty before you can update the client software)
Thats why I specified "client software" in my original comment, but I don't see why server-side daemons cannot use it as well. I guess If more than 1 program needs it, though, a combination of CRUD operations, I would recommend something else.
SQLite is an awesome tool, but I wish we’d stop trying to force it into situations it wasn’t made for. It doesn’t belong anywhere near a web app deployment that currently or will span two or more machines. Once you have a “distributed system” and the complexities of a DR scenario, reach for a tool that was built for such. You can’t go wrong with PG/MySQL. You can go very wrong with a VC-hyped replicated file system from a company known for outages.
Yeah it kinda feels like we're just going back to how things were before everyone decided 12-factor was the new hotness.
SQLite has its uses for sure - just not on a system that might need horizontal scaling, HA, or separate worker processes.
Sure, sure, you can duct tape a few SQLites together but then you're dealing with this
> LiteFS only allows a single node to be the primary at any given time. The primary node is the only one that can write data to the database. The other nodes are called replicas and they provide a read-only copy.
Might as well have just set up a separate DB server, less faff. At the point where you need to scale MySQL/postgres beyond vertical scaling you can probably afford to pay someone to do it for you
I've seen people use a few hundred lines of javascript to create full blown distributed systems that are offline and end to end encrypted and backed by sqlite. They only use the server as a glorified caching system to sync to\from. It's esoteric (logical clocks, conflict free replicated data types), but that's a hell of a lot more distributed than a client-server architecture and it's backed entirely with sqlite.
I mean this is kinda what I’m trying to get at. You need to build all this bespoke infra in order to fit SQLite into the round role where a database with native replication is a far more proven and battle tested solution.
When I say a few hundred lines of javascript (and I'm no fan of javascript) I literally mean a few hundred lines of javascript. No frameworks. No complicated server logic. Just CRDTs and a hybrid logical clock (written in fifty or so lines of JS) and a server running a little message bus...I once heard JLong talk about this and demo'd the capability to a military contractor in an office in old town alexandria and they thought it was hilarious. In just a couple crappy javascript files I was syncing data offline between twenty five different clients regardless of internal system clock speeds (I used a laptop that was running hot and had a fast system clock) or other bullshit. It was magical. And all of that was ran through SQLite. I even used some SQlite PostGIS-like extension during the demo. So yes, sqlite can be distributed in the purest sense of the word.
Running instances, not machines. There's like 15 years of architectural advancement in running web apps that says that more than one copy of stateless software is the best practice. SQLite was not designed to be used for web apps.
HN's bizarre groupthink occasionally decides to subvert architectural norms for fun, and any blog post that's trending on the front page suddenly counts as "good advice". Even if it's literally insane, if it's on the front page it's considered sage wisdom, and naysayers are downvoted for being negative about the insanity.
Those 15 years of architectural norms started at a time where a multi-core 2GB server was an expensive luxury!
We know a LOT more today about building web applications than we did 15 years ago. That's why some of us are ready to say that for a lot of applications, a single server running SQLite (with streaming backups to S3 via Litestream) is actually a pretty great option.
I don't mean use 15-year old patterns, I mean we've learned over 15 years what's good practice and what isn't.
We used to use .csv and .ini files that way. After that we used BerkeleyDB, since it was built into so many tools. Or serialized objects. If you got fancy you'd read/write them to a temporary filesystem, and occasionally sync the files to NFS; if the app restarts, sync the files from NFS back to temporary storage, or if the temporary file isn't there, read it from NFS and write it to temporary storage.
Today, if you have to write a web app that has dynamic content and writes and reads user data, you can do it a bunch of ways:
- "Mostly-JS": Write your app in JavaScript. Some hosted provider out there allows you to submit data to some backend app, and read it from the backend app. (This, by the way, is how CGI-BIN web apps worked 20 years ago)
- "Mostly-CGI": Use some managed hosting provider to host a web app with some framework they support, like Wordpress, or Django, or something else. Typically they bundle a networked relational database with it, but I suppose you could trade that out for an sqlite file. Probably wouldn't support Litestream but you could probably have a cron job copy the file to s3.
- "Mostly-standalone": Buy some virtual machine or physical server, set it up, maintain it, write your web app, run a web server that runs your web app, write the file to local disk, set up Litestream. Never touch it, patch it, upgrade it, etc, so it always has security vulns, but it will just keep running, so it will be much easier to get going.
- "Mostly-managed": Pay for some hosted provider to give you the ability to run your app in containers. You build it, you push it, and then set up the thing to run it. Like the previous option, you can opt to never patch or upgrade anything so there's nothing to maintain. But since you can use a bunch of SaaS providers to build, test, and deploy automatically, it's way less to maintain and much more automated. Again, typically you'd use a networked relational database, since they're largely designed to run one app at a time, but you can customize them to have some background task try to upload your sqlite file out of the container or a sidecar.
- "Completely managed": You upload your code. They completely manage the versions of your app platform, doing patches, upgrades, running different instances of your app, databases, web servers, whatever you need. You basically don't have to think about anything but code. You could use sqlite but there's probably no way to back it up.
If you don't need to read and write user data, you don't need a web app at all. A static site works just fine. Hell, you don't even need a static site generator anymore, just link to the right css or js files in your html to DRY up your static content.
Vast vast majority of web apps run fine on a relatively modest VPS. And almost all will run fine on a beefy dedicated server. No amount of best practice changes that.
Best practice is the practice that works the best. That isn't groupthink. If it works, it works, regardless of whether a single person or a group is doing it. Groupthink would be whatever the group thinks is best, which isn't necessarily what actually works well at all.
The reason why running an app on a VPS is not best practice isn't that the VPS can't handle it; it's not a question of scale. The issue is that there are many different problems with running an app on a VPS, and it is much better to avoid those problems by not using a VPS at all, rather than to spend a lot of time trying to mitigate those problems on the VPS. If there is an alternative that is lower cost than the cost of mitigating the issues on the VPS, then that's the better solution.
It turns out that there are solutions like that, among them Heroku/DigitalOcean App Platform, DigitalOcean K8s/AWS EKS/GCloud GKE, Fly.io/AWS Fargate, Google Cloud Run/AWS App Runner, AWS Lambda/Google Cloud Functions, and many more. Almost all of these are better than running your own VPS, because they are designed to remove the problems you will inevitably run into with your own VPS.
Some people buy cheap things and replace them frequently. Some people invest in better quality that lasts longer. If you'd rather patch the holes in things rather than simply last longer, a VPS may be the solution you want. But it's certainly not the best solution.
I’ve been using Pocketbase[0] on my projects and I can’t recommend it enough.
It is built on a SQLite db and has real-time pub/sub capabilities. Its JS SDK is incredibly easy to use and setup for CRUD as well. For side projects and some medium tasks, I’d say SQLite/Pocketbase has been super easy to work with.
One possible strategy is to have one directory/file per customer which is one SQLite file. But then as the user logs in, you have to look up first what database they should be connected to.
OR somehow derive it from the user ID/username. Keeping all the customer databases in a single directory/disk and then constantly "lite streaming" to S3.
Because each user is isolated, they'll be writing to their own database. But migrations would be a pain. They will have to be rolled out to each database separately.
One upside is, you can give users the ability to take their data with them, any time. It is just a single file.
As a guy running a few Debian web servers for some small client projects and a few personal projects (some of which are medium sized, e.g. a website that gets 10 million pageviews / month) I actually tried to move from MariaDB to SQLite. The idea seemed great, at least in theory. Mainly to get away from mysqld, which actually crashed on me and corrupted some data once or twice in a timespan of 15 years.. think it was due to package updates every time.
Anyway, query and app logic modifications aside, I quickly ran into two unsolvable issues for me.
1. The lack of a robust web based admin tool comparable to phpmyadmin (no, phpliteadmin doesn't even come close)
2. No alter table. This hit pretty hard.
Plus lots of other idiosyncrasies - no enums, no type for dates, etc.
In the end I decided it's not worth the extra work, trouble and risks and stayed on Maria.
My Datasette tool doesn't quite serve the same purpose of phpMyAdmin unless you're OK with read-only access, in which case it works great: https://datasette.io
I'm finding Sqlite DBs in S3 to be a great solution for ETL tasks.
It's a project where multiple other applications want to process data produced by my application. Instead of forcing them to make API calls to retrieve all of the data and all the network overhead and latency that entails, periodically write Sqlite DBs to S3 and post an event that the database is available. Can create one database per customer, or whatever your natural partition key is. Then it's just pull the database to local disk, pull all the data you need with SQL queries, and delete from the bucket when every client has had a chance to process it.
I believe it offers much better throughput than going through an API.
The following is our current strategy. We used to be all-in on SQLite but we have discovered some nuance as we grow. We do B2B SaaS/PaaS for banking sector.
If we have a customer that is OK with RPO measured in minutes-to-hours and they have fewer than 5k active users, we are perfectly happy running with SQLite on a customer-managed VM and instructing them to use snapshots for backup & restore. Everything on one simple cheap VM in Azure/AWS/on-prem. One storage volume to snapshot. No weird tricks.
If the customer demands stronger consistency between their underlying record system and our system (specifically at time of catastrophe) and/or has more than 5k active users, then we are starting to push towards a separate cloud-native stack. Azure SQL Server Hyperscale with geographic replication to another region within ~100ms of the primary region. I looked long & hard at some of the SQLite replication options, but there is no way in hell I could get them to pass due diligence with the kinds of CTOs I have to argue with regularly.
SQLite is incredible for keeping it light & simple, and we still advocate for this where it fits the overall technical strategy. In terms of performance, before you reach a certain breakpoint, it is very hard to beat SQLite. You have to do it wrong on purpose to make it slower than a single node hosted solution on equivalent hardware.
I'm a huge fan of SQLite! My org's apps use it heavily, often via this simple key-value interface built on sqlite: https://github.com/aaviator42/StorX
Handles tens of thousands of requests a day very smoothly! :)
As an aside, has anyone tried using a RAM-disk as the storage medium for SQLite DB files? We've started experimenting with it lately and results have been promising!
Yes! Like I mentioned in the other comment, multiple processes cannot access the same :memory: sqlite db, which makes it incompatible for our use cases.
SQLite is cute and very appropriate in certain cases:
- Local storage for a mobile/desktop app, instead of a DIY file type.
- Local cache storage for a distributed system, instead of a DIY file type.
While you can use it as a general backend, and some people work hard to make it usable in distributed systems, it continues to be a fancy elaborate project to turn SQLite into something it isn't. It's a thought experiment that happens to get corporate funding. It's fun and interesting in the same way as Dogecoin is.
There's plenty of free DBaaS these days that are frankly incredible and FREE. I work for one, so might sound like I'm chilling that space. But I recently went to work for one exactly because I think the new offerings are an incredible leap forward in developer productivity, and there's a lot of cool stuff to be done here. I could instead have gone to work for one of those who are working on SQLite, but I just don't believe they're real products for real web workloads. They're a marketing catch for curious learners and experimenters. I respect the intellectual curiosity but it's not how I would build anything.
If you're building a business, the Free tier should be a starting point, not a forever solution. Either the business dies, or it grows and you'll graduate to a paid option very quickly.
Sure, but building a business isn't the only reason for people to select a hosting solution.
Personal side projects are the most obvious example.
I do a lot of work in journalism. Newsrooms don't have a budget for maintaining interactive apps they built for a story that ran ten years ago, so they often made the sensible (at the time) decision to deploy them as a free, scale-to-zero Heroku instance.
Those apps are all gone now, thanks to Heroku bait-and-switching.
Many of those same newsrooms were paying customers of Heroku for other purposes!
I don't know what SQLite has gotten over a 5 line docker-compose.yml running PG locally for my project. It's ready for scale-up when needed, supports more features and foreign keys without setting a flag manually, proper types etc.
I get SQLite for mobile and toy projects but really, a PG set up in docker-compose.yml in your project is the easiest thing to do these days.
Can you expand on that? my use case is pretty simple and I haven't been able to find something on it that really bothers me.
Also, about PG: I run each project in a different port. Not defaulting to PG's default 5432 makes it able to easily run several projects at once without clashing, just like you would with several SQLite files. I don't really see any drawbacks in using a little docker compose for your local dev set-up.
For something as simple as having your own little DB side by side to your project, it is really not that complicated. A 5 line docker-compose.yml gets you a fully encapsulated PG instance on a custom port. One per project.
I hope you give it a try someday if you get the chance - also, it's a process in a "jails"-like environment, not a VM, not another full blown kernel running - so really the performance overhead is minimal to none - just like if you were running PG "locally" (well, you actually are)
I actually get to see a little more of databases than most so:
"SQLite is a sql-based database with a particularly unique feature: the entire database is in a single file."
Digital Equipment Corporation (DEC) developed a SQL database known as Rdb for VMS. It could store everything in a single file. Oracle bought it, and maintains it here:
Serverless doesn't mean hard-drive-less, it just means that you, the person deploying the software, don't have to manage servers, only your application which may or may not access files on the filesystem that may or may not be a local disk.
If I was going to give folks advice, it would be to use an ORM, instead. That let me build low-investment prototypes using SQLite. When I outgrew it, I could easily switch to PostgreSQL by adding the appropriate test framework plugin and changing the database connection string.
SQLite, for all it's features, has some pretty significant limits. In my case, I needed better date/time and decimal type handling than what's on offer in SQLite. I still test my code against SQLite because that is a useful way to lint my data model—plus it's nice for small-scale proof-of-concept deployments—but I expect deployers to use PostgreSQL or MariaDB in production.
Edit: I've had a chance to read the article. The comment about "zero latency" is wrong because it ignores disk I/O. And yes, I know SSDs are super fast. Still not zero. The bit about Docker Compose rings hollow. It sounds like they need to learn how to use multiple Compose files, e.g., https://mindbyte.nl/2018/04/04/overwrite-ports-in-docker-com.... As for development and testing, mock database setup should be handled by your unit test framework. I'm familiar with pytest, which makes testing the same model or ETL across multiple database fixtures very, very easy. SQLite has nothing to do with that.
I see posts like this and I can't help but think: This sounds exactly like something an academic would recommend, and then you hit the reality of industry engineering.
Its a similar take to a lot of the negativity surrounding Next 14's stabilized server actions. The negativity is academic; the productivity is industrial.
Here's my counter-hype take: SQLite actually kind of sucks. It has a place, but that place isn't significantly different from where it was five years ago despite all the "serverless read replicated VC funded hacker news hype startups" work that's happened since then.
If you're in crazy-enterprise hell-engineering; no one is going to reach for SQLite when more robust alternatives like Postgres exist.
If you're trying to get something out the door fast, you've got Postgres on Supabase, you've got MySQL on Planetscale, you've got Firebase, fifteen years of database-platform development, all of these aren't just cheap, they're free, and zero maintenance, you're not going to pay more money to do more work for a worse product by setting up SQLite on your single DigitalOcean VPS. You probably won't even reach for something like Cloudflare D1; sure, its interesting, but why? Its just Planetscale, but not better in any way and worse in plenty (its not even really much cheaper).
SQLite doesn't support alter table. It doesn't have a date-time type. It doesn't enforce columnar types outside of strict mode, which isn't enabled on e.g. Cloudflare D1. Someone stop me. Everyone says "KISS, you can get so much performance out of one application server with SQLite running locally" but you can get the same f^cking performance out of one application server, and a DBaaS, like Planetscale, and its easier, and its cheaper, and you get backups, and you get full MySQL, and you get a paved-road to actually paying them $30/mo if you need to instead of hitting a bottleneck and suddenly having to wonder how the hell you're going to scale SQLite beyond this one instance ("we need VC funding for more engineers", you'll think in that moment)
Kent's statement "However, SQLite is capable of handling databases that are an Exabyte in size" actually makes me mad. Its so ridiculously academic I'm aghast anyone takes this seriously. "Oh, well, hc-tree uses 48 bit page numbers so its an exabyte, and you'll hit other application problems first anyway". No; you'll hit problems with SQLite first, ten times out of ten, it'll be long before you hit an exabyte, it'll be before you even hit 100gb.*
I’ve helped too many teams migrate from SQLite (after a ton of growing pains and suffering from tech debt) to agree with this.
I think SQLite is awesome for some things. It’s also hard to beat in some cases. But if you use it in a non-ideal situation, you need to know when, why, and what your refactor path will be to get to the right technology. And there should be a good reason for doing that, and I can’t think of one that makes sense.
The teams I helped often said something to the effect of “SQLite is great because it got us this far!”, and while I appreciate the optimism, they wouldn’t have gone a shorter distance with MySQL or Postgres. They made the wrong choice, period, and lost a huge amount of productivity to it.
Yet there are people out there who have chosen something like firebase where SQLite would be perfectly fine and far simpler, too. I love SQLite, but I think it’s promoted incorrectly at times. It makes for some brutal growing pains when it’s used in the wrong places.
I get where you're coming from, but a few of these examples didn't grow in that sense. More so evolved and discovered how SQLite couldn't meet their needs efficiently (or at all). It became technical debt before they could properly get off the ground.
I fully agree that there are projects out there which don't need to grow, won't grow, or otherwise perfectly suit the use of SQLite. I've encountered a lot of cases where it was simply the wrong choice, though. Growth or no growth.
I think Kent likes doctrine and tends to encourage practices without addressing nuance sufficiently. The kinds of people he teaches aren't likely to understand where SQLite falls short, and it's also difficult to explain to people with less experience. The advice rubs me the wrong way as a result; in practice, I don't see people use SQLite properly more often than not.
I'd say the same thing about MongoDB. It gets abused like crazy. It has a great fit in some applications and I love using it when that's the case. Yet it was promoted as the easy and scalable database for ages, and I can't count how many projects I've encountered which were badly encumbered by the misuse of that database. Is it a bad database? No, it works well in the right place. In the wrong place, it's a remarkably poor database though. SQLite is much the same (though I'd argue a better piece of technology all around).
^ NixOS one-liner to get a local Postgres going. I tried SQLite for a single VPC startup MVP but then just used Postgres because it was just as easy and it's just better to build backend software with.
And then when it made sense to use a managed Postgres instance, it was trivial to migrate.
The first two items are just wrong. I use PostgreSQL like SQLite all the time. That is: Colocated with the application, connected only via UNIX socket. In these cases, the database is not meant to be a separate application, so I don't treat is as something separate.
Ironically, this is also the easiest way to set up PostgreSQL. At least in the distributions I use, it is already configured to connect locally over UNIX sockets, and all I have to do is create a system user to connect with.
PostgreSQL has a bad reputation for being difficult to set up because people expect it to be exposed on the network by default, and then they have to dig into the
obscurely named pg_hba.conf and change multiple lines.
There is ongoing cargo cult wisdom that "the RDBMS HAS to be a standalone service" which is just nonsense and extremely counterproductive for the median use case.
I'm still soyfacing over sqlite and how elegant and powerful it is, and I'll continue deciding software architecture based on random blog posts. I have learned nothing.
WAL Mode worked as advertised. But the project ended up being split into several autonomous workers, and even with WAL mode, sqlite gets problematic when you have several separate processes trying to read from and write to the same file.
I have a desktop app that stores all it's info in a single XML. The XMl file is read into memory and written to disk when they save. I lock the XML file so that only user can have it open to write. Customers never need more than a few thousand records and it works fine. But some of my customers want to be able to read/update data from more than one PC/Mac on a local network. Performance requirements are low (generally <10 concurrent users and <1 update per second). I don't want to use a heavtwight database. I wondered if SQLite could be a way to handle this? But the SQLite FAQ seems to have quite a few caveats about accessing it from multiple processes across a network. Or is there a simpler/more reliable way to transition a single-user app to multi-user?
This is one of the few cases where I would recommend staying clear of SQLite, because the maintainers are very clear they running it on the network file system with multiple processes is accessing is a bad idea.
SQLite is great, but the article misses the greatest weakness: It is almost entirely untyped. Feel free to specify fancy types for your columns. But SQLite doesn't care in the least. You can create an INTEGER column and write text to it. Or bytes. Or anything you'd like!
The other thing I miss often is better support for schema migrations. For a lot of things (like adding certain constraints after table creation) the "create new table, copy data over, delete old table" workaround is needed. However, it's only a minor annoyance, and if it allows the codebase to be more maintainable, that's OK with me (given the fantastic quality track record of SQLite).
SQLite has a solution for this! There's called strict tables and final SQLite to get mad at you when you try to put a string into a number column, etc.
I think this speaks more to the target audience of the courses he sells:
> So, can you use SQLite? For the vast majority of you reading this, the answer is “yes.” Should you use SQLite? I’d say that still for the majority of you reading this, the answer is also “yes.”
If you’re making a simple proof of concept app, a side project you just need to get working, or a toy project for learning then SQLite will be fine.
If you’re trying to learn how to build maintainable web apps for a business, getting PostgreSQL or MySQL or similar up and running shouldn’t be that big of a hurdle.
He also either doesn’t understand the limitations and complexities of using SQLite (migrations, foreign key quirks, other issues mentioned in this thread) or he’s deliberately ignoring them because it would weaken his point.
For what it’s worth, we’ve had to un-teach some of this author’s material to junior devs after they read into it a little too literally. He likes to push his way as the “right” way to do things, which can turn into a cargo cult mentality for junior devs buying his courses. He pushes his new “Epic web stack” in the same way.
I'm logging traffic data (IP-pairs and packet+byte-count) into a SQLite database on a couple of machines, and what bothers me is that I'm unable to query them whenever I want. So the files end up being rotated whenever I want to inspect one closer.
While it's good to have a simple storage solution, it does have its drawbacks.
Then again, yesterday I was working a bit with MBTiles, which are SQLite databases with per-z,x,y-protobuf-blocks of vector data, and in that case it does make sense to use SQLite. Or the things browsers use them to store bookmarks and stuff.
Started a Minimum Lovable Product last week that helps people work through their anxieties or fears.
Using SQLite as a backend for everything from session management to logging. No elk stack, no separate db. Just a single file.
Development is so simple so far. Maybe I’ll need to move off of it later, but who knows maybe not. And the speed of development and simplicity is what matters more now.
A type system like what a programming languages have doesn't make sense for databases.
For inserts, sure (but then, it's straightforward to use the host language for that).
But once you move beyond simple cases, data changes over time, not atomically. It's also common for data to be distributed. Static type system can handle either of these things.
If your database can't go down for the duration of a migration (which will need more time now, to type-check all the data), and you don't have a way to ensure all clients are updated at the same time also, then you can't use static type checking like programming languages have.
I’ve used sqlx and SQLite for a lot of projects and sqlx’s macros have prevented me from inserting the wrong types in columns, but I agree that it’s not a viabile long term solution, especially if you don’t use sqlx macros in the first place which is not something you can always do.
On the contrary, I think that everything should be fine with Diesel since that leverages the rust ecosystem to manage types and migrations of the sqlite database.
Postgres is so incredibly easy to setup and has really low system requirements for simple applications and small quantities of data. SQLite has some really interesting features when it comes to portability - it's a great solution if you want to be able to trivially bundle your application and data state together and distribute it as a binary... but the barrier to running a postgres instance is incredibly low.
For simple applications and small quantities of data Postgres is a lot more complicated than SQLite. Just the requirement of a database server is a hurdle. Factor in that the data is stored across multiple files in some different directory, and you’ve got a lot of headache when you’re just dealing with a simple application and small amount of data.
True, but it’s still a separate server process, with access control and interprocess communication requirements. SQLite is just a C library so there’s no separate process to talk to.
On a serious note, there is definitely a time and a place for SQLite - it's a great technology - but the engineering being done to make it scalable across multiple servers etc. seems ill-placed. For applications of moderate complexity or scale, I suspect developers will quickly see issues trying to monitor and interact with SQLite in ways that they wouldn't with other tech.
SQLite is the MSAccess of this generation. MSAccess was an absolutely wonderful database when you were hosting multiple small sites on a server and wanted quick deployment, easy setup and segregation. It connected easily with ODBC to a website and was robust and feature rich enough for hobby projects. They was no way I would want to bet the farm on it for a enterprise project.
That's demonstrably not true, and not a useful comparison.
A growing number of real-world production apps are running on SQLite these days, precisely because it's so mature, robust and performant at this point - and, unlike Microsoft Access, is very actively maintained.
SQLite is by far my favorite database for any client-side application - mobile and desktop (and perhaps embedded). I still wouldn't use it server-side for anything serious.
While you can do it, especially with tech such as LiteFS, I much prefer using database systems with built-in support for replication, failover, and splitting data storage from the app server.
For side projects or non-prod environments where a single instance is fine, I use sqlite + litestream and it works reasonably well. I've looked into the more complicated setups with LiteFS/Consul/RAFT and at that point I'm not sure you're saving yourself much headache over running your own DB.
Zero Latency - irrelevant for 99% of use cases
One Less Service - premature optimization - saves almost nothing
Multi-instance replication - where is the advantage over say postgres?
Database size - irrelevant for 99% of use cases today
Development and Testing - no advantage here
It's always much easier to work with fully hosted turnkey-ready solutions like Supabase, Planetscale, Neon, or the like. And you get all of their huge benefits. Latency of some ms are hardly an issue for most of the use cases.
SQLite is great, the article is not. The author seems to be compiling bunch of marketing claims and tries to sell it as a best practices. Anyone seend Exabyte sized SQLite databases in production ?
This post would have said "MongoDB" years ago, and soon one will say "Postgres" instead, and then after that whatever's trendy next. Probably CSV files.
You should probably be architecting your application to use the data management solution that fits your use case.
Honest question from someone currently in the middle of implementing my first SQLite deployment for a local data store in a desktop application (and is rather new to software development.)
Is SQLite not ergonomic? I’ve previously played with mongodb through work and it was super simple and intuitive. But SQLAlchemy is super verbose and not at all intuitive despite being intellectually easy to grasp. It feels to me like it has decades of cruft built into the syntax and while I can appreciate that it means options it has not at all been nice to work with. I understand the issues with mongodb, but I managed to get it working in 30 minutes. Meanwhile, I’m 3 days into trying to get SQLAlchemy working. It feels to me that perhaps there may be a hole in the market for “GPT” friendly databases. Because SQLAlchemy is so verbose it kills the context window of GPT lmao.
For local/single-instance apps? Heck yeah, sqlite is a great choice. Just wrap all your queries behind functions in a db.py file and never think about application state until you’re forced to. SQLAlchemy is tempting, but honestly sql is already easy enough that I’ve found I spend far less time just writing plain sql queries.
Most important thing is to get your actual application idea functioning quickly. Sqlite is fast to work with and low maintenance (your db is one file), and later if you found you made a mistake (too slow, no support for advanced sql features you need, you want concurrent access across multiple boxes) you can write a program to migrate your data in a single afternoon. Wasting time on more complex solutions early on is a pitfall that has stolen years of project time from me, just avoid misusing dependencies too badly and get your thing working first.
I can't believe that video games still have a "saving data. Please do not turn off your computer" like two-phase commit hasn't been a thing since 1987.
> SQLite being a file on disk does make connecting from external clients effectively impossible.
Not quite impossible (if you design an appropriate VFS), but it doesn't work as well as a real client/server database, since SQLite is not designed for this use.
> SQLite does not support enums which means you're forced to use strings. ... The main drawback to this is when it comes to the typings for the client which doesn't allow you to ensure all values of a column are only within a set of specific possible values for the string.
In SQLite you can use CHECK in a table definition to require all values of a column within a set of specific possible values.
However, I think there are some actual flaws in SQLite, such as:
1. It uses Unicode (but only partially; case-insensitive is only with ASCII). This can make it less efficient than it should be if you are dealing with ASCII only, and makes it difficult (and somewhat inefficient) to deal with non-Unicode text. Although it is possible to store non-Unicode text in a TEXT value or in a BLOB value, and to use CAST everywhere or to override the built-in functions with your own, neither is really ideal, for several reasons (including causing some optimizations to not work, making your code even longer and less efficient, etc). It is also possible to patch SQLite to do this, but then if you upgrade, you must patch that one too.
2. The URI file name mechanism is a bit messy. Using a separate argument for the parameters might be less messy.
3. There is no standard way to define the time zone used for functions that deal with local time. (You can override the function to find the current time by the VFS, but overriding conversion to local time is only possible by use of an undocumented function (which is not guaranteed to stay the same in future versions).) Better (in my opinion) would be to add such a function into the VFS (since the current time function itself is in the VFS anyways, so it should go together).
4. It might have been better to store the journal in the same file as the database. One idea how this might be done: Set the read version to 1 and the write version to 3. If you begin a write transaction, lock the database and then make copy-on-write of any modified pages into free pages, but do not change the references to the free pages in the header (so other programs still believe they are free). To commit a transaction, make a table of the required page linking changes in the file, and then temporarily change the read version to 3, and then change all of the links (so that the pages containing the old data would now be considered free), and then change the read version back to 1 (the write version will still be 3) and then unlock the file. To roll back a transaction, simply unlock the file (which will be done automatically if the program crashes); there is no need to write or delete any file. However, one disadvantage of this method is that the database file is now up to twice as big (depending on how many pages need to be modified for each transaction), and there may be other disadvantages too.
5. You need file names (which is necessary for separate journal files, anyways). However, in my opinion it might be better to pass file descriptors or stream objects. (SQLite and some other libraries annoy me that they do not do such a thing, and require file names.)
I always use SQLite as the first database when I start a new project. It is a great way to get started quickly and not waste time while you are figuring out what problem you are solving and how you want to structure your data. On some projects I've been able to make do with SQLite only for over a year before having to bother with other databases.
However, I always design with the intent to use PostgreSQL or some other database. Sometimes with the intent of using two different data stores in the scaled up version. For instance an SQL database for stuff that has low volumes, and perhaps a time-series database for the high volume stuff.
This is why I always design a persistence interface. An internal persistence API layer inside the application. I also write the test suite to this interface. This makes adding support for other databases much easier later.
Note that I say "add", because I usually keep the SQLite support after I've added support for the databases I want to use in production. Keeping SQLite is extremely useful for when people want to do integration testing or even when they are learning to use your server/service. Rather than having to fire up a database, people can just run with the built in SQLite. You download and run the binary and stuff just works. It also makes cleanup a lot easier. Not least if you run an in-memory database (use ":memory:" as the DB spec).
In the persistence layer I tend to avoid using too much abstraction. The interface is my persistence abstraction. I don't need more layers. I think ORMs are incredibly limiting and unnecessary. However, I do use sqlx (Golang) to make the SQLite implementation of the persistence layer less verbose (faster turnaround when figuring stuff out). I usually use it for PostgreSQL too since the performance penalty usually isn't significant enough to matter. (If you haven't used sqlx before: it is essentially like using the DB driver directly, but with a lot less boilerplate). sqlx is trivial to just rip it out if you think it introduces too much overhead.
Usually you will know ahead of time when you potentially need more than one data store. I tend to use SQLite for storing everything during initial development, but the API is usually designed in a way that doesn't make it awkward to split the store in two or more domains.
I sometimes write storage API middleware. For instance statistics, adapters that can combine different permutations of different storage domains, sharding, throttling, specialized logging etc. I've done ACLs, failover, housekeeping middleware as well, but usually they do not belong in persistence middleware.
I never put business logic or "cleverness" in the persistence layer. It should worry about storing and retrieving data.
You can probably use SQLite as your database for much longer than you think. And some projects honestly will never need more than SQLite. However, I think that if you start doing sharding, replication and dealing with situations where throughput and concurrency is high, you really want to use something a bit more beefy. You have to think about how much time you'd spend getting SQLite to do things versus how much effort it is to just fire up database instance(s).
No, it isn't. If we want to be on the front page, we'll publish a blog post; we literally rate limit ourselves on that. We're just a piece of the stack, and this person happens to use us to make a point about SQLite.
LiteFS isn't a Fly.io feature; it's Ben Johnson's open source project, and it runs just fine on GCP and AWS. Try it out!
But... for an app running multiple instances, the article suggests a lot of extra complexity. Managing that complexity makes less sense to me than just firing up a managed database server (RDS, or whatever).
Even if you're self hosting, I think running a MySQL/Postgres cluster is a lot less complex than the options this article calls out.
edit: and more to the point - MySQL / Postgres replication is boring. It's done by a ton of deployments, and isn't going to surprise you, at least at smaller scale.