How the SQLite virtual machine works

manimino · on Sept 7, 2022

These blog posts have been great. I'd love to see a deep dive on the query planner at some point.

I've done lots of benchmarking SQLite while writing ducks [1], and found some interesting stuff. It seems like SQLite will only use one index, even in cases where two indexes would better suit the query. Or, in cases where the query is fetching most of the table, it will use an index even though a full scan would actually be much faster.

Don't get me wrong, SQLite is awesome, and it's quite possible to work around these behaviors. Just interested in why they're not built in.

[1] https://github.com/manimino/ducks

chasil · on Sept 7, 2022

The author of SQLite recently published a paper on olap performance and duckdb that went into greater detail on the virtual machine.

https://news.ycombinator.com/item?id=32684424

Also, there is a "hint" mechanism in SQLite to force the use of a specific index. A variant of it ("not indexed") can be used to force a full table scan, which is appropriate when more than 5% of a table's rows are involved. The docs strongly admonish that these are not "hints," and the optimizer must obey these clauses when they are used.

https://www.sqlite.org/lang_indexedby.html

lilyball · on Sept 7, 2022

The docs also say these aren't meant to be used for performance tuning but instead of protecting against regressions if the schema changes. I guess the idea is once you're done and happy with performance you check the query plan for time-sensitive queries and update the relevant statements with INDEXED BY to reference the indexes they're already using.

chasil · on Sept 7, 2022

In Oracle, stats can change a CBO query plan in ways that adversely affect performance.

To address this, Oracle 7 added hints, and then 8i introduced the ability to "pin" an existing optimizer plan to a query.

Stats are run here and there in SQLite, so I can see how bad stats could throw the cost-based optimizer (CBO) into bad performance.

Stats can be manually run with the ANALYZE command. This should likely be done (among other things) prior to distributing SQLite databases to an audience, or perhaps in weekly maintenance of active databases undergoing extensive DML.

https://www.sqlite.org/lang_analyze.html

manimino · on Sept 7, 2022

Exactly. So the workaround looks like:

- Make one query with LIMIT of 5%

- If it returns LIMIT rows, do a query with NOT INDEXED

Which is surprising; why not build that into the query planning / execution in the first place?

dllthomas · on Sept 7, 2022

The workaround is to understand how the questions you're asking the database relate to the structure of your data. What you describe is a brute force way to acquire that understanding on the fly on arbitrary queries. It might sometimes be the best you can do, and then it might be worth it (benchmark!), but in quite a few cases it'll be quite a bit slower.

manimino · on Sept 7, 2022

That's a great point. The LIMIT workaround takes no knowledge of the data distribution, but an optimizer could do much better with some statistics such as estimated percentiles.

hinkley · on Sept 8, 2022

> full table scan, which is appropriate when more than 5% of a table's rows are involved

Am I wrong in thinking this ratio could be improved? Table scan being 20x faster than indexes seems like it might be leaving quite a bit on the table.

HanyouHottie · on Sept 8, 2022

I can believe it. Accessing sequential data that's being prefetched into cache is astonishingly faster than following references.

samatman · on Sept 7, 2022

Have you brought any of that up in the SQLite forum? Not that you must, but the team loves that kind of feedback. They won't take all opportunities to make queries faster, if it's too complex for example, but they care a lot about speed of execution.

derefr · on Sept 8, 2022

> It seems like SQLite will only use one index, even in cases where two indexes would better suit the query.

If you mean "one index per table per subquery", then AFAIK this is true of all RDBMSes. A query plan boils down to a bunch of "get [stream of tuples looking like X] out of [table Y]" leaf nodes; and those leaf nodes are then plan-optimized to extract those tuples using some particular strategy, which is always a very strict pipeline of "take a data source [table or index]; slice it to some range; then walk it either ascending or descending, filtering it, potentially with some offset+limit; and, if a non-covering index, dereference the index's file-offset-typed value to replace the index row-tuple with its backing table row-tuple."

There's no room in that pipeline for having two indices work together to materialize one scan. If you want the effect of two indices working together, you either have to create a compound index (for a logical conjunction); or you have to use two subqueries and join them with UNION (for a logical disjunction.)

whalesalad · on Sept 7, 2022

cool library, potentially have an immediate need for it will take it for a spin. currently maintaining multiple dict's in order to achieve concurrent (air quotes with my fingers here) writes but as more dimensions are added, more locks are needed etc. memory is cheap so I'd rather just store a big list of objects

manimino · on Sept 8, 2022

Thanks! Hopefully ducks is everything you need it to be.

skadamat · on Sept 7, 2022

> SQL was originally designed for non-technical users to interact with the database, however, it's used almost exclusively by software developers peppering it throughout their applications.

Interesting claim! In my experience, the # of data analysts / BI analysts / etc that write and have mastered SQL vastly exceeds the # of web developers (especially if we focus on developers actually writing SQL instead of using ORMs)

benbjohnson · on Sept 7, 2022

That's a good point. My claim is probably too specific to software developers. Maybe I should have said "technical users" instead.

I was paraphrasing from the Donald Chamberlin quote in the last section of the linked PDF[1]:

"When Ray and I were designing Sequel in 1974, we thought that the predominant use of the language would be for ad-hoc queries by planners and other professionals whose domain of expertise was not primarily database management... Over the years, I have been surprised to see that SQL is more frequently used by trained database specialists to implement repetitive transactions such as bank deposits, credit card purchases, and online auctions. I am pleased to see the language used in a variety of environments, even though it has not proved to be as accessible to untrained users as Ray and I originally hoped."

[1] https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6359709

jgilias · on Sept 7, 2022

I wonder what was the intended way of interacting with the database for technical users?

tyingq · on Sept 8, 2022

There's a pretty in-depth overview of that here: https://people.eecs.berkeley.edu/~brewer/cs262/SystemR.pdf

Where the original Sequel authors took their Sequel spec and implemented it on real hardware + software, with interfaces for both "non-technical" ad-hoc queries as well as integration with PL/1 and COBOL for more technical users.

I have "non-technical" in quotes since in that 1970/80's timeframe, you had to be pretty technical to even have that label.

jgilias · on Sept 8, 2022

Perfect, thanks! Exactly what I was hoping for someone would provide!

benbjohnson · on Sept 7, 2022

I don't have any evidence to back this up but my guess is that the underlying key/value store & cursor APIs were probably the intended way for programmers to interact with the database. SQL came out in the 1970s so every ounce of performance was important.

ok_dad · on Sept 7, 2022

The VM described in this post? I would imagine other databases have a similar VM?

paulhodge · on Sept 7, 2022

Yeah I think it's mostly a piece of fun trivia at this point. If we're talking about designing an interface for non-technical users to interact with the data (an interesting topic), then a modern one would probably look more like the omni-search bars that are part of Splunk/Datadog/etc. Those make it drastically easy to find data (with tableless text-based search by default), and easier to aggregate it too (with the left-to-right pipe | syntax).

xani_ · on Sept 7, 2022

It would be probably other way around if browsers, that just use sqlite underneath, exposed it as offical API for storing data and not that abortion of a database called IndexedDB

sgbeal · on Sept 7, 2022

Funnily enough: the WebSQL standardization effort was abandoned _because_ all implementors were basing theirs on sqlite (citation: https://www.w3.org/TR/webdatabase/). By convention, standards efforts (except for the one Microsoft bought from I$O for their office format) require two competing implementations, in order to demonstrate that the standard isn't entirely dependent on one provider.

bob1029 · on Sept 7, 2022

The EXPLAIN keyword is really awesome. It also allows you to quickly validate SQLite command text without actually executing it. We have found this to be useful for real-time validation of user inputs when working on SQL-based config tools.

Note that EXPLAIN also seamlessly incorporates any application-defined functions, so you can validate everything is as expected. Useful when being clever with DSLs and such.

flakiness · on Sept 8, 2022

I hope SQLite's EXPLAIN were more visual / tree-like. Using bytecode internally is fine, but what users care is typically more high level. Decoding the that high level intention from the bytecode is so burdensome that I often give up and just fall back to try-n-error.

carry_bit · on Sept 8, 2022

If you use EXPLAIN QUERY PLAN rather than just EXPLAIN, you get a high level description of the plan rather than the bytecode.

flakiness · on Sept 9, 2022

Great to know! Thanks!

spullara · on Sept 7, 2022

Most people probably don't know this but SQLite is used under the covers for most Mac OS applications like Photos and Mail. Been that way for a long time and lets you do some interesting data mining.

e.g. https://javarants.com/build-your-own-mail-analyzer-for-mac-m...

pak9rabid · on Sept 7, 2022

I believe it's because these applications likely use CoreData as their data persistence framework, which uses SQLite as the backend store. This has been nice because when I'm doing iOS app development that uses CD, I can simply pop the SQLite database open and query it directly to see what's going on as I'm testing.

adamnemecek · on Sept 7, 2022

It's one of the possible backend stores.

avl999 · on Sept 7, 2022

I wonder how an alternate timeline might have played out if Richard Hipp had not named it "SQLite" and instead called it "SQLightning" or "SQLExpress" or something like that. For much of its lifetime, SQLite wasn't taken seriously despite being an extraordinary technology and part of me is convinced it was in large part due to "Lite" being in the name.

dimitrios1 · on Sept 7, 2022

SQLite has been taken seriously by serious people for decades. We just don't hear about it in the cool, hipster startup circles because it isn't sexy. But it's steadily grown to be the most deployed embedded database in the world, in millions of devices, and have specialized strands that went through military and medical grade formal validations to end up in airplanes, fighter pilots, naval ships, medical devices.

You have to look where SQLite isn't rather than where it is these days.

tptacek · on Sept 7, 2022

The shift isn't in SQLite being taken seriously behind the scenes, or as a better file format. It's in SQLite being taken seriously as an alternative to n-tier database in full stack applications.

rrdharan · on Sept 8, 2022

I don’t necessarily disagree but I still get an Upton Sinclair vibe from fly.io folks declaring that this shift is happening as a matter of fact / by fiat.

tptacek · on Sept 8, 2022

Why? We don't have an SQLite offering; we do Postgres.

sgbeal · on Sept 7, 2022

> ... in millions of devices...

cough _Billions_ cough with a capital "B". Very nearly every non-trivial electronic device built over the past 10-15 years. (Non-trivial being very roughly: "anything with a UI or having the potential to run one.")

simlevesque · on Sept 7, 2022

You may already know but it's SQL-ite (sequel-ite), like graphite, dynamite, sulfite.

https://en.wiktionary.org/wiki/-ite

srcreigh · on Sept 7, 2022

Not even Sequel-ite. But S.Q.L.-ite. That's how Dr Hipp says it, anyways.

sgbeal · on Sept 7, 2022

FWIW... i've been in Richard's presence several times when he was asked how to pronounce it by various people and he's invariably answered, "pronounce it however you like!" (i belong to the ess-queue-lite school of thought.)

dec0dedab0de · on Sept 7, 2022

I love when projects have cute puns in the name. I always am sure to correct people so they don't miss out on the fun. Luckily most developers are pedantic and like language enough that they don't get too annoyed with me. Same thing applies to PostgreSQL, it drives me nuts that sqlalchemy is not sqlchemy.

completely off topic at this point, but Jim Cornette reading paid advertisements for codecademy was very funny during the pandemic. He insisted on pronouncing it code-cademy while his co-host tried to correct him.

avgcorrection · on Sept 7, 2022

Such ambigious names remind of “Rust”. It’s such an um-actually name (um actually it’s not first and foremost about oxidization it’s about fungi…)

randomdata · on Sept 8, 2022

> it’s not first and foremost about oxidization it’s about fungi…

Um, actually this[1] suggests that it originally wasn't about anything other than seeming like a good name and that any deeper explanation that has been given was made up. Of course, by the same token, "it seemed like a good name" could have also been made up. But it appears that the team eventually embraced it in the oxidization[2] sense.

[1] https://www.reddit.com/r/rust/comments/27jvdt/internet_archa...

[2] https://bugzilla.mozilla.org/show_bug.cgi?id=680521

avgcorrection · on Sept 8, 2022

Um actually I believe more in community consensus with regards to naming provenance as opposed to authorial intent

ad nauseam

randomdata · on Sept 9, 2022

Fair, but community consensus is that of oxidization to reflect how the Rust community comes in, takes over, and ruins every discussion about programming by finding some way to mention Rust. It is very much a virtual oxidization process.

Case in point: This discussion was originally about SQLite.

EGreg · on Sept 7, 2022

I had no idea! I was sure it was “SQL Lite” ever since I heard about it a decade ago. Its support for mostly text strings while MySQL and Postgres supported a myriad types made me think it’s just a small embedded “lite” library for when you wanted to bundle a mostly SQL database with your app lol

geewee · on Sept 7, 2022

Huh, I had no idea!

tveita · on Sept 7, 2022

I've used SQLite for toy services "in production", and it was really just as bad as people think it is. Sure, you could handle a large amount of read-only queries, but it only took a tiny bit of write traffic in the mix to make the random latency spikes jarring.

This was pre-WAL, presumably enabling WAL would help a lot (but is still not the default, so beware). But the caveats were real, it's not like people just took one look at the name and though "'SQLite?' I better put a big warning in our documentation to not use this in production."

LAC-Tech · on Sept 7, 2022

This was pre-WAL, presumably enabling WAL would help a lot

Indeed it would!

"WAL provides more concurrency as readers do not block writers and a writer does not block readers. Reading and writing can proceed concurrently."

https://www.sqlite.org/wal.html

I think the people advocating for SQLite to be used in more places are all assuming write ahead logging is enabled.

chasil · on Sept 7, 2022

There are many problems with WAL, as acknowledged by Dr. Hipp in the recent olap/duckdb paper.

The chief problem that I see with WAL is that it breaks ACID with databases that are ATTACHed, as the documentation shows:

https://sqlite.org/lang_attach.html

simonw · on Sept 7, 2022

What are the other problems?

chasil · on Sept 7, 2022

OK, I'll download the PDF onto my phone and get the quote...

...I forgot how significant these problems are. These are quite serious.

"However, WAL mode has notable disadvantages. To accelerate searching the WAL, SQLite creates a WAL index in shared memory. This improves the performance of read transactions, but the use of shared memory requires that all readers must be on the same machine. Thus, WAL mode does not work on a network filesystem. It is not possible to change the page size after entering WAL mode. In addition, WAL mode comes with the added complexity of checkpoint operations and additional files to store the WAL and the WAL index."

https://vldb.org/pvldb/volumes/15/paper/SQLite%3A%20Past%2C%...

https://news.ycombinator.com/item?id=32684424

yread · on Sept 7, 2022

It also complicates backups. I have another process that .backups the database but even when opening it in read-only it creates a .wal file. My first version of the backup script didn't delete it afterwards and the normal process didn't have the right to overwrite the backup's wal file so opening the db failed

chasil · on Sept 7, 2022

Maybe adjust the umask of the backup process?

gigatexal · on Sept 7, 2022

How often is attach really used though?

chasil · on Sept 7, 2022

How often are distributed transactions used elsewhere with two-phase commit?

All the time. I suspect this is similar.

gigatexal · on Sept 7, 2022

Right but that’s bending SQLite a lot no? I mean it’s meant to be used as a file that has a sql interface. But I digress.

taberiand · on Sept 8, 2022

I wonder how many times SQLite has been passed over for a more complex solution because people thought they needed highly scalable distributed two phase commit or whatever when all they really needed was a file with an SQL interface

gigatexal · on Sept 9, 2022

I'm guessing your response was sarcastic. I don't get it, though. If you need a distributed two-phase commit DB get one like CockroachDB or something like that. SQLite is not that.

simonw · on Sept 7, 2022

WAL was added 2010-07-21, so pre-WAL is over 12 years ago now!

hinkley · on Sept 8, 2022

Lots of changes with branch prediction in that time too.

tptacek · on Sept 7, 2022

I don't think it has much to do with the name. I think the biggest perceptual impact on SQLite came from Rails, and its default for using SQLite as a test database while strongly discouraging people from using it in production.

post-it · on Sept 7, 2022

Django too.

> By default, the configuration uses SQLite. If you’re new to databases, or you’re just interested in trying Django, this is the easiest choice. SQLite is included in Python, so you won’t need to install anything else to support your database. When starting your first real project, however, you may want to use a more scalable database like PostgreSQL, to avoid database-switching headaches down the road. [0]

[0] https://docs.djangoproject.com/en/4.1/intro/tutorial02/

tptacek · on Sept 7, 2022

Sure, Django too.

0x457 · on Sept 7, 2022

To be fair, serving any kind of traffic with Rails required multiple replicas of Rails application running. So SQLite wasn't an option for production for any kind of production rails workload.

Then most hosting for rails were stateless, so you had no way of storing SQLite on disk.

And finally, for serious production you need high availability and SQLite couldn't offer that.

Abishek_Muthian · on Sept 8, 2022

Would Go's concurrency be an issue for SQLite too?

Edit: I looked into a common Go driver for SQLite[1] and the FAQ reads,

> Can I use this in multiple routines concurrently?

> Yes for readonly. But not for writable. See #50, #51, #209, #274.

Every time I see a blog post from fly.io reg SQLite I'm tempted to use it for my next project, But the need to rewrite my framework for limited data types and the doubts regarding concurrency keeps me away.

[1] https://github.com/mattn/go-sqlite3

tptacek · on Sept 8, 2022

My general M.O. for SQLite in Go is to have a single writer thread with a channel of closures to apply transactions, and a shared db handle for reads. Works fine.

Abishek_Muthian · on Sept 9, 2022

That's an useful information, Thanks for sharing.

randomdata · on Sept 7, 2022

I think it's older than that, even predating SQLite. Back when Perl/CGI was all the rage it was common to store data in files. But the web was growing fast and scaling files was hard. Pushing that workload onto the heavyweight RDMBS was seen as the solution, reasonably so in some cases, but that also brought the cargo culting along with it.

When SQLite arrived it got lumped in with file storage that can't scale. Rails only perpetuated what everyone was already thinking.

nigma · on Sept 7, 2022

There are many reasons for using other databases like PostgreSQL or MySQL in production - feature set, (managed) hosting, backup and replication, etc.

SQLite is primarily embedded/local database and cannot be easily separated and shared over network [1] between multiple disposable backend/worker instances.

[1] https://12factor.net/backing-services

tptacek · on Sept 7, 2022

Yes, that's what Rails (and Django) say about it.

deepstack · on Sept 7, 2022

yeah when it comes to embbed, if one is willing to go the java road, then H2 is an excellent DB to use. It is some what Postgres compliant.

d23 · on Sept 7, 2022

Interesting. I didn't realize it wasn't for "lite".

A friend of mine made a similar point about GIMP. I'd never thought about it that way. What a shame to be hindered by such a terrible name choice (in GIMP's case).

TOGoS · on Sept 7, 2022

I'm always surprised at how quickly people will dismiss something just because of a name. I once wrote a bug tracking system/wiki for my company to replace the terrible one that we had been using. It did everything we needed and was much nicer to work with, but the reason people gave me for why we never used it was that the name sounded funny. It was a made-up word, totally an afterthought. sigh

busymom0 · on Sept 8, 2022

What was the name?

gnrlst · on Sept 7, 2022

I am a big believer in "nomen omen" -> latin for: destiny is in the name. It's obviously an exaggeration, but to your point, this is one of those cases where the name influenced how the tech was being used (and in what environments), regardless of its underlying potential.

mananaysiempre · on Sept 7, 2022

“Nominative determinism”[1]

[1] https://unsongbook.com/

resoluteteeth · on Sept 7, 2022

I'm not sure the problem is the name.

I think there has been a lot of recognition in the last 10 years that sqlite is actually quite robust, but it still hasn't been considered suitable for serious use is based on how software and database servers have traditionally operated.

It seems like what's changing that now is the recognition that other approaches may make more sense given modern software architecture.

kevincox · on Sept 7, 2022

I don't think stability is the main concern. However for the longest time SQLite didn't have great support for multiple writers and it is still pretty basic. So it works really well for smallish numbers of clients or write-light workloads but if you want to pound it with inserts and updates it still isn't as good of a choice as other RDBMSes.

ternaryoperator · on Sept 7, 2022

> For much of its lifetime, SQLite wasn't taken seriously

As far as I can recall, it was always well regarded as an embeddable database. What makes you think that it wasn't taken seriously "for much of its lifetime"?

aaaaaaaaata · on Sept 7, 2022

By standard web devs.

Been in missiles for a while.

_the_inflator · on Sept 7, 2022

I look at it differently. In my opinion Android started popularizing SQLite. For me kudos go out to Android, for integrating it on a large scale.

thro388 · on Sept 7, 2022

There were other embedded databases; Firebird, Foxpro etc..

sethev · on Sept 8, 2022

It's hard to imagine a timeline where SQLite was more successful.

einpoklum · on Sept 7, 2022

The post does not describe a virtual machine, nor does it really describe how SQLite, specifically, works, as opposed to other DBMSes.

ok_dad · on Sept 7, 2022

The "virtual machine" is not what you would probably consider a "VM" from the usual use of the term, like a "virtual x86/ARM computer" that you can do anything with.

In this case, it's simply a set of variables used to simulate CPU registers, an instruction set that's specific to the underlying database's data format and desired operations, and then some code that "virtually" emulates what a CPU might do if it were designed with this memory architecture (the database file) and instruction set. The VM supports primitive operations via the instruction set; liken this to the machine code for your laptop's CPU, but very specific to the use-case of a SQLite database. The SQL you send into the SQLite db transaction is compiled into this VM's "machine code" and then run on the "virtual CPU" that implements that instruction set, and then it spits out your results. The article was very clear on how it worked.

d23 · on Sept 7, 2022

It describes a virtual machine instruction set along with specific opcodes for the example query as well as a link to the full instruction set. It specifically talks about SQLite. Did you click the wrong link?

mbaris · on Sept 7, 2022

I really like this series of blog posts on SQLite, thanks if you are reading this. It was really informative for me since I did not know much about its internals before

benbjohnson · on Sept 7, 2022

Author here. Thank you for the kind words! I'm glad to hear it's been helpful.

tpetry · on Sept 7, 2022

Are you still planning to write the article about cases when SQLite is not a good fit? Would be really interesting because we only hear stories when someone believes it is the best fit.

benbjohnson · on Sept 7, 2022

Yes, I am still planning on writing about when SQLite is not a good fit. Folks on Twitter were helpful in listing use cases where it doesn't well. :)

orangepurple · on Sept 7, 2022

I am under the impression that sqlite may struggle with many concurrent writers

samatman · on Sept 7, 2022

I'm having a great time with it. SQLite's documentation is so good that it's difficult to usefully add to it, but you've managed that.

Some else said this, but I'd also love to see a deeper dive into the query plan, if you get to it.

adamrezich · on Sept 7, 2022

hey, while you're here, the links to the other articles from this one are broken

benbjohnson · on Sept 7, 2022

Oh shoot, thanks for letting me know. I'm getting them fixed up right now!

endisneigh · on Sept 7, 2022

I see so many SQLite posts these days, but which companies with a lot (>1M) concurrent users are using SQLite in a non embedded fashion?

It just seems so academic. I’d like to use a web service or app backed primarily by SQLite and see how it goes.

tptacek · on Sept 7, 2022

It works fine for read-heavy apps (ie: most apps) and, with things like Litestream, is especially attractive in distributed and edge settings, where you can stream out read replicas to a bunch of regions and direct writes to a single central region.

That might sound confining, but it's what people already do with Postgres: read replicas, single write master.

And the benefits aren't academic: having your database in-process not only simplifies your database code (because N+1 queries are no longer perf emergencies), but can also drastically speed up response times, because you're not dealing with compounding intra-DC latency responding to an HTTP request that incurs multiple database queries.

And, finally: SQLite wants a single writer per database. But there's no rule that says your entire app has to land in a single .db file. That's what you'd naturally do in a Postgres app, but it's not necessarily the natural answer for a SQLite-backed app, where it's pretty trivial to pull from multiple databases with a single SQL statement.

Don't get me wrong, I think it's still pretty niche-y as an alternative to standard n-tier app databases. You'd use it deliberately; it's unlikely to be a full-stack default. But people have definitely been sleeping on SQLite, and I think what you're seeing is some of those people beginning to wake up.

endisneigh · on Sept 7, 2022

> It works fine for read-heavy apps (ie: most apps) and, with things like Litestream, is especially attractive in distributed and edge settings, where you can stream out read replicas to a bunch of regions and direct writes to a single central region.

This is what I'm saying - I keep hearing this, which popular website is using SQLite as their main database?

I don't have any doubt you can serve 1000 users, or even 10,000 current users using SQLite. There's a power law distribution with the popularity of sites. The most popular sites have multiple orders of magnitude.

Can SQLite serve 500,000 current users with 75% reads and writes with the consistency you'd expect?

> And, finally: SQLite wants a single writer per database. But there's no rule that says your entire app has to land in a single .db file. That's what you'd naturally do in a Postgres app, but it's not necessarily the natural answer for a SQLite-backed app, where it's pretty trivial to pull from multiple databases with a single SQL statement.

Sounds reasonable in theory. How do you maintain consistency between all of these databases? Schema migrations? Backups and restores and maintaining referential consistency?

By the time you implement all of this, and you certainly can, why not just use Postgres?

ok_dad · on Sept 7, 2022

> which popular website is using SQLite as their main database

First, I don't think this is a good metric for software quality or use-case analysis. Just because people don't use X doesn't mean X wouldn't be a superior tool for their use-case.

> How do you maintain consistency between all of these databases?

You don't, in the GP's example, you would shard the database schema. Sharding is easy to figure about, because many database services do this already to spread data and analysis across several servers. You might use Litestream to then send those thousands/millions of database files to some S3 bucket somewhere. Then, you could analyze the shards together in that bucket using one of the many available solutions for such things, like Apache Hive or whatever the BigCo clouds provide.

> By the time you implement all of this, and you certainly can, why not just use Postgres?

Because you analyzed the pros and cons to SQLite vs. Postgres for your specific use-case and determined SQLite will be better in the long run. Perhaps your particular use-case will turn up "Postgres" and you won't use SQLite, or maybe you'll use both of them, but for different data stores.

No one technology is "the chosen one", sometimes you need several different options.

endisneigh · on Sept 7, 2022

> You don't, in the GP's example, you would shard the database schema. Sharding is easy to figure about, because many database services do this already to spread data and analysis across several servers. You might use Litestream to then send those thousands/millions of database files to some S3 bucket somewhere. Then, you could analyze the shards together in that bucket using one of the many available solutions for such things, like Apache Hive or whatever the BigCo clouds provide.

I'm not sure if you're being serious. Sharding isn't easy at all. Especially if you're doing it across different files that need to be synced in tandem with schema migrations that also need to be applied in tandem.

> Because you analyzed the pros and cons to SQLite vs. Postgres for your specific use-case and determined SQLite will be better in the long run. Perhaps your particular use-case will turn up "Postgres" and you won't use SQLite, or maybe you'll use both of them, but for different data stores.

The use case is already defined in my original post. A popular app with a lot of concurrent users. People claim SQLite is superior to Postgres for such a scenario, I'm curious to read about and use a service using SQLite for an app with a lot of users (>1M 75% reads)

From looking at the issues for Litestream on GitHub I'm not really convinced it's production ready. For some small app I'm sure it's fine, I'm talking about an app with a lot of users generating a lot of money and you need high availability.

All of this makes it sound like I dislike SQLite, I really don't - I've used it successfully for local apps on the web, Android and iOS. However I feel this popularity for SQLite on the web may be misplaced.

ok_dad · on Sept 7, 2022

> People claim SQLite is superior to Postgres for such a scenario

Only idiots would claim some technology is superior without doing some use-case analysis and experimentation.

> I'm not sure if you're being serious...

I am, but still it depends on use case. If you have an application that can support it for the use-case, then you can do simple things like making one DB per user and then doing aggregate functions on Apache Hive or something. If you have a schema that will absolutely NOT support sharding, then I guess it might not work in that case. I would argue I could design a SQLite solution for 95% of the applications that need a SQL database, but there are some which, for whatever reason, may require other solutions. I am also NOT saying that all of those designs would be the best for that use-case, but it would work.

Unequivocally: Your use cases define the tools you need to use.

> The use case is already defined in my original post. A popular app with a lot of concurrent users. People claim SQLite is superior to Postgres for such a scenario, I'm curious to read about and use a service using SQLite for an app with a lot of users (>1M 75% reads).

Still, you didn't define it. What's the data schema? What is the application? I don't know what to say, if your app is a TODO list app that's wildly popular, then I guarantee I can do what you want with SQLite easily. If it's Spotify-level stuff with special aggregation and data analysis, maybe not?

endisneigh · on Sept 7, 2022

> Still, you didn't define it. What's the data schema? What is the application? I don't know what to say, if your app is a TODO list app that's wildly popular, then I guarantee I can do what you want with SQLite easily. If it's Spotify-level stuff with special aggregation and data analysis, maybe not?

sigh. any app based on SQLite with more than 1 million concurrent users, 75% reads.

ok_dad · on Sept 7, 2022

Here's one specific one, I would imagine they have a pretty big user base: https://tailscale.com/blog/database-for-2022/

Here are a few other sources I googled:

https://www.sqlite.org/famous.html

https://stackshare.io/sqlite

This is the best I can do, because I have no idea what else you want. If I had the time, I could throw up a simulated million users doing a TODO app with SQLite as the backing DB engine, but I don't have time, I have a 2 year old. Do your own research from here, or don't and just use Postgres until you retire, I don't really care that much.

tptacek · on Sept 7, 2022

A million concurrent, with a database-backed website? Is that a useful threshold? How many monthly actives is a million concurrent? And, at that scale, is your database your gating factor, or is app architecture?

kasey_junk · on Sept 7, 2022

Can you name a single application that has 250k concurrent writes (per second I assume as you didn’t stipulate) using a consistent sql backend (that is no external caching/sharing layer)?

I don’t think your example is a good fit for SQLite as it’s not a good fit for any sql store that I know of.

rrdharan · on Sept 8, 2022

Uber, on Google Cloud Spanner, seems to be in this ballpark though they aren’t quite explicit about it:

https://www.uber.com/blog/building-ubers-fulfillment-platfor...

kasey_junk · on Sept 8, 2022

Spanner is explicitly sharded. Sort of famously so.

rrdharan · on Sept 8, 2022

All scale out databases use some form of sharding, tautologically.

The parent asked about use cases where there wasn’t an _external_ (which I interpret to mean “handled explicitly by the app, not the database”) sharding / caching layer.

Spanner internally handles splits based on load / size.

That having been said, more careful reading of the Uber post shows that they still do employ additional app-level explicit sharding to resolve hot-spots, so it’s probably a fair point that this doesn’t qualify as an example the parent was looking for.

endisneigh · on Sept 8, 2022

A few vendors doing time series data. Even if you lowered it a couple orders of magnitude. I haven’t seen any app doing even 5K concurrent writes backed by SQLite.

kasey_junk · on Sept 8, 2022

What time series vendor is offering high write throughput consistent sql?

Legitimately interested as this is something I have more than a little professional interest.

That said, “show me an app doing 5k concurrent writes with SQLite” seems like a much more reasonable standard.

lenkite · on Sept 8, 2022

Don't even bother with a million. Even 1k concurrent users with 75% read/25% write is not something plain SQLite can handle without significant latency compared to Postgres.

Scarbutt · on Sept 7, 2022

sigh. any app based on SQLite with more than 1 million concurrent users, 75% reads.

Definitely not the use case for SQLite unless you can easily do sharding but as you said, why bother?

bob1029 · on Sept 7, 2022

> And, finally: SQLite wants a single writer per database. But there's no rule that says your entire app has to land in a single .db file.

I like finding hints of our architecture scattered across these SQLite threads :D

It's very sketchy trying to slam all users into a single physical database. If you can slice up the users along unit of work, session, customer, account, etc., you can go horizontal very quickly.

sgbeal · on Sept 7, 2022

> I’d like to use a web service or app backed primarily by SQLite and see how it goes.

Look no further than sqlite's own source control system: <https://fossil-scm.org>

i've used that site almost daily since the end of 2007 and have encountered maybe (maybe) three "db is locked" errors.

TillE · on Sept 7, 2022

Not everyone is doing only web development for millions of concurrent users.

SQLite has been deployed on billions of devices, so "academic" is an odd term here.

endisneigh · on Sept 7, 2022

Non embedded. fly.imo for example exposes SQLite in a non embedded fashion.

ok_dad · on Sept 7, 2022

I don't understand your claim, with fly.io it seems like you attach some storage to your application and store the SQLite file there, semi-locally, much the same as any other use of SQLite. The SQLite database code is still inside your chosen application, it's just loading the file from some "persistent volume" abstraction.

jcwayne · on Sept 7, 2022

I'm curious how many companies building for >1M concurrent users have even a tenth of that in reality.

pjc50 · on Sept 7, 2022

That's pretty much the anti-use-case for SQLite?

bob1029 · on Sept 7, 2022

> I’d like to use a web service or app backed primarily by SQLite and see how it goes.

We've been doing this in production for over half a decade now. It goes so well it becomes boring very quickly.

endisneigh · on Sept 7, 2022

Stats on your service/app and workload?

bob1029 · on Sept 7, 2022

We shard out to 1 SQLite database per user's working session, so there really isn't a meaningful figure if you are looking for apples-to-apples in a traditional sense.

Our most heavily loaded SQLite instance probably sees 100 writes per second. In aggregate, we can handle tens of thousands of requests per second.

Abishek_Muthian · on Sept 8, 2022

> 1 SQLite database per user's working session

I presume you've designed your application to accommodate only the data types offered by SQLite and without any concurrent writing?

Thaxll · on Sept 7, 2022

You won't find anyone because it would mean that you have a single service talking to SQLite which most webservices won't do for HA reasons, you have a few of them behind a loadbalancer.

Using SQLite for any serious webservices is bad thing because you can only have a single app. Then you have other tools on top of SQLite for those shortcoming.

tptacek · on Sept 7, 2022

Having a single database service as a write master is already commonplace in HA Postgres.

Thaxll · on Sept 7, 2022

Right but they do that over the network not using the filesystem.

tptacek · on Sept 7, 2022

I'm not sure what's complicated about this. Instead of exposing a database instance as the write master, you expose a service as the write master. You can even do it transparently, if your load balancing fabric cooperates: just arrange to have read replicas redirect write requests to the write master.

avl999 · on Sept 7, 2022

With that setup you have essentially lost any benefits that SQLite provides, specifically any performance benefits by making a DB read/write a network call instead of a local call in the same process that manipulates some file on the same machine. You are closer to a postgres/mysql patterne except now your writes don't scale as well. What are we actually gaining with this setup other than perhaps easier administration of the "db service" compared to a mysql/postgres deployment?

tptacek · on Sept 7, 2022

Obviously, you have not lost those benefits: reads are satisfied from an in-process database, and, for most applications, most requests are reads.

adamrezich · on Sept 7, 2022

you've almost certainly used software that uses sqlite today, possibly even right now https://www.sqlite.org/mostdeployed.html

tptacek · on Sept 7, 2022

They stipulated non-embedded use cases for it --- really, their subtext is "show me places that use SQLite where ordinarily people would use an n-tier architecture", which is to say, Rails, Node, or Django. I think it's obvious to everyone that SQLite gets a lot of important use behind the scenes.

adamrezich · on Sept 7, 2022

"embedded" can mean a few different things so that's a bit confusing for me, but if the intent was "show me places sqlite is used as a database backend for user-facing web software", the Nim forum (https://github.com/nim-lang/nimforum) uses sqlite as its database backend.

sgbeal · on Sept 7, 2022

> but if the intent was "show me places sqlite is used as a database backend for user-facing web software", the Nim forum

Perhaps more appropriately, sqlite's own forum is sqlite-backed: <https://sqlite.org/forum> (running the Fossil SCM's forum interface, which itself exclusively uses sqlite for storage: <https://fossil-scm.org> (that whole site is an instance of fossil, serving all content from an sqlite db))

tptacek · on Sept 7, 2022

That's a good example.

zerr · on Sept 7, 2022

That's because of its prevalence usage as a better flat file or config file.

ngcc_hk · on Sept 8, 2022

Surprise a bit about the intent of sql. C J Dates of those days had no hint of that. Need to know set theory to understand what is going on I thought.

oxff · on Sept 7, 2022

For some reason, IME, SQLite feels the hardest to use of the popular databases. I did start my own database journey with Postgre and MSSQL so I feel kind of lost with SQLite whenever I use it.

sgbeal · on Sept 7, 2022

> For some reason, IME, SQLite feels the hardest to use of the popular databases.

sqlite is, above all else, a C API for interacting with a database. Try using the C APIs for Postgres or MySQL or Oracle/OCI, and i _guaranty_ you you'll find sqlite's easier to work with _by leaps and bounds_.

nijave · on Sept 7, 2022

I'm guessing this varies a lot by language and library. In Python, it uses DB-API 2.0 which is a Python interface so it's basically the same as, say, Postgres (minus the sql dialect quirks).

I'd think something like JDBC would do the same for Java, etc

sgbeal · on Sept 7, 2022

i can't speak to the high-level wrappers, but the native C API for MySQL is atrocious, the one for PostGres is meh, and the OCI is also (in my very limited experience) also meh. sqlite's C API (its core-most product) is a dream to use in comparison to its competitors.

HelloNurse · on Sept 8, 2022

With a similar library API as the competition (according to other comments, a better API) and a good SQL dialect, what can make you feel "kind of lost" with SQLite?

What are you missing? Obnoxious DBMS installations (e.g. DB2), colorful web management and monitoring consoles (e.g. Oracle), arcane semi-mandatory GUI tools?

purim · on Sept 7, 2022

Any dates on supporting range of ports to my running sqlite/db instance on? ex) 5999-8999

edit: not sure why this is being downvoted? not being able to define a range of port seems like a huge oversight. the forums there are not very active so I am asking here.

sgbeal · on Sept 7, 2022

> edit: not sure why this is being downvoted? not being able to define a range of port seems like a huge oversight.

It's likely been downvoted because your feature request demonstrates that you're not asking about sqlite. sqlite has no server element, and therefore no ports which need using.

> the forums there are not very active so I am asking here.

You're apparently not looking at the correct forums. As one of the sqlite forum moderators, i can say with some degree of authority that its forum is well-visited.

purim · on Sept 7, 2022

[flagged]

sgbeal · on Sept 7, 2022

> Do you work for fly.io? Horrible way to represent the company like this!

Wha!?!?? The top post is very specifically about sqlite3, not fly.io (whatever that is). Your comment about it was taken entirely within the context of sqlite, not fly.io (whatever that is).

> Why would I be on sqlite forums?

Because this whole thread is about sqlite and your post said "the forums," implying sqlite's forums, not fly.io's (whatever that is).

Edit: is see now that the article is hosted on fly.io (whatever that is), but the article is still entirely about sqlite, not fly.io (whatever that is).

tptacek · on Sept 7, 2022

Whatever that is?! Hey now!

simonw · on Sept 7, 2022

Your original comment didn't clarify that you were talking about ports on a Fly.io instance, which I imagine is why people didn't understand the relevance of your question.

duskwuff · on Sept 7, 2022

And even if the parent is asking about Fly.io, I don't see how that's relevant to a blog post about SQLite.

mrkurt · on Sept 7, 2022

https://news.ycombinator.com/item?id=32251586

This is answered in the forums, too: https://community.fly.io/t/new-feature-every-public-port-now...