You say that like a networked RDBMS isn't meant to be a replacement for fopen() ...

hagy · on Aug 22, 2021

Exactly! There are numerous applications where SQLite is replacing an ad hoc file-based solution, with nebulous at best durability and atomicity, let alone scalability. Like an application that would otherwise persist state in XML/JSON/protobuf and have to manage the challenges of regularly persisting and reading state; challenges with become unwieldly as the data size grows.

This is not a challenge commonly faced on on the enterprise server side, hence why many of us cannot imagine a use case for SQLite. Our workloads involve multiple readers/writers for both scaling and availability. An RDBMS cluster is the obvious choice.

But for mobile/embedded, all the data is local to a single device and commonly also a single process. In this case, an RDBMS would add unnecessary overhead. Additionally, there's desktop software that has to work with relatively large datasets, which require reliable persistence and efficient searching/reading. Adding an RDBMS would complicate installation and support.

Further, researchers and data analysts benefit from SQLite databases when the data is too large to hold in memory, yet not large enough to warrant a proper data warehouse. Even ~100 MB datasets can benefit from SQLite if you're performing a lot of random writes/reads or want to execute complex queries. There are other alternatives such as Apache Arrow, but SQLite is tried and tested option. It can be populated and queried similar to SQL-based data warehouses, and it also includes secondary indexes, an efficient query engine, and fast random writes with durability and atomic transactions.

There's a reason why SQLite can reputably claim to be the "Most Widely Deployed and Used Database Engine" with an estimated one trillion SQL database in use. [0]

[0] https://www.sqlite.org/mostdeployed.html

tomnipotent · on Aug 22, 2021

There are also many use cases where a network service would use per-user SQLite files, which can support concurrent reads from multiple processes (but global write locks). If all processes accessing the file are on the same host, you can also enable a WAL but that has its own pros/cons.

Sqlite is a beast, there's a lot you can do with it. But there's no shared buffer cache or strategy for keeping disk pages in-memory across transactions and files, so you pay for it with random disk I/O access proportionate to your client load. But for many workloads, it's a really low-maintenance option with lots of low tech and effort back-up/disaster recovery options and depending on how chatty your clients are you can get a lot out of modern hardware/clouds.

You could also get clever, and provide pure in-memory access when pinning clients to hosts and treat it like a write-through cache.

zarzavat · on Aug 21, 2021

Not many people are willing to make PostgreSQL a dependency of their program.

twic · on Aug 21, 2021

How about DuckDB:

https://duckdb.org/

It keeps databases in a single local file like SQLite, and it supports PostgreSQL syntax (not sure which version). It does a load of funky stuff to be really fast at analytics workloads, but I would guess it's fast enough at transactional workloads for small-scale use.

da_chicken · on Aug 21, 2021

Not a dependency in the sense of not a statically linked binary? Sure.

Not a dependency in the sense of not requiring a specific version of a Postgres db provider, or a specific version of a Postgres itself, or able to do any work without Postgres existing? Laughably no.

DB agnostic software is very uncommon, even though ODBC is well-established and mature.

d4mi3n · on Aug 21, 2021

I always found this very interesting. With the exception of enterprise software built to run in customer datacenters with whatever infrastructure that may entail, practice seems to dictate that the overhead of making something DB agnostic isn't a worthwhile trade off.

I haven't spoken to too many folks who've had to make this decision, but off the top of my head I think there's a number of factors that aren't obvious at first glance that factor into this equation:

1. You need authors of queries to understand the least common denominator of supported SQL features and syntax (probably ANSI SQL, but there may be cases where it isn't)

2. Such projects would need to test or otherwise verify that all SQL queries execute in the ways they're expected to. This can be tricky when you have regressions in query performance from one database flavor to the next.

3. Your application can't have features that are made or broken by features specific to a particular database. I can see this being a very limiting constraint in a world where business intelligence and data analysis at scale has become a the norm.

Are there other factors here that would compel projects to stick to something like ODBC (or your database abstraction layer of choice) rather than against it?

delusional · on Aug 21, 2021

The most important factor for me has always been simplicity. If i know my app is going to hit a postgres, and postgres has a feature that perfectly fits my problem. Why would I spend a bunch of time and effort building it again. Just so my app can run on a database it's never going to run on?

For me it's about using the full power of my underlying technologies. Just like i don't restrict myself to a subset of C that would compile with any C compiler ever, i don't restrict myself to generic SQL.

icedchai · on Aug 22, 2021

ODBC, like JDBC, isn't really a database abstraction layer. It lets you connect to multiple databases in a standard way. That's it. Your queries still need to be database specific, or stick to some limited syntax. I've never seen a real project that doesn't eventually require some vendor specific SQL syntax.

zarzavat · on Aug 21, 2021

Not a dependency in the sense that nobody is going to ship PostgreSQL with their 20MB mobile app. SQLite is absolutely ubiquitous in mobile and desktop software. It also works well as a file format, you can download a DB full of data from the internet or a git repo.

It's designed to be embedded into software and used to replace raw files. That's why the database is a single file, which is something that is a feature in every situation except if you have a server doing many concurrent writes.

The only reason that SQLite even works as a stand-in for an RDBMS is because they did such a good job implementing the SQL standard.

samus · on Aug 22, 2021

It's actually disappointing that PostgreSQL doesn't support an embedded mode like MySQL does. I guess it's because of its concurrency model of one process per user (MySQL uses threads AFAIK), but for some embedded use cases like unit/integration tests, single-user single-connection mode would go a long way already.

Testcontainers are the proper solution for automated tests, but it is not trivial to set up CI as AFAIK it requires access to the Docker socket.