Replicating SQLite using Raft Consensus

otoolep · on Feb 23, 2015

Hello -- rqlite author here.

It was definitely interesting to code this. It was 50% to meet a need, 50% to better understand Raft consensus in general, and Ben Johnson's go-raft implementation in particular (https://github.com/goraft/raft). As it happens Ben and I now work together, on InfluxDB (http://influxdb.com/), which has a new Raft implementation at its core.

Definitely fun.

teraflop · on Feb 23, 2015

This exact problem was the final challenge of last year's Stripe CTF contest. They put together a really cool test harness that would systematically probe for any possible way that an unreliable network could cause the system to fail. In order to complete the challenge, you had to achieve a desired throughput without ever allowing the replicas to become inconsistent.

Unfortunately Stripe doesn't seem to have kept the main site[1] online, but the test harness is still available on Github[2].

[1]: https://stripe-ctf.com/ [2]: https://github.com/stripe-ctf/octopus

kijin · on Feb 23, 2015

> While it is possible to continually copy the SQLite file to a backup server every time it is changed, this file-copy must not take place while the database is being accessed.

What about the `.dump` command?

I've been using it to backup SQLite databases since forever. It produces an SQL script (similar to `mysqldump`) that you can import back into another SQLite database, or even into another RDBMS. It's much more elegant than copying the database file itself.

But come to think of it, I never really checked whether the `.dump` command produces a consistent backup, and the official documentation doesn't seem to answer that, either. If you're familiar with the internals of SQLite, please let me know whether I should keep trusting `.dump`.

otoolep · on Feb 23, 2015

You could be right, it might well produce a consistent backup. If fact, I'd be very surprised if it didn't. But doing that misses the point of rqlite, of course. rqlite replicates the database to other machines, after every transaction, without copying the entire sqlite file everytime.

jakobegger · on Feb 23, 2015

I'm not sure a backup that was never tested qualifies as backup.

kijin · on Feb 23, 2015

The backups do restore properly all the time, I made sure of that.

I simply don't have enough transaction volume to test all the edge cases that might produce an inconsistent dump (and if I did, I wouldn't be using SQLite).

stavros · on Feb 23, 2015

SQLite is amazing. I wonder if it's usable as a replacement for Postgres for web apps, I'm not sure what happens when multiple processes try to write to it.

Regardless, this is a very interesting project, good job.

pavpanchekha · on Feb 23, 2015

Multithreaded use is not supported. SQLite is not a replacement for Postgres—Postgres supports many advanced data structures (lists!), SQL commands, and is faster. But in certain cases, it is enough, and of course setting up SQLite is much easier.

teraflop · on Feb 23, 2015

> Multithreaded use is not supported

To be more precise, it's perfectly legal to use SQLite from multiple threads or processes at the same time. It will use its own mutexes and OS file-locking operations to ensure that at any given time, a database is being accessed by either a single writer, or an arbitrary number of readers.

In my opinion, the bigger issue is that clients can only safely access a SQLite database if it exists on a local filesystem. As soon as you decide you want to run multiple web servers, you're in trouble.

stavros · on Feb 23, 2015

Sorry, yes, that's what I meant. I use it for a mostly-read-only Web app, but wasn't sure if it's a disaster waiting to happen. Good to know it's a viable alternative to postgres in multithreaded or multiprocess scenarios.

reitanqild · on Feb 23, 2015

I ran it as a replacement for MySQL for 3 years, light usage php web app, around 100 - 200 users I guess.

It saved me quite some hassle and reduced the attack surface compared to MySQL.

FWIW I never bumped into speed issues. My biggest issue against it is that it would let me alter columns directly which effectively sabotaged my use of alembic when I started using Python/flask.

stavros · on Feb 23, 2015

Huh, that sounds like it worked very well. Plus, the reduced overhead of backups etc is definitely worth it for small apps.

cookrn · on Feb 23, 2015

Here is more information on the concurrency-related features of SQLite to complement teraflop's sibling comment: http://sqlite.org/threadsafe.html

lsb · on Feb 23, 2015

It's a good database in its own right. Use a write-ahead log, and you can get one writer and as many readers as your local disk can support.

The common thinking is that it's more of a replacement for fopen than it is for Oracle.

tveita · on Feb 23, 2015

This looks like statement based replication, which can come out of sync if you're not careful to make your statements deterministic.

Any use of random(), as a trivial example.

lsb · on Feb 23, 2015

This is pretty cool!

Interestingly, once SQLite 4 comes out you'd probably implement this the opposite way, using Raft consensus (and something like etcd) for block storage, and using the SQL engine over arbitrary pluggable block storage.

andrewstuart · on Feb 23, 2015

sqlite4 seems not to be moving so fast - has it stalled?

biokoda · on Feb 23, 2015

ActorDB takes this concept much further. It replicates the WAL file (not sql statements).

http://www.actordb.com

https://github.com/biokoda/actordb

shazeline · on Feb 23, 2015

I skimmed the source and didn't see anything regarding leader election. Am I missing something or is this implementation encapsulating some of the underlying terminology/analogies traditionally used by Raft?

masklinn · on Feb 23, 2015

Wouldn't that be done by the underlying Raft implementation (goraft)?

otoolep · on Feb 23, 2015

Yes, that's right. 'go-raft' is imported by rqlite, and uses that package for consensus and leader election.