Exploring the Virtual Database Engine inside SQLite

rogerbinns · on Oct 16, 2012

The big change hinted at in the documentation around the 3.5 era is that SQLite switched from a stack based VM to a register based one. Note that the implementation details are not exposed to a developer using SQLite and there is a massive test suite so the change had no visible impact.

What isn't mentioned is why SQLite is using a virtual machine in the first place. The reason is that SQLite only calculates the next row of results when you ask - it does not calculate all result rows at once in advance. Each time you ask for the next row of results it has to resume from where it last left off and calculate that next row. The virtual machine is a way of saving state between those calls for the next row (amongst other things).

This is also why there isn't a method in SQLite DB adapters to get the number of result rows. SQLite has no idea other than actually calculating them all which is the same amount of effort as getting all of them.

finnw · on Oct 16, 2012

>Each time you ask for the next row of results it has to resume from where it last left off and calculate that next row. The virtual machine is a way of saving state between those calls for the next row (amongst other things).

They could have achieved the same effect by implementing it in a language with continuations or coroutines, or maybe just iterator functions. That might be viable today if the language is fast enough and callable from C.

C# and LuaJIT spring to mind.

beagle3 · on Oct 16, 2012

SQLite predates both; the first official release is from August 2000.

Also, one of the goals of SQLite is to be available everywhere. It's available as super-standard-super-clean C code in one .c file, that works on just about every architecture. Had they used C# or LuaJIT, that could not be achieved.

I'm not sure if you are aware, but SQLite is what powers almost every app on Android or iPhone that needs tabular storage, including the built in ones like Contacts.

jerf · on Oct 16, 2012

"They could have achieved the same effect by implementing it in a language with continuations or coroutines"

They did. They just implemented their own minilanguage instead, that (AFAIK) simply has no conventional syntax-based serialization. It's still a language, though. Given their tight focus, it is completely plausible that they can create a little VM that will run far, far better than any general-purpose language VM could.

ominous_prime · on Oct 16, 2012

But in the end, you're just trading performance for abstractions. There's not really a functional difference between writing the DB in a language running on a VM implemented in C; or writing a DB with a VM implemented in C.

If performance and portability are your goals, C is going to be the logical choice.

rabidsnail · on Oct 16, 2012

You already have the abstractions, they're just C abstractions.

onetwothreefour · on Oct 16, 2012

SQLite is used everywhere. Embedded devices, all sorts of different architectures, etc.

C# and LuaJIT are not options. I don't really understand what your point is, because they implemented their own VM instead.

finnw · on Oct 16, 2012

Fair enough, but I was thinking "how would I implement it today" rather than "how should they have implemented it at the time."

lvh · on Oct 17, 2012

Implementing it today would have the same issues: SQLite runs on more platforms than even LuaJIT; and even then I still don't see that a whole general purpose language, no matter how small or embeddable (mostly thinking of Lua here) would compete against SQLite's "be good at SQL-y computation" VM. The main argument for Lua is having humans write it; but the human interface to SQLite is supposed to SQL. I'm not sure I understand the benefit of compiling the SQL down to Lua so it can then be interpreted by the standard Lua implementation...

shortlived · on Oct 16, 2012

Thanks for the insight. Do you know the reasons why they switched from using stack to registers in their VM implementation?

SQLite · on Oct 16, 2012

We found it is much easier to handle exceptions without risking stack leaks using a 3-address register machine. Code generation is also simplified in a register machine compared to a stack machine (at least in the case of a VM designed specifically to run SQL statement).

The VM used by SQLite is different from the VM used by things like Javascript or Python in that SQL is not a general-purpose programming language. Most of the opcodes in the SQLite VM are heavy-weight concepts such as "insert a key/value pair into a b-tree" or "decode the N-th column of a table row". These opcodes are implemented with hundreds or thousands of lines of C code. The overhead of instruction dispatch is insignificant in comparison. And so there isn't much of a performance advantage one way or the other between a stack machine and a register machine in SQLite. The choice of a register machine is for maintainability, reliability, and testability.

Someone · on Oct 16, 2012

I do not know, but if I had to guess, it is for performance.

I am no expert in this field, but AFAIK http://static.usenix.org/events/vee05/full_papers/p153-yunhe... (discussed on http://news.ycombinator.com/item?id=704576) has not really been refuted.

I base that on the fact that both Dalvik (http://davidehringer.com/software/android/The_Dalvik_Virtual...) and Lua 5.0 (http://www.lua.org/doc/jucs05.pdf) use register-based VMs.

draegtun · on Oct 16, 2012

The Parrot VM is also register based: http://en.wikipedia.org/wiki/Parrot_virtual_machine

bane · on Oct 16, 2012

I wonder if anybody has dared guestimate how much money SQLite has provided to the world in terms of time saved not hacking out buggy serialization and indexing code for software projects and just using this amazing tool instead.

I use SQLite frequently in small projects and it's absurdly invaluable. That it's free and PD almost beggars belief.

rabidsnail · on Oct 16, 2012

Neat! Is there a way to tell the backend "run this bytecode, please"? If there is sqlite would make a great test bed to try out new query languages. Or, inversely, you could write a distributed database whose wire protocol was sqlite bytecode. Or you could write code translator to let you run sqlite queries on hadoop.

motter · on Oct 17, 2012

Take a look at Oracle Berkeley DB:

"Berkeley DB is not exposed to the end-user. It is totally hidden below the SQLite APIs. It acts as the storage engine in place of SQLite's own BTREE. An application written to use the SQLite version 3 API can switch to Oracle Berkeley DB with no code changes, simply re-link against Berkeley DB."

http://www.oracle.com/technetwork/database/berkeleydb/overvi...

I haven't had time to read the full paper, but it looks very interesting.

lvh · on Oct 17, 2012

SQLite is even recommended[1] (granted, by the authors) as a testbed for SQL extensions. While an entirely new query language would obviously be a lot more work, I don't see why it wouldn't be an equally good choice.

[1]: https://www.sqlite.org/whentouse.html

euroclydon · on Oct 16, 2012

Informative but short article. I'd love to read more about when it's time to create a VM for your program or system. If anyone has links to more articles on VMs in practice, especially a story chronicling a transition from a non-VM architecture to a VM-based one, please post them.

themckman · on Oct 16, 2012

Now that's just one of those things I would have never guessed. Super interesting.