Internal Versus External BLOBs in SQLite

mattbillenstein · on Dec 27, 2014

How old is this benchmark? HDD or SSD (I think the former given the 2011 reference.

A filesystem is the canonical means for storing files - just use it - it's incredibly handy to have regular tools at your disposal for managing those files outside of the database. The only case I can think where maybe files in the db makes sense is if you have millions of really tiny files (100-200 bytes).

Having files in the db maybe avoids an extra seek - but in any real system high throughput system, you're probably leaning towards SSDs where you get many more seeks/s than you do on a spinny disk.

If your application needs something more like a distributed filesystem, there are better things (gluster, moosefs, even nfs, etc).

gilgoomesh · on Dec 27, 2014

> A filesystem is the canonical means for storing files - just use it

Sure but what about the many use cases where you're not really storing files but metadata or other generated data?

A simple example might be photo libraries. The photos themselves are files but are the 5kB small and 50kB medium thumbnails generated from the photos worth storing as files? Or should they be stored as blobs in the database along with their other metadata?

It's good to know where the cutoff should be.

mattbillenstein · on Jan 5, 2015

Metadata makes sense in the db - it's structured non-binary data.

But I think the binary thumbnails and so forth should almost always be files on the filesystem.

Bill_Dimm · on Dec 27, 2014

> How old is this benchmark?

It says they used SQLite version 3.7.8. Since 3.7.8 came out on September 19, 2011 and 3.7.9 came out on November 1, 2011, it was probably in that range.

jpgvm · on Dec 27, 2014

If consistency and reliability matters I would probably avoid GlusterFS like the plague. (things may have changed now but I worked with it in a petabyte scale bioinformatics cluster and it was not fun)

Look at Ceph if you want to store files in a consistent and distributed fashion. Not the filesystem though, the object store (RADOS).

mattbillenstein · on Jan 5, 2015

I had trouble with Gluster - and the Ceph filesystem as well. But MooseFS was solid.

anaisbetts · on Dec 27, 2014

Caution that on Windows (and even OS X to some extent), this graph will be very different - NTFS perf for accessing a lot of small files is very slow compared to Linux.

Spooky23 · on Dec 27, 2014

I've always pushed back at storing blob data in databases in most cases -- probably 80-90% of the time depending on the job.

Why? Databases are all about storing data in a structure defined by the relationships between the records. In most cases with files, the relevant data is the metadata or other structured data, not the file.

If you do the math, it's rarely cost effective to store unstructured data in a structured database. Databases are always harder to scale.

On the file side, there are dozens of ways to effectively deliver files that are cheaper, more scalable and simpler for many use cases.

electrum · on Dec 27, 2014

Databases also have advantages: transactions, durable logging, replication, integrated backups, etc.

Tom Kyte has some insightful points in this thread: https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUE...

"Do I want you to keep MY medical stuff in a directory, unsecured, unmanaged, not backed up like the database is - we have no idea if recovery is possible, who accessed it last? Who has access to it."

"if this data is valuable to your business, if this data is necessary for your business, if the loss of this data would damage your ability to do business - it has no business NOT being in the database."

Spooky23 · on Dec 27, 2014

Absolutely. I didn't say never do it... just that it doesn't make sense to me most of the time.

Tom is Mr. Oracle, so I am not shocked that he's a big advocate of using that platform. But I don't like the straw man argument that anything that isn't managed by an Oracle DBA is automatically flawed and unreliable. Give me a break... Robust auditing, backup, professional management, etc can be delivered anywhere. (Or f'ed up on an Oracle platform)

That said, it's a business decision -- somebody, somewhere has probably justified putting user home directories on an Oracle database.

twombly · on Dec 27, 2014

Actually yeah I do want to keep MY medial stuff in a directory ; but that doesn't mean that it's unsecured, unmanaged and not backed up. Ultimately all data ends up on the filesystem in one way or another if it's large enough. I would rather have a filesystem that I can trust rather than trying to fix/ignore that layer by building more layers on top of it.

xorcist · on Dec 27, 2014

Relatational databases has their advantages compared with hierarchical filesystems of blobs, but that comment is really completely beside the point.

File systems are simpler and backups and replication are really much more straightforward. As anyone who did disaster recovery of Oracle databases can attest to, there are a lot of details to get wrong.

It boils down to what data you want to store and what your access patterns look like.

Spooky23 · on Dec 27, 2014

Great point -- Oracle has a solid capability that is easy to operationally screw up. Database recovery is usually a real shitshow.

When you go to other platforms, you need solid process AND deep exterior in whether the database can do what you want. Can MySQL handle the same recovery scenarios that Oracle can? With which storage engines?

regularfry · on Dec 27, 2014

You don't get to avoid backing up the database just because your file contents happen not to be in it. Now you have two different backup jobs not just to run, but to run atomically.

bch · on Dec 27, 2014

I've ran into situations where this is pertinent information to know w/ a music pet-project. I ended up jamming all my .mp3s into the DB though, because not only is it storage and access in this case, but the collection (in a DB) is in a defacto archive; all I do is copy the (admittedly large) DB where I need it, and I've got my data and metadata. If I were doing it again, I'd rethink this design, which isn't to say I'd abandon it, but I'd reconsider it; it's unclear whether the all-in-one solution is worth the bulk, though it might still be.

heavenlyhash · on Dec 27, 2014

I have the opposite reaction to this situation: I avoid products that use DBs for smallish datasets because the DB blob doesn't dedup well, and that impacts me negatively for maintaining file-system level backups as well as for efficient sync between hosts.

Sure, I could write a custom script that dumps the DB to a format that deltas better, but I'd really rather my filesystem level dedup and sync tools just work and not have to do something special-case for every application I use.

(I'm currently going through this exact process with my media library -- it's a dozen megabyte database. It's so small loading the entire thing into RAM is easier than not; indexing it is beyond unnecessary, etc. And yet because it doesn't diff, every time I sync it between laptop and desktop, I'm irritated to discover I'm waiting for the entire thing to cross the network every single time.)

XorNot · on Dec 27, 2014

Agree with this. It's the current thing I am hating about every single music player application. There just isn't that much data to manage - why do they all insist on a binary blob for playlists and organization?

frik · on Dec 27, 2014

WinAmp, FooBar2000 (Plugins) and Windows Media Player (Win7) store the music metadata in the mp3 files itself (IDv2 format). Using the Windows search advanced syntax I can find all songs that I rated with >=4 stars. One can also add various metadata columns to Explorer details view (like album name, stars-rating, etc.). As playlist format use M3U or its unicode sibling - a format that every music player supports; avoid WPL or other lesser known/supported formats.

It would be great if more music players like car entertainment systems, iTunes, iOS Music would show the stats rating that is stored in the mp3 files itself (metadata). Monolithic databased based music management systems may be convenient for some users. I personally prefer to copy the mp3 files (with embedded metadata) to my other devices.

My wish list: More intelligent playlist playback that e.g. recognises song skipping as down-voting and manual replaying a song as up-voting and plays similar songs based on the music fingerprint like iTunes Genius but as completely offline feature.

Casual people nowadays listen music on streaming platforms like Youtube, Spotify or old-school radio stations. So it's unfortunate but we saw no new exiting new features in music players for some years.

akx · on Dec 27, 2014

Storing "post-fact" metadata such as ratings, etc. in the files themselves is a big no-no in my book, since the hashes, etc. of those files will now be changing.

For the first ingestion of the data, be it from a CD or from the artist/label themselves, sure, that metadata can, and should, be stored in the files, but not your own.

frik · on Dec 27, 2014

What music player do you use on your PC? How do you share/sync your ratings with your phone/car-audio/mp3-player/tablet/other device? Do you use ReiserFS4/ZFS - or what's the fetish with file hashes about? Mac/iOS or music streaming (Spotify, etc.) service walled garden eco-system?

There are several metadata formats for mp3, flac, mp4, etc. formats. Windows Media Player e.g. stores the music star rating per user name and adheres to the metadata standard. Most media player apps adhere to the metadata standards. Some players only read, but write to an internal database with no export option - I burnt myself with iTunes back in 2004. Some apps write older metadata formats like IDv1 instead of v1 and v2.

If you mean CRC32/MD5/SHA1-hash, than you do something wrong. There are special hashes for music data like qhash and minhash that Google and other similar services use. The media formats like mp3 have seperate sections for the audio data and the metadata payload, so you should use a hash of only the audio data anyway, the metadata is meant to be edited (parts of it as described in the standards).

Only with mp3 files that contain proper metadata one can sync his audio with various devices from different manufacturers.

quacker · on Dec 27, 2014

> since the hashes, etc. of those files will now be changing.

I'm having a hard time seeing hashes[1] being very valuable for much of anything regarding music library management. Hashes won't be any good for comparisons of music files containing the same song across different recording sessions, bit rates, or file formats.

I would rather all of my music playing programs respect standard metadata formats (like ID3) - which is usually included in most digital downloads and which is supported across all the popular audio file formats - and leverage that metadata for organization. Off-hand, I can't think of a music organization system that relies on hashes that is not brittle.

1: I'm talking about "dumb" hashes like SHA-1. Fingerprinting can work well for this, but is more complicated: http://en.wikipedia.org/wiki/Acoustic_fingerprint

cube00 · on Dec 27, 2014

Managing a playlist (sorting and searching multiple fields) is easier with some kind of DB layer. Usually those blobs are either MS Jet DB (MS Access) or SQLLite and can be opened for inspection/export. Unless of course of the blob is in a proprietary binary format in which case it's all about vendor lock in.

XorNot · on Dec 27, 2014

Some kind yes, but I'd prefer the index be transitive and the data confirmed against what's actually on the disk. Not to mention, those proprietary formats usually fail to sync to player devices automagically.

bane · on Dec 27, 2014

I'd always be concerned of some minor level of corruption to the big file, something that would only ruin a song or two, could render your entire dB useless.

pmoriarty · on Dec 27, 2014

This is a good point. Not just for DB's but for any large file.

It's not uncommon to lose an inode or two on certain filesystems during OS crashes. When one of those inodes is of a large, valuable file that you don't have a backup of, you've got problems.

Of course, the best solution is to have recent backups. Unfortunately, the more data you have, the harder, more time consuming, and more expensive your backups will get.

icefox · on Dec 27, 2014

So the natural question is why don't "file systems" (aka big database) become useless with some minor level or corruption?

hobs · on Dec 27, 2014

I was about to comment that a database will only be corrupt if it is corrupted in the same way a file system would be corrupted, or its not a great database file format.

I work with SQL Server and while database corruption can/does happen, I often can get away with a row level restore from a backup, no problem.

I still dont like storing big ass files in a database though, I dont love the additional level of abstraction when you generally dont need to have a high transactional/relational system.

emn13 · on Dec 27, 2014

I think you've got that backwards: if you have a database for something, is it worth the additional levels of abstraction to also store some data in files?

If your data is fairly small, and perf not the biggest issue (which is pretty common), you might as well go for the simplest solution, which is going to be purely DB or purely FS, not some mixture of the two.

hobs · on Dec 27, 2014

Agreed, the overhead of maintaining two solutions can be a huge burden (especially when troubleshooting something going wrong via the synthesis of two solutions), but I definitely called out "big ass files" because the difference between some text data and some smaller blob is almost nothing in terms of performance.

bch · on Dec 27, 2014

True -- that could be considered a point for "files to FS, metadata to DB".

icefox · on Dec 27, 2014

Were you worried that the OS makes it really easy for a user would muck with the filesystem and not update the applications metadata? On the easy of copying one file was it that coping all of the files was too error prone?

herf · on Dec 27, 2014

You could compare with an append-only-file (with offline compaction), in which case saving tons of "file open" operations would actually be a big win, especially if you ever use a network file system. In some cases, a "simple" log+index can beat both schemes.

Also, I've stored lots & lots of photos in a filesystem, and backup is very hard. Backing up big files is just a lot easier.

pjc50 · on Dec 27, 2014

How is backing up lots of small files harder than a small number of big files? With which software?

herf · on Dec 27, 2014

Using rsync and 20M files is a good enough example: - to check the date of each file on UNIX you must run "stat" once per file (unless you have an external log that says what to skip) so that's very slow. - to backup a big file is "--append-verify" or something like this and one streaming read per file.

electrum · on Dec 27, 2014

This sounds similar to Facebook's Haystack: https://www.usenix.org/legacy/event/osdi10/tech/full_papers/...

pmoriarty · on Dec 27, 2014

There are big files, and then there are Big files. Some files are just too big. At some point, it's easier to deal with smaller files.

herf · on Dec 27, 2014

Yeah, this doesn't preclude breaking things into chunks (typically limited by however many files a non-root user can open).

I just dislike unnecessary "open" calls because it can get kind of crazy. For instance, if you happen to send a short filename to a samba server (which must enumerate every file in every folder up to the root), or use NFSv4 ACLs, etc.

PythonicAlpha · on Dec 27, 2014

I think, that is a rather complicated question. You should also take into account how accessing other data is affected by large blobs that reside in the same database. For example: you have a table with large data blobs and other tables with conventional relational data. The blobs will take a lot of the cache size away from the other data. Also by increasing the database size, you will have potential longer access times, at least when you use spinning hard drives.

So, several things to think about (make cache bigger, how much grows the overall db size) that can change the database behavior.

Of course those numbers can give some directions, but since only the blob data was taken into account, it can not give full information.

I personally would most of the time prefer smaller db sizes, at least, on edge cases. Relational databases are great for structured data -- unstructured data is not their specialty.

cefstat · on Dec 27, 2014

About 9 years ago I had a small site where both page content and some large PDF files were stored in the same SQLite database. Nothing extravagant: about 20 pages and 10 PDF files, each PDF file averaging 2-3 MBs. The website was extremely and consistently slow: it would take several seconds to load any page. I eventually figured out that the problem was the binary blobs. After moving the PDF files from the database to the filesystem the load times went below 0.1s.

Since then I have not put large binary blobs in a database but I wonder if I would get reasonable performance by putting them in a _separate_ database instead of mixing them with other small-size rows.

emn13 · on Dec 27, 2014

That's weird, and should not have happened. I've done similar things with sqlite databases many times that size, and though the FS might have been even faster, sqlite was still more than fast enough (as in, sub-10ms responses the norm).

I suspect there's some way in which your access pattern or usage was suboptimal. Perhaps you had big reads with small writes in a transaction, or your network streamed directly from db leaving connections open a long time, or... something, because there's no way this should have taken so long.

geku · on Dec 27, 2014

Wondering how MySQL and Postgres perform. Does anybody have links to such benchmarks?

btrask · on Dec 27, 2014

SQLite supports a single writer at a time, so writing large amounts of data limits your write throughput. A file system can cache data for several files concurrently and fsync them independently when you're done.

I believe SQLite also has to write the data twice, using the WAL. Maybe there is an optimization to avoid that though.

tines · on Dec 27, 2014

> I believe SQLite also has to write the data twice, using the WAL. Maybe there is an optimization to avoid that though.

If I understand correctly, it writes the data twice if you're not using WAL. With WAL, it only writes the data once [1].

[1]: The "Performance Considerations" section of https://www.sqlite.org/wal.html

seppo0010 · on Dec 27, 2014

The data is written twice when using WAL. From the same link you pasted:

> The original content is preserved in the database file and the changes are appended into a separate WAL file.

The difference is whether the journal keeps the new data (WAL) or the old data (not WAL). If you are writing big chunks of data, WAL will probably be more I/O intensive.

Taking a wild guess I'll say that if you are adding data to the database, and not using WAL, the data will only be written once since the journal won't keep a reference to old data if no pages is being overwritten.

tines · on Dec 27, 2014

Ah, I see the difference now, thanks.

Sami_Lehtinen · on Dec 27, 2014

With WAL mode data is always written twice for sure, unless same pages are updated several times inside the active part of WAL journal. Without WAL mode data isn't getting written twice unless you're updating it. These are annoyingly complex issues, and I'm not even 100% sure if I got it right, I'm just using common logic, I haven't checked source.

microcolonel · on Dec 27, 2014

HTTP headers are telling me that this was last modified on 2014-12-18, I really hope they didn't JUST get around to testing ubuntu 11.04.

Sami_Lehtinen · on Dec 27, 2014

Another great question is, if there's any reason to use smaller than 4096 byte pages with current operating and file systems?

Sami_Lehtinen · on Dec 31, 2014

Isn't file system just a hierarchical key value storage? Think about it.

Animats · on Dec 27, 2014

Note that the performance difference rarely exceeds 2:1 either way. So this isn't a performance issue, unless you're using SQLite for something bigger than SQLite is for.

delinka · on Dec 27, 2014

I don't think I'd define "rarely" as "nearly 25% of cases." Seems it happens with enough frequency to at least be aware.