OK, so I've got 50GB of audio samples. Does anyone actually believe that these 5...

wswope · on Sept 3, 2022

Can be; it’s all contextual.

https://www.sqlite.org/fasterthanfs.html

PaulDavisThe1st · on Sept 3, 2022

Fair enough. The problem is that at some point, the data has to hit some sort of storage hardware. Presumably between the DB and the hardware, there's some layer that somewhat abstracts the storage hardware. Isn't that ... a filesystem?

tremon · on Sept 3, 2022

That's a storage volume (partition, raid volume, zfs pool, etc), not a filesystem. A filesystem is the abstraction layer on top of the storage volume translating the user-assigned data identifiers (aka file names) to byte ranges.

Talking specifically about databases: they often implement their own data organization. Oracle and Sybase famously performed better when working on raw partitions than with files.

jonjacky · on Sept 4, 2022

This 1981 paper by Stonebreaker, Operating System Support for Database Management [1], explains why conventional file systems are not a good foundation for databases. He writes:

"The bottom line is that operating system services in many existing systems are either too slow or inappropriate. Current DBMSs usually provide their own and make little or no use of those offered by the operating system. ..."

Now, 40+ years later, Stonebreaker is a participant in the present DBOS project!

1. http://www.cs.fsu.edu/~awang/courses/cop5611_s2022/os_databa...

wswope · on Sept 3, 2022

If I’m reading you right, you’re correct that the database is still technically passing its data to the filesystem at the end of the day.

However, databases generally subsume most responsibility for the on-disk representation of data as well as I/O patterns. What’s really being compared here is the performance of the database as a storage engine vs. the file system itself as a storage engine - not the raw I/O potential of the filesystem itself.

https://en.wikipedia.org/wiki/Database_engine

jandrewrogers · on Sept 3, 2022

Many databases implement their own filesystem internally that are heavily optimized for database-y use cases and access patterns while missing standard POSIX and other features a "real" filesystem would have. When this filesystem is installed on top of the OS filesystem, there is a cost due to duplication of effort, design impedance mismatch, limitations of the OS filesystem, etc. This is partially mitigated by turning the files in the OS filesystem into a giant block store to minimize interaction with the OS filesystem.

Some database filesystems can be installed directly on raw block devices if you desire with no OS filesystem in the middle. This usually offers significant performance and efficiency gains since everything above the raw hardware is purpose-built for the requirements of optimal database performance.

samus · on Sept 3, 2022

A filesystem offers a hierachical interface. Meanwhile, a DBMS needs nothing more from the OS than access to blocks and preferably information about HDD layout. That's a level below.

hcta · on Sept 3, 2022

Can you clarify what point you're making? If you're trying to argue that adding an extra layer can only reduce performance, any cache is an obvious exception to that. Are you saying it's extraneous to use a database as a storage abstraction because they have to sit on top on filesystems, and filesystems already exist?

PaulDavisThe1st · on Sept 3, 2022

The point I'm making (and I'm not certain that it is true) is that ultimately if you want to store raw data, a filesystem seems more likely to be what you want to use. Put differently, BLOBs in the DB end up (necessarily) as blobs on the disk, and managing blobs on a disk is precisely what filesystems are intended for.

But yes, on top of that, there's the question that in the end even the DB will need something very, very much like a filesystem between them and the storage hardware ... which opens up the question whether this should remain hidden to every other application, or whether it makes sense that for certain kinds of applications, they too would use it (i.e. just like today)

layer8 · on Sept 3, 2022

> managing blobs on a disk is precisely what filesystems are intended for.

A filesystem is doing much more, e.g. providing naming and management (directories, symlinks, access control, extended attributes, cache management, …) for files for manipulation by humans and applications, whereas RDBMs only need fixed-sized blocks of storage.

Some databases actually support using raw disks without a normal filesystem, which can have advantages by removing the extra layer of abstraction, e.g.:

https://dev.mysql.com/doc/refman/8.0/en/innodb-system-tables...

https://docs.oracle.com/en/database/oracle/oracle-database/2...

https://www.ibm.com/docs/de/db2/9.7?topic=creation-attaching...

fuckstick · on Sept 4, 2022

> But yes, on top of that, there's the question that in the end even the DB will need something very, very much like a filesystem between them and the storage hardware

So the answer to this question is no. The “filesystem” that a relational database uses - ie how it organizes and allocates on the block layer is so different from the DOS/POSIX semantics that you wouldn’t recognize it as a filesystem - so to say it is very, very much like a filesystem is dubious.

didgetmaster · on Sept 3, 2022

I created a kind of object store (https://www.Didgets.com) that originally was designed to replace file systems. It manages the data streams (i.e. blobs) for each object very much like a file system does for each file. Although I have a few algorithms that make allocation and management of all the blocks very efficient, my testing shows almost equivalent I/O speed for reading/writing the data.

It is in the metadata management where my system excels. The table of file records for volume with over 200M files only needs 13GB read from disk and that much RAM to cache it all. Contextual metadata tags can be attached to each object and lightning fast queries executed that use them. The objects (Didgets) can be arranged in a hierarchical folder tree just like file systems use, but they don't need to be.

esjeon · on Sept 4, 2022

The page says:

> SQLite reads and writes small blobs 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread() or fwrite().

This is likely because of the hierarchical nature of the filesystem. The filesystem path lookup can be slower than DB index lookup (depends on actual implementation). I haven't tried this, but something like getdents(2) would improve the performance of the test C code here, as one can skip full path lookup.