This is sort of similar in design to the old school Unix Filesystem (UFS), moreso than Ext4 (which grew out of similar ideas and of course, still has inodes and data blocks and a superblock, but also has a lot of other things going on).
In a production Rust filesystem, I would not use libc types lid uid_t and gid_t in defining the on-disk format -- they may vary from libc to libc, but you probably want your filesystem to be portable between systems.
The use of direct, indirect, double indirect, etc. pointers to data blocks shows how antiquated some file system designs are. When files that can be fragmented regularly reach sizes in the gigabytes, using extents is the best way to track their data blocks in an efficient manner.
It's always annoyed me how I can query hundeds of millions of records in a SQL database instantly, but searching metadata from my filesystem is still comparatively slow. Are there any filesystems out there that take a more database-like approach to this (while still appearing and integrating like a traditional filesystem - I'm not after object storage, nor do I want to rely on a decoupled indexing process).
I built a system that I designed to be a file system replacement. It can efficiently manage the metadata for hundreds of millions of files without a separate indexing system that can become out of sync with the actual data. You can attach dozens of metadata tags to each file and query for every file that has a certain tag or other attribute (size, type, datetime stamp, etc). It works a lot like a database in that results are returned almost instantly.
Hi, I did not understand the arithmetic behind indirect pointers and the file size it can store.
It says "A single indirect pointer can point - (12 + 1024) * 4 KiB" that is 4MiB file.
My question is, what does 1024 represent. I understand 12 is the number of direct pointers and we are multiplying it by 4 considering a pointer takes 4 bytes of space. Would help if anyone explains it.
Might be a silly question, sorry about that. But ya, thanks for the help!
We're pointing to a block of data; that's the 4 KiB. You have 12 direct pointers — that's the "12".
The "1024" appears to be the single indirect pointers: it seems like the inode only has 1 "single indirect pointer", so that points to a block's worth of pointers. (Hence the "indirect" part.) A block in this design is 4 KiB, and a pointer in this design is 4 B, so the block of pointers will hold 4 KiB / (4 B/pointer) = 1024 pointers — that's the "1024".
Interesting article. I wonder if the FS abstraction is not too large, maybe it could be split into various layers, such as the Physical layer (block KV store), intermediate with building files from blocks and last UI layer with FS or Object Store type access. Kind of similar to what's happening in the database space with Query Engines like DataFusion.
In a production Rust filesystem, I would not use libc types lid uid_t and gid_t in defining the on-disk format -- they may vary from libc to libc, but you probably want your filesystem to be portable between systems.