Hacker News new | past | comments | ask | show | jobs | submit login
DwarFS: A fast high compression read-only file system (github.com/mhx)
166 points by metadat on July 24, 2022 | hide | past | favorite | 64 comments



Favorite part: “I had several hundred different versions of Perl that were taking up something around 30 gigabytes of disk space”. That’s a lot of Perl.


I have this same kind of thing, for running https://perl.bot/ and related services. I'm using BTRFS however (due to a long story and actual reasons) and use it to dedupe, compress and discard a file backed filesystem for them. I'm at a logical store of about 46GB with an on disk size of 25GB. Most of that is the fact that I have several hundred (550ish last i counted) libraries and modules installed into each install of perl. This means that they dedupe incredibly well since they're basically perfect copies, and most of it is text based so it compresses well too.

I've looked at DwarFS before for this same use case but the fact that it's read only makes it more difficult to handle since I'd have to have an uncompressed version sitting out elsewhere. Though I've now got the hardware to actually put that all into a CI/CD that generates the image. I might actually work on that once I get my turing pi 2 boards since I want to port this whole setup to ARM as well as x86_64. I might also put it on my RISC-V board too but it's too slow so I think it won't be as useful.

EDIT: fixed the tbd size, it was taking a while to calculate.


Can you have OverlayFS on top of it to make it writable and periodically generate new base DwarFS from that overlay?


People do something similar with e.g. Raspberry Pi machines with a microSD card; they make an overlayfs setup where the OS logs et. al. get written to memory and are only being synced to the main microSD card once an hour or so -- to avoid amortizing it too quickly.

So I'd imagine making something like that but with DwarFS below would be quite easy although it'd require you to set it up by hand. Still, once done it'll likely be a rock-solid setup for a long time.


I'd imagine so, that might be a good strategy for doing it and you could measure the newly setup stuff fairly easily.


It'd be very cool to have something like virtualenv or Anaconda that uses DwarFS. Python environments take up so much of my harddrive.


"DwarFS compression is an order of magnitude better than SquashFS compression, it's 6 times faster to build the file system, it's typically faster to access files on DwarFS and it uses less CPU resources."

Credit and thanks to coldblues for alerting me about this!

https://news.ycombinator.com/context?id=32212870


"DwarFS compression is an order of magnitude better than SquashFS compression, it's 6 times faster" : I suspect this is on a highly specific test-case and no generalities can be made...

N.B. there is also EroFS https://www.kernel.org/doc/html/latest/filesystems/erofs.htm...


I thought I was the one mentioning it?

https://news.ycombinator.com/item?id=32211651


You did mention it, although with less context and somehow it didn't stick in my brain. Regardless, it wasn't intentional to not give you credit, and I am happy to give you all the credit sir, madame, or they!


Well, don't get me wrong -- not that I care that much but I found it impossible to find the other reference to it that you mentioned so I was wondering if you (or me) made a mistake. Thanks for clarifying!


Tangentially related - what's a good option for a cross-platform (portable to all platforms with a filesystem) read-only virtual file-system today like e.g. quake's pk3 file format ? e.g. let's say I want to access a few ten of thousand small files fast, much faster than what e.g. NTFS allows since I know that I'll likely have to read more-or-less all the files and I can mmap the whole thing, what are my options? My prime concern is having an api such as

    handle = vfs_fopen("/my/file1.txt")
    pointer_to_the_file_bytes = vfs_map(handle, <start offset>)
which would be as fast as possible. compression, encryption aren't needed.


ZIP is the closest to an "industry standard" portable filesystem. It's directly comparable to Quake's PK3 format because that's all that PK3 was, a ZIP with a custom file extension.

It's also what "powers" a wide range of portable filesystem in a single file tools such as DOCX and ODT and quite a few other modern Office and Office-adjacent file formats.


Would a RAM disk fit the bill? Just read all the contents from the copy in non-volatile storage at boot. Cross-platform then by virtue of using what-ever RAM-based filesystem or block-device options are commonly available on the target OS.

For “as fast as possible” you'll need to experiment and benchmark with your workload. Which filesystem is optimal may depend on how you are laying out the data and where the latency/throughput sensitivities are in your use case and the given filesystems.

> My prime concern is having an api such as...

Having a different API other than it looking like a filesystem would make cross-platform more of a concern, as you then have a data access library not a general filesystem. It will likely to be necessary for best performance though: any filesystem is going to have significant overheads (orders of magnitude) compared to being able to map chunks of the data directly into your process' address space.

If abandoning a generic filesystem, perhaps something like sqlite with an in-memory table/db (https://www.sqlite.org/inmemorydb.html)? Again like the ramdisk option just load up the content from permanent storage on first use.


> Would a RAM disk fit the bill?

A normal unprivileged app cannot create RAM disks easily on any OS as far as I know ; also it wouldn't really work on e.g. WASM

> as you then have a data access library not a general filesystem.

That's fine for me - although I don't see a particular difference between either, a filesystem is just a system to access files, whatever that means


> although I don't see a particular difference

The distinction I'd make us that a filesystem provides a common generic API that practically all processes on the OS understand and share access to. Pretty much always implemented out-of-process (in the kernel or another userland process via kernel stubs/hooks like FUSE).

A data access library is usually much more specific to a particular data set or set of applications, and likely doesn't follow the filesystem abstraction (at least not in the same way).


SQLite?

There's also a vfs module for it that imitates a filesystem on top of a single SQLite DB.


ended up biting the bullet and started https://github.com/celtera/uvfs


Maybe this is a really dumb question but how does one use a read-only filesystem? Can you mount it as writable temporarily or something?

Or is it that you create a compressed 'file' that you can mount as a file system? Like a zip file kinda I guess?


You mount it as a read only file system to a mount point. It’s like putting in a CD (remember those?) into a drive.

The source data can be a raw block device or more likely a local file. It doesn’t matter as either way the kernel is just reading blocks of bytes.


> remember those?

Not really tbh. I haven't used a CD in my adult life. I remember "burning a CD" was a thing and that's about it.


How could you make us all feel so old. You monster.


I was born in the early 2000s and I've used CDs a lot.


As an adult? For what?

I was born in 1991 and I haven't used one since I was in 9th grade and made a girl a 'mix tape'.


we won't ask about 3.5 drives, funny thing is they have their special uses too.


Or 5 1/4” for that matter! I remember my cousin’s old PC used those. Very flimsy but seemed to work remarkably well!


IMHO they wasn't flimsy just a bit floppy


Another option is to use a read-only filesystem with overlayfs to provide a writeable filesystem (like JFFS2) that also has a solid static FS underneath in case of error. OpenWrt does, or at least did, this.


You got a few good “how” answers but as to “why”, it’s common for containers to use an overlay file system to handle writes. So in a container situation, you’ve already paid the overlay tax. Adding compression is a smaller incremental cost.


Just like a read-only file or service. There's some kind of a construction step, and thereafter it's read-only. One might do that to make it explicit that updates are expensive, to grant read-only privileges to a less trusted process, or whatever.

Somebody else can chime in with the exact mechanism by which this one is written, but common solutions include being writable sometimes or having a program to build the filesystem from known data. That might be filesystem-as-a-file, filesystem on a separate partition, or what have you.


Right I want to know what the construction step is in this case, but that makes sense that there's multiple approaches.


Just like creating any sort of archive. You (or a program of sorts) create the fs structure in some directory, then invoke something like `mkdwarfs -i /path/to/that/directory -o /path/to/output/file.dfs`


And squashfs works the same way - mksquashfs takes a directory as input and writes a file as output. That file can then be loopback-mounted to present the readonly filesystem.


Got it, yeah so that tracks then, thanks.


Slightly OT.

What resources would people here suggest for learning about file systems? I see a lot of new file systems like zfs, btrfs, etc. I looked up for resources but couldn't find anything substantial.

I want to learn how they work so that I can appreciate projects like this and compare them.

I looked into the Build Your Own X repo but didn't find anything. I found a book called Practical File System Design: The BeOS file system but it's apparently dated, and I'm not sure I want too much of a deep dive.


The BeFS book is seriously good. I would start there.


I remember learning about filesystem design from the classic "demon" book (The Design and Implementation of the 4.4 BSD Operating System by Marshall Kirk McKusick) which has a lot of details about the original Fast File System.


is zfs still considered new?


A few years ago, I designed an incredibly fast write-only file system, but for some reason couldn't leverage it into a commercial product that anyone was willing to buy.


http://www.supersimplestorageservice.com/

> The Super Simple Storage Service (S4) is a new innovation in cloud storage. Our advanced write-only storage provides the highest security, lowest cost, and simplest management available.



I didn’t dig into the details, but unless DwarfFS is a joke, I assume they mean the system supports create, read and delete operations.


There's plenty of read-only media out there. I can see this being useful for container deployment -- you've got a database somewhere that you have read/write access to, but the deployed code doesn't need to be updated in the container. A readonly filesystem is a great way to enforce principle of least privilege.


I'd guess you build a fs image from an existing filesystem, so you'd only have read operations.


It looks good for NixOS (or Guix) store.


> Clustering of files by similarity using a similarity hash function.

Does that mean that I can store all those 100s of photos taken at slightly different angles, in an efficient way?


That depends on whether they are bytewise similar, which in turn depends on the details of the image encoding of the format you’re using. I would expect that, by default, no.


That does lead to the question whether, with enough files, storing a lot of similar photos in uncompressed png is more space efficient in DwarFS than it is storing them in jpg.


How does this compare to WoF compression in NTFS (which uses LZX)?


Sounds like the main benefit is hash-based dedup. You can do this with ZFS and borgbackup - they'd likely be a better comparison - but most other compressed filesystems won't do it.

Most compressed filesystems will fail to dedup stuff because for various performance related reasons the compression window is usually quite small maybe up to 128kb and the files are often not sorted in the archive such that small files would manage to compress anyway in that window.

EDIT: Apparently squashfs does include file de-dupe however DwarFS is fixing what I basically said above. It sorts files in order of similarity so that you can compress accross file boundaries as well as just removing complete duplicates. That's pretty cool.

"Clustering of files by similarity using a similarity hash function. This makes it easier to exploit the redundancy across file boundaries."

This kindof thing is more difficult to do on a writable filesystem though not impossible.


Seems like this a pretty nice read-only filesystem.

Shame that it can't be used much commercially in many cases due to GPLv3 license. This could improve so many embedded systems currently using SquashFS.


Well, if you want to build a commercial product using dwarfs, you could always (shock, horror) contact the dev and *pay* him to license it to you under a bsd-like license (or whatever).

(the dwarfs dev would have to work it out amongst all who contributed to his repo)


Who said I'd want (or need) to use it in a product?

What I'm saying is that due to the license fewer companies want to use and thus we're getting a bit worse products as a consequence.

Of course I completely agree the developer should get licensing income (heck, that's in my self interest as well!), but what's going to really happen is that the cheapo companies just make do with less and use SquashFS or whatever instead.

It's a tragedy of commons there's so little will to donate for or crowdsource open source projects and that they're taken for granted.


> It's a tragedy of commons there's so little will to donate for or crowdsource open source projects and that they're taken for granted.

Which is not made better by your sentiment? I mean GPL at least forces you to either try negotiating a different license with the author (for money ideally), or suck it and use something inferior. Using a permissive license that allows multi-billion dollar companies to use your code for free certainly doesn't help changing the mind set of "open source is other people working for me for free".


"I mean GPL at least forces you to either try negotiating a different license with the author (for money ideally)"

Sounds great in theory, but even for a popular open source library developer this happens so rarely that it rarely pays the bills regardless of license.

"Using a permissive license that allows multi-billion dollar companies"

Most embedded devices by far are not developed by multi-billion dollar companies. Their legal departments are also actively steering away from GPL licenses.

"try negotiating a different license"

In my experience, by far the most companies don't want to negotiate anything. If there's a product with a set price, yes, then it might be purchased. They typically also want some kind of product support with it.

"use your code for free certainly doesn't help changing the mind set of "open source is other people working for me for free"."

How to solve this? Ideally the library developer would get paid AND the consumers get more value for their money. This would encourage more developers to write useful libraries that provide great value in the big picture, but are uneconomical or otherwise too much trouble to deal with on an individual company or product level.

As it is, everyone seems to lose.


This is all true, but how do you think a more permissive license would improve things for open source authors? Maybe there is a miscommunication here, but how is a company more likely to give you money if you release under MIT oder BSD?


By "can't be used" do you mean by unjustified whiners, or is there a real problem?

If you're including this existing filesystem you shouldn't have any relevant patents to worry about, and the clause about letting the user write their own firmware shouldn't be an issue for 99% of products.

Is there anything else I missed in the differences between v2 and v3?


The clause about letting the user write their own firmware is a huge issue for 99% of products.

Use some vendor blob -> problem. Have some NDA for registers used in some hardware -> problem. Hardware does not have user firmware flashing -> problem. Legal compliance requirements for hardware bearing your name -> problem.


> Use some vendor blob -> problem. Have some NDA for registers used in some hardware -> problem.

That's the same as GPLv2. If you don't mix the filesystem into your proprietary code, you don't have to reveal any of those things.

> Hardware does not have user firmware flashing -> problem.

GPLv3 only requires you to let the user have the same flashing ability you have. If it can't be flashed, you don't have to do anything.

> Legal compliance requirements for hardware bearing your name -> problem.

That's a 1% of products situation.


it's not in standard linux kernel(unlike squashfs) plus it is less field tested comparing to squashfs, more importantly I too avoid GPLv3 in any commercial product. Anything GPLv3 will be off my check-out list for anything immediately because I honestly don't know what GPLv3 really means to me. I use GPLv2 cautiously wherever it fits as I feel at least I know what it implies.


Doesn't it "just" say that on top of providing the sources of the binaries you distribute, you must provide/document a way to update them?


I think the author overlooked that a read-only file system is of absolutely no practical use. If you can't write to it, then there will never be any data in it to be read.


There are plenty of data archival projects that can benefit from DwarFS. Stuff like e.g. old magazines or ROM collections or scanned books/comics that don't have copyright attached etc., don't need modification and if the collection is big enough then the deduplication can reduce the final size of the archive by a lot.


Read-only means you can't change it _after_ you initially put all your files in it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: