Hacker News new | past | comments | ask | show | jobs | submit login
Composefs: Content-Addressable Overlay Filesystem for Linux (github.com/containers)
80 points by ignoramous on Jan 25, 2023 | hide | past | favorite | 23 comments



There's a ton of good technics to dive into, but in my mind it's almost all to address one leading sore point of containers:

With composefs, if two containers have the same file- wherever on the filesystem it is- that file will only be stored once on the host, and (equally if not more importantly) it will be shared in the page-cache (the cache for file contents). Currently, in most systems, different images end up having replicas of the same file that aren't shared on disk or in the page-cache.

If, for example, your org has a handful of base images it works from, it could drastically reduce the footprint of containers, both on disk and in memory. By effectively sharing the things that can be shared.


Combining this with IPFS could be pretty interesting


could you elaborate


I've read the project's description and still failed to understand what it does and what it is useful for.


The main advantage is the content addressable part. Existing overlay filesystems only handle the overlay aspect, and then tools like docker/containerd attempt to reuse layers efficiently, but it's not perfect. The same files from two different layers may have the same content, but it's still stored twice because the layers are the "unit" of storage roughly speaking. By making a single filesystem which handles both content addressability and the overlay aspect, you can avoid duplicating files that are the same, but in different layers.


Another way of describing it, rather than a Docker/Container image being a group of layered archives, each with changes, instead a list of file hashes is distributed, detailing where those files need to be in a mounted filesystem, with x permissions.

Since everything is named based on hashes, content is naturally deduplicated if two images share the same files and all files are stored in the same place.

If you boot on top of a composed filesystem, you also get easy file verification as long as the booted list is signed and unmodified. If you modify the local files, the hashes won't match.


Imagine a workflow:

- clone a repo

- run a command

- run another command

It runs several times daily. Maybe it's CI or something.

Now suppose you want to cache the filesystem state for each command so that they can be rerun in a debug scenario where you'd expect them to behave the same as they did the first time because they have the same filesystem. (Having then recreated the bug, you could then start making changes towards a fix).

You either end up with many many copies of that repo, or you use something like this to only store the unique files and instead have many many indices into that store.


Git has 2 object stores: the loose object store and the packed object store.

What you said is applicable to the loose object store where a full copy of the file is stored as plain text. Those could be deduplicated quite nicely, and git does just that, loose object store is a CAS.

However the packed object store is trickier. It stores duplicated object in a delta compressed format plus gzipped. So deduplicating packfiles on a file system level is almost never worth it.

Git is moving toward using packed object store more and more. With some of the latest patches, you can effectively use git with very little loose object storage ultilization (zero if you are on a server hosting git repositories).


Hmm that's good to know, thanks.

It still just applies to files that are handled by git though. If your workflow applies a patch and then invokes a compiler which generates intermediate files, both the post-patch file and the intermediate files will not end up in the packed object store. So if you're taking filesystem snapshots of those states you'll still want to deduplicate them some other way.


By chance had been reading this blog on the same topic just the other day:

https://blogs.gnome.org/alexl/2022/06/02/using-composefs-in-...


I was shortly chasing that mythical composef whose plural the composefs might be, to no avail.


Don't forget about the singular zf and btrf.


Those couldn't be pronounced like single words. I'd pronounce composef like compose-eff.


Very promising, but given that they are still fixing binary search bounds checks, probably needs time to stabilize https://github.com/containers/composefs/commit/64640fa0fe256...


Oh, interesting! I wonder if this could be used for an alternative implementation of Nix/Guix's store/profiles. It seems conceptually very similar, but implemented as a filesystem rather than a big bundle of symlinks.


Nix at least is not content-addressable; it's derivation-contents addressable. Under ideal circumstances the same derivation will result in the same contents, but it's not a guarantee.


Content-addressed Nix currently in testing, see: https://discourse.nixos.org/t/content-addressed-nix-call-for...


I've often imagined a system that tries to build consensus around which (content-addressed) code snippets can be treated as pure functions with (content-addressed) memorized outputs and which ones need to be rerun.

If you're on a well-worn-path you could operate mostly by lookup and only run the code if something doesn't smell right.


Unison (lang) ?


Indeed. I'm watching that one closely, although I haven't made time to do much coding in it.

Even if it turns out to be perfect in every way though, it takes a long time for the masses to adopt new languages. I think it might be worth finding ways to build consensus around claims like:

> this particular bit of python can be treated like a pure function

...even though there aren't any guarantees built into the language.


I guess it is a solution to Yocto build directories being huge while having lots and lots of duplicates. I thought about a filesystem layer that would do dedup like this and this seems to be a solution for the problem.


Or dwarfs.


fun consideration:

any file system can act as a key-value store

any key-value store can act as a content addressable data store

through with limitations like fundamental constraints not being enforced, performance problems and potential unexpected problems (like the number of files per folder)

for above reason and given that content addressable storage is a poster child for fast(1) distributed reliable solutions (1: At least in "single entity owned backend database case") I would never recommend using the file system for it outside of prototyping (and system internal use cases like docker storage).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: