Hacker News new | past | comments | ask | show | jobs | submit login
GCFS: a Garbage-Collected Filesystem for Linux (madore.org)
83 points by spooneybarger on March 6, 2011 | hide | past | favorite | 30 comments



Unfortunately, the necessity to distinct between files (and lack of filesystem-level transactions) and directories led to inability to have a lot of fancy filesystems. There are tons of research papers and projects, which just cannot be implemented on GNU/Linux in non-kludgy way.

For example, there would be no need in archivers (like tar), when you could do `cp ./directory archive.tar` (OS X has something similar this with dmg HFS images). Or you could have `/etc/foo.config` as a text file, but also access separate options as `/etc/foo.config/section/option`. Or you could have versioning built-in into a filesystem. `./foo.txt` is a current version, `./foo.txt/history/` is a directory with a list of revisions, and `./foo.txt/history/2010-01-23T16:24:00Z.001` is an old version snapshot.


> There are tons of research papers and projects, which just cannot be implemented on GNU/Linux in non-kludgy way.

Do you recall any of these offhand? Thanks.


Well, I was able quickly remember and find just this: "The Box: A Replacement for Files", Francisco J. Ballesteros et al. (http://lsub.org/ls/export/2kblocks/index.html)

While I was writting the comment above I was mostly thinking of Plan 9 and GNU Hurd. Couldn't quickly remember exact names, but you should look in direction of those two OSes. They were made to address the shortcomings in UNIX design, and had some fancy concepts.


You can implement most of those ideas with FUSE filesystems. The archiver is a simple one (keep file tree, compress on unmount). Yor config file is another (albeit the wisdom of imposing a config file structure from the file system reminds me of the Windows registry.

I am using the snapshot feature in BtrFS for 6 months now. Love it.


No, I can't. FUSE - as everything relying on kernel VFS switch - forces me to explicitly distinct between files and directories.

Look at the current state of FUSE-based archiver filesystems. They use kludges like file ./archive.tar.gz but (AVFS example) directory ./archive.tar.gz#/file.txt to get around this limitation.

As for Windows Registry - it is not an absolute evil, it's just that Windows implementation of the idea of a centralized tree-like key-value configuration storage is unbelievably crappy. For example, there're (Plan 9-inspired) /proc and /sys, and they feel perfectly fine.


the snapshot idea has been implemented in DragonflyBSD's HammerFS


And ZFS.


Not quite. Hammer lets you look up the state of a file at any given timestamp.

ZFS makes you take snapshots regularly and then lets you look at the state of the file at a given snapshot.


And BtrFS. On Linux.


I remember back on my Amiga pretty much anything could be represented as a filesystem.

This lead to lots of weird things being implemented (like a reboot-persistent ram-drive), but one of the things I remember distinctly is the tar-example you mentioned (except it was lzh/lha). You would literally do cd myArchive.lha and that was it. This was a legitimate file-system path where you could do whatever you wanted. Operations may have been slow on bigger archives, but you could treat it as just another directory. It was very neat.

I've always missed having that feature in sub-sequent OSes I've used and I'm very disappointed it's been 10 years and seemingly the best we have these days is Windows Explorer's Zip-support.


The Xbox Media Center has supported transparent access to multiple archival formats with a variety of compressions (including transparently folding multi-file archives into one file) for years. It's an illusion implemented in user space and not a part of the operating system itself, but in practice there's no difference to the user.


GNU/Linux systems do this too with GnomeVFS/GVFS and KIO. Unfortunately, they are only usable if all applications you use that are aware of them.

This is why filesystems should be connected to some central place (in GNU/Linux that's the kernel VFS switch).


The latest GNOME doesn't have this limitation because it uses FUSE. Any application, e.g. vim can use those files.


Except for a problem, that you have to mount (by accessing) GVFS-filesystem first. If you're using Nautilus to navigate to the file and start vim — it's fine, but it won't work for any non-GIO file browsing (bash, mc, emacs-dired, gmplayer, ...)


Not that it makes a lot of difference, but this page is from 2000.


It would also have been quite impractical before the development of affordable solid state drives - GCs tend to involve quite a lot of random access, after all...

That said, I'm far from convinced allowing cycles in a directory structure would actually be useful for anything.


Yeah, I was wondering if it would allow applications to create "detached" filesystems. I could see that being (sorta) useful: a filesystem which is automatically cleaned up when the application exits.

Of course you can already do that easily enough with /tmp, but /tmp has its own problems: if it is shared between all users then it is a well-known source of security problems, and if your OS has private /tmp then that has its own problems too (ie. it's not shared between users of the same application). The other problem with /tmp is that it isn't "garbage collected" very quickly -- on my Fedora server, unreferenced /tmp files stay around for up to 10 days.

With GCFS it looks like you could get rid of /tmp altogether. Applications could just create a directory anywhere (eg. some random name under $HOME) and then "detach" it by removing "..", and then keep it open for as long as they need it, after which time it gets GC'd quickly and automatically.


The other problem with /tmp is that it isn't "garbage collected" very quickly -- on my Fedora server, unreferenced /tmp files stay around for up to 10 days.

Technically, apps using temp files are supposed to unlink them after opening them, so that they get cleaned up as soon as they are closed. This is sometimes true of things that are shared, like UNIX sockets, too.


Are you talking about Fedora's seunshare? That's not without its issues: http://vigilance.fr/vulnerability/Fedora-RHEL-file-access-vi....


To avoid race conditions you would need a way to create a pre-detached directory.


Yes, this was my first thought. Graphs are more generic than trees, but sometimes the complexity cost doesn't justify the opportunity cost. The main byproducts would be a) easier to crash your system with an infinite directory loop and b) Making the term 'up a directory' more wishy-washy.


This is an old article. Mount --bind solves the directory hard-linking problem for root, and there is no compelling reason to allow users to hardlink directories.

Also, making a file-system not a tree adds a lot of complexity for questionable gains.


The Posix-Breaking Party that the author describes could be implemented by using any COW file system for the backing store. For example, the new SCSI Unmap support in ZFS: http://gdamore.blogspot.com/2011/03/comstar-and-scsi-unmap.h...


Does anybody know how they implemented directory hardlinks in Mac OS (they use them for time maschine I think)?


I just tried to do a directory hard link and it failed.

  $ ln ~ hardlinkdir
  ln: /Users/travis: Is a directory
  $ ln -s ~ hardlinkdir
  $
My guess is that they have implemented something like soft links with cleaner shell support, so you can't tell they are something like soft links.


The ln command just doesn't expose the functionality.

http://stackoverflow.com/questions/80875/what-is-the-bash-co...


hmm I guess you can do too much damage with directory hardlinks. Like link the root into your home directory. A recursive delete on the home directory would then delete all files...


I wonder what happened to it...


What exactly would be the advantages of this for a user? (I can think of lots disadvantages.)

My suspicion is anyone deleting a directory either wants the information within it destroyed or doesn't want to delete the directory after all.

Certainly, more "fuzzy" relations and storage could have their uses - like a downloaded music directory where the least listened songs go away automatically when you need more space. But you would not want to make your entire file system "fuzzy" in the sense that you don't know what's gone and what isn't.

The trashcan/recycle-bin is a great interface for undoing what you thought was the deletion of a file. This, not so much.


Then it seems you haven't used symlinks or hardlinks very much. I do it to organize my code, because it's useful to have more than one way to get to folder X (I have things organized by language, then they all go to a master code folder which reorganizes them in versioned/not, mine/downloaded, etc). It makes finding what I'm looking for very easy, it's easy to script changes / restrict searches / update everything in one go.

But using symlinks means I can't reorganize without great cost, because they don't follow movements around. It also means some of my folders look like shortcuts while some don't, stepping into one causes my path to change wildly (no backing up, usually), and in order to protect my data I have to make sure I never delete (or move) the originals, only the symlinks.

Or say you want to organize your images in two photo applications at the same time - hardlinks mean each can have full control over where they place their file, without interrupting the other, and without duplicating what could be an enormous amount of data.

OSX attempts to solve this with Aliases - essentially, "smart" symlinks that follow changes. If you alias a file and then move the original, the alias will still work. The downside is that essentially no command-line tools handle them in any way, because they're decidedly non-standard. Being able to hardlink folders would allow filesystem organizations not possible before, with safety, tool-compatibility, and efficiency.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: