Hacker News new | past | comments | ask | show | jobs | submit login
Practical File System Design: The Be File System (1999) (nobius.org)
154 points by tosh on Sept 15, 2017 | hide | past | favorite | 50 comments



I remember reading this book way back when. It is a great introduction to file system design. HOWEVER, the core thesis of BFS is basically "Let's see what happens when we make a filesystem that is more like a database." The TL;DR of the BeOS experience is that this is in fact a terrific idea, if you don't care about backwards (POSIX) compatibility.

The book is terribly dated though and I'm not sure I would recommend it to anyone except as a historical reference. It lacks much of the latest file system research that has gone into things like ZFS and Btrfs.

If you want to go the BFS-like route, then I recommend just accepting the premise and getting a book or other resource on DBMS design, getting one on modern file system design, and then merging the two yourself.


if you don't care about backwards (POSIX) compatibility.

...nor overall performance, as the benchmarks in chapter 9 show. Apparently BFS is very good at sequential accesses (but then, so is FAT... which has probably become the most popular filesystem in use, especially in embedded devices) but performs poorly for a lot of other operations.

This book also reaffirms a common adage that books with "theoretical" in their title tend to be very practical, and vice-versa... but then again, it was written at a time when a lot of people were betting on BeOS being The Next Big Thing and they never expected the current dominance of Linux and Windows.

Ultimately, it seems the "keep the filesystem simple and build further abstractions inside files" design has lead to the greatest success, because it's more flexible and allows more interoperability --- building too much "intelligence" into the FS, proprietary or otherwise, makes it harder to exchange data with other systems, whereas you can e.g. move or copy a file containing a relational database between systems and operate on it more easily. Filesystem drivers are usually implemented in the OS kernel, and so it makes sense to keep them minimal from a reliability and security perspective.


The fundamental question is: what is the filesystem for? If it's for users to store documents, you want something like BeFS, with slower performance but more features. If it's for programs to persist data, you want something minimal which you can build database libraries on top of. Trying to do both at the time pulls you toward a mediocre middle ground.

Operating systems should really have two separate systems, one for user documents and one for low-level persistence (the former probably being built on top of the latter). Programs needing plain persistent storage shouldn't have to pay for nice features at the document level, and we shouldn't have to compromise the user experience of document browsing to get high-performance persistence.


You can look at what Apple does for an example of how you compromise on a higher level than the filesystem. They offer regular file storage (in the file system), a structured object storage API (Core Data, which is really just built ontop of SQLite databases stored in the file system), and an cross-application metadata and search index (Spotlight).

Software that wants to store structured documents without thinking up their own file format can use Core Data, and still get their documents indexed properly. Software that's cross-platform or has to support existing file formats can use regular documents and supply a file metadata/content indexer to Spotlight.


They do. That's what a modern file system is. The on disk data structures and related code is usually less complex than the higher level software features. It's just that unfortunately these layers aren't often exposed for use (ZFS is the exception).

If I were writing a file system from scratch, and I have thought about this, I would make a base layer that did nothing more than provide a hardware independent interface to block devices plus B-tree and PATRICIA index implementation that includes block-level features such as RAID, snapshots, log structured updates, etc.

Then on top of that you can either write a fast DBMS for a BeOS like experience, or a POSIX compatibility layer that makes a more traditional hierarchical interface.


> Operating systems should really have two separate systems, [...]

That was the promise of exo-kernels: remove all abstractions from the kernel, leaving only what's required for secure multiplexing of resources, and then let user level libraries handle abstractions and cross-platform concerns.

That includes multiple different file system in user land.


I have been thinking about another tradeoff recently. For lots of rapid access, you want your filesystem to behave like a character device, and to be able to async dispatch all operations (including open and close). For fewer accesses to large data selections you want block devices that map to memory (in the process, you lose async options).


The RDBMS stuff about BFS seemed pretty overstated to me; it amounted to something more like arbitrary file metadata plus indexing that metadata. Frankly, the integration with Tracker for media and mail was pretty compelling: Tracker did half the work of XMMS and half the work of the mail application, so the media player just had to play files and do scrubbing and the mail app just had to let you view/edit individual messages. The search/list stuff many apps have was provided by the filesystem+Tracker.

Keep in mind, this was during the era of Microsoft FindFast. FindFast worked by continuously crawling the filesystem, so the extra cost you might be incurring on BFS on-demand was happening in batch mode on Windows at great cost. slocate still works essentially the same way and is widely used on Linux but doesn't really do metadata queries; Beagle and Spotlight do the same thing as FindFast but plug into filesystem notifications to make the reindexing less painful. I don't think it's necessarily an insane idea to make this part of the OS at a lower level, especially in an era when filesystem notifications were kind of a new idea.

As someone who lived through that era, BeOS was always a long-shot. But it was crazy responsive and looked good. I don't think BeOS failed because of technical shortcomings really. BFS was compelling compared to HFS+, FAT32 and ext2 (and it could mount at least FAT and ext2 natively); ext3 and ReiserFS came out around the same time as Be Inc dissolved so journalling was still a significant filesystem differentiator at the time. Whether BFS was relevant to it or not, BeOS had a reputation for being very good at media operations.


A few people might have been betting or at least hoping BeOS becomes popular, not so sure about 'next big thing'. Windows was already dominant when BeOS popped up.


IIRC much of the upside for alternative OS's at the time was premised on the not outrageous possibility that Microsoft would be broken up and more or less forced to port Office everywhere.


Also BeOS was wooing Apple to be the replacement for the pre-X macOS. It would have been too, if Jobs hadn't returned and decided on NeXT instead.


I think how close that was has been greatly overstated and is a part of a somewhat nostalgic BeOS mythology. BeOS wasn't anywhere near being a consumer OS Apple could build on.


It's what the Be leadership were betting on. They lost that bet and the company folded. Whether they were ever in the running or just deluding themselves I have no special insight into.

BeOS was less production ready but a better multitasking media OS than NeXT. NeXT had a longer history but did take a long time (years!) to get ready for consumer use. Which was a better call is hard to say, even in hindsight.


What took a long time was not getting it ready for consumer use. It's also worth remembering Apple had no shortage of half-finished OS's sitting about. It's trivial to say which was a better call. BeOS was an interesting prototype for its time but to treat it as somehow equivalent or comparable to what Apple got with Nextstep doesn't reflect the reality at all.


Not equivalent but a different set of trade offs.


No, no, that's exactly the point. Not equivalent and not comparable. If I want to fly a bunch of people from LA to NYC and am picking out a Boeing or Airbus passenger airliner, there's a different set of trade offs. There is no 'different set of trade offs' between an airliner and a crop duster (or a super-advanced ultralight designed and hand-assembled by Burt Rutan). Because one of those things is not a passenger airliner. It can't fly bunches of people from LA to NYC.


> They lost that bet and the company folded

Many, many years passed between the beginning of this sentence and the end. Remember BeIA? Internet appliances were the final bet that actually killed Be.


> Ultimately, it seems the "keep the filesystem simple and build further abstractions inside files" design has lead to the greatest success ...

Perhaps another example of the "end to end principle"? [1] While the original statement of the principle is over networks, the same seems to apply here .. where a file system is like a network protocol for communicating with the same system over time.

[1] https://en.wikipedia.org/wiki/End-to-end_principle


The filesystem for an AS/400 box from IBM is basically DB2.


The predecessor of the AS/400, the IBM System/38, was a really very radical architecture for 1978/1979 it had capability based addressing and a file system based on Relational Database concepts.

Some of the same leadership that ran that project (Glen Henry) was also respononible for IBM's adoption of Unix.

This IBM Unix system, AIX (1986), was designed to run on another interesting hardware platform, the earliest generation of IBM's RISC hardware PowerPC. Which in turn evolved from IBM's (1975) 801 system (John Cocke--who won the Turing Award in 1987 for invention of RISC) was responsible for the 801. I remember him coming into my office to talk to me about some ideas he had for a high capacity disk drive around 1987 when I was working on AIX.


>"It lacks much of the latest file system research that has gone into things like ZFS and Btrfs."

Could you or anyone else recommend a book or other single resource about filesystem design that does include more recent research and concepts?


Not exactly what you need but "File System Forensic Analysis by Brian Carrier" goes over a lot of the internal data structures on common filesystem. It is a bit older book so it mainly covers fat, ext2fs and ntfs.


WinFS was a similar idea https://en.wikipedia.org/wiki/WinFS


Yeah, "what if the file system was a database?" was a very common line of thought in the late '80s/early '90s -- see BFS, WinFS/Cairo, Apple's "soups" for the Newton PDA (https://en.wikipedia.org/wiki/Soup_(Apple)), Palm OS' virtual file system (https://www.netmeister.org/palm/PalmMisc/PalmMisc.html#1.7), et al.


You could just feed Oracle DB raw filesystem partitions back in the day. Don't know if that is still possible.


I'm curious about whether SQLite could do something similar.


InnoDB (MySQL) still supports it


WinFS wasn't a file-system. Its data store was in files on NTFS. Further, if my memory serves, every part of WinFS ran in user mode.


"It lacks much of the latest file system research that has gone into things like ZFS and Btrfs."

Could you recommend a recent book on file system design ?

thanks.


This is still true in the era of SSD?


Even SSDs need file systems.


Pshaw, just make it a general-purpose K/V store and that definitely will of course solve all problems no need for any kind of datastore.

/s


Since the author of BeFS (1999), Dominic Giampaolo, is the lead architect for Apple's APFS (2016), it would be more helpful if there's an article that shows the delta between BeFS and APFS.

In the intervening 17 years, Mr. Giampaolo surely learned new things that invalidated some assumptions in 1999 and/or discovered new demands on file systems that he didn't foresee.

Imo, the intellectual evolution from the vantage point of 2017 & hindsight would be much more interesting than trying to read through a 247 page pdf.



Maybe relevant to see how it affected APFS design decisions due to Be FS developers working on APFS.

I am hoping Apple releases APFS with the xnu 10.13 code drop.


A 27-minute presentation: https://systemswe.love/archive/minneapolis-2017/ivan-richwal... ; https://player.vimeo.com/video/209021697

"Metadata Indexes & Queries in the BeOS Filesystem", by Ivan Richwalski, at Systems We Love.

He explains cool features like extended attributes in files, and search queries whose results are updated in real time.



Please put [PDF] in the title.


Normally HN does that automatically if a link ends in .pdf, but there's a GET parameter here (m=1) that prevented it from working.


That HN doesn't parse out the path prior to checking if it ends in .pdf fills me with warm and fuzzies.


Out of curiosity, why is this still something people desire in a link? Can't any modern browser pretty much open a PDF as quickly easily as a web page?


Principle of least surprise. Web browsers primarily display web pages.


I personally have PDF rendering disabled in browser for security, just as I have JS disabled by default. So no, my browser cannot display a PDF at all.


To be fair- if you turn off standard features, you can't complain when you get a sub-optimal experience.


PDFs are a sub-optimal experience. Putting [PDF] in the link just saves me from wasting the time of clicking.


Mobile will download and store on the device.

Web pages are quite large anymore and most can be larger than a PDF when delivering less content, so I don't know if there is a size issue anymore.


iOS shows it inline. Android can do too; it usually starts a download instead but IMO that's the browser's fault (should default to embedded).


Is it really a matter of fault when the original question was 'who doesn't not 'why don't they'?


Personally, I feel it is simply good courtesy.


A few years ago, in college, this was a holy grail for our final assignment




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: