We should think more *out of the box*. BeOS BFS, (ReiserFS4) and early versions ...

sdrinf · on June 11, 2014

| We should think more out of the box. BeOS BFS, (ReiserFS4) and early versions of NTFS supported extensive object oriented metadata and search capabilities ("Cairo") - something like WinFS but directly in the filesystem driver. Data could be organized like in relational databases today, you could find your data in various ways. Not just in the old and proven way of hierarchical directory tree.

The second you open that up to application layers, you are facing a wide range of problems. Specifically:

* Common schema: each application considers itself a unique snowflake, and might want to use different columns to mean different sort of things. We've been here before -semantic web, RDF, etc- and the correct solution was letting the apps just manage their own database of stuff.

* Selfish apps: There is nothing stopping apps from overwriting, or manipulating meta-info in ways that are detrimental to user experience. Consider the current case of "Set <x> as default browser", "Set <x> as default media player" at each launch, then multiply it across every schema column

* Interop: Beos attempted to solve this by dumping meta-info into the archives. The problem is, when a file gets outside the original system/OS, these attributes will get truncated. Any workaround on this would require full agreement on changing all of the file transfer protocols, and storage methods on all OS.

Also note, that the 90% of the case (localizing files instantly on multi-terabyte consumer-drives) is handled already via Windows Search & Finder. Personal experience on this shows, that you can safely drop the hierarchical madness in favor of filename-based search for most media/music/document cases; and there are ways you can apply this for development as well.

frik · on June 11, 2014

> the correct solution was letting the apps just manage their own database of stuff.

Applications can do this since forever. But there is no correct way per se. The question is should application sit on a lot of data for themselves? (walled garden, vendor lock-in, no interop) Example: think of music ratings in iTunes - it's all lost/inaccessible if you decide to additionally use a non-Apple software.

I would argue metadata in user mode applications is already a solved problem [1] - most applications adhere to common metadata format standards [1] and a few outliers [2].

WinFS, NEPOMUK and semantic web failed or haven't gained traction.

A practical common schema is being developed for search engines on schema.org by Bing, Google, Yahoo!, Yandex & co: https://schema.org/docs/full.html

> Selfish apps

The metadata access would part of the operating system API. If a software intentionally renames filenames or moving files to different directories (for no good reason), it's a virus/worms.

> Interop

Most common file formats support metadata anyway, just keep them up-to-date. And Adobe created the XMP sidecar format especially for this use-case: http://en.wikipedia.org/wiki/Extensible_Metadata_Platform , http://en.wikipedia.org/wiki/Sidecar_file

[1] mp3 ID2, jpg ITPC/EXIV/XMP, office formats, pdf, epub, etc.

[2] Windows Explorer, Windows Media Player, Windows Photo Gallery, foobar2000, Winamp, Photoshop, Acrobat, etc. (and Linux applications as well) usually read/write file metadata for common formats just fine.

[3] iTunes, iPhoto, Aperture and Photoshop Lightroom store their metadata in a per app SQLite database.

optimiz3 · on June 11, 2014

This ignores the massive body of research and evidence that suggests putting a database in a file system is a Bad Idea.

File systems deal with organizing unstructured data (i.e. blocks of bytes); databases deal with organizing structured data (i.e. typed records).

Efficiency and scalability come from decoupling the FS and the DB and letting them specialize.

Examples: GFS + Bigtable, Azure extent/partition manager+ Table store, Amazon's various storage elements.

Pushing the DB into the filesystem doesn't really buy you anything - you still have to solve the unstructured page management/allocation problem.

Counter-examples: WinFS, Cairo, Windows Registry (which I'd argue was a large failure).

It's an idea that sounds good on paper, but fails on the theoretical (unstructured vs structured) and practical aspects (distributing structured data is MUCH harder than distributing unstructured data).

frik · on June 11, 2014

I would argue that Nepumok, Cairo and WinFS failed because of the project management failed to meet the milestones, not because a filesystem with an index and query interface is a bad idea. Please point me to research documents.

The Cairo project documents never specified the query-language and UI part of Cairo. And this was basically what never got implemented, eveything else made it. WinFS was doomed to fail because it run in user-mode in dotNet (in Longhorn era PCs were slower), instead of adding the query part to the NTFS driver in kernel mode. The Shell integration with only UNC path and dotNet only API was bad. And the object oriented metadata scheme was way to complecated, especially if used on an SQL database

WinFS beta1 worked okay, it was just very slow (dotNet services + SQL server in background, stored in hidden directory on NTFS). WinFS never made it because it was way behind the shedule and too slow.

NTFS and similar modern file systems are modular enough that would make it possible to add the missing feature a query interface directly into the kernel driver. Operating system of course would need to expose the API too so that C functions like fwrite() and WinAPI WriteFile(), etc. could be used to access files using the file directory tree as well as using a query language (e.g. Windows search exposes in Explorer address bar)

optimiz3 · on June 11, 2014

The problem with Database-as-a-File System is people are really asking for The-One-Unified schema. The problem with The-One-Unified schema is it anticipates all future requirements which is impossible.

At best you can add a few more structured primitives, but that's not much better than SQL-lite or whatever you prefer running on top of a block store, since you don't know or really care about the domain of every application.

frik · on June 11, 2014

Microsoft SharePoint does everything of WinFS, it acts as WinFS like file-server for office documents. It comes with default schema (columns) and the administrator can add specific company relevant metadata fields. You can group, filter search, create custom views based on metadata. It all works great, but in the end of the day it's just a website and managing more than one file at a time is cumbersome (it's a website not Explorer/shell) and even if one can open directories in Explorer using inbuilt WebDAV protocol the WebDAV integration is ok but it is as featureless as the zip-file support in Windows shell (no rightclick menu entries, no new file, etc.).

With a native OS integration other applications could take advantage of the new possible features.

optimiz3 · on June 11, 2014

Also - Cairo/WinFS failed because their requirements were not achievable, and there was no real market demand for them. Just like natural language input for the general market (no one outside of very specific niches actually wants to interact with a computer using voice).

What value would a "filesytem" that understands contacts add to a web-server or load balancer? All it does is couple application-specific domains and complexity into layers of the system where they don't belong.

No one is debating that it would be nice for all computers to have a unified understanding of what a document and what a contact is, but those are orthogonal requirements to what a filesystem needs to do.

cma · on June 11, 2014

Filesystems have tracked date and time metadata for a long time. Once Apple further structured this with the file event store, excellent applications like Time Machine and improved Spotlight emerged.

VLM · on June 11, 2014

There are two aspects not discussed so far.

The first is labor. I'm cool with "mysql as my filesystem" but most people have no DBA-foo and will be horribly lost. Who gets paid more, a "filesystem-oriented administrator" aka generic sysadmin or a "database-oriented administrator" aka DBA? Its going to be much harder to use, not easier.

The second is "trends and fads" in persistent data. The calls for nosql as the universal cure for all ills have quieted a little. Fundamentally 99.9% of the time I just want something to quickly persist a binary blob, say a video file of a movie. I don't want to reimplement git in my filesystem because if I did I'd use git, I don't want a DB as my filesystem because if I did I'd use a DB. Ditto spreadsheets or VRML (remember that?) files. So the "nosql" analogy of a database driven filesystem is all the cool kids would "upgrade" to ext3 for performance reasons anyway and then come on HN to lecture everyone about how ext3 is the only way to solve all storage problems instead of the old fashioned and obsolete database-filesystem being proposed.

frik · on June 11, 2014

The query language can work like Google and Windows search (mind the optional advanced query syntax). It's all about adding an advantaged query system to the filesystem driver, the files in the filesystem would be stored in the same was as today.

The benefit: users and application developers could access files in various ways. (A directory tree is so limited and outdated. See my other comment about SharePoint what is already possible and successful, just at an higher level - intranet website)