Why Files Exist

ChuckMcM · on June 29, 2012

If you want to get existential files don't exist. What exists is a way to name a non-volatile data set. Given the name, and a non-volatile memory unit, and an algorithm for translating between that name and a memory unit specific representation of its internal structure, you can retrieve the data set. If the name is sufficiently portable you could in theory hand it to another program/process/thread and that thread could translate to the actual data on the memory unit.

It sounds all horribly abstract but it is the actual reason file names, file systems, and file system APIs exist. Then there is a whole different set of semantic interpretation of the contents of files. Whether it is a simple stream of UTF8 encoded code points, or an ELF file which can describe executable code that is ready to be loaded into memory and executed.

The OP decrys the lack of an interchange format which is simply a convention by which two programs can both interpret the contents of of non-volatile memory which they have both accessed using a unique name. And that mostly because of iOS devices and the applications which have eschewed the idea of putting the names of their non-volatile data sets into a globally accessible namespace.

quesera · on June 29, 2012

That's exactly right. Files don't exist. The "filesystem" just a big fat KV store.

Somewhere along the way someone decided it'd be a useful abstraction to imagine a hierarchical organization system (directories) on top, so that was glommed on, but it's not real either.

It was never a perfect system, but it worked well enough when most users were fairly technical and had a humanly comprehensible set of labelled data streams.

I appreciate the effort to insulate users from the complexity that has grown up underneath them. The simple fact is that most people fail at large scale taxonomy and organization. It's hard. And it's a lot of work to maintain even if you're good at it. See: library science. So I don't think there is another model that will succeed as well as "files" have.

iOS hides the filesystem, but it's still there obviously. So far all we've seen is insulation for those who need it, as a byproduct of huge control loss for everyone. The other (valuable) byproduct is security.

We haven't found the compromise yet. There might not be one.

Periodic · on June 29, 2012

It's a little more than a key-value store. The keys are impure.

One of the first issues is that the disk likes to work in terms of blocks, which tend to lend themselves to arrays, preferably of fixed size so that related data is contiguous. This leads to a limit on the number of files in a directory and so nesting directories is one of the easiest ways to contain more files.

But it's also a little more than that. Directories can group semantically related files together. This means their meta-data is in the same directory inode and so can be read in as a group. This creates more efficiency. Chances are that you'll often access many related files at once, even if it is just to list a directory, so it helps the file system to have some structure so that the meta-data of related files is all read in at once. It's an optimization that takes advantage of our own semantic information to structure the data.

This arises from disks being a really crappy way to access data. They are slow and work best with sequential reads of large amounts of data. It really isn't that useful for a persistent key-value store whose meta-data alone may be much larger than the amount of available memory.

But I tend to think of it as a KV store most of the time anyway and often wonder why we have the silly idea of directories.

politician · on June 29, 2012

> I ... often wonder why we have the silly idea of directories.

We want to associate our keys with namespaces.

loup-vaillant · on June 30, 2012

Yes, but often not just one. Now we can use links (mainly symbolic), but this can be a mess.

Tags may be a better way for an end user to sort out the data. You don't have to lose the hierarchical structures that directories give us. But it may be simpler to think of a file as some atomic stuff that just lies at the root of some disk, and belongs to a number of (sub)categories.

nathan_long · on June 30, 2012

Sounds good, but what uniquely identifies a file? Right now it's path + name.

If I have two files with the same name, one tagged A and the other tagged A and B, are they the same file or not? What if I add a tag of B to the first one?

A directory hierarchy makes this unambiguous.

loup-vaillant · on June 30, 2012

I think we should use several mechanisms at once to identify files.

Tags. The default mechanism for sorting and searching files. The assumption is, most files are passive data. When sharing a file, its tags should be sent along with it, so the receiving system can propose them by default to its user. Note that one may want to categorize tags themselves (meta-tags?). I'm not sure, but it may be necessary if a given system use many tags.

Descriptive names. This is the user-facing name of the file. No need for it to be unique. Like tags, a file's descriptive name should be sent along with it.

Locations. It may be of import to know where a given file is physically located. It is cool to transparently access more files when you plug you thumb drive in. It is less cool to forget to actually copy those you need.

Unique keys. Devised by the system and not directly accessible by the user. When a search yields several files with the same descriptive name, or when two files share tags and name and location, the system can be explicit about the ambiguity.

Unique names. Devised by the user. The system checks uniqueness (or, more likely, uniqueness by location). Follow a directory structure convention. Discouraged by default. Their primary usefulness would probably be for system and configuration files, which need to be accessed automatically and unambiguously by programs. May be implemented on top of descriptive names (the system could treat descriptive names that begin with "/" as unique, and enforce that uniqueness).

There. End users would primarily use tags, descriptive names, and locations. With the right default, users may actually be able to avoid making a mess of their data. To prevent unwanted access to sensitive system files, the system can by default exclude those files from search results. Typically those both tagged "system", and located on the main drive. Unique names would be for programs, power users, and those who want to fiddle with their system settings (either directly or through a friendly interface). Unique keys belong to the kernel.

So, how does that sounds?

nathan_long · on July 2, 2012

Hard to say. It may be brilliant, and may be the Future Of Files, for all I know.

My first reaction, though, is that it sounds a bit confusing to me, and very confusing for novice users.

Right now, Mom understands that "C:\My Documents\bird.jpg" is not the same as "C:\My Documents\My Pictures\bird.jpg". The rule is simple: unique names per folder.

What's the new rule?

loup-vaillant · on July 2, 2012

This is kind of a paradigmatic change. Right now, the default when dealing with files is to point to them. What I envisioned in the grandparent was to make search the default. Tags, descriptive names, and locations are all search criteria.

In a way, it is more complicated: instead of 0 or 1 file, you now get a whole list. On the other hand, everyone understands search. My hope is, the initial extra complexity would be dwarfed by the ease of sorting and finding your files. Because right now, one or the other is difficult: it's hard (or at least bothersome) to properly sort one's data in a directory tree, but it's even harder to find it if your disk is messy.

Now there are two snags we might hit: first, I'd like to do away with unique names, because they get us back to the old, difficult to manage, directory tree. Second, to have good tags, you have to internationalize them. For music stuff for instance, French speaking folks would like to use "musique", while English speaking ones will use "music". It has to work transparently when they exchange files, or else it would defeat the purpose of default tags. I can think of solutions such as aliases, normalization at download time, or standard tag names that can be translated by the system, but I'm not sure that's really feasible or usable.

politician · on July 1, 2012

I think that all these different types of identifiers might make security a challenge.

loup-vaillant · on July 2, 2012

Access rights should of course not be tied to identifiers, but to files themselves.

tagx · on June 29, 2012

There is a hierarchy of organization structures that are typically used for "content." For content with low numbers, a list works best. For example, the home screen is a list. With a medium number of files, hierarchical structures like traditional file systems work best. However, when you reach many, many files, tagging and searching is typically needed.

Files do exist. They are the values in the filesystem KV store, but they are also a schema for the value so that interoperability works. If we followed your logic, apps would not exist either but we all agree they do.

quesera · on June 29, 2012

Filenames are the keys, file data are the values, and there's metadata too, encapsulated in the directory data, inode data, etc. But the concept of a file is just a useful metaphor. A file is exactly identical to a directory or an executable, from the KV store's low level perspective. All the handling and interpretation happens up higher.

I disagree that (most) files contain their own schema though. Sometimes the schema is incorporated by reference in the metadata, sometimes in the key name (filename).

Some structured data formats contain a sort of self descriptive sub schema, but they always(?) require something higher up the chain to make sense of it. I can't think of any examples where that isn't the case, but I'll leave the question mark on the always because I'd like to be be proven wrong!

derleth · on June 29, 2012

> when you reach many, many files, tagging and searching is typically needed.

When you reach many, many files, just try to get people to tag them. Just try.

drone · on June 29, 2012

Indeed, you either tag from the get-go expecting to have many, many files in the future, or you give up on ever contextually managing those files outside of large containers.

How do you tag a file when you can't find it to tag it, and when do you tag a file that you've forgotten about?

Ultimately, someone comes up with yet another abstraction that makes it just a little bit easier... Now, if we made every application those wrote a file also tag it meaningfully, and then had meaningful translations, and.. oh geez. I normally just delete everything and start over when I realize I have no idea what 90% of the files I just scanned were for. If they were truly important, I would've known what they were. I guess the people with ten bazillion files on a PC are just data hoarders. "But, but, but I'm going to need that report one day!" (Bet you would tag it, now wouldn't you?)

johnchristopher · on July 1, 2012

Well, there is something such as digital hoarding: http://online.wsj.com/article/SB1000142405270230340470457730...

First link for "digital hoarding", but there is more to read about it.

icebraining · on June 29, 2012

There are auto-taggers. For example, I use Picard for music files.

encoderer · on June 30, 2012

The Filesystem != Files.

The File is one of the most important core abstractions of our computer systems. Stdin? Stdout? Network I/O? Etc. All files.

ChuckMcM · on June 30, 2012

Actually no, they are file descriptors. They provide a unique identifier, which used in conjunction with an API call, with operate on the data in, or the meta data of a data set.

And while its true that UNIX (and taken to its logical extreme Plan 9) used a common API to handle all of the transfers between non-memory resident data sets and memory, there are many examples of operating systems that use other schemes. One of my favorites which I got to help build when I was in the kernel group at Sun was a thing called 'mmap' which assigned addresses in your memory space to locations on disk. The 'magic' happened in the VM HAT layer. That scheme is having something of a comeback in 64 bit address processors since few machines actually have 22PB of RAM it is possible to do something like:

    struct Stuff *my_stuff;
    mmap((void *)my_stuff, file_len, 0, 0, file, 0);

And then be able to make a statement

    if ((my_stuff+100)->is_ready) { ... }

And have the operating system translate between my notion of a memory address and the data sets offset to instance 100 of the structures contained therein.

I expect to see more of that in experimental OS code. Something like char data_file = ChuckOSOpen("some.dat"); which lets me then address data_file like an array of char so

   while (char *t = data_file, IsValid(t), t++) { ... };

would then iterate over the contents of data_file as 8 bit values until there wasn't any more data and thus no more validly mapped pages. Read and write are simply assignment, seek is simply math, and close is simply unmap.
All without the notion of 'files' but with a notion of 'named data sets' which clearly can be implemented in a number of ways.

encoderer · on June 30, 2012

> Actually no, they are file descriptors

That is a distinction without a difference.

Someone · on July 1, 2012

Without distinction? Only if you look at it simplistically, form the view of a single process. Otherwise:

- Shutdown a machine, and all file descriptors are gone. It's files, one would expect, would still be there.

- you can have multiple file descriptors for a single file.

- you can have file descriptors that aren't 'attached' to a file proper (stdin, stout)

TazeTSchnitzel · on June 29, 2012

Hmm. Homescreens seem to work well enough for organisation of apps, better than start menu folders anyway. Perhaps file "grids" instead of the traditional hierarchy of folders might work better.

christopherscot · on June 29, 2012

Instantly reminds me of coworkers' desktop filled with icons and documents

vacri · on June 30, 2012

I used to be of the mindset to keep a clean desktop, then I came to my senses and used it as a working space. It's easy to get to from anywhere in the OS as it generally has it's own shortcut'd location. Temporary working files just get stored there. Permanent files might have a shortcut to them stored there instead, leaving the real file in a more suitable location. The 'ugliness' associated with such a desktop is meaningless once I realised most of the time the desktop is covered.

The most infuriating thing about Gnome 3 is that it decides for you that the desktop is an unholy place to keep anything, because you're too stupid to figure out how to do things efficiently.

TazeTSchnitzel · on June 29, 2012

It seems crazy, but it may be more intuitive for some people. People often use muscle and visual memory to remember where things are, not necessarily by name of location.

Edit: Plus, you can always add naming and search with it. And Microsoft has an interesting grouping concept in Windows 8 with grouped and optionally named sections, not folders.

quesera · on June 29, 2012

Doesn't seem crazy to me at all. It works well for that use case, but it doesn't scale well to hundreds of entries.

Once you get into groupings you're creating the same problems that people have with filesystems (implicit or explicit organization challenges, loss of discoverability, etc).

Search (or something like what we call "search" today) might be the best step forward from here, but you can layer that on top of any other (or no specified) KV metaphor you like.

alberich · on June 29, 2012

>It seems crazy, but it may be more intuitive for some people.

It seems crazy, but it may not be more intuitive for some other people. :)

TazeTSchnitzel · on June 29, 2012

Of course, me included. I like hierarchical organisation.

I suppose you could mix the Windows 8 approach (all groups) with iOS-esque folders, and allow subfolders, and then you have the best of both worlds.

pelotom · on June 29, 2012

They may be a tolerable layout for your set of apps, which tends to be pretty small, but for organizing all your files it quickly becomes a giant mess.

curiousfiddler · on June 29, 2012

Well, then there's the file metadata (inode in Linux: http://www.linux-mag.com/id/8658/). It is a very useful chunk of properties attached to the memory unit and provides a base for ton of essential facilities that make life easier.

brettcvz · on June 29, 2012

Interesting to think of it from that layer of abstraction. I definitely agree, but "Why Files Exist" fits nicer as a title

_quora · on June 29, 2012

"What exists is a way to name a non-volatile data set."

   1. What if it's "code" not "data"? I thought "data" in this case was just 1's and 0's, i.e. what the represent, e.g. electric charges on a medium? I thought that how we interpret these charges is what makes them "code" or "data".

   2. What if the "file" is stored on a "RAM disk"? Is that non-volatile? If no, then does that mean this is not, by your definition, a "file"?

notatoad · on June 30, 2012

1. code is data. Thanks, mr Von Neumann!

2. the implementation of the storage is another level of abstraction away. the filesystem assumes the media it exists on is non-volatile, what exactly that means in implementation is irrelevant. theoretically you could call a traditional spinning disk volatile because i could hit it with a hammer and destroy it.

kahirsch · on June 29, 2012

There is one aspect that everybody here seems to be ignoring. Files can span boundaries of time, space, connectivity, bandwidth, and trust. They also span boundaries of architecture--CPU and OS.

I have files "in the cloud" that were born on systems that haven't been manufactured since before Google existed. Those files are self-contained units that I control and can move to whatever system I desire.

And, although people here say that end-users just don't know how to use files, I have relatives who are over 85 years old who still manage to attach photos--and Powerpoint presentations, for some reason--to emails and share them.

Saying that we only need an API is saying that it's okay for the data to die when the manufacture goes out of business, or decides that it's time to shut down the DRM servers, or you just lose your phone.

Files are reifications of data that allow us to separate some concerns. Transporting and backing-up files are orthogonal to the data that is in the files. We can compress and email and FTP any kind of data whatsoever.

That's not an insignificant thing.

zanny · on June 30, 2012

You "could" in theory write apps for the iPhone that interface over localhost ports. Which would be awful. (too bad ports in unix are described with file descriptors!)

thoughtsimple · on June 29, 2012

I disagree that files are the only solution. Back in the 90's Apple had an OS that had fully interoperable data in applications and that OS didn't have a file system.

It was Newton-OS and it used something known as soups for persistent storage. Soups were discoverable databases that intelligently handled Flash cards insertion/ejection. The ability to handle Flash on removable media is still something that mobile OS's have trouble with to this day.

The OS could merge soups on different stores dynamically and detect if some data in a soup was currently in use on an ejected card and ask for the card back. This merging of soups on different storage devices is something I've never seen duplicated in the subsequent 20 years.

Files are not the only way to achieve the requirements in the article. They are just the common solution.

wrs · on June 30, 2012

Hey, someone remembers! (I did the Newton object store.)

I spent years of my life trying to get rid of treating direct user access to the filesystem as a foundational UI metaphor, at both Apple and Microsoft. As I liked to say, why is the UI based on a filesystem debugger? (If you can see /dev or C:\windows\system32, then yeah, you're running a debugger.)

Many people who aren't programmers don't seem to get deep hierarchy (deep meaning > 2 levels). Searching works, tags kind of work, but few people really know how to set up and use a folder hierarchy.

The reason it works to let the app deal with navigation is that the app knows how to do type-specific, contextual navigation. People like concrete things (whereas programmers like abstract things—a constant struggle). If you're trying to find a song, you want to have a UI that knows about songs: they come in albums, the same song may be on multiple albums, they have artists and composers, etc. Any attempt to represent that in a filesystem hierarchy can be nothing but a compromise.

This has nothing to do with defining standard formats for exchanging units of data. Just how you find them once you've stored them.

cynwoody · on June 30, 2012

>> Many people who aren't programmers don't seem to get deep hierarchy (deep meaning > 2 levels). Searching works, tags kind of work, but few people really know how to set up and use a folder hierarchy.

Yep. Search works. (But forget tags and taxonomy. Taxonomy is sooo 1994 Yahoo! Library science! LOL!)

Imagine if the whole internet were below /. Now, where's Walter? Lessee. His initials are wrs. He did the Newton object store. Hmm.

Finder can't find. Explorer can't explore.

However, if I simply type "wrs newton object store" into my Chrome address bar, it instantly coughs up the Newton Hall of Fame! https://www.msu.edu/~luckie/hallofame.htm

QED. Wallyscript indeed!

encoderer · on June 30, 2012

First, hat-tip for your accomplishments.

Now on to the bashing....

(I kid)

Seriously, though: For a song, that works fine. But what happens when it's a note I jot down in a hurry? And then an address I tap in for later. And my grocery list.

Now, I have to keep this mental mapping of where my data lives. I have to, essentially, remember file types and associations myself.

Not saying I need a file browser, but the current iOS facility for this isn't good enough. Look at the card-wallet thingy for iOS6. Maybe what would work is something like that for each general type of content. You want to see any stored gift cards and boarding passes? Open your wallet. You want to see any stored notes and grocery lists and what not? Open your moleskin.

You've clearly thought about this more than I have, though. So what's your take on it?

wrs · on July 1, 2012

Well, your examples are kinda covered in iOS already: the note goes in Notes, the address goes in the Address Book, and the list goes in Reminders. But I think I see what you mean -- where do you throw random bits of stuff and how do you get it back?

I think in the sort of usage you're describing, you just make random things and save them, and you get them back with search and a chronological list. The three things you describe don't sound like you'll need them after, say, tomorrow afternoon. So why put a ton of effort into organizing them?

It is of course useful to be able to organize arbitrary files in a more permanent way. The repeated mistake (to me) is that the process of organization is not itself considered a concrete application based on specific use cases. For some reason, a document format is considered application-specific, but as soon as you want to group two documents together you're dropped into this pure universal abstraction of a filesystem hierarchy. In other words, applications get to define how files work, but not how folders work.

For example, you could have a "project" that let you group various things together (maybe some CAD drawings of an office remodel along with various random notes and a budget spreadsheet). That's what a folder does, but a project would be much more specific--maybe do some time tracking, have some client-based organizational functions, etc. And of course you'd look at projects in the project application, not in a filesystem debugger.

thoughtsimple · on June 30, 2012

Hey Walter. I remember that you did a lot more on the Newton than the object store :)

NewtonScript made me a better programmer. I don't say that about many languages.

VonGuard · on June 30, 2012

Awesome. I still use my 2000 from time to time. That device was so far ahead of its time in every regard. Second hat tip for your work.

kabdib · on June 29, 2012

We stored lots more than just "notecard entries" in the Newton object store. We had application packages (with demand paging of compressed code). We had "large binary" support, similarly demand-paged. And all of this stuff was hooked up to the garbage collector.

I don't know how well it would scale to a non-hand-held device, but it worked really well on the Newton.

Files are useful, but they are not necessary. We are used to them, but there can be better ways to do things.

brettcvz · on June 29, 2012

Interesting solution - this is similar to what I was referring to in the last paragraph that a traditional folder-based filesystem isn't necessarily the only way, but that a system-wide, abstract, and inter-operable content wrapper was the key requisite

icebraining · on June 29, 2012

git-annex is a similar system. Files are tracked and can be on many storage devices (local, removable and even "cloud") simultaneously, and you can always know where they are from any machine.

spdegabrielle · on June 30, 2012

True. Another approach was used by the OLPC project. Their ui had a journal(chronological order) view of user created documents. There must be other approaches?

wes-exp · on June 30, 2012

Databases are often excellent for interoperability. This happens all the time in business environments; e.g. a reporting tool can access an application's data using the database as the middle man.

Files, on the other hand, often have peculiar formats. Is .xls "interoperable"?

bergie · on July 1, 2012

In MeeGo they used an RDF triplestore for much of the personal data.

politician · on June 29, 2012

Sounds like ZFS.

juriga · on June 29, 2012

IMHO, interoperability between apps is the main benefit of Android against iOS (see Intents in Android).

I can take any photo, URL or file and open/share it in/to another app. The OS and app developers take care of which apps support which resources, so as a user I'm always presented with a sensible list of apps currently available on my device.

I'd say the notion of a file with a specific file type is too abstract and technical for most use cases for casual users. The UI should group pieces of data as human-understandable resources (i.e. a "picture" can be a .jpg, .png etc.). With this level of abstraction, a user can be expected to understand when presented with a list of apps:

OS: "What do you want to do with this URL?"

User: "Share to Twitter/Facebook/My other browser"

MatthewPhillips · on June 29, 2012

Don't you see the problem with this? By allowing the app, rather than the user or the system, to own the file you wind up with multiple copies.

App A shares data with App B. The user makes some changes and App B saves its new copy. It doesn't send the data back to the originating app. It could, but then the user would have to manually do that and save it in App A again.

This is horrendous! People are confusing the poor UI that we have for files (file pickers) with thinking that files themselves are a bad abstraction.

jpxxx · on June 29, 2012

I understand what you're saying, but you need to illustrate an actual problem beyond "technically there's two" that this causes in mobile workflows for typical consumers.

That the current Send-To idiom is limited is not under dispute. That it's worth changing is what's worth discussing.

MatthewPhillips · on June 29, 2012

The 2 copies are not guaranteed to be in sync. If I modify data in App B, App A now has an old copy. I shouldn't have to explain why this is undesirable.

jpxxx · on June 29, 2012

Relax. :) Now we're getting to the crux of the situation.

Let's talk spreadsheets. That's a pretty data-freshness-critical thing, yes? Jennifer gets an e-mail with an Excel attached. She previews the Excel, decides it needs an edit, and taps Send-To -> Numbers (the most common current scenario). A duplicate is created and shipped over to be owned by Numbers.

She makes edits, then clicks Send To -> Email. The correct version goes out. The updated version remains in Numbers. The user assumes that Numbers has the spreadsheet, because that's where she was making changes. The copy in the Mail Attachments archive remains old but it's never invoked a second time and the workflow never takes in stale data.

The only scenario in which Send-To causes problems is in the case of two applications that have equal abilities to process a given filetype AND roughly equal chances of being invoked by a user, and how often is that going to be coming up on a telephone or an tablet?

You could always build a file locking system at the app-level or OS-level or cloud-FS level, but then we're back to Who's Freshest?, the hardest game of all to win.

Apple chose not to play this game and pushed all the filesystem yuckiness out onto a rarely-traveled edge of a regular user's possibility space. It's not ideal, but what is?

drone · on June 29, 2012

Why do there have to be two copies of the file? Could there not be only one copy of the file, with two "begin pointers" that are dated? When a user decides that they no longer need an older version, the older section can be purged. Obviously, this could result in quite large files, and we start wanting to do smarter edits to the contents, but I'm not sure having two distinct copies of the data serve the average user any better than one with two internal copies of the data? Obviously, there are technical challenges, but let's just assume for a minute that years of database and similar use case-driven development have solved many of the basic issues already.

jpxxx · on June 29, 2012

Two copies solves more problems than it wastes space.

What you're describing is concurrent access to a shared resource, which means we now need to start having the following discussions:

- Who's accessing? - Where's the thing they're accessing? - When are they starting? - When are they done? - What happens if they stop talking without saying goodbye? - What happens if changes from User A happen before User B gets a full copy of the starting work object? - What happens if User B deletes everything? Should she be able to? - How do we manage identity? - What happens if the work object is asked to save but is in an incomplete or non-logical state? - What happens if the work object is damaged?

(Now multiply the word 'user' by multiple axes: actual humans, programs, network services, filesystem services)

This is an awful, awful, awful lot of work designed to save the space used by a duplicate 300KB .PPTX file. And frankly, I've only ever seen SubEthaEdit get it right.

Also: the user will never, ever, ever, ever decide to review or delete an old file. That's shit-work, and we are mortal.

A majority of customers aren't even aware that there can be multiple versions of the same work or that the work they're doing can be expressed discretely - they're just doing stuff, and cannot be expected to think about about stuff--.

So in short, let's waste some space and save some hassle. That's what flash chips are for.

juriga · on June 29, 2012

This is a very good point. Merging file changes between applications is not a trivial task and usually applications just end up saving their own copy (as you mentioned).

For example, if I share a photo from the Android Gallery app to Instagram, add some trendy filters and share the modified photo in Instagram, I end up with three copies in the filesystem: The original, and two copies made by Instagram to preserve both the original and filtered version.

After all, Instagram doesn't know/care whether the photo came from a camera or from another app, so it always saves a copy of the original.

Also, each time I share the photo to Twitter, Facebook or other cloud service I create another copy with different resolution, (re)compression and even dimensions, which is saved to the service out of my reach.

I guess having multiple unsynced copies of the same information is a problem that won't be solved as long as we want to have the ability to distribute information freely. One interesting development would be better content-based matching of files (e.g. Google similar image search, content-based indexing of photo subjects in Google Drive).

brettcvz · on June 29, 2012

seconded

sehugg · on June 29, 2012

I think the "here today, gone tomorrow" nature of app stores is impairing file interoperability. There's just little incentive to allow your productivity or creative app to play with others (unless that's the whole point of your app, like PlainText). I've given up on fancy note-taking apps, knowing there will always be a better one that's not compatible with my old data.

In another decade we'll have a whole lot of unreadable proprietary app data, inaccessible because the original app doesn't work on new hardware. Extracting it will be a tedious process of either reverse engineering or emulating the old hardware/software combination.

Not that we haven't been down this road before, but it just seems like it's worse this time. Even the word "file format" seems archaic, and not many (other than pirates) seem interested in reverse engineering and/or documenting them.

guard-of-terra · on June 29, 2012

File systems have to evolve. These days, file system means two things: an application-independent API to access common documents, and a hierarchical local storage. But it doesn't have that way.

The best thing I've seen in file system evolution is KDE's KIO: Any KDE application can take any KIO url and use it; all file operations are asynchronious (even if you open local files, and that's very nice even for local files that are big), and any program can use network resources as easy as local with little to no effort.

But we should improve on that: a heterogenous user file system should provide discoverability (e.g. your social network photos are automatically available in any program once you bind the account, and you know where to find them). File system branches restrict some operations on files or hint on their cost (scanning a huge photo bank is a very expensive operation; you can't access the contents of audio files inside a streaming service but you can play them in the program). There also should be other ways to organize files than just dumb hierarcheries (imagine a search box in place of a folder, you need a query and then you enter the search results; or you can have tag cloud in your file system)

There's a great deal of work of innovation here and nobody does it at the moment, so it seems.

Sorry for mistakes 'cause I'm hurrying to go to bed :)

paulsutter · on June 29, 2012

Files exist because decks of punched cards were cumbersome. A best practice was to make a diagonal stripe across the top with magic marker so that you could restore the ordering if you accidentally dropped the deck. Files in a filesystem eliminated the need for that. And the cards were heavy.

That's why files exist. Not sure what the article is trying to say, TL/DR

agumonkey · on July 1, 2012

The diagonal trick reminds me of CDROM CRC error mechanism (reversed of course)

ori_b · on June 29, 2012

The key isn't the specific file abstraction used today. The key is being able to name data. Whether that is through a traditional hierarchical file system, through an activity log, through URLs, through some hash-based key-value store, the requirement is being able to refer to data independent of the application that produced it.

jules · on June 29, 2012

We will still need a way to pass information between applications, but that may be so different from the concept of files that it would be ridiculous to call it files. For example applications on the internet exchange information via APIs. Microsoft is doing something similar with Windows 8: if you want to get a photo into an application you show the user a menu to get a photo. The user gets a list of all his photos to pick from. Where this list comes from is dependent on which other applications are installed: if you have a facebook application you can choose your photos from facebook, if you have picasa then you can also choose photos from picasa, etc. This works because each application that has photos is supposed to provide an API to the OS to access its photos. Exchanging information by exchanging it directly via standardized APIs makes a lot more sense than exchanging it via an abstraction layer designed to operate on top of a hard disk. This is similar to the difference between Unix pipes and getting the output of one program, storing it on your hard disk, and then reading it in with another program. With the API model the disk loses its special status, and instead becomes just one other data source/sink like any other (FUSE turned on its head, if you will).

aganek · on June 29, 2012

I love this post.

There is no doubt in my mind that the file system (as we know it) is dead. Daily workflows are becoming more and more integrated with the social graph. Its one thing to manage your own file set, but try keeping track of everyone's files... or even your own across multiple different purpose devices for that matter.

If I save files using one filtering scheme and someone else saves to the same shared drive using another scheme... both of our files eventually become lost in a mess.

Like others have posted, I believe the solution is search. Maybe not textbox search like Google, but certainly different ways to view lists of files. Can you imagine viewing the most recent files edited by a certain coworker, or the most recent files edited within range of a certain GPS location. I don't have an exact answer how to sort the data, but in my mind... there is a lot of additional data that can be used to help filter file presentations beyond the just the file index and file attributes used today.

I'm in the bay area, working on a startup to address this shift. Message me if interested... I'm always looking for people to talk about it with.

7952 · on June 29, 2012

"but in every OS there needs to be at least some user-facing notion of a file, some system-wide agreed upon way to package content and send it between applications."

This is what the world wide web does. DNS, HTTP, and MIME types solve these problems. The problem is that it is still to difficult to make things on a device into URLs.

brettcvz · on June 29, 2012

But even on the web there is limited ability for applications to share data without explicitly working with the apis. A central filesystem allows for "star network" integration rather than point-to-point

icebraining · on June 29, 2012

The API is already there: it's HTTP. read() is GET, write() is PUT or PATCH, unlink() is DELETE. If you want to be fancy you can use WebDAV, which is also a standard API.

You don't need APIs, you need standard file formats, just like with filesystems.

7952 · on June 29, 2012

Its just a different kind of addressing. The way different things are integrated is an independent concept.

Perhaps you could expose URLs in a unix file model style and pretend that they are files. Or build http into the kernel and give every file and executable on the system a URI. Or even map 64 bit processor address space to IPv6 addresses. Based on a 4k block size you could address every hard drive in the world in a 64 bit single subnet.

My point is that files are just one type of abstraction. URLs are just better.

tagx · on June 29, 2012

How many different file types do you typically use in a week?

tlrobinson · on June 29, 2012

    $ find ~ -type f -atime -1w | awk -F/ '{print $NF}' | awk -F\. '{if (NF>2 || (NF>1 && $1!="")) print tolower($NF)}' | sort | uniq | wc -l
          83

Granted a lot of those are system files.

super_mario · on June 29, 2012

And this is the best counter example why not having files/filesystem would suck. You could not do this rather simple calculation at all.

Somehow this crusade against files and the filesystem just feels like it has ulterior motives behind it. I have yet to see even a computer illiterate user who has a hard time understanding "folder" metaphor and that folder may have items inside them, including other folders.

agumonkey · on July 1, 2012

If some files are archives (or any encapsulation format/mechanism), then the count is false.

Files and folders are too generic and not generic enough. Some files aren't files, some files are ~folders. Actually most of those files are ~folders, they are containers for other kind of data and relationships. List of samples, Tree of names, Graph of points.

IIRC Plan9 tried to be a little more generic (in a good way), you could read/write/list anything even visual objects with one single mechanism.

We need maps to see/categorize/find data. Graphs of atoms that you can close (as in closure, any datum involved in the meaning of an operation has to be included) to transmit them in a consistent state. Moving files is wrong and everybody have seen it, it's full of hardcoded context.

brettcvz · on June 29, 2012

Code (multiple types), images (multiple types), documents (multiple types), videos (multiple types), audio (multiple types)

And for all of them, I should be able to move them between services and applications as I please

Zikes · on June 29, 2012

And archives (.tar.gz, .rar., .zip, etc)

drostie · on June 29, 2012

Just today I worked with Markdown, LaTeX, PDF, Postscript, SVG, Python, GIF, and HTML. Most of that was in some way or another related to the Master's thesis I'm working on, but it's honestly pretty typical. (The HTML was not -- I decided to curl some spam sites linked from my email and read through their JavaScript, to trace down and see what sorts of deliciously painful exploits they were trying to install -- turned out it was just advertising crap.)

will_work4tears · on June 29, 2012

Directly or indirectly?

tobyjsullivan · on June 29, 2012

The limitation of apps not saving to the iOS file system is not a bad thing. It is progress.

There is nothing preventing my shiny new iOS app from sharing files with other applications. Apple is just preventing those files from being stored on the device. Instead, if an app developer wants interoperability, they can have the app save a file to Drop Box, or my Google Drive. Any other application can access that same cloud storage and access the file.

The beauty is we've moved beyond sharing between applications on a single device. Now EVERY application I run on EVERY device I have has the potential to share the same data seamlessly.

This is why iOS doesn't open its file system. It wants the app developers to use something a little more flexible and reliable.

juriga · on June 29, 2012

> if an app developer wants interoperability

Implementing interoperability with all the possible cloud storage systems shouldn't be left to each app developer separately. This should be a feature of the operating system.

As an Android user, I'm genuinely interested if iOS users find the sharing options between apps too limited. Do you often end up requesting new sharing options from the developers of your favorite apps?

Also, not every piece of data is a file I'd want to save to Dropbox. For example, I share article URLs from Flipboard to 2cloud many times a day (2cloud opens the URL on my desktop browser). I'd hate to have an extra save/open step between the apps.

tobyjsullivan · on June 29, 2012

My argument to this is that having access to the file system just gives developers a cop-out. If you give app devs access to the FS, they will all use that because it's easier and avoids the challenges of supporting cloud storage. However, this is ultimately worse for the user experience in the end.

I get what you mean about only wanting certain documents on Drop Box - that was just a limited example. The spirit of the concept is that the developer can choose what cloud storage to use based on the application.

In the case of a mobile Photoshop app, Drop Box might make sense. In your example, the storage medium would be different (maybe proprietary even) but a cloud space would still be ideal for the end user over just storing these URL's on the local device.

vacri · on June 30, 2012

Nice though Dropbox is, I don't have an account on it. You're saying that in order to share data between two apps on the same device, I should have to sign up to a third-party system to do so?

Then, if I have different kinds of documents, I should sign up to a bunch of different third-party cloud systems? Each sign up being another username and password, another point-of-failure for security, more management overhead? All for a system that won't work if I lose network coverage (rural, underground, airplane mode, choked tower, foreign travel etc)?

The cloud does have its good points, but I do not buy this snake oil.

colinsidoti · on June 29, 2012

We'll see how it plays out, but I imagine the notion of a file will continue to decline, and end up replaced by APIs.

APIs continue to provide interoperability, but instead of having the user select a file to upload, they select an image through the Facebook API. This should ultimately improve the user's experience, but there are some downsides during this transition period (IE: photoshop touch lack of sharing features).

While you could argue FilePicker brings back the file, you could also hedge your bets the other way, and work as much as possible to abstract away the file. Instead of grabbing Facebook photos as a set of files, what if I could easily grab the set of photos than contain me and a friend?

guard-of-terra · on June 29, 2012

This way if you start a new social network, in addition to network effect it would have that huge disadvantage to Facebook that no programs are willing to interoperate with is because they only know that proprietary Facebook API.

Good for Facebook, bad for you and me.

brettcvz · on June 29, 2012

The problem is that it requires each application to have to wrap each API. You end up with a huge amount of redundant work, and in general a dearth of integrations as developers are lazy/reprioritize. For example, there are apps that support integration with Dropbox, but don't support Google Drive, Box, SugarSync, etc.

colourforth · on June 29, 2012

Forget about "filesystems" for a moment.

Files exist because the amounts of "stuff" users want to "store" do not always correlate well to block size.

To put it another way, block size is fixed. But the size of "stuff" is variable.

OK, now you can go back to thinking in terms of "file systems".

lucb1e · on June 29, 2012

Oh, I thought this was going to be about who invented files in the first place. Why'd you call something a file? How'd that idea arise? The only real-life "files" I know are these reports the police keeps on people, or I guess any dataset. But who invented filesystems?

Edit: 302 found http://en.wikipedia.org/wiki/Computer_file#History

njharman · on June 29, 2012

Interfaces (read APIs) and data types (PDF, png, json, markdown, etc.) much superior to files for consumer level users. This is we're iOS is heading. Itseemsby evolution, not design.

Files are great when there is a competent, skilled user to provide the interface glue between apps. To automate, and have things just work, interfaces and data types are needed.

AsylumWarden · on June 29, 2012

I remember the Palm Pilot didn't use files. I think I was using the Palm IIIxe may be. Developers, perhaps a little freaked out by the idea of no files, actually created an api to make it look like there were files. I thought it was pretty funny at the time.

nitinthewiz · on June 30, 2012

My response to this, including how file systems and file explorers will never die -> http://blog.nitinkhanna.com/why-the-file-system-will-never-d...

stcredzero · on June 29, 2012

I recently switched to using MiniKeepass on my iPhone. I have my encrypted KeePass file on Dropbox. To get it into MiniKeepass, I just went to the iOS Dropbox app and clicked on my key database file. MiniKeepass was registered as an app for the corresponding MIME type, and the file opened. Easy.

That worked great for me. What else is needed?