The escalator never went down in the first place. Files are a weird, unclean semi-abstraction (growable virtual sparse block devices addressed as seekable byte-streams with heavy metadata and OS-level memory caching?) that we only feel a degree of primacy about because of how common they've been.
Consider: unikernels (like, say, any old cartridge game ROM) don't have any need for files. They have precisely three abstractions they deal with data in terms of:
• a .data section in ROM (maybe needing bank-switching to get in place);
• some kind of byte-addressable NVRAM (like battery-backed "save RAM", or CMOS memory) either bus-mapped, or through MMIO.
• tape or (floppy) disk, sometimes at an extremely low level (send commands to the drive motor, write guard nybbles, etc.), sometimes through a DOS where you can just request to seek to a given track then read or write a given sector on that track. Either way, more like a block device than a filesystem.
---
For today's kids, I'd suggest: don't start with files. Teach key-value storage first. Interact with a library like LMDB without explaining where it's putting the data. They'll understand this just fine.
Then, teach object storage in terms of key-value storage. Object storage—especially once you add object versioning—is much closer to the modern metaphor that user-facing apps expose. You compose a complete new version of an object in an in-memory scratch buffer, and then it atomically replaces the previous object. You can't corrupt an object by half-saving it. Etc. Again, don't bother explaining how this works yet; just give them a scripting runtime hooked up to a Minio instance.
After they get that, you can ask them what they'd do if they needed to create an object that wouldn't fit in memory. Then you can explain block devices, as a "place where large scratch buffers can live"—but don't force them to figure out how to allocate those buffers from the block device! That's gonna pull in a whole bunch of prerequisite teaching. Instead, pull out another API: Linux's LVM. Logical volume management takes block devices in, and spits block devices out. The logical volumes are the scratch buffers. Explain mmap(2), and how these buffers end up a lot slower that memory buffers. Explain how these buffers survive a disk crash.
After you get to that point, then you can explain that all the other ways computers durably store data are built on top of these logical-volume durable scratch buffers. You can explain how LMDB works in terms of disk pages; and then you can explain how a content-addressable storage works in terms of combining an LMDB-like KV store with hashing and splitting.
And, after that—if you like—you can explain that sometimes, when we need something that's like object storage but where everything in the storage bucket is actually a tiny scratch buffer, we use filesystems. You can explain what an "extent" is by talking about how something like LMDB, that has a "freelist" of pages from its underlying logical-volume, can reserve a contiguous set of those pages and then let something else access them. Then you can explain a filesystem as a key-value store that has buckets called "inodes" with page-range keys, and extent-address values. (And an associated versioned object store of directory-objects, where each object is a serialized list of records (dirents), each containing a reference to an inode and giving it a name and other stuff.)
I didn't say the implementation of a filesystem was hard. You can certainly communicate, as a complete standalone fact, the brute engineering perspective on what makes a particular filesystem work. Probably even in a single lesson.
But the point isn't to teach students about a filesystem; the point is to teach students what files are when they don't understand why you'd even want the abstraction that "files" represent; when "files" aren't an abstraction they're familiar with compared to other abstractions. You have to justify "files" in the context of things they do know—to explain what purpose files would serve if they were invented today; what reason you'd introduce a filesystem into a data architecture that had all these other things (KV storage, object storage, logical volumes) but not files.
Students would ask: Why are files streams? Why are the streams arbitrarily writable, when this means files can become corrupted from a partial write? Why are files extendable, but only at the end, not in the middle or the beginning? Why do files have names? (And, once you explain inodes+dirents, they'll ask why there's a directory tree if hard links exist; and why it's the dirents that have the names, but the files themselves that have the other metadata—xattrs; alternate data streams, etc.) What is the difference between a file stored in a directory, a file stored in a filesystem stored in a mounted loopback image stored in another file, and a file stored in a subvolume or filesystem container? Why do directories need special system calls to read and write them? Why is a virtual filesystem used in many OSes to represent kernel objects like named pipes and device nodes? Etc.
(This isn't even a hypothetical; engineers at AWS and GCP probably had to answer this very question when asked to justify building EFS and Cloud Filestore. Why do we need a filesystem as a service, when we already have these other services that provide these other abstractions?)
This is not altogether unlike teaching a student what a programming language is and why you'd even want one of those, when they're immersed in an environment where software is created without them. Would you just sit down and show the kids C, because it's small and easy-ish to understand?
A filesystem is a data-storage abstraction, like C is a machine abstraction. But neither are primitive abstractions. They build on numerous other, theoretically purer abstractions. It's much easier to explain the motivation for the creation of a language like C, if you already understand the motivation of the creation of a simpler language that rests upon fewer other abstractions—like ASM, or like Scheme. Likewise, it's much easier to understand the motivation behind the creation of the abstraction that is filesystems, if you already understand the motivation behind the creation of simpler data-storage abstractions, like logical volumes or memory-page-backed search trees.
The answer to "why do you want files as an abstraction" is that files are the units of ownership of data. If you don't control the files that represent your data, you don't own it. You might think you do, but you don't, because someone else ultimately decides the fate of those files.
My point was that there don't need to be files anywhere to represent data at all. Computers can work entirely without files. For example, your whole hard drive could consist of an RDBMS, where you'd not "download" files, but rather "download" streams of tables, which would import directly as tables into the RDBMS.
"Files" are a very specific abstraction; thinking they're the only way to transfer chunks of data around is a symptom of lack of imagination. There are very similar abstractions to files, such as object-store objects. The only practical difference between files and objects is that updates to objects are transactional+atomic, such that nobody ever sees an "in progress" object. But an object store (backed by a block device) is a simpler system than a filesystem (backed by a block device.)
You can control the objects that represent your data. You could also control, via an RBAC-like system, the K-V or tuple-store records that represent your data. Or the Merkle-tree commits. Or the blockchain transactions. Or the graph edges. Or the live actor processes holding in-memory state. You can transfer all of these things around between computers, both by replication (copying) and by migration (moving.) What do files have to do with any of this? An accident of history is what.
They are not the only way to transfer chunks of data.
They are the most successful and versatile mass way to transfer chunks of data and define ownership in the history of computing.
I’m sure an RDBMS or a graph DB can do those things as well. But no one has succeeded in doing it even close to as effectively as files managed to. And many have tried. In fact, probably the greatest computer software failure of all time, Windows Longhorn, was largely a failure in trying to replace a file based system with a graph DB based system.
People very much can imagine alternatives. There are no shortage of imaginable alternatives. There is a huge shortage of successful in use alternatives that are as versatile or effective as files.
You're focusing on "files" as they compare to things very different from them. But imagine for a moment what an OS with an object store in place of a filesystem, would be like. Pretty much exactly the same, except that the scratch buffers backing temp files and databases wouldn't hang off the object-store "tree", but rather would either be anonymous (from mmap(2)), or would be represented by a device node (i.e. a logical volume) rather than being objects themselves. All the freestanding read-only asset bundles, executables, documents, etc. would stay the same, since these were always objects being emulated under a filesystem to begin with.
And downloads would also be objects. Because, when you think about it, at least over HTTP, downloads and uploads already are of objects—the source doesn't get allocated a scratch buffer on the destination that it can then freely seek(2) around and write(2) into; instead, the source just streams a representation to the dest, that gets buffered until it's complete, and then a new object is constructed on the dest from that full local stream-buffered copy. (WebDAV introduces some file semantics into HTTP's representational object semantics, but it doesn't actually go all the way to enabling you to mount a DBMS over WebDAV.) Other protocols are similar (e.g. FTP; SMTP.) Even BitTorrent is working with objects, once you realize that it's the pieces of your files that are the objects. Rsync is the only weird protocol, that would really need to be reimplemented in terms of syscalls to allocate explicit durable scratch buffers. (That and SMB/NFS/AFP/etc., but those are protocols with the explicit goal of exposing a share on a remote host as something with filesystem semantics, so you'd kind of expect them to need filesystem support on the local machine.)
Now, want to know something interesting? We already have this. Any inherently copy-on-write filesystem, like APFS or btrfs, is actually an object store masquerading as a filesystem. You get filesystem semantics, but they're layered on on top of object-storage semantics, and it's more efficient when you strip them away and use the object storage semantics directly (like when using btrfs send/receive, or when telling APFS to directly clone a container.) And these filesystems also have exactly what I mentioned above: special syscalls (or in this case, file attributes) to allocate scratch buffers that bypass Copy-on-Write, for things like databases.
There's no reason that a modern ground-up OS (e.g. Google's Fuchsia) would need to use a filesystem rather than an object store. A constructive proof's already there that it can be done, just obscured a bit behind a need for legacy compatibility; a need that wouldn't be there in a ground-up OS design.
(Or, you can take as a constructive proof any "cloud native" unikernel design that just uses IaaS object/KV/tuple/document-storage service requests as its "syscalls", and has no local persistent storage whatsoever, never even bothering to understand block devices attached to it by its hypervisor.)
> "Files" are a very specific abstraction; thinking they're the only way to transfer chunks of data around is a symptom of lack of imagination.
I didn't say they were the only way to transfer chunks of data around. I said they were the units of ownership of data. If your data is somewhere in a huge RDBMS mixed together with lots of other people's data, you don't own it, because you don't control its fate; whoever owns and manages the RDBMS does. The same goes for all the other object control and storage systems you mention: individual people who have personal data don't own any of those things.
This is true. Way back when, I was teaching a commercial course on C to a class which included a COBOL programmer. At the end of one session, he came up to me and asked, "But what is this 'memory' stuff, and why would I ever want to use it?"
That's a cool idea! What resonates most with me is starting somewhere in the middle of the ladder of abstraction, low enough that a lot of the fundamental CS concepts are visible, but high enough that beginners can make something real, now. I like that networking would be built in from the start.
I'd also want to teach beginners in a way that lets them keep building and tinkering outside the environment I provide for them. This is where I'd love to see schools providing and encouraging the use of hosting space and resources like Minio, so that students can just count on them being available.
Consider: unikernels (like, say, any old cartridge game ROM) don't have any need for files. They have precisely three abstractions they deal with data in terms of:
• a .data section in ROM (maybe needing bank-switching to get in place);
• some kind of byte-addressable NVRAM (like battery-backed "save RAM", or CMOS memory) either bus-mapped, or through MMIO.
• tape or (floppy) disk, sometimes at an extremely low level (send commands to the drive motor, write guard nybbles, etc.), sometimes through a DOS where you can just request to seek to a given track then read or write a given sector on that track. Either way, more like a block device than a filesystem.
---
For today's kids, I'd suggest: don't start with files. Teach key-value storage first. Interact with a library like LMDB without explaining where it's putting the data. They'll understand this just fine.
Then, teach object storage in terms of key-value storage. Object storage—especially once you add object versioning—is much closer to the modern metaphor that user-facing apps expose. You compose a complete new version of an object in an in-memory scratch buffer, and then it atomically replaces the previous object. You can't corrupt an object by half-saving it. Etc. Again, don't bother explaining how this works yet; just give them a scripting runtime hooked up to a Minio instance.
After they get that, you can ask them what they'd do if they needed to create an object that wouldn't fit in memory. Then you can explain block devices, as a "place where large scratch buffers can live"—but don't force them to figure out how to allocate those buffers from the block device! That's gonna pull in a whole bunch of prerequisite teaching. Instead, pull out another API: Linux's LVM. Logical volume management takes block devices in, and spits block devices out. The logical volumes are the scratch buffers. Explain mmap(2), and how these buffers end up a lot slower that memory buffers. Explain how these buffers survive a disk crash.
After you get to that point, then you can explain that all the other ways computers durably store data are built on top of these logical-volume durable scratch buffers. You can explain how LMDB works in terms of disk pages; and then you can explain how a content-addressable storage works in terms of combining an LMDB-like KV store with hashing and splitting.
And, after that—if you like—you can explain that sometimes, when we need something that's like object storage but where everything in the storage bucket is actually a tiny scratch buffer, we use filesystems. You can explain what an "extent" is by talking about how something like LMDB, that has a "freelist" of pages from its underlying logical-volume, can reserve a contiguous set of those pages and then let something else access them. Then you can explain a filesystem as a key-value store that has buckets called "inodes" with page-range keys, and extent-address values. (And an associated versioned object store of directory-objects, where each object is a serialized list of records (dirents), each containing a reference to an inode and giving it a name and other stuff.)
Filesystems are hard!