"why maintain two logical address spaces? Why not have the file system directly ...

AnthonyMouse · on Feb 27, 2015

> Because managing flash is hard. For this to work, the amount of exceptions in the filesystem it's self would make it unmanageable.

The obvious way to implement this is to expose the raw flash to the OS and put an abstraction layer into the OS that does what the drive firmware currently does so it can be used with existing filesystems. Just that would have its own benefits because it would remove the black box from around that code and let people improve it. It would also make the drives less complicated/expensive and remove the burden from smaller manufacturers to maintain solid firmware (which they've often failed to do), and make it a lot easier and more reliable to secure erase drives.

Once you have that you can start looking at improving particular filesystems by taking advantage of the additional information now available to the OS.

restalis · on Feb 27, 2015

"The obvious way to implement this is to expose the raw flash to the OS and put an abstraction layer into the OS that does what the drive firmware currently does so it can be used with existing filesystems."

It may or may not work better than what we have now. Doing so would require a lot more knowledge for the developers involved. Actually "developer" is a fancy term for a job that could not be called "engineering" for being almost completely insulated from that class of problems that an engineering job has to consider as a norm. Expose the raw flash (or other raw access to electrical components or what not) and you get yourself with real engineering problems on your hand. No more (software) "development", but engineering! And engineering is hard.

And there's one more thing - if we think the current state of software fragmentation is bad, wait until when this unified physical computing interface (that the firmwares more or less adhere to and which we're taking for granted today) is taken out.

AnthonyMouse · on March 7, 2015

> And there's one more thing - if we think the current state of software fragmentation is bad, wait until when this unified physical computing interface (that the firmwares more or less adhere to and which we're taking for granted today) is taken out.

It isn't about removing abstraction layers, only moving them. It should be part of the OS rather than the hardware. The people who write the abstraction layer have to deal with hard problems, but they do that now. What does it matter if they work for Microsoft and RedHat instead of Samsung and Intel?

The point is to publish and standardize how that abstraction layer works so the people who write filesystems and filesystem tools have better information and can suggest or provide improvements. And to stop forcing every SDD manufacturer to duplicate the software engineering efforts of the others instead of focusing on hardware.

stupidcar · on Feb 26, 2015

It seems like the article maybe misrepresents what the thesis that inspired it says[1]. From the abstract:

"... the device still has its freedom to control block-allocation decisions, enabling it to execute critical tasks such as garbage collection and wear leveling ... Next, we present Nameless Writes, a new device interface that removes the need for indirection in flash-based SSDs. Nameless writes allow the device to choose the location of a write; only then is the client informed of the name (i.e., address) where the block now resides."

So it sounds like this approach is actually removing responsibility from the filesystem, not the firmware.

[1] http://research.cs.wisc.edu/adsl/Publications/yiying-thesis1...

hvidgaard · on Feb 27, 2015

I have now read the thesis lightly, and I cannot for the life of my children, find out exactly WHY you want to save the simplest of abstractions on the device, namely the logical to physical translation. It's a well known abstraction, and the benefits are clear as day - the alternative, with migration callback and now two different kinds of data that you must write differently just screams bad idea, and goes against everything I've learned in CS and in my carreer. It introduces unneeded complexity for the OS. The main benefit seems to be lower cost devices (1GiB ram per 1TiB), which is neglectable. The performance isn't really better and the thesis doesn't exactly go into detail of the CPU overhead of this implementation during heavy IO, and we now face an entirely different problem with crashes. Today we can build SSDs to ensure that confirmed writes are guranteed persisted.

This abstractions is what have allowed us to transistion from a traditional spinning data store to SSD without much effort (save for delete flags to help the device performance GC and improve performance).

Peaker · on Feb 26, 2015

Nameless writes save an indirection layer -- but quite a cheap one!

They add significant latency to write operations, though, which is a high price to pay for such a small gain.

Gurkenmaster · on Feb 27, 2015

SSDs are so fast let's make them slow again!

derefr · on Feb 26, 2015

Indeed; it sounds like it's creating a clean abstraction layer for the kinds of time-space guarantees flash memory wants to give, where what we have now is very muddled due to being based on the kinds of time-space guarantees spinning media (or even tape drives) wanted to give.

TD-Linux · on Feb 26, 2015

The Linux kernel already supports directly managing Flash without a controller. The JFFS2 filesystem is designed to run directly on top of the Flash device. This is used in most routers and other small Linux devices in order to keep costs down.

osandov · on Feb 27, 2015

There's also f2fs [1][2], which has a similar design but runs on flash devices with an FTL and thus takes a more middle road. It still has a log structure and tries to make things easy for the FTL by doing large, sequential, out-of-place writes whenever possible, but it takes advantage of the FTL when it makes things simpler, like for certain metadata that's easier to update as a small random write.

[1]: http://lwn.net/Articles/518988/ [2]: http://lwn.net/Articles/518718/

polar · on Feb 26, 2015

Does YAFFS follow the same principle?

ansible · on Feb 26, 2015

Yes.

It is interesting to note the design trend in mobile devices like phones.

Years ago, it was typical to have a NAND flash controller built into the SoC (like a TI OMAP, Freescale i.MX series or similar).

This is raw Flash memory, and it is up to the SoC plus OS to manage error recovery, remapping bad sectors, etc.

However, in recent years, most mobile devices just use one of their SD interfaces (often 8-bit these days) to access an eMMC chip. This looks just like a SD card, because it has a FTL in it which takes care of a lot of the low level details needed for Flash management.

Some SoCs these days don't even include a NAND Flash controller anymore.

Wicher · on Feb 27, 2015

Exactly!

The actual memory contents behind the controller are dependent on physical characteristics. Using the FTL firmware, the flash+controller (such as eMMC) vendor is free to do all kinds of tricks, for instance, depending on the quality of a particular batch of NAND. Bad batch? Use more of the spare for error correcting code. Particular memory pattern that generates interference? Tweak the scrambler. Slow? Interleave between a couple of NAND chips. (These examples are hypothetical)

Tying filesystems to the physical layer makes no sense. It would mean I wouldn't be able to use 'dd' to copy a partition to some other device, since it would have different physical characteristics which the filesystem would need to take into account. It would mean that I wouldn't be able to take an iSCSI volume and write it out to disk to 'de-virtualize' virtualized storage.

rasz_pl · on Feb 26, 2015

>Then you have the flash that lives on a device like your phone, however the firmware of the SoC is tweaked to run

nowadays all of the things (ha) use flash behind higher level of abstraction, be it emmc, UFS, or SD controller. There are no phone soc firmware tweaks.

stcredzero · on Feb 26, 2015

At its heart, flash/hdd is just a key:value database (presented with 4k value size) why would we want to complicate that

To avoid an inherent bottleneck?. Perhaps we'd be better served by a larger number of key:value stores with greater parallelism?

wmf · on Feb 26, 2015

SSDs are already parallel and I don't think updating the single FTL mapping table is a bottleneck.

gcb0 · on Feb 26, 2015

same was told about cylinders and sectors a couple decades ago.

to use your example: imagine if you could use the fusion-io FS on the intel flash.

also remember that several SSD companies that even had their own flash fab are now gone because their firmware was crap.