I would recommend anybody interested in filesystems to watch Jeff Bonwick (ZFS inventor) explain the design of ZFS: https://www.youtube.com/watch?v=NRoUC9P1PmA. They have a few very nice war stories explaining why they found it useful to have the user data checksummed as well.
I really like the hack of a block pointers being a data structure and containing the birth time (tx id I guess) and how that avoids needing managing bitmaps.
The advantage of doing send/receive vs rsync was also a nice explanation.
It is not often that you see a technology and think "Oh, this is great stuff", this is like that with ZFS. Haven't played much with it. But get that idea just from learning about it so far.
Both Jeff and Bill are great at communicating and explaining the technology. Like how they tag team, with minor funny bits here and there.
Regarding the main issue here, checksums -- yeah I don't see how Apple engineers could have watched this and said "Meh, don't need data checksums". Maybe they do have a secret vault with magic new holographic storage, immune to cosmic rays and other vagaries of physics, who knows.
I might be wrong, but I read APFL's no-checksum decision as founded upon Apple's hardware-and-software strategy. Given their situation, they can decide that checksumming is a hardware problem. They can build their latest devices with ECC, and require customers to only use Apple-developed USB keys. (I would not be shocked to learn that, in late 2017, Apple starts selling USB-C storage keys preformatted as APFS.)
I think it is a decision based on Apples image. They can only lose when enabling data checksums.
The user would have to be informed, since Apples hardware has no storage redundancy. That will be perceived as an admission of failure on Apples part; your $2k device just murdered your data.
There is not really a UX flow apart from telling the user to recover the file from backup.
Technically it is the dominant strategy to have data checksums. - Even if you assume the Hardware is very awesomely perfect.
>There is not really a UX flow apart from telling the user to recover the file from backup.
I agree with all your points, but just want to offer a possible UX solution:
Since the vast majority of data on people's hard drives are video and images, where minor data corruption results in (in most cases) just visual artifacts, we could have a pop-up dialogue that says: "A higher quality version of this file is found on your backup. Do you want to restore it?" when the corrupted file is a video or image and there's a confirmed backup of it.
If there's no confirmed backup then just silently ignore the corruption since the user won't notice anyways.
> There is not really a UX flow apart from telling the user to recover the file from backup.
There absolutely is, since that could be performed automatically. Inform the user about the corruption, rename the corrupted file, then restore the most recent backup in its place.
That's assuming that there's backup and OS knows about it. Huge majority of people don't have any backups or their backups are not automatic, so OS can't restore the file.
To the extent that this is true in the modern era of cloud services, it's irresponsibly abetted by so many programs silently ignoring errors rather than reporting them.
In Apple's case, they have complete control over iCloud and could offer this easily for any file which is stored there. They could also add some sort of metadata API so services like CrashPlan, Backblaze, etc. could register the presence of other copies in a generic manner. Third party services could also integrate background scrubs into their existing application.
In each case, the first time that dialog appeared you'd likely have a customer for life from anyone who's gone through the hassle of losing a personal memory, important document, etc. or make a panicked search for other/older copies.
If there isn't a backup, it should at least notify the user of the error. But the OS would know about backups, since it has a built-in backup system.
Also, I wouldn't be surprised if Apple adds a cloud-based backup for macOS once APFS is the default filesystem, since change sets would be extremely efficient.
Checksumming at that level is a bit pointless, because then you can't repair the data. Instead of being able to recover from a mirror or parity data, all you get is "it's corrupt, oh well".
> Instead of being able to recover from a mirror or parity data, all you get is "it's corrupt, oh well".
That's exactly what you want – a clean failure which prevents other software from silently propagating corrupt data. Even more importantly, with corruption it's usually much easier to recover from another copy if you notice the problem quickly.
Think about what happens with e.g. photos – you copy files from a phone, USB card, etc. and something gets corrupted on local storage. With integrity checks that means that the next time you load your photo app, go to upload photos somewhere, run a backup, etc. the filesystem you immediately get an unavoidable error telling you exactly what happens. If it's recent, you can go back to the source phone/camera and copy the file back.
With all current consumer filesystems, what you instead get are things like a file which looks okay at first – maybe because it's just displaying an embedded JPEG thumbnail, maybe because software like Photoshop is enormously fault-tolerant – but doesn't work when you use it in a different app or, because you didn't get an error, the underlying hardware file affecting your storage becomes worse over time until something fails so badly that it becomes obvious. By the time you notice the problem, the original file is long gone because you reused the device storage and you have to check every backup, online service, etc. to find which copies are corrupt.
Checksumming at the file system level solves the problem of corruption that occurs off the media (on the bus). The media has checksumming on it that allows detection and recovery from errors that occur on the media itself.
The question I'm sure Apple engineers have were - How often do we see BitRot occurring off media, and is the media that we're deploying sufficiently resistant to bit-rot?
And, with APFS's flexible structure, this is a feature that can be added at a later time. Probably made sense to deliver in 2017 something that was rock solid that they could build on, than to either (A) push out the delivery date, or (B) not fully bake all features of the file system.
Others have pointed out the potential HW issues, but I implemented something similar in a product that stored data using its own disk format, and found that adding checksums to all data written to disk yielded a number of cases where what we though was HW failures were actually SW failures. AKA really, really, really obscure bugs that only happened under obscure conditions (think the equivalent of fsck bugs checking the filesystem after power loss for one example, the journal needed to be in exactly the right state to trigger the recovery bug).
I have no idea if Apple flash storage devices do that (or will), but Hamming (7,4) can correct single-bit errors. Most ECC devices are SECDED (single-error correction, double-error detection).
Failing to replicate corrupt data is the point. By delivering the corrupt data to user space, it proliferates into backups. And since it's common practice, even if suboptimal, to conflate archive and backup and rotate those media, it's just a matter of time before all backups have a corrupt copy of the file.
You may be interested in http://daringfireball.net/thetalkshow/2016/06/17/ep-158 where Craig Federighi discusses how Apple implemented wear-your-Apple-watch-to-unlock functionality for their Macs; as it turns out, they
make calculations based on how fast the Bluetooth responses are to judge the distance between the laptop and the watch — that way, a snoop can't unlock your laptop if you're merely in the room. It's nice to know that Federighi, at least, is involved in the nitty-gritty even though these aren't reflected in the keynotes.
Fun fact: Dominic Giampalo (who wrote the BeOS file system) is on the APFS team. His book "Practical File System Design" is an excellent description of a traditional UNIX file system design. May be out of print now but I think used copies turn up on Amazon.
> Fun fact: Dominic Giampalo (who wrote the BeOS file system) is on the APFS team. His book "Practical File System Design" is an excellent description of a traditional UNIX file system design. May be out of print now but I think used copies turn up on Amazon.
Fun fact when it's in the article? Third paragraph:
"APFS, the Apple File System, was itself started in 2014 with Giampaolo as its lead engineer...(he built the file system in BeOS, unfairly relegated to obscurity when Apple opted to purchase NeXTSTEP instead)"
More than a fun fact. BeOS FS is said to be extremely fast and capable (by the view of its time). The cpp API was clean enough so that a newb me could appreciate the possibilities.
The metadata use across applications was great and exactly what others tried to mainstream (WinFS et al). You could search quickly for media files, emails, etc. and save the queries for fast access later on. mbox vs maildir, etc.
Edit: And this wasn't slow and of considerable overhead like what we have right now with indexers for OS-wide search.
Just wanted to second this: using BeOS for email, Usenet, etc. where each message was in a single file had an experience which was faster and more capable than email client shipped before Gmail several years in. For things like smart folders, Apple Mail or Outlook on a modern SSD-equipped system has more noticeable UI lag than BFS delivered on a mid-range Pentium.
The really cool part is that it was also portable – there were multiple email which seamlessly interoperated because they were both just querying BFS, which made it faster for developers to experiment with alternative UIs since you didn't have to reinvent or tune a lot of core functionality.
Unfortunately our industry's mainstream systems for the most part have badly done half-hearted implementations of past innovations. What's most disheartening is that for example stuff like low-latency audio isn't reliably available on mainstream platforms. There's always some dance you have to go through if you need it. On one platform the dance takes longer than on others, but the fact remains that even this was no problem on BeOS. I'm not saying BeOS was without faults, but it was a general purpose system and excelled in some regards which hasn't been available in the same quality since, unless you're willing to restrict yourself to certain hardware or API restrictions. The saddest part is that the BeOS IP has since been tucked away inside that Japanese company (Access Ltd?) and is lost to the world. I've tried Haiku but it's not the same. Just imagine if BeOS had been open sourced under an acceptable license back when it actually had good hardware support for its time.
All this is unfortunate yet very obvious result of free market and capitalism. Wherever money goes, technology goes. Android is about to bring back low latency audio. Just like PHP, the whole system is optimized around optimization.
It's up to the people who cares to organize and assemble teams etc etc .. but only rarely it goes somewhere I believe.
I still hope for a sane cpu, sane gpu, sane dsp, all very very open so that a solid software can be written without too much reversing and friction (I saw that tagged pointers architectures were back in research on risc-V so who knows).
I remember feeling physically sad rewatching BeBox (with SMP). The low latency UI while processing 3D and video, all this dynamically... even now it's still not a given on any system. Except maybe macs ?
Agreed - my P90 could seamlessly transfer video from a camera, build Mozilla from source, and surf the web while never missing a frame on music playback. Even now that's not a given, although SSDs have made it at lot less common than it used to be with iTunes.
Speaking of SMP, it's also sad that whenever an x86 or ARM or whatever CPU with more than 4 cores is announced, what's the first thing tech media and most people say? "Who can make use of more than 4 cores anyway?" or "There aren't any programs that can take advantage of more than 2 cores.".
I've never understood how people seem to ignore the fact that you can run more than one CPU intensive application at the same time or even have breathing room for light applications while most cores are busy heavy processing.
I can easily saturate all the cores you throw at me with the usual tasks I'm busy with at a computer, so give me more cores, more memory. And I run highly parallel tasks like image or video processing or compiling code.
On latest Linux kernel from SSD to USB or USB to USB with it's easy to lock up the copying application or even whole machine while data is flushed. It's less of an issue with deadline or bfq i/o schedulers, but it's still there. Now, we copy around at least gigabytes these days, but still. When I was using a Mac, the first thing I always did was to completely disable Spotlight (including all indexing).
This WWDC was the first time I saw or heard of him, then I went to look up more about what he did because he seemed such a genuinely nice, warm person.
APFS is slated for a 2017 release, yet development started as recently as 2014. By comparison, development on Btrfs started all the way back in 2007, yet many still consider it to be unsuitable for widespread deployment, particularly in mission-critical settings.
If Apple can actually pull off this turnover so quickly, does that suggest the complaints about Apple's declining software engineering quality were overblown?
Edit: Ted T'so in this talk(1) (at the 8 min mark) discusses the taskforce that birthed Ext4 and Btrfs and its estimate (based on Sun's experience with ZFS, Digital's with AdvFS) of 5-7 years at a minimum for a new file system to be enterprise ready--an estimate which definitely proved optimistic with regard to Btrfs. Will APFS be different?
While APFS' design mirrors that of current generation file systems such as ZFS and Btrfs, its scope is much reduced. It does not target large amounts of storage or disk arrays (such as RAIDZ and the normal RAID family) with automatic error correction, it is not required to perform competitively in server scenarios (especially for traditional databases, where COW file systems tend to perform poorly), and it can completely disregard rotating hard drive performance.
Snapshots are read-only, unlike Btrfs (but like Linux' LVM, which has been around for quite some time), and we won't even delve into online rebalancing and device addition.
APFS is a very welcome addition given that it brings Apple file systems into the XXIst century, but the reason for its speedy implementation is that they purposefully constrained its goals — a good call, in my opinion.
If you have the same workloads as Apple's customers and large amounts of SSD storage, you'll find Btrfs and ZFS both fit your needs on Linux.
> an estimate which definitely proved optimistic with regard to Btrfs.
SUSE has had Btrfs support for enterprises since SLE11SP2[1,2] (2012). And it's been the default for the root partition since SLE12[3] (2014). So it wasn't overly optimistic at all, it was actually a very accurate estimate. The same support apples for openSUSE, but I think they had it for longer.
Do you have any good source to point people at when they say btrfs is unstable and corrupts data? (and I mean, either positive or negative data, I don't want to be biased either way) I'm kind of tired of people posting comments like that, which are based on "google is full of people saying this", or "I know one person who lost data" (but no idea which kernel was it - could well be from 2014).
If I look at btrfs patches on lkml, on one hand side I can see some fixes for data loss, but on the other they're usually close to "if you change the superblock while log is zeroed and new superblock is already committed and there's a power loss exactly at the point, you'll get corruption" - which are just really obscure edge cases people are unlikely to ever hit.
So what can I look at to get a realistic picture of what's going on? (what would SUSE point me at)
(for the negative results, I know of the recent filesystem fuzzing presentation where btrfs comes out worst, but honestly I don't consider it interesting for real world usage - car analogy, I'm interested how the car behaves on a typical road, not which fizzy drinks added to the gas tank will break it)
As for unstable and corrupts data, this is just not true. Many more users using it on stable hardware don't have problems. I've used Btrfs single, raid0, raid10 and raid1 for years, and haven't had corruptions at all ever that I didn't myself induce.
I have stumbled upon, just days ago, parity corruptions in raid5 however. The raid56 stuff is much much newer and hasn't been as well tested, and has been regarded as definitely not ready for prime time. So that's a bug, and it'll get fixed.
Bunches of problems happen on mdadm and LVM raid also due to drive SCT ERC being unset or unsupported, resulting in bad sector recovery times that are longer than the kernel will tolerate. That results in SATA link resets, rather than the fs being informed what sector is bad so it can recover from a copy and fix up the bad sector.
So there are bugs all over the place, it's just the way it goes and things are getting quite a bit better.
It is totally true that Apple can produce their own hardware that doesn't do things like lie to the fs about FUA or its equivalent of req_flush being complete when it's not, or otherwise violating the order of writes the fs is expecting in order to be crash tolerant. But we're kinda seeing Apple go back to the old Mac days where you bought only Apple memory and drives, 3rd party stuff just didn't happen then. The memory is now soldered on the board and it looks like the next generation of NVMe and storage technologies may be the same.
Windows and Linux will by necessity have file systems that are more general purpose than Apple's.
>If Apple can actually pull off this turnover so quickly, does that suggest the complaints about Apple's declining software engineering quality were overblown?
The complaints are usually about minor parts of the OSX/iOS stack -- parts Apple might not even particularly prioritize.
A faulty filesystem on the other hand is something entirely else altogether and something they can't ship unless it's good.
I'm inclined to agree that 3 years to squeeze out a perfect file system is ambitious to say the best. That said, consider the robustness of HFS+; APFS is blessed with desperate, captive users.
Besides being old, there's nothing much to the "robustness of HFS+". It's not like users are losing data left and right, as its being painted. In fact it's an FS running just fine on about a billion devices...
Engineers who have worked on HFS+ believe that it actually is losing data left and right. I'm inclined to believe that they're right and that most people simply don't notice.
I used to work on the largest HFS+ installation in the world; and we saw data corruption all the time. Mostly large media files with block sized sequences of zero'd data; we were lucky in that we had non-HFS+ file systems backing us up, but deeply unlucky in that the nature of our media was such that random blocks were very much more likely to cause media-level problems rather than container-level ones, and thus were much harder to catch.
It doesn't have checksumming yet. I have a feeling it will be added before the final release since data integrity is one of the tenets they're pushing.
And the conclusion of the author itself --in the end of the post-- was that it was due to bad hardware, not HSF+.
To quote:
>I understand the corruptions were caused by hardware issues. My complain is that the lack of checksums in HFS+ makes it a silent error when a corrupted file is accessed.
This not an issue specific to HFS+. Most filesystems do not include checksums either. Sadly…
A bad filesystem would have corrupt metadata. Plain old corrupt data is the fault of the storage media, which does have its own error correction. Clearly it wasn't good enough, or the path from the HD back to the system can't report I/O errors.
BTW he didn't lose any data, since he had backups. If he had a checksummed filesystem, but not backups, he would still have lost data. Checksums, like RAID, aren't backups!
I know. And I am not happy about that. But I'm not sure if the issue really was bit rot or bugs in HFS+. I haven't had a lot of issues in years so I lean towards the latter.
>> does that suggest the complaints about Apple's declining software engineering quality were overblown
The type of developer you would have working on a filesystem are likely going to be from a different world than those who work on UI applications. Speaking about their apps, the quality problems Apple faces are usually design-based rather than functionality. When working on a filesystem, you're not going to be forced into developing a crappy application by some designer who ruins the entire application.
What I found most interesting about the review is that Apple chose not to implement file data checksumming for the reason that the underlying hardware is very 'safe' and already employs ECC anyway.
Which is silly, and fails to isolate problems when/where they happen. Pretty much every significant level should have its own checksum, in many cases hopefully an ECC of some form. Hardware has bugs/failures as does software. What is particularly evil is when they both collide in a manner which causes silent/undetected corruption of important data for long periods of time.
That's not the only reason, though. There's other factors going into that decision that make it totally rational:
APFS isn't designed as a server file system. It's meant for laptops, desktop and most importantly (to Apple) mobile devices. Note that most of the devices are battery powered. That means "redundant" error checking by the FS is a meaningful waste.
That's not to say they might not add error checking capability in the future, but it makes total sense to prioritize other things when this file system is mostly going to be used on battery powered clients basically never on servers.
Actually the reason for it is that lower layers already do checksuming and generally at that layer you don't get scrambled packets. You only lose packets which happens when there's congestion.
Alternately, just look at "netstat -s" for any machine on the Internet talking to a bunch of others. Here's the score for the main web host of the Internet Archive Wayback Machine:
3088864840 segments received
2401058 bad segments received.
One of the key innovations in ZFS is storing checksums in block pointers which is something that cannot be done efficiently outside the file system. Storing checksums elsewhere is far more complex and expensive.
It tells you that your file is corrupted. You can then restore from backups, re-download, or take some other corrective action, such as delete the file, reboot the machine, re-install the operating system, or play Quake 2 to test your RAM and graphics.
Never underestimate the value of a reason to play Quake 2.
The average user might have no redundancy, but they still ought to have a backup. Checksum failure tells them they need to restore.
At the very least, a checksum failure might tell them (or the tech they're consulting) that they have a data problem, rather than, say, an application compatibility problem.
"Why is my machine crashing?" "Well, somelib.so is reporting checksum failures" is a much better experience then "weird, this machine used to be great but now it crashes all the time"
Today you can verify backups on OS X with "tmutil verifychecksums", at least on 10.11. The UI to this could be improved, but user data checksums don't necessarily need to be a filesystem feature. On a single-disk device, the FS doesn't have enough information to do anything useful about corrupt files anyway.
> On a single-disk device, the FS doesn't have enough information to do anything useful about corrupt files anyway.
Some filesystems can be configured to keep two or more copies of certain filesystem/directory/etc. contents. Two copies is enough information to do something useful.
Well, Apple is moving in the direction of syncing everything with iCloud - iCloud Drive has been around for a while, and Sierra adds the ability to sync the desktop and Documents folder; of course on top of long-existing things like photo sync. If the file was previously uploaded to iCloud, there is redundancy, and you definitely don't want to overwrite it with the corrupted version.
How big an issue this is in practice I don't know.
The author is apparently proud of the fact that they have “literally never seen or heard of [the OS X document revision system] until researching this post”. The dismissive tone of the entire article is hard to stomach. Ars has really gone downhill since the departure of John Siracusa.
The author (me) isn't dismissive; he's skeptical. Siracusa did a great job of upbeat, critical, accurate reviews. Conversely, the coverage of APFS has been inane. A video, a PDF, and some high level docs were shuffled and reworded; that's not a review. Tech journalists live in a tough world where the details are obscure and hard to evaluate with deadlines that are becoming ever shorter and a reward system that values views over depth. My skepticism was meant to counterbalance the fluffy, fawning tone of what I had seen.
I stood around with Giampaolo, Tamura, and other Apple folks. I had a significant leg up on journalists; I know the subject well and the Apple folks know and respect me based on technology I've worked on. (And they didn't kick me out when I revealed that I didn't actually have a conference pass). I knew (some of) the questions to ask, and they were remarkably open with their answers.
John Siracusa's departure was after Ars already had gone downhill significantly. Also he was not an editor or regular contributor outside of his OS X reviews, so his departure doesn't have much effect on the day to day. It's more of a symptom rather than a cause.
The cause is the change in editorial standards for the site which has been decreasing for years, and so every year is at a new all time low. Not to mention the really invasive ads and even sponsored content.
I would trace the beginning of the decline of Ars Technica back to when Jon Stokes sold it to Conde Nast. But Siracusa's departure was another nail in the coffin.
People have been complaining about the editorial standards continuously since at least 2000, but that doesn't magically lend more weight than any other subjective assessment unsupported by evidence.
As for advertising, how do you expect them to hire writers and editors or run a high-traffic website without advertising? They've offered paid subscriptions for years and subscribers don't see ads at all but not enough people have taken them up on it.
Give me a break, I am not complaining about advertising period. Ars has always had advertising and it used to be fine. But over time the ads have gotten more and more intrusive. There's more and more auto play ads with sound, and ads that take over part of the screen, etc.
The lack of subscriptions are probably a symptom of the editorial quality going down. You can say there's no evidence of it, but when lots of people complain about it that is evidence of it. I was a long time Ars reader and the b.s. finally got too thick that I have stopped going to the site regularly in the last year.
It used to be a site for intelligent, balanced articles about tech. Now it's got shills like DrPizza who basically just reprint whatever Microsoft's PR department emails him.
One of the (many) things that struck me in the article was his dismissive remarks regarding copy on write within a filesystem. That seems like a fantastically useful feature for development, etc, where you are almost always building, then copying the artifacts into the deploy/test directory. Avoiding the disk IO in those situations seems like a pretty sound win to me in terms of giving a performance boost for free.
COW is great. It has applicability around on-disk consisteny, and it makes snapshots much more coherent. It's also not without downsides such as fragmentation--though that's far far less important on SSDs than HDDs. Apple's fast file copy mechanism also uses COW. I would suggest that it's an instance of listening to what your customers say rather than providing them what they need. They say file copies are too slow, but what they really want is a way to track versions, retain old ones, etc. It would be cool if Apple built on their existing approach with regard to file versioning.
Yeah, I know ZFS is CoW, I use snapshots all the time when building jails/containers precisely for that reason. I was just struck by a few paragraphs from the article that really stuck out:
> With APFS, if you copy a file within the same file system (or possibly the same container; more on this later), no data is actually duplicated. Instead, a constant amount of metadata is updated and the on-disk data is shared. Changes to either copy cause new space to be allocated (this is called "copy on write," or COW). btrfs also supports this and calls the feature "reflinks"—link by reference.
> I haven't seen this offered in other file systems (btrfs excepted), and it clearly makes for a good demo, but it got me wondering about the use case. Copying files between devices (e.g. to a USB stick for sharing) still takes time proportional to the amount of data copied of course. Why would I want to copy a file locally? The common case I could think of is the layman's version control: "thesis," "thesis-backup," "thesis-old," "thesis-saving because I'm making edits while drunk."
CoW is one of those features that is superficially questionable until you start noticing the bits and pieces of workflow it really makes faster and easier. Given the keynote was towards an audience of developers, I'm really surprised that there wasn't a demo showing how much faster deploys, etc are with such tech.
He is dismissive of file-level CoW. ZFS will also create new blocks if you cp a file, only dedup will remedy that. Also APFS only seems to do file-level CoW with a special syscall.
Note he was also not even aware of support for the same feature in btrfs until I pointed it out to him in an earlier HN discussion of his original blog post -- the version on Ars already pretends otherwise, without being marked as a fixup (though I am ready to blame this on Ars fully; his blog version includes an UPDATE tag and has better flow).
> APFS addresses this with I/O QoS (quality of service) to prioritize accesses that are immediately visible to the user over background activity that doesn't have the same time-constraints. This is inarguably a benefit to users and a sophisticated file system capability.
Could someone clear up how this is can be determined on the filesystem rather than scheduler level (I suspect it cannot be, or the article is making bogus claims)?
We can speculate on the management decision, but from an engineering point of view Dominic Giampaolo said that he didn't dive into btrfs, ZFS, or HAMMER in order to not get potentially bad influence. At least that's how I read his answer as cited by Adam. It's interesting that Larry (Oracle CEO) being a good friend of the late Steve Jobs didn't result in finding an acceptable ZFS licensing agreement for Apple. I mean, they've incorporated DTrace (into the kernel of all things), so why not ZFS as well?
Another possible interpretation of "to avoid becoming tainted" is that it refers to avoiding "residual knowledge" that in some circumstances can be used to make a copyright claim. E.g. if someone reads the ZFS source code to see how they solved a particular problem and then goes and solves it the same way that residual knowledge might cause legal problems.
That being said IANAL and have no idea whether residual knowledge has been explored w.r.t. the GPL or CDDL.
> from an engineering point of view Dominic Giampaolo said that he didn't dive into btrfs, ZFS, or HAMMER in order to not get potentially bad influence.
I'm not sure I quite understand this approach. It's sounds like a pure NIH kind of method (which is admittedly common for Apple). I.e. of course one can always try to reinvent the wheel, but why is it bad first to analyze what already exists and to evaluate good / bad sides of that? Or his approach is simply always to make everything from scratch and not to look at anything else?
>why is it bad first to analyze what already exists and to evaluate good / bad sides of that?
It's a legal defense strategy.
ZFS is covered by multiple patents.
If someone who have never read about any of ZFS's designs and patents independently reproduces one of ZFS's patented features, then the courts could rule that that was non-infringement.
That's why the author said of Giampaolo: "...but didn't delve too deeply for fear, he said, of tainting himself". Reading too much about a patented product effectively taints yourself from being able to freely create your own designs.
Independent reinvention is not in general a defense to a claim of patent infringement. Is there some fairly specific set of circumstances you're referring to here?
I understand what you're saying, but I also find it funny that Apple would try to avoid patents. If you're saying that Apple is happy to patent stuff but not so much when it comes to using patents by others, maybe you're right. I wish filesystems were more like media codecs in that Apple had no choice but to support an industry standard.
I don't believe Dominic and the team were in NIH-mode. There are most likely other factors which due to Apple's secrecy, and this not being a public project like Linux or Firefox, we may or may not learn. Though, with the information we have right now, it's easy to come to your conclusion.
How so? If anything, ignoring ZFS's cache subsystem, a filesystem is another module in a pluggable system if many fileystems. DTrace on the other hand to be most effective requires adding probes all over the kernel.
You're forgetting the weight of data and user practice for a mainstream project. Consider how those two scenarios would unfold:
Apple makes a mistake in 10.11.6 and ships a kernel without DTrace. Everything runs but a few nerds notice and file bug reports. 10.11.7 quietly ships and all is right with the world.
In contrast, shipping a new filesystem is enormously invasive: billions of hours are spent globally rewriting EVERY storage pool in existence. Some percentage of those will fail due to hardware problems or corruption which happened years ago (or even somewhere else) but was previously unnoticed, flooding support and the news with dire predictions. Every crash or performance issue noticed for the next year will probably be vocally blamed on the new filesystem, even if there are clear signs pointing elsewhere.
You don't want to deal with that any more than you have to.
Apple will _not_ remove the existing filesystems, and neither will they automatically migrate an existing HFS volume to APFS without first asking, especially not during an OS update. There are many filesystems in the kernel, and compared to DTrace probes, they are very minimally invasive. Adding DTrace probes means touching all kinds of places in the kernel, so it's definitely more invasive.
Again, look at it from the perspective of a user or system administrator rather than a kernel developer. Yes, someone has to remove a bunch of easily searched for and removed code and rerun their test suite but that's a handful of people for a very small amount of time compared to the number of people involved with something which will make a major change on hundreds of millions of running systems. More critically, the development work is completely under Apple's control and shipping can be delayed as long as it takes to complete testing but once they ship a filesystem it will very quickly reach the point where it will be in the wild for years even if they immediately ship a new release which takes a different direction.
The question isn't whether Apple will force this on users soon or without notice but rather the observation that they're going to be very careful not to do this more often than actually necessary. That's why despite having implemented and shipped ZFS in the 10.5 beta series they removed it prior to release. Sure, it might have gone okay but if that changed later there would be no easy way to go back without forcing users to migrate existing data and if that hit a licensing/patent case, the other side's lawyers enjoy the extra leverage which that would give them.
As an aside, “There are many filesystems in the kernel” is technically true but misleading: HFS+/HFSX is used constantly on every device, FAT/ExFAT are used regularly by many users, and everything else is a rounding error. A few Mac users use NFS or SMB, even fewer use UFS, etc. and no iOS device uses any of those.
I agree with most of that, but I don't understand why you think DTrace is less invasive than another filesystem. They could have shipped ZFS all along and only now declared it stable, if they wanted to. They also could have integrated it again, now that it's actually more portable than before due to OpenZFS. I don't use Macs, so I don't have a direct interest in ZFSonMac, but using a filesystem which can be accessed from Linux, illumos and FreeBSD has some practical aspects which are definitely missing with APFS. Someone will eventually, unless Apple prevents it with IP and Patents, implement APFS for BSD or Linux, but just as NTFS-3G, it will never be fully stable and reliable as a Mac version of OpenZFS could have been. Apple's own APFS implementation will be stable of course, and for other versions to be good as well, they would have to open source the critical bits.
I think the point is that if Apple had to remove DTrace for legal reasons, the impact to Apple's business and customers would be minimal. If a hypothetical AppleZFS shipped on a billion devices and then Apple lost a patent lawsuit, Apple would be totally screwed. A filesystem can't just be pulled out of the OS without shipping a new one and converting every device. Apple could potentially be in a position where they couldn't sell any new devices at all, and ZFS support in other OSes wouldn't help at all.
What about BTRFS? It's more lightweight. Or it's very Linux specific?
And I'm not sure you need snapshots and many of those industrial features on a watch anyway. Sometimes simplicity is actually a plus. So who said they need "one shoe fits all" to begin with? It's never a good approach.
I don't think BTRFS has all the features they want. One thing you can do with APFS is do an upgrade of an HFS+ partition to APFS. That might be difficult with BTRFS.
You don't necessarily need snapshots on a watch, but I can see it coming in useful on a phone when you're editing pictures and video.
Snapshots on a watch could be useful for system recovery. Say something went wrong during an update you could potentially use the snapshot to revert to a previous version.
Filesysytem snapshots are useful for recovery, but it's a very powerful mechanism for it. I.e. I think for a watch it's an overkill.
On the other hand, computers are gradually moving towards miniaturization, so I suppose all this really is quite transitory, and soon enough all such considerations wills simply be irrelevant.
BTRFS can upgrade from ext4 by slotting itself into free space and marking the old metadata as a special snapshot. The same thing should work on other filesystems with few new challenges.
Support for that has been dropped within Btrfs AFAIK. Also, "upgrading a filesystem in place" is more of a lottery than a feature that people should actually use. So many bad things can happen.
Well I did it a month ago, on a reasonably-updated system, at least.
The main point is that upgrading "in-place" does not require any of the metadata to be in the same place between old and new filesystem. With a mildly flexible destination filesystem, the old filesystem can stay there until you're completely sure the conversion is a success. You end up with two read-only filesystems sharing a partition, and you choose which one to go forward with. Cancelling at any point is trivial, even if the conversion process crashes.
No they don't. They use and license various pieces of software that's under BSD and other similar licenses. For instance, they released Swift at the end of 2015. It's on GitHub and has an open development process. Plus, they use GPLv2, just not GPLv3.
I was specifically referring to their kernel when talking about "exclusively proprietary software" (which is what matters in discussions about porting btrfs). Sure, they've liberated some software but their entire stack is essentially proprietary.
I was under the impression that the released sources for the kernel are not complete (they are missing critical features), but I'm not sure whether or not this is the case with the repos you linked. To be clear, you still need proprietary software to compile it (so it's not practically free under the FSF definition).
"Or it's very Linux specific" Bingo. One of their design goals was to re-use as much existing Linux code as possible. It seems like if they succeeded it would be very hard to port it to another OS.
Because Apple systems are non-expandable, I usually store my large data on external drives (USB, NAS, etc). Any idea if APFS will work on them, either direct connect or iSCSI?
USB devices can definitely be formatted with APFS, since they did that on-stage during the demo.
I'm less sure about iSCSI targets, but iSCSI is a block-storage protocol. If you can format an iSCSI volume with HFS+ today, you'll probably be able to format it with APFS tomorrow.
File system is usually independent from storage media, with some exceptions
(and I would kind of avoid formatting external drives in "funny" FSs as I might want to read them in some other OSs. Unfortunately this usually means FAT32)
But I do have external media as HFS to work with Time Machine
For what it's worth, there's NTFS drivers for Linux and OS X (ntfs-3g). NTFS often gets a mixed reception, but regardless of your opinion of it, it's still massively better than FAT32.
Another option is ext3. The only caveat there is the Windows ext drivers don't fully support ext3 (unless I've missed an announcement). However they do fully support ext2 and ext3 is backwards compatible so you can effective get ext3 support in Windows.
Sadly though, all the good stuff requires 3rd party libraries. It's a real pity everyone can't agree on a standard to replace FAT32. :(
> It's a real pity everyone can't agree on a standard to replace FAT32. :(
Every OS in the universe supports UDF now, and its the only real ubiquitous non-proprietary filesystem. I use it on all external storage that I cannot guarantee will be touching Linux machines exclusively.
I've been using NTFS on USB sticks and external drivers for over 5 years when using Linux / Windows mixed use. No problems so far. This of course wouldn't mean it's problem free or 'rock solid', but in normal daily use I haven't encountered any. Why? Because meta data journaling. exFAT works as well, but it doesn't journal -> less reliable.
UDF works well for thumb drives that are large enough that you might want to copy a file to the thumb drive that exceeds the max file size of a FAT filesystem.
exFAT was supposed to be the new standard. But Microsoft went a little crazy with patents and license restrictions. As a result it's about as popular as JPEG2000.
It's actually a shame. A better file system for removable devices was a huge opportunity for an open standard.
Ideally they could have released a filesystem that was backwards (read at least) compatible. Large files could show up as multipart files when viewed as FAT32.
Is NTFS support good enough for writing? I haven't kept up. At one time you could read NTFS on Linux well enough, but writing was not supported (or was "experimental") I think mostly because the permissions attributes were quite different from the Unix approach? And also because it was reverse-engineered and not officially endorsed by MS?
Did Microsoft ever open-source the NTFS specs and drivers?
There were a few NTFS drivers for Linux. The native ones didn't support writing, but the ntfs-3g drivers I recommended do. ntfs-3g runs on FUSE, so it's not a native kernel driver, but honestly the slowdown from running in FUSE is hardly noticeable and they've been stable for at least 10 years. So you shouldn't have any issues running ntfs-3g.
I don't know the answers to your other questions. Sorry.
Shouldn't have any issues, but don't treat it as rock-solid either. About 5 years ago ntfs-3g made a pretty big folder of mine simply disappear. To the point that when I tried a dozen recovery tools, only two of them could even see the files.
To be fair, even the most battle tested of file systems can run into occasional glitches like this when running on consumer hardware. So on any sufficiently large forum, you'll always find a few members that have experienced data loss at some point on an otherwise agreed to be stable file system driver. But for what it's worth, I've been using ntfs-3g for about 15 years and never had a single issue with it. So as much as I do sympathise with your pain, it's definitely more an edge case than the norm.
Yes, writing was the problem, not sure how good it is today
Since my main usage for external media was big files (you know, the ones with extension mov, mpg, avi, etc) using FAT32 was not a big issue (unless it was bigger than 4GB of course)
> I know there was a way to write to ntfs from Linux, but it required to install ntfs driver file from Windows.
That isn't true. You need to use ntfs-3g, which is a free software implementation of NTFS (that allows both reading and writing). It's been stable for 10 years. Using NTFS doesn't require anything from windows and doesn't require proprietary software.
I just downloaded the source for ntfs-3g[1], and it doesn't appear to have any binary blobs in it. In addition, it's under the GPLv2 so integrating proprietary components is unlikely to be legal. The answer you linked quite clearly says that the company offers a proprietary version of ntfs-3g. The answer does not say that ntfs-3g is proprietary.
Also, Trisquel (an FSF-approved GNU/Linux distribtion, meaning that it doesn't have any proprietary software within 100km of the distro) has packages for ntfs-3g[2]. So it's _definitely_ entirely free software.
So again, you're wrong on this point. In addition, I strongly believe that you were never correct on this point. Maybe you confused ntfs-3g with the proprietary version that company sells?
Apple contrasts [space sharing] with the static allocation of disk space to support multiple HFS+ instances, which seems both specious and an uncommon use case.
Really? I depend on this use case every year to safely test drive pre-release OS versions.
Maybe I'm misreading the author, but I take "static allocation of disk space to support multiple HFS+ instances" as a fancy way to refer to multiple partitions on the same physical volume.
(So, Yosemite on one, and a pre-release of El Capitan on the other.)
Space sharing just promises to allow the same thing but without wasting space.
This seems too obvious and important a use case to call "specious and an uncommon".
Correct—I reached out to Adam to ask if Ars could syndicate the piece, and then I did some minor cleanup and clarification editing on it (mostly style conformance, but also some minor grammar tweaks and a few sentence re-writes). After checking with him to make sure my changes didn't change anything substantive, we ran the piece this AM.
Link wherever you'd like, of course, but the more traffic this pulls in, the more ammo I have to be able to get Adam contributing to Ars as a regular freelancer!
(edit - hi, adam!)
(edit^2 - corrections corrected. Apologies for the errors. I am just a simple caveman. Your mathematics confuse and annoy me!)
Some of these are probably a little too deep to get much traction on the ars front page, but there are some solid ideas here (especially the oft-sited problems one). Thanks for the feedback!
We did run a big piece by Jim Salter a couple of years ago on next-gen file systems that focused on ZFS and btrfs (http://arstechnica.com/information-technology/2014/01/bitrot...), but yeah, I'd love to have more filesystem-level stuff showing up. The response is generally very, very strong—turns out people really like reading about file systems when the authors know what they're talking about!
edit -
> I should say that I'd only support an article like that if Ars allows parts of the written text to be incorporated into OpenZFS wiki/documenation.
That's more complicated, unfortunately. I am not a lawyer etc etc and I am only speaking generally here, but Ars and CN own the copyright on the pieces we run (though syndications like Adam's piece today are different), and wholesale reuse of the text without remuneration isn't something that the CN rights management people like. Fair use is obviously fine, so quoting portions of pieces as sources in documentation is not a problem, but re-using most or all of something isn't (necessarily or usually) fair use.
(again, not a lawyer, my words aren't gospel, don't take my word for it, etc etc)
I'm also not a lawyer, but my thought process is like this: in the open source spirit, given this is not a book to be profited from, and profiting from technical books is very hard anyway, developers of some software could contribute technical content which then instead of compensation would only get editor time and in return be allowed to be included in the project's documentation. Real World Haskell, Real World OCaml somehow managed to convince the publisher this is fine. Again, IANAL, just thinking out loud.
I used to think that (1024 vs 1000), but not any more. There's ancient precedent for the other interpretation.
For example, when I was a kid growing up in NY, one of the local radio stations I listened to was WPLJ, 95.5 MHz. That's 95,500,000 cycles per second, not 100,139,008.
Go back nearly 100 years, the Chicago area got a radio station called WLS[1], one of the original clear channel stations. It broadcasts at 870 KHz. That's 870,000 cycles per second, not 890,880.
Much as computer people would like "kilo", "mega", "giga" etc to mean 1024^(whatever), there's a lot of precedent for doing things the old fashioned way!
As Wikipedia explains:
tera-, from Greek word "terastios"="huge, enormous", a prefix in the SI system of units denoting 10^12, or 1 000 000 000 000
SI is a well accepted standard. Just because it's more logical for chip designers to implement memory chips using powers of 1024 isn't a good enough reason to ignore SI.
I would hardly call it old fashioned. SI prefixes were just mis-applied to storage sizes, hence the more correct use of kibibyte (kiB) [0], mebibyte (MiB), etc. Hence also Apple's somewhat recent switch to using 1 kB = 10^3 bytes.
This sentence has a plural snapshots where singular makes more sense:
APFS brings a much-desired file system feature: snapshots. A snapshots lets you freeze the state of a file system at a particular moment and continue to use and modify that file system while preserving the old data.