Early Linux filesystem reliability

Florin_Andrei · on May 30, 2017

I've actually used Ext2 in production, back in the '90s. That's all we had back then. A bit later I've also played a lot with XFS, both on Linux and Irix (I've worked at SGI for a while in the '00s).

Back in ye olden days, yanking the power cord basically guaranteed a corrupt Ext2 FS and a visit from fsck. However, in the vast majority of cases, fsck would actually do the job. You had to peruse lost+found and recover the lost souls out of it, but that was usually the extent of it. I did 'fsck -y' many, many times - in most cases with good results.

XFS on PC was worse. It tended to be stupendously fast when dealing with lots of I/O to/from very large files (video editing, running VMs), but a power failure on PC was a lot worse than with Ext2. There was a good chance you would lose the whole volume.

On Irix / SGI hardware it was a very different story. Those suckers were quite reliable. Heavy as hell, too.

---

One habit from back then that's very hard to shake off is to run sync before reboot, often as "sync; reboot" - you know, just in case something gets stuck on the way down and you have to hit the reset button. A more extreme example would be to manually stop all services except sshd, then do "sync; reboot".

It's completely unjustified today, and yet I still do "sync; reboot" occasionally. It's baked into the muscle memory of my fingers after sleepless nights caused by losing stuff due a flaky driver that froze the system on reboot.

throwaway76543 · on May 30, 2017

If you're concerned about flushing writes consider `mount -oremount,ro` -- this guarantees writes are flushed and is the only truly important step of a system shutdown anyway WRT filesystem integrity. Once filesystems are mounted read-only you can safely power off the machine.

Filesystems can be remounted read-only via the serial console as well, with a break-u. Useful even when userland is otherwise inaccessible such as in the event of a fork bomb.

mschuster91 · on May 30, 2017

Nice to learn this - but how does one send a "break-u" e.g. from inside minicom, or in KVM virtual machines, via virsh console?

throwaway76543 · on May 30, 2017

I usually use `cu`, but the docs for minicom tell me it's control-A followed by F to send a break signal. Then just hit "u." There are a number of single characters following a break with actions to them, the full doc on this kernel feature is here: https://www.kernel.org/doc/Documentation/admin-guide/sysrq.r...

I'm not terribly familiar with KVM but the VM console tools probably have some way to generate a break signal. Check the docs for how to "send a break."

Oh, and don't forget to enable this feature via sysctl -- per the above linux doc.

jk2323 · on May 31, 2017

"Once filesystems are mounted read-only you can safely power off the machine."

In Ubuntu/Kubuntu there was once a key combination to forcefully unmount all file systems. THey seemed to have discontinued this? No?

kbaker · on May 31, 2017

You are looking for the Magic SysRq Key: https://en.wikipedia.org/wiki/Magic_SysRq_key

- Alt+SysRq+u remounts all filesystems readonly.

Looks like some of the Magic SysRq Keys are disabled by default now in new Ubuntus.

vanni · on May 31, 2017

https://askubuntu.com/questions/11002/alt-sysrq-reisub-doesn...

Piskvorrr · on May 31, 2017

Yup. You need to set kernel.sysrq=1 , probably in /etc/sysctl.conf

marcosdumay · on May 30, 2017

And, of course, in the early 00's those habits got ingrained into init scripts. Every distro would automatically run a fsck after a forced shutdown, and sync after the heaviest deamons were stopped.

Somehow, I remember filesystems to be more reliable by them, and the time needed for the fsck being the biggest issue. Then we had a Red Hat person demonstrate a forced reboot in a computer running ReiserFS in my university, and I think in the next month no computing student was suing Ext2 anymore.

Florin_Andrei · on May 30, 2017

> the time needed for the fsck being the biggest issue

Right, that was always a big problem.

jk2323 · on May 31, 2017

You mean "using"? I always was happy with ReiserFS. What a pity.

marcosdumay · on May 31, 2017

In the end EXT3 became much better anyway. It just took some time.

mediaserf · on May 30, 2017

I still do: sync; sync; shutdown now -r

Before you think I'm too old-fashioned, it's at least the shutdown command and not "init 6"

fnj · on May 30, 2017

I'm old enough to still retain the triple-sync habit: sync; sync; sync; reboot. A habit picked up on SVR3.

davidw · on May 30, 2017

I worked with a guy who did that; I wonder what the original thing was that required a triple sync? It was never needed on Linux as far as I know.

Maakuth · on May 31, 2017

To my understanding, the idea was to actually type 'sync' thrice, pressing return in between. The time it took to type it in again supposedly allowed previous sync to finish.

There's more about it in this old usenet thread: https://groups.google.com/forum/#!topic/alt.folklore.compute...

mmjaa · on May 31, 2017

I'm one of those old-timers who still has 'sync;sync;sync' in muscle memory, and I also remember why we did it way back in the day - in my case, I learned it during days of development on MIPS Risc/OS, and what three rapid syncs in succession did, was, tell the TAPE driver to rewind the tape.

Mass-storage back then was often done with tape decks, which were a lot cheaper than hard disks, and the triple-sync trick was a common firmware trick that tape-drive vendors used to get a cheap 'rewind command' out there for their users, without requiring platform-specific bins for the job.

Maakuth · on June 1, 2017

Ah, that's interesting to know. Was the development system also RISC/OS, or did some UNIX variant do this as well? I thought in UNIX, tape was always used with utilities such as tar, not mounted as a part of filesystem. That should of course be possible, for the patient at least :).

mmjaa · on June 1, 2017

It was definitely possible, and some of us remember the joys of spooling data from tape before we could use it. :)

And if I recall, this was a built-in of the tape-drive itself, which could be used on multiple different systems, and so the triple-sync being used to rewind wasn't Risc/OS specific; I remember using the same tape-deck later on Linux and Irix machines, same ol' sync command worked every time.

I can still hear the whirr in my mind ..

msisk6 · on May 30, 2017

Same here. I actually just did this today doing some work on Linux VMs where it's totally not needed. Old habits die hard.

ianai · on May 31, 2017

In some ways, you're saving yourself from not doing it that one time you find yourself on (very) old hardware that still needs it. Some may laugh, but I'm sure there're more than a few legacy installs out there - just waiting to die because somebody forgot an old trick.

avar · on May 30, 2017

Always relevant in this context, a parable from Theodore Ts'o about XFS & FS reliability: http://zork.net/~nick/mail/why-reiserfs-is-teh-sukc

Florin_Andrei · on May 30, 2017

ReiserFS was so fast when you had LOTS of tiny files, like with a caching proxy.

Eventually the Ext family caught up with it.

rconti · on June 1, 2017

Yeah, I ran ext2 in production a ton. I actually don't remember ever losing and subsequently finding anything in lost+found; in fact, almost every single one of my fsck or fsck -y were successful.... EXCEPT

One time on my personal machine, someone talk bombed me once, and I somehow had to not only recover the superblock from about the 8th backup location (yeah, the first 7 or so failed), but I also had to manually recover some files based on the inode data. But it worked!

dheera · on May 30, 2017

I always had the same issues with ext2. Dozens and dozens of errors, fsck -y, kernel panics, and then giving up. It wasn't until ext4 that I started to see some serious reliability.

Especially given how often I have to hard power-off machines because they just freeze, or because shutdown -h now doesn't work, or because they keyboard input doesn't work ...

bogomipz · on May 31, 2017

>" It tended to be stupendously fast when dealing with lots of I/O to/from very large files"

Can you or someone say what was it in the design that allowed XFS to have that performance with large files? Was this mostly extent based allocation or something else?

Dimi9909 · on May 30, 2017

not uncommon in embedded world.lots products are still using ext2. "sync" is definitely required before rebooting ext2 fs. We all know :D

tetha · on May 30, 2017

Hell the attitude in the article evokes strong memories to my last workplace.

Almost all development teams tried so hard to build smart software doing the right thing, recovering from everything, tried to fix so many things. They spent insane amounts of work trying to be smart in error situations. And all of that was useless or harmful - it either didn't work, or it it made things worse.

My team operated under simple principles. Crash early, crash hard, log well, trust the operator. But you know, Ops at that place loved our applications. It took time to get them going, but it was easy to understand what to do whenever they crashed. It was easy to understand why it stopped. It was easy to understand when to file a bug.

kpil · on May 30, 2017

Getting good at error handling, queue sizing and writing relevant logs requires some repeated exposure to real world pear-shaped events.

Failing early and visible is always good, especially before a program have been taken into production. When things have stabilized for a while, recoverable errors can be ignored, and just logged.

It's also important to take the opportunity to improve the error handling and logging first, before fixing the actual problem. Errors are hard to fake "right" so getting a good reproducible error is an opportunity.

kokada · on May 30, 2017

Quite a coincidence. I recently went back from XFS to EXT4 in my work laptop using Arch Linux, and before that I went from BTRFS to XFS.

I went from BTRFS to XFS because I got some really bad corner case of performance in BTRFS. My work consists of mostly backend development using Ruby web stack and PostgreSQL/Redis, and often my laptop would freeze completely and only get back after a dozen of seconds. This being in a SSD was unacceptable*. So I decided to go back to XFS and for some time everything went well, no more random freezes.

In this weekend I updated my system by running pacman, as usual. Chrome seemed to consume all memory and the system freeze during the update. Ok, not bad I thought, I rebooted and tried to run pacman again, hopping that only some files was corrupted, however sufficient files were corrupted that I needed to reinstall all packages in the system. Another freeze during the update and my system was essentially death after reboot. I tried to recover using chroot however multiple files were broken beyond repair.

So I decided to go back to EXT4, and reading this article does make me more confident that this shouldn't happen again.

kokada · on May 30, 2017

BTW, I know that database workloads are not optimized in CoW filesystems like BTRFS, and I could disable CoW for PostgreSQL/Redis, however this is just a development machine with maybe a hundred database transactions per second, so this shouldn't be a problem.

kev009 · on May 31, 2017

Hard not to come off smug but as a ZFS user it is always jarring to read anecdotes like this in {{ current_year }}.

kokada · on May 31, 2017

I am quite curious to test ZFS, however Arch Linux does not maintain official ZFS packages in repositories, so I would need to maintain my own kernel updates just to use ZFS. Not interesting to me, specially considering that this is my work laptop.

When (if?) Arch supports ZFS in main repos I will probably test it. That is, unless bcachefs comes first.

codys · on June 1, 2017

> maintain my own kernel updates

Not really: zfs is available in aur as a dkms package. Pacman automatically rebuilds dkms modules when it updates the kernel. All you'd need to do is modify mkinitcpio.conf to include "zfs" in the HOOKS at the appropriate stage.

So: more complicated that just using ext4, but not "maintain my own kernel" level of difficulty.

szatkus · on May 31, 2017

How did you find out that it was filesystem problem?

wazoox · on May 31, 2017

XFS is for server workloads. It definitely doesn't play well with laptops.

darksaints · on May 30, 2017

So last I checked, btrfs was the way of the future according to Ted, but every time I see it discussed, it's Here Be Dragons galore. Is there some timeframe where btrfs will take over? Or at least be stable enough for a Debian or Red Hat to switch to it as a default?

Mister_Snuggles · on May 30, 2017

SuSE/OpenSuSE has been using BTRFS by default for a while and it seems to work well enough. There's a default schedule of running 'btrfs balance' every week, seemingly based on when the OS was installed (it relies on the timestamp of a file that gets updated with every run), that makes the system(1) virtually unusable for about 15 minutes.

(1) I've only seen this on one machine, so maybe it's a quirk of that machine's workload. But it sure does suck when it's in the middle of a workday.

codys · on May 30, 2017

On btrfs causing latency: I've got a few systems with btrfs as the rootfs (on top of lvm, on top of dm-crypt, on a SSD).

I recently started using `snapper` to create snapshots on a schedule on them. I enabled quota support in btrfs so I could see how much space snapshots were using.

I noticed that filesystem wide latency tended to spike when removing snapshots (several minutes of all fs access stalling).

Balancing with quotas enabled is even worse: my systems were hung for multiple days, until I forcibly restarted them and disabled quotas. Then the fs hangs were much smaller (a few seconds) and not to noticeable. Balancing finished in something on the order of an hour.

While I had quotas enabled, I was constantly having btrfs tell me the data was bad and needed rescanning (rescanning quotas would also induce fs wide latency).

The thing is, ZFS has snapshot space usage info, and doesn't have awful latency (it also doesn't have a "balance" operation, but I'm not sure how relevant that is).

Given my experience with both btrfs & ZFS, I'll likely consider using ZFS as my rootfs in the future.

Mister_Snuggles · on May 31, 2017

I have no idea what the state of ZFS on Linux is, but I've been using it on FreeBSD for a while now and it's fantastic. Comparing FreeBSD to Linux is a bit of an apples/oranges thing though.

cyphar · on May 30, 2017

Yeah, that happens to me too. I believe the reason why the lag is so bad on openSUSE / SUSE [aside: note the spelling :P] is because we carry patches that make quotas actually apply correctly (but it increases the big-O complexity by quite a lot -- making balances way more expensive).

krylon · on May 30, 2017

I've seen it, too.

I have three machines running openSUSE, and the one I've seen it on is both the least powerful one (Lenovo IdeaPad 100, some Atom chip, 2GB RAM) and the only one with a non-SSD drive.

ansible · on May 30, 2017

We've been running in production with our small-scale setup for (three?) years now. Mostly problem-free. But...

We did have one incident recently with an Ubuntu 14.04 system, which had RAID-1 across 3 drives. Lost one physical drive, and thereby lost the entire btrfs filesystem. Running the btrfs fsck wasn't able to fix it. I likely should have run the latest btrfs-tools to try to fix it, instead of letting the default version that came with 14.04 version try.

Still, we're not planning on switching anytime soon. Been using btrfs send/receive for snapshot backups, which is awesome.

sp332 · on May 30, 2017

I was running 14.04 with its "stock" kernel (can't remember what version, but it was old) and had stability problems after losing a RAID1 drive. Upgrading the kernel made a huge difference in stability, and I was able to recover most of the data.

Zardoz84 · on May 30, 2017

hardware , fake raid, dmraid or btrfs raid?

ansible · on May 31, 2017

It was btrfs RAID-1.

Zardoz84 · on June 1, 2017

Actually is well know that BTRFS RAID-1 have problems if degrades to a single hard disk. Perhaps it's related to this.

eeZi · on May 30, 2017

I'm running btrfs in production with a very heavy workload with millions of files and all sorts of different access patterns. Regular deduplication runs, too. We're probably one of the largest btrfs users.

Had a LOT of unplanned downtime due to various issues with older kernel versions, but 4.10+ has been solid so far. You definitely need operational tooling (monitoring, maintenance like balance) and a good understanding of the internals (what happens when you run our of metadata space etc.).

Happy to answer questions!

On a related note: Never ever use the ext4 to btrfs conversion tool! It's horribly broken and causes issues weeks later.

lloeki · on May 31, 2017

> On a related note: Never ever use the ext4 to btrfs conversion tool! It's horribly broken and causes issues weeks later.

Care to give some details about this and other failures? Part of what makes a FS reputation is not just people telling "it works" but also stories about how the thing crashed and how they recovered from it. IOW, it always works, until it doesn't, and then it still "works" because I can dig myself out of the hole this or that way.

Inspired by the way you can convert a Debian VM to Arch Linux on Digital Ocean, I happen to have been toying with it recently to auto-convert a blank Debian 8.x VM from ext4 to btrfs. Looks like things are fine, but only because the kernel is <4.x and the VM has very little data on it since it's blank.

WARNING: This is a toy. Do not use for production.

https://github.com/lloeki/digitalocean-ext4-to-btrfs

eeZi · on May 31, 2017

It resulted in random, hard to reproduce ENOSPC errors down the line without either data or metadata being anywhere close to full. Neither us nor the btrfs developers that took a look at it were able to figure out what exactly went wrong, but it was something about new blocks not fitting anywhere despite lots of free space.

Someone on #btrfs said that the filesystem layout is a lot different when using the conversion tool and all of the regression testing happens with regular filesystems, not converted ones.

We reinstalled all machines from scratch. Never happened again.

cmurf · on May 30, 2017

It's stable on stable hardware for some time. The multiple device stuff has missing features mainly related to error handling when a device starts to go crazy, and that flat out requires a sysadmin who understands all of that. i.e. Btrfs won't consider a block device unreliable and just ignore it, it'll keep retrying to read or write, while filling up the system log with all of the errors. When there's redundancy, it does fixup these problems automatically, but it can drown in its own noise if a device is producing an overwhelming amount of spurious data. And there's no notification system for this: but note there's no standard notification for this on Linux at all either. LVM and md/mdadm RAIDs do not share the same error handling or notification.

The main issue for whether it's the default filesystem is whether the distro has the resources to support it for their users. Mostly this is in terms of documentation, and understanding what sort of backports to support. Really the only distro doing that work is Suse and maybe Oracle. I don't expect the more conservative distros to support it for some time until they feel they can depend on upstream's backporting alone.

cyphar · on May 30, 2017

> Or at least be stable enough for a Debian or Red Hat to switch to it as a default?

SUSE / openSUSE has had btrfs as the default filesystem for a few years (and we have a bunch of tools built around it adding features like boot-to-snapshot and auto-snapshot of upgrades). Personally I have had issues with it, but I've also messed around with btrfs subvolumes quite a lot (developing container runtime storage drivers) so it might be self-inflicted.

sp332 · on May 30, 2017

If you stick to the green features, you're probably good. https://btrfs.wiki.kernel.org/index.php/Status

Zardoz84 · on May 31, 2017

It depends of would do you with BTRFS. We are using on our small-scale servers where we have many small VM (20~40GiB) that have BTRFS as root with transparent compression. It make more easy expand a VM hard disk, as we only need to add a new virtual hard disk and add it to the BTRFS. Some of our VM's are on HyperV and this means stoping the VM to add a new hard disk or resize the virtual hd. Other are on a new server running Proxmox that allows add virtual hard disk without stoping the VM, that allow us to add extra hard disk space with stoping the VM server. I only need to schedule a weekly rebalance of the BTRFS (Ubuntu server 16.04 don't does this!) and I have a script to check free space and available chunks on the FS to avoid any problem with a full BTRFS partition.

I remember having a issue with BTRFS two years ago when go a unexpected powerdown (the SAI don't help us as was a short-circuit after the SAI!) where a HyperV VM with BTRFS had the FS corrupted, but we managed to recover the data. Also we have a physical server to run Jenkins and GitLab that it's using FakeRaid + BTRFS + btrbk to schedule backups using snapshots and btrfs send, that are stored as compressed files on a network folder (and this folder are backuped to magnetic tapes).

I don't noticed any slowdown when I launched a manual rebalance, but we are operating at low scale so we not have a lot of I/O. Our real bottleneck are the databases that are on Windows servers.

mjevans · on May 30, 2017

The TLDR version:

BTRFS for a 'simple' use case is fine.

Push redundancy to another layer for the time being. EG: MDADM or hardware raid.

disconnected · on May 30, 2017

I dont use it, but in practical terms, I think btrfs is safe to use, from what I've been reading. There are some corner cases where something might bork, and you'll be staring at a recovery job, but hopefully no data will be lost.

The main thing that I think is stopping widespread adoption is that none of the developers seems to want to come forth and say "Here's a stable, rock solid version. Go nuts.". There's always a caveat or a disclaimer. Use at your own risk and all that.

When even the developers are kinda jittery about it, it isn't exactly reassuring.

dom0 · on May 30, 2017

It still looks like the FS exhibits perf degradation (well, uh, worse than Linux already degrades under I/O load anyway) under not-that-untypical workloads. A few years ago it had the same problems with bog-standard workloads such as using it on / and installing or updating packages. Though, like I mentioned, Linux does generally not shine when it comes to I/O scheduling and system responsiveness. Just a couple days ago I made my whole workstation bog down by writing to the /home SSD with 400 MB/s avg (ext4). It's just not very good there, and it feels like the desktop software is getting worse at dealing with it (probably due to more I/O in more spots, like delayed loading of resources or history files that are read/written in the UI thread and stuff like that)...... especially considering that we all were on spinning rust a few years ago, and now everyone uses SSDs with orders of magnitude more IOPS and at least 2-4 times the read/write speed.

api · on May 31, 2017

Synology uses btrfs by default for a NAS product, and I find it hard to believe they'd pick that for a product whose explicit goal is reliable storage if it were full of dragons.

Once something gets a reputation for having issues that reputation tends to stick pretty much forever. I'm thinking maybe older versions of btrfs were problematic and the FUD has never gone away.

X86BSD · on May 31, 2017

What other choice did they have if they deploy on Linux? Btrfs is the best Linux land has and its no zfs.

BrainInAJar · on May 31, 2017

Just use ZFS. It's been battle tested for the last 13 years in production, and was written by careful, thoughtful, intelligent engineers to solve a problem. BTRFS was written sloppy to solve the problem of "oh no, Sun did a thing good"

simula67 · on May 31, 2017

> There is yet another example of "Worse is Better" in how Linux had PCMCIA support several years before FreeBSD/NetBSD. However, if you ejected a PCMCiA card in a Linux system, there was a chance (in practice it worked out to be about in 1 in 5 times for a WiFI card, in my experience) that the system would crash. The *BSD's took a good 2-3 years longer to get PCMCIA support, but when they did, it was rock solid. Of course, if you are a laptop user, and are happy to keep your 802.11 PCMCIA card permanently installed, guess which OS you were likely to prefer --- "sloppy but works, mostly", or "it'll get there eventually, and will be rock solid when it does, but zip, nada, right now"?

Time to market matters. Probably explains how codebases with low reputed quality still seems to win in the real world : MySQL vs Postgres ? MongoDB vs RethinkDB ?

kev009 · on May 31, 2017

Ted's post is quite balanced and accurate vs the revisionist fawning in the quoted message from McVoy. But Warner's message elsewhere in the thread is also good http://minnie.tuhs.org/pipermail/tuhs/2017-May/009880.html.

The biggest Linux defect was not ext{2,3,4} but LVM and MD, which would throw away write barriers until something like kernel 2.6.31 (which was especially painful on Ubuntu LTS). Many distros used LVM by default, and many servers used mdraid somewhere in the stack. I saw many corrupt Linux systems in the 2000s through the first part of this decade.. it was especially egregious for DBs and hypervisors with file based disk images.

jdblair · on May 30, 2017

It took me a few passes to figure out that "Things Just Worse" should be "Things Just Work." Autocorrect much?

codewiz · on May 30, 2017

PC class hardware tends not to have power fail interrupts, and when power drops, and the voltage levels on the power rails start drooping, DRAM tends to go insane and starts returning garbage long before the DMA engine and the hard drive stops functioning.

This is insane, I refuse to believe it. Even a junior EE knows how to design a PCB so that the RESET signal is asserted on all ICs as soon as the voltage drops below a safe operating level.

DannyB2 · on May 30, 2017

The PCB may be well designed, but the power supply may not be. I have witnessed what a cheap PC can do. Purchased from a Walmart some years ago. Windows 7 immediately replaced with Ubuntu. After a brownout, that machine was unbootable. Also un-fsck'able.

My thought was that the power supply should guarantee adequate power, or none at all. Not something in between. Also, no a rapidly alternating states of adequate power / no power.

Short term solution: rebuild that system and get it a UPS.

Longer term solution: get a much better box.

tetha · on May 30, 2017

> My thought was that the power supply should guarantee adequate power, or none at all. Not something in between. Also, no a rapidly alternating states of adequate power / no power.

That is very hard to do, actually. Voltage does fluctuate to a certain degree, because physics. And you don't want to cut power to a system just due to a small power dip. And you can't distinguish a small power dip from 10 small power dips over some time ago. And your flapping detection only works if the power drops are close enough to each other.

Proper control theory is amazingly hard. As my prof on that said - no one fully understands PID controllers, but some blokes have a really lucky thumb.

codewiz · on May 30, 2017

The case Ted Ts'o describes is much simpler: DRAM returning garbage while the CPU is still executing instructions and sending commands on the SATA bus.

This can't possibly happen even if the power supply is crazy bad, because the reset logic on the main board will halt everything before the DRAM starts malfunctioning.

pdkl95 · on May 30, 2017

> SATA

ext3 predates SATA, briefly. While SATA was commonly used with ext3 soon after ext3's release, a lot of the hardware Ted Ts'o is probably referring to would have been ATA-4/UDMA (or SCSI).

> while the CPU is still executing instructions

The problem Ts'o described doesn't involve the CPU:

    DRAM tends to go insane and starts returning
    garbage long before the DMA engine and the hard
    drive stops functioning.

A bus-mastering ATA (or SCSI) controller - possibly on the motherboard, possibly an expansion card, regardless almost certainly PCI - may be copying data from RAM directly as Direct Memory Access.

> the reset logic on the main board will halt everything

In theory, there is no difference between theory and practice. In practice, there is, especially when cheap, poorly-designed hardware is involved.

DSMan195276 · on May 30, 2017

> RAM returning garbage while the CPU is still executing instructions and sending commands on the SATA bus.

When I read it, I got the impression he was saying the DRAM was going crazy while a DMA transfer to the hard-drive was still going on. That doesn't require the CPU to be functional when the DRAM is corrupt, only the DMA controller. I can't personally say if that makes it any more likely though.

codewiz · on May 31, 2017

That's right, but all stateful ICs on the motherboard have a reset pin, including the north and south bridges when they were still separate packages. Even PCI cards will receive the reset signal simultaneously.

Not sure what the hard drive would do with a truncated ATA command though.

GalacticDomin8r · on May 31, 2017

A more in-depth explanation of journaling vs soft-updates:

https://www.freebsd.org/doc/handbook/configtuning-disk.html

seltzered_ · on May 30, 2017

This post brought up memories of trying Linux in the late 90s and giving up (or reinstalling the os) because ext2 crashed to a point unrecoverable by fsck. Anyone else have a similar experience back then?

ianai · on May 30, 2017

I stand corrected, sorry everybody.

blackflame7000 · on May 30, 2017

ZFS without ECC Ram isn't as dangerous as you might think. http://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-yo...

"There’s nothing special about ZFS that requires/encourages the use of ECC RAM more so than any other filesystem." -Matthew Ahrens (Cofounder of ZFS at Sun Microsystems and current ZFS developer at Delphix)

ianai · on May 30, 2017

Well, this hurt, but I learned something today. Thank you.

swinglock · on May 30, 2017

It's a common myth being repeated.

I think comes from an assumption that if you use ZFS you care more than users of other filesystems about getting the same data you have put onto disk out of the disk later when reading it back.

If that's important it's good advice to use ECC RAM. The point that's often lost is that if you cared about correct data, then having ECC RAM would have been a good idea regardless of using ZFS or not, and you're not worse off with ZFS than you would have been with another filesystem without ECC RAM.

Another common myth is that ZFS requires large amounts of RAM. It doesn't do that either, but the assumption is that if you use ZFS you're building some large multi-user system which benefits from a large cache. If that's not your use-case then that's not true, you could run ZFS on your single disk laptop if you wanted to.

kasabali · on May 31, 2017

Coincidentally both myths originate from the FreeNAS forum.

laumars · on May 30, 2017

If you can afford ECC then that is obviously preferable. However the horror stories regarding running ZFS on non-ECC are greatly exaggerated.

michaelmrose · on May 30, 2017

Not merely exaggerated, nonsense. All filesystems using non ecc memory are more likely to experience damage to the data stored therein.

Zfs is no more likely to be damaged that anything else.

This whole confusion stems from zfs devs suggestion that ecc ram be used and people who don't know any better thinking oh that must be some special requirement for zfs.

These folks then spread this misinformation all over the Internet where its passed on by people who know nothing about the topic.

Pro tip: don't talk about things on the Internet that you only know about 3rd or 37th hand. If you know nothing about the topic you aren't improving the store of human knowledge by passing on noise.

laumars · on May 30, 2017

> Not merely exaggerated, nonsense. All filesystems using non ecc memory are more likely to experience damage to the data stored therein.

If it were "nonsense" then one could flip the argument and say "ECC memory has no beneficial effect what-so-ever" - which be both know is untrue. However I can see why people do say "nonsense" given the ECC-myth seems to only be discussed in relation to ZFS and the probability data errors due to non-ECC memory is small.

For what it's worth, I did originally include a comment about how the risk being the same as on any other file systems - but then deleted it fearing I might have overlooked something on another fs I'm less familiar with.

> Pro tip: don't talk about things on the Internet that you only know about 3rd or 37th hand. If you know nothing about the topic you aren't improving the store of human knowledge by passing on noise.

I don't think that's fair. The problem with 1st hand experience is that it's often just based on anecdote. Which can often be worse than 3rd hand advice. And in the case of failures: often some products are too widely used to be worth constantly badgering the development team for help; which means you end up having to rely on the advice of others. The problem is really more that some people are terrible at researching so take 3rd hand advice without bothering to fact checking it.

In any case, I don't know if your comment was aimed at me or not, but I do have nearly 10 years of experience (wow that's gone quick!) running ZFS across a variety of systems, some running ECC memory, others not. I've also been a keen study of the Sun/Oracle docs. So I do consider myself reasonably well informed from both credible sources and personal experience. Though I'm not arrogant enough to assume I'm an expert either - communities like HN can be deeply humbling places.

dom0 · on May 30, 2017

> > Not merely exaggerated, nonsense. All filesystems using non ecc memory are more likely to experience damage to the data stored therein.

> If it were "nonsense" then one could flip the argument and say "ECC memory has no beneficial effect what-so-ever" - which be both know is untrue.

The original argument is like an implication ("no ECC => don't use ZFS"), which he refutes by saying (correctly), that file systems are generally affected the same way by memory corruption. Your argument seems to invert the original implication and drawing conclusions from that (fallacy of the converse, I believe). ∎

laumars · on May 30, 2017

He was refuting my "greatly exaggerated" remark by going on to make the same points I was alluding to albeit directly and in more detail. So I was in turn elaborating on my choice of language.

I feel between this post and you're previous one that you are basically just agreeing with me via the process of nitpicking the language I used.

dom0 · on May 30, 2017

s/exaggerated/wrong

dsr_ · on May 30, 2017

We used to say that the fastest way to get a good answer is to post a wrong assertion on Usenet.

ianai · on May 30, 2017

I learned to do that in academia, too. It's effective but you need your flame retardant boots on.

michaelmrose · on May 30, 2017

I don't understand how you could think this. It's not merely the fact that it's incorrect I don't understand the line of logic that would lead to the conclusion that zfs requires any more sophisticated setup or ecc to function.

Furthermore backups and raid may be related but they are pretty orthogonal. Having a reasonable strategy to ensure performance integrity and reliability isn't obviated by having a good backup strategy.

If you have to restore from backup aren't you going to lose some amount of data even if it's only the last hour -> day.

datatan · on May 30, 2017

There is no reason not to run ZFS on non-ECC memory, let alone it being "insane".

iso-8859-1 · on May 30, 2017

triple negation? Surely this isn't needed...

regularfry · on May 30, 2017

Complex Things Fail In Complex Ways: An Object Lesson

Dimi9909 · on May 30, 2017

slightly off topic: How facebook uses btrfs:

https://www.linux.com/news/learn/intro-to-linux/how-facebook...

Dimi9909 · on May 30, 2017

http://masoncoding.com/presentation/ks-14/btrfs.html#/1

Dimi9909 · on May 30, 2017

Ted is brilliant