Hacker News new | past | comments | ask | show | jobs | submit login

The "why not btrfs" line boils down to "it took a long time to be stable".

That's a weird argument. Even if it's true, it is now stable, and has been for a long time. btrfs has long been my default, and I'd be wary of switching to something newer just because someone was mad that development took a long time.




In 2019, btrfs ate all my data after a power cut. Btrfs peeps said it sounded like my SSD was at fault. Well, ZFS is still chugging along on that drive. I am not surprised btrfs took ages to stabilize, and it will take ages again before I rely on it. I’ve had previous btrfs incidents too. I think the argument against btrfs is that it was not good enough when btrfs devs told people to use it in production for ages.


Anecdotally and absolutely not production experience here, but I've had a Synology device running btrfs for 7 or 8 years now. Only issue I ever had is when I shipped it cross country with the drives in it, but was able to recover just fine.

This includes plenty of random power losses.


They do use btrfs. However, Synology also uses some additional tools on top of btrfs. From what I remember (could be wrong about the precise details), they actually run mdadm on top of btrfs, and use mdadm in order to get the erasure coding and possibly the cache NVME disk too. (By erasure coding, I mean RAID 5/6, or SHR, which are still unstable generally in BTRFS).


I assume you mean running btrfs on top of md (mdadm) or dm (dmraid), not the other way around?


Woops, you are correct! And it looks like it is dmraid, not mdadm.

https://daltondur.st/syno_btrfs_1/

Sorry about that!


Yeah, in the last year and a half, I've had three btrfs file systems crash on me with the dreaded "parent transid verify failed". Two times it was out of the blue, third time was just after it filled up.

The people on IRC tend to default to "unless you're using an enterprise drive, it's probably buggy and doesn't respect write barriers", which shouldn't have mattered because there was no system crash involved.

Yes, I did test my RAM, I know it's fine. For comparison, I've (unintentionally) ran a ZFS system with bad RAM for years and it only manifested as an occasional checksum error.


> Yes, I did test my RAM, I know it's fine. For comparison, I've (unintentionally) ran a ZFS system with bad RAM for years and it only manifested as an occasional checksum error.

Just luck. Software can't defend itself against bad RAM. There's always the possibility that bad RAM will cause ZFS to corrupt itself in some way it can't recover itself from.

Everything is in RAM. The kernel, the ZFS code, everything. All of that is vulnerable to corruption. No matter how fancy ZFS is, it can't stop its own code from being corrupted. It's just luck that it didn't happen.


Well, yes and no. The amount of RAM consumed by the filesystem driver is negligible compared to the truckloads of filesystem data shoveled through it. If we assume that errors are comparatively rare, the code itself is unlikely to be affected. Even if you're unlucky enough to get RAM corruption in the 0.01% occupied by the ZFS driver, the chance that a bit will flip in just such a way as to make a checksum succeed when it should have failed due to a second bit flip is virtually nonexistent. Much more likely that it simply crashes in some way. As such ZFS is much more resilient to on-disk filesystem corruption from bad RAM than systems which don't do any checksumming at all.


ECC RAM helps


> For comparison, I've (unintentionally) ran a ZFS system with bad RAM for years and it only manifested as an occasional checksum error.

Be careful though. If whatever data was to be written got corrupted early enough, ie before ZFS got to see it, it happily wrote corrupted data to disk with matching checksum and you're none the wiser. But yes, it didn't blow up the entire Filesystem unlike btrfs likes to do.


Btrfs never actually stabilized it's still garbage compared to ZFS


Care to substantiate that statement? It seems rather arbitrary to just say that it's garbage when it is running and has been running successfully for the vast majority of its users. It also offers two features that ZFS does not: the ability to grow a pool, and offline duplication.


Based on the reports of corruption and data loss from actual users, I don’t think this claim is true at all.


Does it even have RAID5?


Why should it matter? It's an extremely niche technology that's only interesting to some home users. I see no reasons why other users should care about a RAID level they're not interested in.

(I don't use btrfs or any other COW filesystem because of significantly worse performance with some kinds of workloads, but it has nothing to do with maturity of any of them.)


> Why should it matter? It's an extremely niche technology that's only interesting to some home users.

I use RAID-Z2 in lots of places for bulk storages purposes (HPC).

There's a reason why Ceph added erasure coding:

* https://ceph.io/en/news/blog/2017/new-luminous-erasure-codin...

* https://docs.ceph.com/en/latest/rados/operations/erasure-cod...

When you're talking about PB of data, storage efficiencies add up.


Wtf? This is a bizzare take. Facebook poured in millions of dollars of R&D into btrfs.


but they likely put money into features which they are interested in, and not raid56


Yes but you will lose data if you are writing to your array when the power goes out. RAIDZ (ZFS) does not have this problem. See BTRFS RAID5 write hole.


Btrfs is still unacceptably less reliable than ZFS, after _decades_ of development. This is unacceptable, IMHO. I've lost so much data due to btrfs corruption issues that I've (almost) stopped to use it completely nowadays. It's better to fight to keep the damned OpenZFS modules up to date and get an actual _reliable_ system instead of accepting the risk again.


> I've lost so much data due to btrfs corruption issues that I've (almost) stopped to use it completely nowadays.

Just out of curiosity: is there a specific reason you're not using plain-vanilla filesystems which _are_ stable?

Personal anecdote: i've only ever had serious corruption twice, 20-ish years ago, once with XFS and once with ReiserFS, and have primarily used the extN family of filesystems for most of the past 30 years. A filesystem only has to go corrupt on me once before i stop using it.

Edit to add a caveat: though i find the ideas behind ZFS, btrfs, etc., fascinating, i have no personal need for them so have never used them on personal systems (but did use ZFS on corporate Solaris systems many years ago). ext4 has always served me well, and comes with none of the caveats i regularly read about for any of the more advanced filesystems. Similarly, i've never needed an LVM or any such complexity. As the age-old wisdom goes, "complexity is your enemy," and keeping to simple filesystem setups has always served my personal systems/LAN well. i've also never once seen someone recover from filesystem corruption in a RAID environment by simply swapping out a disk (there's always been much more work involved), so i've never bought into the "RAID is the solution" camp.


ZFS is just too convenient, IMHO:

- ZStandard compression is a performance boost on crappy spinning rust

- Snapshots are amazing, and I love being able to quickly send and store them using send and receive

- I like not having to partition the disk at all, and still be able to have multiple datasets that share the same underlying storage. LVM2 has way too many downsides for me to still consider it, like the fact that thin provisioning was quite problematic (i.e. ext4 and the like have no idea they're thin provisioned, ...)

- I like not having to bother with fstab anymore. I have all of my (complex) datasets under multiple boot roots, and I can mount pools from a live with an altroot and immediately get all directories properly mounted

- AFAIK only ZFS and Btrfs support checksums out of the box. I hate the fact that most FS can in fact bitrot and silently corrupt files. With ZFS and Btrfs in theory you can't easily restore your data, but at least you'll know if it got corrupted and restore it from a backup

- I like ZVOL; I appreciate being able to use them as sparse disks for VMs that can be easily mounted without using loopback devices (you get all partitions under /dev/zvol/pool/zvol-partN)

- If you have a lot of RAM,the ZFS ARC can speed up things a lot. ZFS is somewhat slower most of the time than "simpler" FS, but with 10+ GB availble to ARC it's been faster in my experience than any other FS

I do use "classic" filesystems for other applications, like random USB disks and stuff. I just prefer ZFS because the feature set is so good and it's been nothing but stable in day to day use. I've literally had ZERO issues with it in 8+ years - even when using the -git version it's way more stable than Btrfs ever was.


> Just out of curiosity: is there a specific reason you're not using plain-vanilla filesystems which _are_ stable?

I'd guess that it is the classic case of figuring out if something works without using it being a lot harder than giving it a go and seeing what happens. I've accidentally taken out my own home folder in the past with ill-advised setups and it is an educational experience. I wouldn't recommend it professionally, but I can see the joy in using something unusual on a personal system. Keep backups of anything you really can't afford to lose.

And one bad experience isn't enough to get a feel for how reliable something is. It is better to stick with it even if it fails once or twice.


> And one bad experience isn't enough to get a feel for how reliable something is.

For non-critical subsystems, sure, but certain critical infrastructure has to get it right every time or it's an abject failure (barring interference from random cosmic rays and similar levels of problems). Filesystems have been around for the better part of a century, so should fall into the category of "solved problem" by now. i don't doubt that advanced filesystems are stupendously complex, but i do doubt the _need_ for such complexity beyond the sheer joy of programming one.

> It is better to stick with it even if it fails once or twice.

Like a pacemaker or dialysis machine, one proverbial strike is all i can give a filesystem before i switch implementations.


snapshots every 15 minutes are a big selling point of ZFS for me; losing a file to a tired

    $ grep bar foo.txt | tr A-Z a-z > foo.txt
is much more common than losing a disk


> losing a file to a tired ...

If the file isn't in source control, a backup, or auto-synced cloud storage, it can't be _that_ important. If it was in either, it could be recovered easily without replacing one's filesystem with one which needs hand-holding to keep it running. Shrug.


ZFS is the mechanism by which I implement local (via snapshots) and remote (via zfs send) backups on my user-facing machines.

- It can do 4x 15-minute snapshots, 24x hourly snapshots, 7x daily snapshots, 4x weekly snapshots, and 12x monthly snapshots, without making 51 copies of my files.

- Taking a snapshot has imperceptible performance impact.

- Snapshots are taken atomically.

- Snapshots can be booted from, if it's a system that's screwed up and not just one file.

- Snapshots can be accessed without disturbing the FS.

In my experience it hasn't required more hand-holding than ext4 past the initial install, but the OSes that most of my devices use either officially support ZFS or don't use package managers that will blindly upgrade a kernel past what out-of-tree modules I'm using will support, which I think fixes the most common issue people have with ZFS.


> is there a specific reason you're not using plain-vanilla filesystems which _are_ stable?

my personal reasons are raid + compression


Funny because I have the opposite experience. The main issue with btrfs is a lack of tooling for the layperson to not require btrfs-developer level knowledge to fix issues.

I've personally had drive failures, fs corruptions due to power loss (which is supposed not to happen on a cow filesystem), fs and file corruption due to ram bitflips, etc. All the times btrfs handled the situation perfectly, with the caveat that I needed the help from the btrfs developers. And they were very helpful!

So yeah, btrfs has a bad rep, but it is not as bad as the common feeling makes it look like.

(note that I still run btrfs raid 1, as I did not find real return of experience regarding raid 5 or 6)


It's funny because Facebook uses btrfs for their systems & doesn't have these issues.

ZFS lovers need to stop this CoW against CoW violence.


Someone correct me if I'm wrong but to my understanding FB uses Btrfs in either RAID 0, 1, or 10 only and not any of the parity options.

RAID56 under Btrfs has some caveats but I'm not aware of any annecdata (or perhaps I'm just not searching hard enough) within the past few weeks or months about data loss when those caveats are taken under consideration.


> RAID56 under Btrfs has some caveats but I'm not aware of any annecdata (or perhaps I'm just not searching hard enough) within the past few weeks or months about data loss when those caveats are taken under consideration.

Yeah this is something that makes me consider trying raid56 on it. Though I don't have enough drives to dump my current data while re-making the array :D (perhaps this can be changed on the fly?)


What's your starting array look like? If you're already on Btrfs then I recall you could do something like `btrfs balance -d raid6 -m raid1c3 /`

https://btrfs.readthedocs.io/en/latest/Balance.html


Yeah I'm on btrfs raid 1 currently, with 1x1TB + 2x3TB + 2x4TB drives. Gotta love btrfs's flexibility regarding drive size :D

I'll have a look, thanks! I guess failing this will make me test by backup strategy that I have never tested in the past.


Out of curiosity, how much total storage do you get with that drive configuration? I've never tried "bundle of disks" mode with any file system because it's difficult to reason about how much disk space you end up with and what guarantees you have (although raid 1 should be straightforward, I suppose).


I get half of the raw capacity, so 7.5TB. Well a bit less due to metadata, 7.3TB as reported by df (6.9TiB).

For btrfs specifically there is an online calculator [1] that shows you the effective capacity for any arbitrary configuration. I use it whenever I add a drive to check whether it’s actually useful.

1: https://carfax.org.uk/btrfs-usage/?c=2&slo=1&shi=1&p=0&dg=1&...


Just want to do a follow up and make a correction that the command to go from whatever to RAID 6 for data and RAID 1c3 for metadata in Btrfs is instead: `btrfs balance -dconvert=raid6 -mconvert=raid1c3 /` instead of what I originally posted


> It's funny because Facebook uses btrfs for their systems & doesn't have these issues.

they likely have distributed layer on top which takes care of data corruption and losses happening on specific server


fs corruption due to power loss happens on ext4 because the default settings only journal metadata for performance. I guess if everything is on batteries all the time this is fine, intolerable on systems without battery.


The FS should not be corrupted, only the contents of files that were written around the time of the power loss. Risking only file contents and not the FS itself is a tradeoff between performance and safety where you only get half of each. You can set it to full performance or full safety mode if you prefer.


True this is file corruption.


>It's better to fight to keep the damned OpenZFS modules up to date and get an actual _reliable_ system

Try CachyOS (or at least the ZFS-Kernel) it has excellent ZFS integration.


This. I may still give up on running ZFS on Linux due to the common (seemingly intentional from the Linux side) breakage, but for my existing systems switching them over to CachyOS repos has been a blessed relief.


Well i use mainly FreeBSD but have used CachyOS for about 3mo to have some systemd refresher :)


Hadn't heard of CachyOS before, looks very nice! Was looking to move to Arch from KDE Neon, but this might be a much better fit.


Well or don't move from arch and just use the cachyos repos:

https://wiki.cachyos.org/de/cachyos_repositories/how_to_add_...

No reinstall needed ;)


Fair point. Currently running KDE Neon though, which is Debian based so reinstall needed...


For me the killer feature of btrfs is "RAID 1 with different sized disks". For a small and cheap setup, this is perfect since a broken disk can be replaced with a bigger one and immediately (part of) the extra new disk space can be used. Other filesystems seem to only increase the size once all disks have been replaced with a bigger capacity one (last time I checked this was still the case for ZFS)


Exactly. Provisioning a completely different set of disks when running out of capacity might be fine for a company but not for home office.


How does that work? You have two 100gb drives in raid1, 80% full, you replace one with a 200gb disk and write 50gb to the array - how is your 130gb of data protected against either drive failing?


I don't know the ins-and-outs of btrfs in detail, but having dug into other systems that offer redundancy-over-uneven-device-sizes and assuming btrfs is at least similar: with your two drive example you won't be able to write another 50gb to that array.

For two devices, 1x redundancy (so 2x copies of everything) will always limit your storage to the size of the smaller device otherwise it is not possible to have two copies of everything you need to store. As soon as you add a third device of at leats 100gb (or replace the 100gb device with one at least 200gb) the other 100gb of your second device will immediately come into play.

Uneven device size support is most useful when:

♯ You have three or more devices, or plan to grow onto three or more from an initial pair.

♯ You want flexibility rwt array growth (support for uneven devices usually (but not always) comes with better support for dynamic array reshaping).

♯ You want better quick repair flexibility: if 4Tb drive fails, you can replace it with 2x2Tb if you don't have a working 4Tb unit on-hand.

♯ You want variable redundancy (support for uneven devices sometimes comes with support for variable redundancy: keeping 3+ copies of important data, or data you want to access fastest via striping of reads, 2 copies of other permanent data, and 1 copy of temporary storage, all in the same array). In this instance the “wasted” part of the 200gb drive in your example could be used for scratch data designated as not needing to be stored with redundancy.


It only works with 3+ disks. All data needs to be on two disks.

e.g. you have 3 100GB drives, total capacity in raid 1 is 150GB.

If you replace a broken one with a 200GB one, the total capacity will be increased to 200GB.


My understanding is that RAID1 is just a mirror, and all disks have identical contents. Are you talking about something else?


Traditional RAID1 will mirror whole drives, yes. BTRFS RAID1 will mirror chunks of data (iirc 1GB) on two drives. So you can have e.g. two 1TB drive and a 2TB one just fine.


You can do this with ZFS, you just have to do it manually, ie by partitioning up the disks in say 100GB or 1TB partitions, then construct vdevs using these partitions.

You can then extend the pool by adding more such partition-based vdevs as you replace disks, just add the new partitions and add new vdevs.

So if you have a 1TB disk, a 2TB disk and a 4 TB disk, you could have mirrors (d1p1,d3p1), (d2p1,d3p2), (d2p2,d3p3) with a total of 3TB mirrored, and 1 TB available. If you swap the 1TB for a 2TB disk, partition it, replacing the old d1p1 partition with the new and resilver, and once that's done you can add the mirror (d1p2,d3p4) and get the full 4TB redundant storage.

Not a great solution though, as it requires a lot of manual work, and especially write performance will suffer because ZFS will treat the vdevs as being separate and issue IOs in parallel to them, overloading the underlying devices.


Thanks, that makes sense. I think the bcachefs --replicas option does something similar


Yeah, BTRFS is really not good for any sort of redundancy, not even very good for multi-disk in general.

1. The scheduler doesn't really exist. IIRC it is PID % num disks.

2. The default balancing policy is super basic. (IIRC always write to the disk with the most free space).

3. Erasure coding is still experimental.

4. Replication can only be configured at the FS level. bcachefs can configure this per-file or per-directory.

bcachefs is still early but it shows that it is serious about multi-disk. You can lump any collection of disks together and it mostly does the right thing. It tracks performance of different drives to make requests optimally and balances write to gradually even out the drives (not lasering a newly added disk).

IMHO there is really no comparison. If it wasn't for the fact that bcachefs ate my data I would be using it.


That, plus offline deduplication.


Bcachefs has this too


Development taking long usually means that the model itself is too complicated to be done right in a reasonable time, which indicates that the "stable" implementation could still be buggy, but only if you stray away from the common path. It's hard to feel comfortable using such a software in a fundamental role such a file system.


> which indicates that the "stable" implementation could still be buggy, but only if you stray away from the common path

Or that the complexity is such that if a new bug is found, it may take a long time to be fixed because of the complexity, or it is fixed fast and has unexpected knock-on effects even for circumstances on the common path.

Something that takes a long time to be declared stable/reliable because of its complexity, needs to spend a long time after that declaration without significant issues before I'll actually trust it. Things like btrfs definitely live in this category.

bcachefs even won't be something I use for important storage until it has been battle-tested a bit more for a bit longer, though at this point it is much more likely to take over from my current simple ext4-on-RAID arrangement (and when/if it does, my backups might stay on ext4-on-RAID even longer).


I think it's not quite so simple. The problem of organising storage is at least complex, on a scale of "simple complicated complex chaotic". The inherent complexity might be impossible to reduce to something simple or even just complicated, except _maybe_ with layering (à la LVM2), each layer tackling one issue independently of the others. But then it's probably at the cost of performance and other efficiency. Each layer should work such that it does not interfere too much with the performance of other layers. Not easy.

Given the rather cheap price of durable storage these days, I would favour rock solid, high quality code for storing my data, at the expense of some optimisations. Then again, I still like RAID, instantaneous snapshots, COW, encryption, xattr, resizable partitions, CRC... It's it possible to have all this with acceptable performance and simple code bricks combined and layered on top of each other?


In this case I think it’s the case that bcachefs has only a very small set of developers working in it.


But that was not the case for btrfs.


> Development taking long usually means that the model itself is too complicated to be done right in a reasonable time

yeah, features rich/complete fs is complicated, that's why we have very few of them.


One interesting titbit I've only recently found out is that btrfs can't really serve reads from different drives in RAID1, it picks a drive based on the process id.

ZFS does something smarter here, it keeps track of the queue length for each drive in a mirror, and picks the one with the lowest number of pending requests.


It's not simply that it took a long time to become stable; it's that during this time where it was unstable a lot of people got exposed to btrfs by having it lose data.

Personally, I was one of those people. Very excited about the prospects of btrfs, switched several machines over to it to test, ended up with filesystem corruption and had to revert to ext. Now, whenever I peek at btrfs, I never see anything that's compelling over running ZFS, which I've run for close to 15+ years, and run hard, and have never had data loss. Even in the early days with zfs+fuse, where I could regularly crash the zfs fuse; the zfs+fuse developers quickly addressed every crash I ran into, once I put together a stress test.


> it is now stable, and has been for a long time.

Is it really? I must have missed the news. Back when it was released completely raw as a default for many distros, there were fundamental design level issues (e.g. "unbound internal fragmentation" reported by Shishkin). Plus all the reports and personal experiences of getting and trying to recover exotically shaped bricks when volume fills to 100% (which could happen at any time with btrfs). Is it all good now? Where can I read about btrfs behaving robustly when no free space is left?


Btrfs lost its credibility and many people would never trust it.


So a year ago i tried to repeat my old trick damaging btrfs (as a user NOT root). Fill the volume with dd if=/dev/urandom of=./file bs=2M && sync && rm ./file then reboot the machine and yes it still works, it's not booting anymore, bravo.

BTW: Even SLES SuseLinux Enterprise says use XFS for data btrfs just for the OS i wonder why


> BTW: Even SLES SuseLinux Enterprise says use XFS for data btrfs just for the OS i wonder why

Because XFS is far quicker for server-related software such as databases and virtual machines, which are weak points on btrfs due to its COW model.


Doesn't "chattr +C" give you back that performance, while still letting you keep the rest of the benefits of Btrfs?


Nodatacow is an ugly hack because it disables btrfs's core features for the affected data. It also should not be used with raid1.


Yeah and maybe additionally you want to keep your data and have a stable filesystem for them ;)


> So a year ago i tried to repeat my old trick damaging btrfs (as a user NOT root). Fill the volume with dd if=/dev/urandom of=./file bs=2M && sync && rm ./file then reboot the machine and yes it still works, it's not booting anymore, bravo.

Do you know how ZFS handles that?


Without any problems. No other Filesystem i tested has that problem (ext4, XFS, ZFS, NTFS, JFS, Nilfs2)


Good to hear:) My understanding is that it's easier to break a CoW filesystem like that because if you run out of space you can't even delete things (because that requires writing that change), so I'm not surprised that the rest (the non-CoW filesystems) did fine, but I'm happy to hear that ZFS also handles it.


As little as one year ago I experienced damage on a lightly used btrfs root partition on my laptop. Never again. I use ext4 root and ZFS for /home for snapshots and transparent compression now, all on top of LVM


btrfs has still many weird issues. e.g. you can't remove a drive if it has I/O errors, even if the rest of the array has still enough space to accompany the data.

You can do a replace, but then you need to buy a new drive.


btrfs is not stable, at least not for me. it lost my data only a couple months ago. no power cut, no disk failure, data just gone.


My personal grievances with btrfs are multifaceted.

- I never agreed with the btrfs default of root raid 1 system not booting up if a device is missing. I think the point of raid1 is to minimize downtime when losing a device and if you lose the other device before returning it to good state, that's 100% on you.

- Poor management tools compared to md (though bcachefs might be in the same boat). Some tools are poorly thought, e.g. there is a tool for defragmentation, but it undoes sharing (so snapshots and dedupped files get expanded).

- If a drive in raid1 drops but then later comes back, btrfs is still quite happy.

- Need of using btrfs balance, and in a certain way as well: https://github.com/kdave/btrfsmaintenance/blob/master/btrfs-... .

- At least it used to be difficult to recover when your filesystem becomes full. Helps if you have it on LVM volume with extra space.

- Snapshotting or having a clone of a btrfs volume is dangerous (due to the uuid-based volume participant scanning)

- I believe raid5/6 is still experimental?

- I've lost a filesystem to btrfs raid10 (but my backups are good).

- I have also rendered my bcachefs in a state where I could no longer write to the filesystem, but I was still able to read it. So I'm inclined to keep using bcachefs for the time being.

Overall I just have the impression that btrfs was complicated and ended up in a design dead-end, making improvements from hard to difficult, and I hope that bcachefs has made different base designs, making future improvements easier.

Yes, the number of developers for bcachefs is smaller, but frankly as long as it's possible for a project to advance with a single developer, it is going to be the most effective way to go—at the same time I hope this situation improves in the future.


> I never agreed with the btrfs default of root raid 1 system not booting up if a device is missing.

Add "degraded" to default mount options. Solved.


Bad defaults is a huge issue even when you can change the config to something sane.


Defaulting to degraded is a bad default. Mounting a btrfs device array degraded is extremely risky, and the device not booting means you'll actually notice and take action.


Md devices do "degraded" by default and it seems fine. Indeed, I believe this is the default operation of all other multi-device systems, but of course I cannot verify this claim. I dislike all features that by default prevent from booting my system up.

The annoying part of this is that if you do reboot the system, it will never end up responding to a ping, meaning you need to visit the host yourself. In practice it might even have other drives you could configure remotely to replace the broken device. I use md's hot spare support routinely: an extra drive is available, should any of the drives of the independent raids fail.

Granted, md also has decent good monitoring options with mdadm or just cat /proc/mdstat.


Defaults should favor safety over convenience. Fail fast and fail hard. Md-raid's defaults are simply wrong.


Based on this logic, should the default mode of operation be to stop I/O when a disk dies?


As someone else pointed out: how is that different from losing a disk while running? Do you want the file system to stop working or become read-only if one disk is lost while running too? I think the bahviour should be the same on boot as while running.


You want a flaky system to pick a random disk at every boot and pretend that's the good one?


The usual reason I've seen RAID 1 used for the OS drive is -so- it still boots if it loses one.

Not doing so is especially upsetting when you discover you forgot to flip the setting only when a drive fails with the machine in question several hours' drive away (standalone remote servers like that tend not to have console access).

I think 'refusing to boot' is probably the right default for a workstation, but on the whole I think I'd prefer that to be a default set by the workstation distro installer rather than the filesystem.


That sounds like the right default then. If you're doing a home install, you get that extra little bit of protection. If you're doing a professional remote server deployment, you should be a responsible adult understanding the choices - and run with scrubbing, smart and monitoring for failures.


"Will my RAID configuration designed so my system can still boot even if it loses a drive not actually let it still boot if it loses a drive?" is not a question that I think is fair to expect sysadmins to realise they need to ask.

Complete Principle of Least Surprise violation given everything else's (that I'm aware of) RAID1 setups will still boot fine.

Also said monitoring should then notify you of an unexpected reboot and/or a dropped out disk, which you can then remediate in a planned fashion.

If this was a new concept then defaulting all the safety knobs to max would seem pretty reasonable to me, but it's an established concept with established uses and expectations - a server distro's installer should not be defaulting to 'cause an unnecessary outage and require unplanned physical maintenance to remediate it.'


> and the device not booting means you'll actually notice and take action.

How often do people reboot their systems? IMO, if it's running (without going into "read-only filesystem" mode) with X disks, it should boot with the same X disks. Otherwise, it might be running fine for a long time, and it arbitrarily not coming back in case of a power failure (when otherwise it would be running fine) is an unnecessary trap.


Since RAID is not a backup, isn't availability the main point of RAID1?


Good luck doing that after the disk shuts down okay but never comes back online


This option is only needed of you can mount the filesystem, but only degraded/to. If you're in that situation, you can remount easily. If not, that's not the solution you need.


Add to the grub kernel option, rootflags=degraded.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: