I think the biggest missing feature for home/casual use in XFS right now is shrinking. Currently, it's impossible to reduce an XFS filesystem (partition) in size, so you have to commit to your disk layout once you've set it up. Whether it's to install another OS side-by-side, grow a swap partition, experiment with or gradually migrate to another filesystem, none are currently possible without adding additional physical storage or using loop devices.
The same applies to ZFS (it's not possible to shrink a ZFS pool), which is why I'm currently using btrfs (with all its pain points) on my machines.
I really would like more data about it. My impression is that the biggest limitation that btrfs suffers nowadays is a severely lacking communication. Even the official wiki is not so up to date, and there are a lot of horror stories surviving from years ago.
The situation has been quite reliable for some time now (single disk, raid{0,1,10}), moreover the feature set of btrfs is really wide (on par with zfs), with a very high flexibility: you can mix and match disks of different capacities, shrink and expand pools, change their redundance level through filtering, everything can be done online...
Even it it doesn't break your filesystem anymore: Performance is subpar in all aspects except sequential read/write even compared to ZFS. high snapshot count degrades performance and lot's of gotchas you discover after using it for a while. RAID1 is not RAID1 - it's oddness of the pid decides which disk to read from... scrub impacts io massivly. Tooling and documentation is not exactly great... lot's of quirky hacks to make up for design errors IMHO.
If it works for you, fine. Also it appears to get better - I won't touch it anymore if I can avoid it.
Well, so is ZFS, but in some situations, like choosing what filesystem to use on a rented dedicated server, any choice that could lead to a situation requiring physical hardware configuration changes is a non-starter.
ZFS licensing is an issue too, as it means that you can't just boot into any ol' Linux live CD (or remotely boot into a rescue environment) to fix the system or salvage the data on it.
> ZFS licensing is an issue too, as it means that you can't just boot into any ol' Linux live CD (or remotely boot into a rescue environment) to fix the system or salvage the data on it.
Sure you can. I've done just this with 3 different live CDs: ArchLinux, FreeBSD and OpenSolaris. I'm fairly sure I've also used ZFS on the Ubuntu Desktop live CD as well but that was just for playing rather than rescusing a degraded system.
FreeBSD and OpenSolaris probably aren't very useful, when trying to rescue a Linux system. Especially if you need to chroot and run things from there. (My need so far was to rescue non-booting system, because the zfs package upgrade went wrong and didn't update spl too. Re-running dracut would be a somewhat problematic from these two systems).
Ubuntu desktop live CD doesn't contain zfs, you have to install it from apt.
However, if you have a ZFS system, I see no problem with having an USB stick with minimal installation of your distro of choice, together with ZFS support. I'm glad I did have it since the ZFS install.
I think you're nitpicking a little to be honest. None of those problems are hard to workaround:
> FreeBSD and OpenSolaris probably aren't very useful, when trying to rescue a Linux system. Especially if you need to chroot and run things from there. (My need so far was to rescue non-booting system, because the zfs package upgrade went wrong and didn't update spl too. Re-running dracut would be a somewhat problematic from these two systems).
I do see your point but it really depends on the problem as not all recoveries require chroot / package management access. I've rescued Solaris (not OpenSolaris) with an OpenSuse live CD back when a cavalier opp chmodded /etc. I've rescued OpenSolaris with a FreeBSD CD back when a faulty RAID controller borked the file system. As for ArchLinux ISOs, I've used them to rescue more systems than I can count. But as you said, some problems do just require booting an instances of the host OS via some means.
> Ubuntu desktop live CD doesn't contain zfs, you have to install it from apt.
It took me all of about 10 minutes to bake the ZFS driver into the ISO. It's not hard compared to the other technical challenges you've discussed. Though if that's too much effort then I think you can also just apt it from the Live CD and manually modprobe it into the running kernel.
> However, if you have a ZFS system, I see no problem with having an USB stick with minimal installation of your distro of choice, together with ZFS support. I'm glad I did have it since the ZFS install.
Indeed. My preferred method is having rescue disks available over PXE booting. Before then I was forever hunting down my recovery disks or spare USB keys / CD-Rs. Not to mention the pain involved if the system I was trying to recover was my main workstation (ie the hardware I'd normally use to download and burn CDs on).
Sure, there are very few problems that cannot be solved by throwing some time and sweat on them. However, when I do need to solve something, I prefer to not be sidetracked by sub-problems. Smooth sailing and all that.
It's much simpler to pull an usb key from the drawer or PXE boot, as you mentioned, and go on on solving the damaged system, than to start downloading and preparing a live distro somewhere.
Again, you're overstating things. If it genuinely takes you more than a couple of minutes to run apt and modprobe then I really think you shouldn't be allowed anywere near a degraded system to begin with. These aren't "sub-problems" - they're the absolute basics of system administration.
It a bit more than couple of minutes to download installer, install it somewhere (livecd doesn't have persistent /), install zfs there and only then go on doing whatever you were doing.
Compared to grabbing standard media you have somewhere, it will take at least 15 minutes extra.
Basics of system administration does not mean, that you are wasting your time, especially on something you can be without.
> It a bit more than couple of minutes to download installer, install it somewhere (livecd doesn't have persistent /), install zfs there and only then go on doing whatever you were doing.
You don't need a persistent root. I'd already addressed that point. Just run modprobe and you're done.
>Compared to grabbing standard media you have somewhere, it will take at least 15 minutes extra.
Bullshit. I've done exactly what I described and it did not take me 15 minutes. Furthermore all you're doing is pre-emptively pushing the work to before your outage which you could do the same with the ISO (if you really wanted to compare apples with apples).
> Basics of system administration does not mean, that you are wasting your time, especially on something you can be without.
The whole point of this tangent was about when one needs an Live CD. Not about whether creating a live CD is worthwhile when you already have a USB key. That new argument you've invented is stupid because the answer is quite clearly "use the USB key if that's already in your draw." But what happens if you have a ZFS volume on a system and you don't already have a recovery media? (ie the original question) Well in that case you can use any of the methods I described. Or, of course, you can create a USB key too. But that will take just as long as the methods I described anyway (you still have to download the OS image, ZFS drivers and write them all to your storage medium. Thus all you're really doing is swapping out one chunk of plastic with another chunk of plastic).
> You don't need a persistent root. I'd already addressed that point. Just run modprobe and you're done.
That assumes too much. For example, that you have a network connection while booted from the live media. You may not have one; then you cannot run apt/yum and you need persistent media that you prepared somewhere else. (Happened to me).
> Bullshit.
Surely. Or you have extra speedy USB keys. Just installing minimal distro on USB takes a better chunk of that time.
> The whole point of this tangent was about when one needs an Live CD.
When you are doing something non-standard - and installing ZFS on Linux is pretty nonstandard - you know in advance that the normal live media won't work. It's prudent to have something prepared, if/when SHTF event occurs.
Specifically with regards to filesystem, when you are installing with non-distro-provided-fs root, you need to make it anyway, just to install it in the first place. So instead of throwing it away, just label it and put in into the drawer. (When you are not installing on non-distro-fs root, you don't need support for that fs in live media at all, the standard one will do for making the system boot).
You've been assuming a crap load of stuff as well when it suits your argument. Like having a pre-prepared USB key to begin with.
> For example, that you have a network connection while booted from the live media. You may not have one; then you cannot run apt/yum and you need persistent media that you prepared somewhere else. (Happened to me).
Indeed. You might also not have a CD drive on the host (happened to me), or any blank CD-Rs, or a CD burner on your workstation. Or the internet connection might not work on your workstation either. But then most of those arguments can be made for creating a USB key as well so your point is moot. In fact my latest workstation (Macbook Pro) only has USB-C so I couldn't use my USB keys when I went to install Linux on that.
My point is, if you're looking for ways to nitpick, there are plenty for your examples as well. In fact there will be a thousand different exceptions for any solution you could dream up. Thus is the nature of working in IT.
> Just installing minimal distro on USB takes a better chunk of that time.
Arguably yes but that also takes longer and your original point was about getting stuff done as quickly as possible. So you're now contradicting yourself.
> When you are doing something non-standard - and installing ZFS on Linux is pretty nonstandard - you know in advance that the normal live media won't work.
Except the whole point of this tangent is me demonstrating where it does work.
> It's prudent to have something prepared, if/when SHTF event occurs.
Now you're arguing a different point to the point I was discussing. I'm not going to disagree with you there (since I've already discussed I run a PXE server for situations like these) but that wasn't the topic we were discussing.
I seriously just think you're now just arguing for the sake of winning an internet argument. I'm not going to argue with you that a CD is better than USB because it's pretty obvious that isn't the case. But that wasn't the point I was discussing. So for the benefit of my own sanity can we please get back onto topic: you can use live CDs to repair a degraded system running ZFS. Sure there will be occasions when you cannot; but that's the case when doing anything in IT (and thus why use sysadmins get to command such a good wage). But generally you can. And I literally have. Many times in fact. So enough with the dumb "death by a thousand paper cuts" and goal post moving arguments please.
> You've been assuming a crap load of stuff as well when it suits your argument. Like having a pre-prepared USB key to begin with.
You are still conveniently ignoring what I said: if you want to install system with ZFS root, you have to make it. That's also the reason why I have it. I just didn't throw it away after the installation.
> Except the whole point of this tangent is me demonstrating where it does work.
Yes, if everything is aligned right, it can work.
> I seriously just think you're now just arguing for the sake of winning an internet argument.
You are free to think whatever you want.
> you can use live CDs to repair a degraded system running ZFS.
Yes, under certain conditions. How they apply in your environment is up to you to assess.
> Sure there will be occasions when you cannot; but that's the case when doing anything in IT (and thus why use sysadmins get to command such a good wage). But generally you can. And I literally have. Many times in fact. So enough with the dumb "death by a thousand paper cuts" and goal post moving arguments please.
It's not goal post moving, it's what happens. Having a livecd that supports your configuration is advantageous to not having it. Being able to download a ready-mady one is advantageous to having to make it. Etc.
So when I can choose between freebsd or opensolaris iso and native system that fully support whatever I need (that was the original issue, remember?), of course I will choose the latter, or having the latter available is preferred.
> You are still conveniently ignoring what I said: if you want to install system with ZFS root, you have to make it. That's also the reason why I have it. I just didn't throw it away after the installation.
I'm not ignoring it; I've repeatedly addressed it and pointed out how it's not true (the Ubuntu Desktop example). Want a few more examples? When I installed ArchLinux with a ZFS root I didn't use a custom ISO (read their ZFS wiki if you don't believe me). I also didn't create a custom Ubuntu Server ISO when I installed that with a ZFS root. Both were installed from CD - the vanilla CD available on their respective websites.
Also, even if you did install from a USB key; what's to say you don't then lose said key afterwards? I'm forever am losing them.
The point is whichever argument you're going to make will be full of more exceptions than you can count. So nitpicking one over the other, like you are, is an utterly pointless exercise and a distraction from the original point I was making.
> So when I can choose between freebsd or opensolaris iso and native system that fully support whatever I need (that was the original issue, remember?)
No that wasn't the original issue. The original issue was whether there are an live CDs that can be used to rescue a degraded ZFS system - which I've demonstrated there are.
However I do agree with you that running ZFS on Linux is a little pointless when FreeBSD and the OpenSolaris forks are all solid platforms and have unencumbered native ZFS support. Though installing a ZFS root on FreeBSD was just as painful as doing so on ArchLinux (at least that was the case a few versions ago - things might have improved since but thankfully FreeBSD never really needs rebuilds so I've not had revisit that particular pain point)
Mind that it doesn't work quite so well as ZFS: btrfs send/receive doesn't produce an identical file system as the source. ctimes aren't preserved, some attributes, and depending on mount options, ACLs and xattrs can vanish too.
The two most obvious reasons to assume so are that btrfs is in comparison relatively new and quite complex. Not to say zfs isn't complex, but I'd rather trust zfs just because of its age.
That being said, unless I really need those specific features, I go for ext4 whenever possible, as it has to be the most battle tested one, at least when it comes to *nix. It also seems that fsck.ext4 has almost magical powers sometimes, but that shouldn't stop you from making backups obviously.
> I think the biggest missing feature for home/casual use in XFS right now is shrinking. .... The same applies to ZFS (it's not possible to shrink a ZFS pool)
I have to admit that recently I've only had the exact opposite use-case: Wanting to expand volumes.
Except for authoring install-media or images where you want to reduce the final file-size... What common use-cases are there for volume-reductions?
Since XFS works by multiple allocation groups spread across the volume, that would require you to migrate data out of the area you want to reclaim, which would require a pretty significant reshuffle of the volume contents.
It's easier to just backup and refill with a single set of reads and single set of writes than it is to suffer the huge overhead of all that random metadata updating and seeking.
You mean, by overcommitting the filesystem size, and using discard to free up underlying space?
Possibly... how well does that work in practice? What happens when LVM runs out of space due to some runaway disk-filling process, is the filesystem ready to handle out-of-space errors coming from LVM in all situations?
There are 3 filesystems that can do transparent compression: NTFS, ZFS, btrfs. One is for Windows, the other isn't available for RedHat/CentOS, the third is getting sunset. XFS could really benefit from compression support.
ZFS is available in RHEL/CentOS. Not as a first-party option, but thanks to kABI that Red Hat provides, it is provided by the ZoL project also in a form of a binary kernel module, ready to go, without having to fumble around with DKMS and compilers.
Btrfs is being obsoleted by RHEL only, because they have XFS expertise, but not Btrfs. No need to split limited resources. Other distributions will continue with development. SuSE doesn't intend to stop.
The general mix has some potential for compression.
Looking at one machine available:
* / has 2.08 compression ratio,
* /var/cache 2.15
* /var/tmp 1.0
* /home 1.02
* /srv 1.10 (few web apps, pgsql instance for these web apps, svn repo, samba shares, local prometheus store).
If your server uses spinning rust, you can also increase the I/O by using compression. You are trading the CPU time for I/O bandwidth. Depending on your workload, it may be a sensible trade-off.
I was thinking more in terms of application data, thank you for giving me this perspective.
But still wondering: does the amount of this data really warrant compression? I mean the smallest sensible size of an SSD is around 100GB and it has lots of performance.
In this case, the svn repos and samba shares are way over 100GB and even with classic HDDs, the machine can saturate the network and still be basically idling. No need for SSDs, classic drives were more cost effective.
The compression was just something nice to have. Lz4 in filesystem is basically for free.
Logfiles. ZFS with lz4 has given me compression ratios of over 100× with /var/log, due to the huge amount of repetition.
My first test of PostgreSQL on ZFS was also quite instructive. lz4 again achieved respectable compression (>10× for my datasets) and improved throughput several fold (with no other tuning!).
I've been having some fun with a revived SGI Octane running IRIX and it occurred to me, quite out-of-the-blue, that XFS development essentially ceased before SSDs were ever even contemplated. For a few moments, I pondered the apparent profundity of this realisation, and then I moved on, cursing the lack of package management while trying to get something to work.
I worked down the hall from the XFS team at SGI in 2005. Development had definitely not ceased, and their work did not seem heavily related to Octane systems. They were focused on XFS for Suse and RedHat Linux for enterprise and supercomputing.
ah, youre right up to a point. around 2014 there was a great flurry of work done, which meant its metadata speed improved >50x turning it from a great streaming file system, to a good all rounder.
It was a bit before that, iirc. LWN article summarizing the changes at [1], which are mostly delaylog, if we are referring to the same thing. Was default-enabled in 2011 on Linux 3.3.
... I’ve happened to spend much of the last three weeks learning about XFS.
Basically copying files without actually copying them. For example, when making versioned backups, or when copying a large number of files from production to various development environments.
Something I wish there was a better solution for. Perhaps there is a filesystem out there that supports CoW in a way that fits this usecase (XFS perhaps even), but I haven't looked into it.
The disadvantage of using hardlinks is that you can't hardlink between users (user is a property of the file, not of the link to the file), and there's always the danger that a write takes place through one of the links. Imho, that should really be solved at the filesystem level using a CoW scheme.
> Basically copying files without actually copying them. For example, when making versioned backups, or when copying a large number of files from production to various development environments.
To me it sounds like you want simple snapshots and backups without the redundancy at the storage level.
So why don't use a filesystem which supports that natively, like ZFS or Btrfs?
That's one backup every hour for 7+ years. I'm really curious about the use case.
Anyway, theoretically[0]:
> As you might know btrfs treats subvolumes as filesystems and hence the number of snapshots is indeed limited: namely by the size of files. According to the btrfs wiki the maximum filesize that can be reached is 2^64 byte == 16 EiB
But it seems practically you're hitting the mud at ~100 snapshots[1] but be sure to read the reply to that mail as it will depend on the use case and it might turn out to be fine way beyond that.
git clone uses hardlinks to reuse object storage if you clone a local repository. I use it when I need another working copy, that way you save some space. And time as well as you don't need to pull it all from the network.
edit: Local in this case means cloning inside one filesystem. Hardlinks cannot span filesystem boundaries of course.
Just out of curiosity, what is the use case for having multiple local clones? Is it to test run pruning or manipulate the reflog? Or is it for developing on git itself?
Why? Isn't the whole idea of using git that changing branch is just as simple/fast as doing "cd"?
You're probably not doing your ongoing work in the branch you are merging for release anyway, if you did you'd have solve any conflicts all over again when you switch clone.
> When a CoW filesystem writes to a block of data or metadata, it first makes a copy of it
Is this a really a precise description? I was sure that (for data) the actual copy only happens in case of a block with multiple references. If there was a single use of a block I expected it to be modified in place in most real-world filesystems. Summary from wikipedia seems to confirm that.
Am I missing something, or is that just unfortunate wording?
What they describe with the write of the data followed with write of the indexes as new elements seems more like a log filesystem.
No, the copy is generally done every time. In-place writes cause all kinds of problems with atomicity, ordering, torn writes, etc. Also, for ZFS/btrfs/bcachefs style filesystems the pointer contains a checksum of the block and thus it needs to be updated on every write.
There's lots of things which are called copy on write, where CoW means really copy-on-deduplication-otherwise-update-in-place. Like qcow2 filesystem. Or Cow type in Rust.
tl;dr XFS will support using filesystem images as if they were directories, kind of like an "internal loop", which will allow (with the help of copy-on-write data, which they support in recent versions) having subvolumes/snapshots.
It might just be how you present it, but to me that sounds like using multiple layers of hacks to implement use-cases other filesystems were carefully designed to support in the first place, and that sounds extremely unreliable and brittle.
Yeah, these subvolumes are going to have scalability issues compared with cleanly designed subvolumes such as in ZFS. But I wouldn't describe them as a hack - it's a rather interesting feature that not other filesystem has explored before. I would describe it as "loop devices done well". I don't think reliability will be a problem, for the upper layer these embedded filesystems are in fact just files.
The same applies to ZFS (it's not possible to shrink a ZFS pool), which is why I'm currently using btrfs (with all its pain points) on my machines.