Hacker News new | past | comments | ask | show | jobs | submit login

"Issues with storage" is an understatement. Linux has major issues with storage. I see no way forward for Linux in that area unless they adopt ZFS.

How many filesystems has the Linux world created in the last 20 years? Of those how many are rotting piles of tire fire?

Filesystems are hard to get right.

ZFS is the way forward. For cross compatability, for reliability, for stability, for lots of use cases.

Btrfs is another tire to throw on the burning pile of other filesystems that won't work out in Linux.

Let the down votes begin.

But Linux has some major architectural issues that eventually the Linux faithful will have to admit to.

Containers.

Filesystems.

Observability. Though bless him Brendan Gregg is trying his hardest to help here.

Portability.

And this is what is so painful for the rest of us to watch or deal with. This inability for the Linux folks to admit these glaring architectural problems. Problems they refuse to look outside their bubble to see how others solved these problems and adopt solutions that have already solved these problems. They just want to continue to bury their heads in NIH soil. And double down on trying to solve these issues, poorly, on their own without any awareness of how others have solved these issues. People outside Linux land just might have the right ideas on how to solve these problems!

But I expect no one in that camp to acknowledge this and down vote away.

Sigh.




I can see a lot of inflammatory content here - vague accusations about "tire fire" filesystems and how Linux users are "faithful" zealots, blind to the truth, that will have to "admit" their folly and implicitly humble themselves before "the rest of us", whoever they are. And baiting a bad response, as you ask people to "downvote away" as if you want to inspire anger. All of this is needlessly confrontational. You don't need to insult your opposition to make your point.


You may view them as inflammatory. But it's coming from a POV of pain having to deal with broken architecture, piss poor portability and general pain of hacking around poorly engineered Linux architecture.

Frustration, pain are more appropriate terms from my pov than inflammatory. But ymmv.

It's like "Come on folks! You can do better than this! Please! Just stop! Think! Please. I'm begging you."


Regardless of whether your point is true or not (I agree that the state of Linux is suboptimal), this kind of attitude in which you spend more on venting your emotions than on actually describing issues, will not help you get your point across, and will not help making the world better. In the worst case you may even give BSD people a bad name by making the public perceive them as a bunch of emotional haters.


> ZFS is the way forward. For cross compatability, for reliability, for stability, for lots of use cases.

I'm not entirely convinced we should settle on ZFS just yet. It's fantastic and quite possibly the best option right now but it has a few limitations:

- The Linux implentation seems to have issues with releasing memory from the ARC back to the system

- The only way to expand a zpool is to add a new vdev. Pools are essentially a RAID0 of vdevs so if a single vdev fails, your entire server fails. You can mirror or RAID within a vdev but this means that the reliability of a vdev is the reliability of your entire pool. The problem here is that you can't just add 1 or 2 new disks since adding a 1 or 2 disk vdev would be data-suicide. For smaller servers, this is silly.

BTRFS looked like it was getting there but it proved buggy and unreliable. Personally, I'm waiting for bcachefs: https://www.patreon.com/bcachefs


There's nothing wrong with a 2-disk vdev. Mirrored instead of RAIDZ is probably the best choice for "smaller servers."


I strongly disagree. If I have an 8 disk server at home with mirroring, the pool could only take a single disk failure. The usable space would be 50% of the total disk. Further, I'd have to buy any extra disks in pairs.

If instead I ran RAID6, I'd have 80% of the disk available and I could add disks in single disk increments.

I think ZFS makes great sense for businesses that can throw money at disks but for smaller businesses or home servers it's kinda bad.


I think it's not a very good idea to design storage solutions around how inexpensive it is to add capacity to them. Your 9+-drive RAID6 is going to take forever to rebuild; 4+ mirrored vdevs (or mirrored RAID of course) will not be a problem at all.


I run a FreeNAS device at home (ZFS underneath).

It was kind of a pain to configure (albeit quite flexible) but it's been pretty nice overall, already survived 1 disk failure and a capacity upgrade (during which I had to resilver after every individual disk upgrade, which was time-consuming, but after the last disk got upgraded, the extra space finally showed up)


You also wouldn't get the same I/O performance a stripped mirror will give you. You can get extra resiliency using a three disk mirror.

Disk is cheap. There's no reason to design like it's not.


Disk is at least ~$0.023/GB. Whether it's "cheap" or not depends on how much of it you need.

Plus, disk may be cheap but servers to house it are not (the kind of servers you'd run in your house, I know you can get cheap SC846 off eBay).


Servers can be inexpensive. It really depends on your use case. You don't have to drop 50K on a box to make a killer ZFS storage array.


My scale starts at 40PB with a "P". Disks aren't cheap.


I use stripped mirrors for most of my ZFS deployments. Disk is cheap. If you want more resilience do a three way mirror.


I'd extend your point on zpool expansion to ZFS's general inflexibility when it comes to reshaping pools or rebalancing ZRAIDs.

I'm sure it's not an issue in enterprise environments, but I've personally been bit by ZFS's inability to shrink or rebalance pools (even offline) several times now through personal use.

Those sort of use cases need to be handled if ZFS is to become a more general use filesystem.


> Personally, I'm waiting for bcachefs

tires++

;-)


And this is what is so painful for the rest of us to watch or deal with. This inability for the Linux folks to admit these glaring architectural problems.

Alright, the suspense is killing me: what OS is the one that gets it right? I ask this to "the rest of us" as you put it.


SmartOS, FreeBSD, Tredly, etc


Tredly is "just" a FreeBSD distribution. And Linux can't have ZFS in-tree because CDDL.


It is not just a version of FreeBSD. The base is FreeBSD but the tooling for containers is not stock. It's value added bits that are shaping up to be very nice for containers on a more stable, secure, and storage friendly stack.


Yes, but the storage parts which the discussions are about are from FreeBSD


Ah, you must be from the alternate universe where most of us have heard of more than one of those!


What you've heard of, and what's workable and useful, do not necessarily intersect.


From the username, x86 BSD perhaps?


NixOS intrigues me as an attempt to create a completely deterministic build.


Many Linux users in particular are pretty well aware of the shortcomings of Linux's filesystem scenario, as well as the fact we never had features like DTrace. (I still think the over-proliferation of Linux tracing tools that fit 10,000 different case is a bit ridiculous.) As someone who worked on a distro, I've been acutely aware of many of these things, for example, as have our users (because they inevitably come to us for advice, suggestions, or bug reports).

I'm not sure what reality you live in where Linux users like, actively deny things like lacking an appropriate COW filesystem with low latency, or things like the fact it took us a decade to get only close to DTrace. Your own mind? 6 month Ubuntu users? Internet forums where you talk to other BSD users and nobody else?

> Let the down votes begin.

Don't worry, I'm sure people will oblige.


I like ZFS,

However XFS is perfectly good for storage. I've used it on 15pbs worth of fileservers with no particular nonsense.

I wouldn't touch BTRFS with a barge pole, not because of any inherent stability issues (although that is a big factor) its the utter nastyness of the tooling.

ZFS is a joy to setup, and its simple, logical and the man pages tell you useful things.

BTRFS, not so much


My favorite XFS workflow:

1. filesystem is corrupted, once again

2. try to repair it

3. oh right, the repair tool can not replay the journal

4. try to mount it

5. admit after 3 hours of nothing that the journal-replay code triggered on mount actually can really not deal with corruption

6. reboot the server to get the filesystem unstuck

7. rerun repair, this time throwing away the journal

8. look at the empty filesystem with everything in lost+found

9. restore from backup

The team I'm in only runs 1'400 servers, yet this happens regularly.


point taken. I have had xfs_check run out of memory because $reasons. This has reminded me of that. But then we also had lustre, so you know, by comparison XFS is a paragon of stability (We also had clustered XFS....)

But, mostly it happens because the fileservers dont have UPS's (The pipeline tools are almost exclusively COW, and backups are tested many times a week.)

I think we paid for support on XFS from either redhat or SGI, but I can't remember, I left that place a year or so ago.

I've never had the balls to run ZFS on large(100tb+) arrays. Last time I tried the way slab handling was translated from solaris cause many problems (but that was more than 4 years ago. ) Plus the support is a bit odd, you either go with oracle(fuck that), or one of the openZFS lot.

To get the best performance/stability you ideally need to let ZFS do everything, instead of letting the enclosure do the raid 6 (4 * 14 disk raid 6 with 4 spares). This of course is a break from the norm and to be treated with utmost suspicion.

Have you seen the GPFS raid replacement? thats quite sexy.

I run ZFS at home, because it works, and has bitrot checking.


Ah yes, ZFS the saviour.

Just as long as it does't get anywhere near capacity.


I assume you are referring to the same limitations almost every FS on earth to date suffers from when you consume about 90% capacity?

What a useless comment. If you are running out of capacity adding disks to a zpool is incredibly mind numbingly easy.

"zfs add poolname mirror disk1 disk2"

Every filesystem has performance hit the floor when you consume all available capacity. This is not unique to ZFS but to well, reality.


No I'm talking 100% capacity, shit happens and I do t want to fight the fs when it does.


100% is a lot better than ~80% (looking at you btrfs)


Sir, you have spoken well. What OS/distro should I be looking at?


Enterprise: Solaris and use zones. Internal/Personal/POC: FreeBSD 11 and use jails.

Both get you ZFS and something lighter-weight than VMs. Need Linux? Bite the bullet and use VMs.


I agree with this. SmartOS would be my choice over Solaris. I think FreeBSD is getting there. With things like Tredly. The building blocks are solid and available they just lack some polish.

So if you have the skill FreeBSD. Otherwise SmartOS makes a killer setup for containers unmatched by anyone else.


A warning, though: don't try to use FreeBSD with SCSI tape drives, in my truncated experience because of what I found experience a few months ago, the driver is capable of writing without error data it then can't read back (but can be read back by Linux).

Maybe not so important in the container context, but sooner or later, somewhere you need persistence, and tape offers certain persistence features you pretty much can't get elsewhere, especially at its media price points.

(Plus, if they got this wrong, when it's really not that hard to get right (I've done SCSI at this level before), I wonder what else they have.)


well, checkout hypercontainer (https://hypercontainer.io/) that runs on Linux, uses VMs and performs like containers.

No killing yourself necessary, really ;)


Solaris is discontinued but Illumos and SmartOS have taken up the torch.


That would come as a surprise. Solaris is alive and well thank god https://www.oracle.com/solaris/solaris11/index.html


See? Solaris is dead


CoreOS, as stated by the roadmap of the article. /s


Init systems.


[flagged]


Just because you disagree with him? Seems like he sparked a legitimate discussion.


Genetic fallacy, in the wild!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: