Demystifying OpenZFS 2.0

todd8 · on Nov 19, 2021

This is a great discussion on ZFS. I thoroughly enjoy reading HN! For those who aren't real IT people here is how I set up my first ZFS system at home.

My background includes a few years at IBM as an OS architect in charge of, among other things, the file systems and network services for AIX. However, this was a long time ago (~1988). So I understood NFS, inodes, what it meant to mount a device and the classic Unix commands for managing filesystems.

The first ZFS problem for me was just the vocabulary. What's a pool? What's a VDev? Fortunately, there are lots of high-level descriptions of ZFS on the internet that cover the high level concepts.

Getting the hardware set up was the second hurdle. There are a lot of discussions on forums that cover hardware issues and there you will find answers to many hardware issues, like: is ECC memory required, how fast does the processor need to be, are SATA drives recommended and what drive controllers are compatible with ZFS. Naturally, the answers to these questions depend on how critical is the system you are putting together. I suggest for most people to try setting up a ZFS on some old hardware just to get practice with it. I had a low end Dell tower server model that I wasn't using so it became my first FreeBSD/ZFS server. It had four hard drives.

The real difficulty for me was just the lack of a deep understanding of how ZFS works. ZFS has lots of features totally new to me so working with only the man pages for the ZFS related commands was not giving me much fun. I ended up buying two small books that I can't recommend enough, FreeBSD Mastery: ZFS and FreeBSD Mastery: Advanced ZFS by Michael W Lucas and Allan Jude, see [1] and [2].

These books are each around 200 pages long. The chapters of the first book are:

1. Introducing ZFS

2. Virtual Devices

3. Pools

4. ZFS Datasets

5. Repairs and Renovations

6. Disk Space Management

7. Snapshots and Clones

8. Installing ZFS

Each of the chapters is quite detailed, and I found these two books were just what I needed.

[1] https://read.amazon.com/kp/embed?asin=B00Y32OHNM&preview=new...

[2] https://read.amazon.com/kp/embed?asin=B01E40YIRM&preview=new...

bhouston · on Nov 18, 2021

EDIT: There is a now an incorrect response to this post of mine stating that I forgot to mention the ZIL SLOG -- but the poster doesn't understand how the ZIL SLOG actually works unfortunately. :(

I applaud OpenZFS for these major improvements, but its performance is not competitive with some of the commercial offerings, even if OpenZFS is a more easily administered solution.

The main issue with OpenZFS performance is its write speed.

While OpenZFS has excellent read caching via ARC and L2ARC, it doesn't enable NVMe write caching nor does it allow for automatic tiered storage pools (which can have NVMe paired with HDDs.)

This means that large writes, those which exceed the available RAM, will drop down to the HDD write speed quite quickly instead of achieving wire speed. Which means I get write speeds of 300 MBps versus 10 GBps wirespeed.

The solutions competitors have implemented are:

- SSD read-write caching, usually using mirrored NVMe drives to ensure redundancy.

- SSD tiering, which is including your SSDs in your storage pool. Frequently accessed data is moved onto the SSDs while infrequently accessed data is moved to the HDDs.

Without these features, my QNAP (configured with SSD tiering) kicks the pants off of ZFS on writes that do not fit into memory.

References:

[1] QNAP Tiering: https://www.qnap.com/solution/qtier/en-us/

[2] QNAP SSD Read-Write Caching: https://www.qnap.com/solution/ssd-cache/en/

[3] Synology SSD Read-Write Caching: https://kb.synology.com/en-my/DSM/help/DSM/StorageManager/ge...

walrus01 · on Nov 18, 2021

> Without these features, my QNAP (configured with SSD tiering) kicks the pants off of ZFS on writes that do not fit into memory.

> Synology

I am perfectly willing to sacrifice some performance in exchange for not being locked into a vendor-specific proprietary NAS/SAN hardware platform or closed source software solution that's proprietary to that one vendor.

danudey · on Nov 18, 2021

While I like the idea of moving to a proper self-managed system, my Synology NAS is the least proprietary of all the proprietary systems I've used.

That is to say that yes, it does have a bunch of proprietary stuff on top, but it also:

1. Runs Linux

2. Uses Linux systems, like BTRFS and LVM, to pool data

3. Provides SSH access

4. Has tons of packages that you might want to use (Squid, Node, Plex, etc.) available

5. Supports Docker, among other things, so you can run what you like.

So yeah, they have a lot of proprietary offerings on top of a generic Linux storage solution, but if you change your mind you can take that generic Linux storage solution somewhere else without being locked in to anything Synology provides. The only caveat is if you choose to use their all-the-way proprietary systems, like their Google Docs equivalent or their cloud music/drive/etc. stuff, all of which is still stored on the same accessible drives. You might lose your "music library" or "photo library" or "video library", but your music and photo and video files will still be accessible.

xoa · on Nov 18, 2021

>The main issue with OpenZFS performance is its write speed.

At first I thought you were talking about actual raw read/write speed and how things like ARC or write caches can actually become bottlenecks when using NVMe storage, which can easily get to 200 Gbps with a small number of devices. That's being worked on via efforts like adding Direct IO.

Instead though I think you've fallen into one of those niche holes that always takes longer to get filled because there isn't much demand. Big ZFS users simply have tons and tons of drives, and with large enough arrays to spread across even rust can do alright. They'll also have more custom direction of hot/cold to different pools entirely. Smaller scale users, depending on size vs budget, may just be using pure solid state at this point, or multiple pools with more manual management. Basic 2TB SATA SSDs are down to <$170 and 2TB NVMe drives are <$190. 10 Gbps isn't much to hit, and there are lots of ways to script things up for management. RAM for buffer is pretty cheap now too. Using a mirrored NVMe-based ZIL and bypassing cache might also get many people where they want to be on sync writes.

I can see why it'd be a nice bit of polish to have hybrid pool management built-in, but I can also see how there wouldn't be a ton of demand to implement it given the various incentives in play. Might be worth seeing if there is any open work on such a thing or at least feature request.

Also to your lower comment:

>I am using TrueNAS for my ZFS solution and it doesn't offer anything out of the box to address the issues I am bringing up.

While not directly related to precisely what you're asking for, I recently started using TrueNAS myself for the first time after over a decade of using ZFS full time and one thing that immediately surprised me is how unexpectedly limited the GUI is. TONS and tons of normal ZFS functionality is not exposed there for no reason I can understand. However, it's all still supported, it's still FreeBSD and normal ZFS underneath. You can still pull up a shell and manipulate things via the commandline, or (better for some stuff) modify the GUI framework helpers to customize things like pool creation options. The power is still there at least, though other issues like inexplicably shitty user support in the GUI show it's clearly aimed at home/SoHo, maybe with some SMB roles.

bhouston · on Nov 18, 2021

You are correct -- I am in a niche area. I have less than 12 HDDs in a RAIDZ2 configuration. If I had more HDDs in the RAID I would see better write speeds. But I think most home users with TrueNAS ZFS do not have massive arrays, thus my envy of the tiering or true SSD caching.

You are correct that prices of small SSD are near HDD prices, but most people would want larger drives like 12 to 16TB drives in their array, and SSDs can not compete with these on price at all.

Maybe we are in the twilight of HDDs? And SSDs will compete on price across the whole storage capacity range soon? Maybe...

rsync · on Nov 18, 2021

"You are correct -- I am in a niche area. I have less than 12 HDDs in a RAIDZ2 configuration. If I had more HDDs in the RAID I would see better write speeds."

That is incorrect.

RaidZ(x) vdev write speed is equal to the write speed of a single drive in the vdev.

If you want a raidZ zpool to have faster writes, you add a second vdev - that would double your write speed.

A wider vdev, on the other hand (15 or 18 drives, etc.) would not have faster write speeds.

walrus01 · on Nov 18, 2021

another consideration is that the "cheap" 4TB SSDs are things like quad-level cells, such as the samsung 870 QVO, which are slightly better in a $ per GB ratio, but also suffer in ultimate GB written lifespan for cell endurance wear-out. Thus the reason why the samsung "pro" are almost double the cost.

if you needed to write a LOT (let's say something like a sustained 70MB per second all the time) but the data rate was not particularly high, you might easily wear out a RAIDZ2 composed of 2TB-4TB sized cheap SSDs and kill them in a fairly short period of time, when the same RAIDZ2 composed of 2.5" 15mm height 5TB spinning drives, or 3.5" 14-16TB drives would successfully run for many years.

also re: cheap SSDs, the quad level cell tech and write cache is a big limitation in sustained write speeds:

https://www.firstpost.com/tech/news-analysis/samsung-870-qvo...

vladvasiliu · on Nov 19, 2021

> But I think most home users with TrueNAS ZFS do not have massive arrays, thus my envy of the tiering or true SSD caching.

I may be heavily influenced by my own situation, but in France, SSDs still are much more expensive than HDDs. I've just bought four IrownWolves, €100 for 4 TB. The cheapest 2 TB SSD I can remember was around €180.

My point is that given the price and the usage profile, I kind of agree with the other poster: I don't have a use for tiered ZFS and would much rather they spent the time on other things.

As it is, I'm already doing a tiered storage of sorts: data that I'm currently working with requiring fast i/o is directly attached to my computer. "Cold" or less i/o sensitive data is plenty fast on spinning drives across the network.

As a home user living in an apartment, this affords another optimization: kick the NAS out and access it over the internet, because gigabit speeds become sufficient.

laumars · on Nov 18, 2021

What larger HDDs would you recommend? I need to upgrade a bunch of 3TB and 5TB drives that have been running for several years but really concerned about failure rates. I used to swear by WD Red however I’m hearing some horror stories about them in recent years.

blancNoir · on Nov 18, 2021

https://www.backblaze.com/blog/backblaze-drive-stats-for-q3-...

walrus01 · on Nov 18, 2021

> Basic 2TB SATA SSDs are down to <$170 and 2TB NVMe drives are <$190.

You sure wouldn't want to use them in any write heavy zfs application however, because you'll kill them quick in their ultimate TB write endurance. Those are triple level and quad level cell SSDs. Double the dollar figures there for anything with a ultimate lifespan TB written figure that you would want to try in a ZFS array.

tw04 · on Nov 18, 2021

>The main issue with OpenZFS performance is its write speed.

>While OpenZFS has excellent read caching via ARC and L2ARC, it doesn't enable NVMe write caching nor does it allow for automatic tiered storage pools (which can have NVMe paired with HDDs.)

Huh? What are you talking about? ZFS has had a write cache from day 1: ZIL. ZFS Intent Log, with SLOG which is a dedicated device. Back in the day we'd use RAM based devices, now you can use optane (or any other fast device of your choosing including just a regular old SSD).

https://openzfs.org/w/images/c/c8/10-ZIL_performance.pdf

https://www.servethehome.com/exploring-best-zfs-zil-slog-ssd...

magicalhippo · on Nov 18, 2021

> Huh? What are you talking about? ZFS has had a write cache from day 1: ZIL.

The ZIL is specifically not a cache. It can have cache-like behavior in certain cases, particularly when paired with fast SLOG devices, but that's incidental and not its purpose. Its purpose is to ensure integrity and consistency.

Specifically, the writes to disk are served from RAM[1], the ZIL is only read from in case of recovery from an unclean shutdown. ZFS won't store more writes in the SLOG than what it can store in RAM, unlike a write-back cache device (which ZFS does not support yet).

[1]: https://klarasystems.com/articles/what-makes-a-good-time-to-...

y04nn · on Nov 18, 2021

That explains the results I've got last week when I was doing some tests on an slow external HDD and a SLOG file placed on my internal SSD, the write speed dropped much faster that I expected. I found it disappointing that ZFS needs to keep the data in RAM and on the SLOG dev. You really need a lot of RAM to make full use of it. And a larger SLOG dev than RAM space available makes no sense then? A real write cache would be awesome for ZFS.

magicalhippo · on Nov 18, 2021

> And a larger SLOG dev than RAM space available makes no sense then?

Exactly, and it's even worse. By default ZFS only stores about 5 seconds worth of writes in the ZIL, so if you have a NAS with say a 10Gbe link that's less than 10GB in the ZIL (and hence SLOG) at any time.

> A real write cache would be awesome for ZFS.

Agreed. It's a real shame the write-back cache code never got merged, I think it's one of the major weaknesses of ZFS today.

rincebrain · on Nov 19, 2021

I haven't tried Nexenta in a long long time, but it seems like it was part of their allocation classes code [1]. The comment at the top of the file says the implementation there doesn't support WBC, but [2] seems to suggest otherwise...

So if you really wanted to, you could try looting code from there, though I believe Nexenta's implementation of that was either orthogonal or significantly different from the one that landed in OpenZFS...

[1] - https://github.com/Nexenta/illumos-nexenta/blob/release-5.3/...

[2] - https://github.com/Nexenta/illumos-nexenta/blob/release-5.3/...

bhouston · on Nov 18, 2021

That you for being right!

So many people misunderstand the ZIL SLOG, including the guy you are responding to.

I think most posts on the internet about the ZIL SLOG do not explain it correctly and thus a large number of people misunderstand it.

magicalhippo · on Nov 18, 2021

Yes it took me some time to fully grasp how the ZIL and SLOG worked and how it is not a cache.

There was some work done[1] on proper write-back cache for ZFS by Nexenta several years ago, but it seems it either stalled or there was a decision to keep it in-house as it hasn't made its way back to OpenZFS.

[1]: https://openzfs.org/wiki/Writeback_Cache

bhouston · on Nov 18, 2021

> Huh? What are you talking about? ZFS has had a write cache from day 1: ZIL. ZFS Intent Log, with SLOG which is a dedicated device. Back in the day we'd use RAM based devices, now you can use optane (or any other fast device of your choosing including just a regular old SSD).

You are spreading incorrect information, please stop. You know enough to be dangerous, but not enough to actually be right.

I've benchmarked ZIL SLOG in all configurations. It doesn't speed up writes generally. It speeds up acknowledgements on sync writes only. But it doesn't act as a write through cache in my readings and in my testing.

What it does is it allows a sync acknowledgements to be sent once a sync write is written to the ZIL SLOG device.

But it doesn't actually use the ZIL SLOG for reading at all in normal operation, instead it uses the in-memory RAM to cache the actual write to the HDD-based Pool. Thus you are still limited to RAM size when you are doing large writes -- you may get acknowledges quicker on small sync writes, but your RAM limits the size of large write speeds because it will fill up and have to wait for it to be saved to the HDD to accept new data.

Here is more data on this:

https://www.reddit.com/r/zfs/comments/bjmfnv/comment/em9lh1i...

tw04 · on Nov 18, 2021

You've modified this response 4 times and counting to try to twist my response into something it wasn't and isn't so I'll respond once and be done with the conversation. I never said ZIL is a write-back cache and neither did you. Sync writes are the only writes you should be doing if you care about your data in almost all circumstances.

bhouston · on Nov 18, 2021

A ZIL SLOG doesn’t even speed up sync writes if your writes do not fit into RAM. It isn’t a write cache either, it is a parallel cache for integrity purposes so sync acknowledgements happen faster. But you are still limited by RAM size for large write speed as I said in my original post.

reacharavindh · on Nov 18, 2021

Isn't that what ZIL(Intent Log)/SLOG is for? I have a huge ZFS server (> 1 PB usable ZFS storage) that runs with 768 GiB of memory and 480 GB of Intel Optane as SLOG. All our writes are sync by virtue of using NFS from our compute cluster. I'm sure we see better write speeds, and not wait on spinning disks for write IOs.

formerly_proven · on Nov 18, 2021

Are you using a single ZFS server as the primary storage system for the cluster? If so, how do you handle availability? Do nodes have local scratch space or is it a do-everything-on-the-cluster-FS kind of setup? How is this setup treating you (administration, "oh no something is slow / hanging / broke, whatdoido", availability, performance, especially on those heavy days)? Is this some kind of COTS solution supported by one of the usual vendors or did you roll it yourself? Being ZFS I'm guessing it is far better at dealing with small files, and with directories containing many files than clustered FS which fall over pretty easily in these cases?

e: Not doing market research here, I'm just seeing (sometimes, feeling) all the pain involved with cluster filesystems and wondering how your solution stacks up.

reacharavindh · on Nov 18, 2021

Yes. Single ZFS server attached to 3 JBODs(60 disks each). Well, single ZFS server per namespace. We have 3 similar configurations for logically separate storage needs.

Yes, nodes have NVMe SSD local scratch space for fast compute needs. Large datasets reside in central storage. Smaller chunks are usually brought to local scratch to let compute jobs do their thing. Results go back to central storage again.

I built it for our needs based on freely available docs and community support. The server runs FreeBSD + ZFS + NFS and nothing else. Snapshots from this primary storage gets shipped over to a secondary location that has a slow mirror(similar setup but with cheap SATA disks). Our need is only minor geographical data durability and not lower downtime DR.

Experience: beats most commercial stuff we have run so far. Minimal setup of FreeBSD, ZFS, and NFS runs buttery smooth with almost no operational needs. I run upgrades on OS after it has been in community for a while, and it has only been smooth. We didn’t find a need to buy specialist support as our setup is minimal, and workload not any crazy, and availability targets not so strict.

Performance wise this setup far exceeded our expectations in all our usecases. We do have 768 GiB of memory for this that ZFS happily makes use of. ARC is about 60-70% of mem most of the time.

I will happily build the same again should there be a need for storage.

yjftsjthsd-h · on Nov 18, 2021

But how are you handling availability? If something happens to the hardware (ex. motherboard) or even just for installing OS updates (ex. FreeBSD security patches), do you lose storage for the whole cluster?

reacharavindh · on Nov 18, 2021

Yes. Our availability target is not high. Our priorities while designing this system were 1. Data durability - need to make sure scientific data is never lost, 2. cost - our budget is small, and we build what we can afford, 3. performance - extract most value out of the hardware we use, 4. Availability - maximise availability by virtue of having fewer things that can fail. However, fancy failover systems and distributed filesystems may be out of reach(at least for now).

So, we make do with what we have. We have three similar storage configs(and three mirrors as slow backups). We carry spare memory sticks, Optane drives, disks, and power supply for minimising down time due to hardware failures. If something were to go wrong for which we don't have spares, yes, the storage will be offline until we can fix it.

Planned updates? - we have two scheduled downtimes in a year. All patches to these machines happen at that time. These machines are strictly internal and don't have internet access except for those planned maintenance days.

Datagenerator · on Nov 18, 2021

>7PB on ZFS in production here. No vendor support, FreeBSD with send/receive pairs to another 7PB location monitoring with Grafana, delayed sync so we always have proper DR available.

reacharavindh · on Nov 18, 2021

Similar setup here. Do you have any pointers on your monitoring setup? I’m working on settling it up right next week, and could use a nice reference.

magicalhippo · on Nov 18, 2021

> Isn't that what ZIL(Intent Log)/SLOG is for?

No, the ZIL is just for integrity[1].

[1]: https://news.ycombinator.com/item?id=29266628

tw04 · on Nov 18, 2021

That's not true. When a write is committed to the SLOG, it acknowledges back to the host immediately. If you've got disk contention on the back-end, the host will absolutely see faster write response until the point your memory is 3/8th full. If you do not have an SLOG, the system will not ack to the host until it has committed the TXG to disk on a sync write, meaning any disk contention will present itself as massive latency. With 768GB of memory and a 480GB SLOG in the system that is a LOT of outstanding transactions that could be buffered.

Sure if your workload is primarily long, sustained transactions the SLOG will do little. But for any workload that is even a little bursty in nature (like hosting virtual machines) it will absolutely make a difference.

All of the above is assuming you have enabled sync=always, which I hope people are doing for production workloads unless they happen to have an app that can tolerate lost writes/partial writes.

magicalhippo · on Nov 18, 2021

The ZIL is not a write-back cache. The ZIL is for maintaining integrity. The SLOG is for speeding up the ZIL.

As such it can behave a bit like a write-back cache as I mentioned in the linked post.

A crucial difference is that the ZIL is limited by the available memory regardless (and tunables, by default 5 seconds worth of data), which would not be the case for a write-back cache device.

tw04 · on Nov 18, 2021

I don't know why you keep responding by saying it's not a write-back cache.

1. I didn't ever call it a write-back cache and never even hinted at it operating like one

2. Write-back caching is only one of many options for write caches

Regardless of both of those points, the SLOG ABSOLUTELY will speed up writes if you have a bursty workload (like the virtual machines he's running) and sync=always enabled.

>A crucial difference is that the ZIL is limited by the available memory regardless (and tunables, by default 5 seconds worth of data), which would not be the case for a write-back cache device.

And yet that point is irrelevant to OPs question of whether or not SLOG was speeding up his writes, which it almost assuredly is doing with the amount of memory and the size of his SLOG device when he is doing sync writes.

magicalhippo · on Nov 18, 2021

Fair enough, I should have just called it cache.

The point remains though. The ZIL is not meant to be a cache. The SLOG (which is a "ZIL on a separate device") could be viewed as a cache, but it's a cache for the ZIL. Thus being a cache for writes is incidental, not the primary purpose.

Certainly, if your workload aligns with those conditions, then you will experience a performance boost. For other workloads it it won't help and for certain workloads it can even be detrimental[1].

[1]: https://www.reddit.com/r/zfs/comments/m6ydfv/performance_eff...

myrandomcomment · on Nov 20, 2021

https://jrs-s.net/2019/05/02/zfs-sync-async-zil-slog/

ZIL and SLOG are NOT a write cache. They are 100% about not loosing data. A ZIL is part of a zpool where data is written to when the file protocol requires sync. When it is written the sync is cleared. In this case you are sharing the zpool with all its others needs to write and read, so more load on the pool. The data is also in RAM given to the write aggregator process. The RAM is only written latter. If the system crashes before the aggregator writes the data, on restart the data written to the ZIL is read into the aggregator and then is written out to the zpool. By default the ZIL is never read. An SLOG is just a ZIL that is on a separate device from the zpool. So a sync write comes in and is written to the SLOG and not part of the spinning rust zpool so no shared resources with the pool. The sync is marked as cleared when the data is written to the SLOG. The SLOG should be really fast as the goal is to ack the sync quickly. Remember at this point the data is NOT written to the zpool, it’s RAM and the aggregator will write the data out later. Same thing applies on crash. Reboot, SLOG is reviewed for uncleared items, sent to RAM, written to pool.

TLDR - zil/slog are write devices where the data is written when sync is required and almost never read back. Data is alway in RAM cache and written out to the spool from RAM. ZIL/SLOG is not a buffer. Speed of ack sync will be as fast as data can be written to ZIL/SLOG.

gorkish · on Nov 18, 2021

ZLOG/ZIL devices journal writes; they have always been in zfs. SSD/NVMe with power loss protection are recommended. Yes, it requires dedicated devices, but so does L2ARC.

bhouston · on Nov 18, 2021

It is commonly misunderstood that ZIL SLOGs are write-back caches, they are not. They do not speed up large writes that do not fit into memory. I have tested this myself. It seems to be an incredibly popular misconception that has spread very widely.

reacharavindh · on Nov 18, 2021

_that_do not fit into memory_ is an important caveat to your abandoning the speedup offered by SLOG for sync writes.

Case A : No SLOG. You have sync writes and they hit memory, the transaction groups get flushed to disks and _only then_ the ack goes back completing the write IO operation.

Case B : low latency SLOG. You have sync writes and they hit memory, immediately gets written to SLOG, immediately the ack goes back completing the write IO operation while batching in transaction group does the flush to disks in background(at slow disk speeds). Sure, if you are out of memory to commit those write IOs, you're going to hit the disks in hot path and get undesirable perf.

Case B surely is always better than case A wouldn't you say?

bhouston · on Nov 19, 2021

I agree with your statement above completely. You understand ZIL SLOG perfectly. I have an NVMe as my ZIL SLOG device but it really doesn’t help much for my workloads - which involve transfer of large files around.

In my scenarios were I ran out of memory on large writes and them get slow downs I would love a real write cache on an NVMe or tiered storage. It would make a huge difference.

Intermernet · on Nov 18, 2021

Is QNAP doing something outside the ZFS spec here? If not, it's almost certain that solutions like TrueNAS can replicate this behaviour. If they are, I'd be concerned about vendor lock-in and data recovery.

I'm assuming that QNAP are using ZFS, which a cursory search supports.

bhouston · on Nov 18, 2021

> I'm assuming that QNAP are using ZFS, which a cursory search supports.

QNAP has two NAS OSes right now. The newer one is QuTS and it support ZFS. The other older one is QTS which doesn't use ZFS. Only the older QTS OS supports SSD Tiering.

> Is QNAP doing something outside the ZFS spec here? If not, it's almost certain that solutions like TrueNAS can replicate this behaviour. If they are, I'd be concerned about vendor lock-in and data recovery.

I am using TrueNAS for my ZFS solution and it doesn't offer anything out of the box to address the issues I am bringing up.

lizknope · on Nov 18, 2021

I don't use ZFS or QNAP but Linux has its own native dm-cache and bcache implementations for creating hybrid volumes out of SSD plus hard drives.

I would suspect that QNAP is using one of them.

https://en.wikipedia.org/wiki/Dm-cache

https://en.wikipedia.org/wiki/Bcache

tw04 · on Nov 18, 2021

TrueNAS very much supports a write log:

https://www.truenas.com/docs/references/slog/

bhouston · on Nov 18, 2021

It is commonly misunderstood that ZIL SLOGs are write-back caches, they are not. They do not speed up large writes that do not fit into memory. I have tested this myself.

You are spreading incorrect information, please stop. You know enough to be dangerous, but not enough to actually be right.

tw04 · on Nov 18, 2021

Says that guy that has three times now claimed I called SLOG a write-back cache and yet can't seem to find that quote anywhere. I'd appreciate if you stopped putting words in my mouth and stopped editing your posts to remove your false statements. You're acting like a child.

xoa · on Nov 18, 2021

Looks like a decent basic rundown of what was another nice jump forward (2020 fwiw, 2.0 wasn't just introduced it launched about 1 year ago, current version is 2.1.1). Worth noting a number of these have significant implications for tuning performance in common use cases, so some resources from even a few years ago are now a bit out of date. I know a lot of people for example have followed JRS' "ZFS tuning cheat sheet" [0], and a lot of that still applies. But given the increasing usage of enormously faster but pricier solid state drives, commonly a relative surfeit of processing power, and the performance of ZStandard, I think most people should now consider it over lz4 by default when using solid state drives unless using a very weak system or every last cycle is needed for other tasks (VMs perhaps). A bunch of detailed performance graphs comparing different compression algorithms were posted in the ZSTD feature introduction [1] which can help give a better idea of the right balance. HN had a long discussion on the kind of improvement it can make at scale just a few days ago [2]. Persistent L2ARC has also potentially changed the math there in some cases, though that one in turn may well be obsoleted entirely for SSD pools. And not new here I think but somewhat more recent, for those who do still have things that make use of atime ZFS now includes the Linux style 'relatime' support which does a nice job of splitting the difference between noatime and full atime.

Also, since a reasonable number of people may use ZFS via TrueNAS (formerly FreeNAS), note the GUI there doesn't expose a ton of useful features itself but can still be told to do them since it handles interfacing with ZFS via a plain Python. Currently, GUI pool/fs creation stuff is handled via:

  /usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py

'options' handles pool flags and 'fsoptions' fs ones. Modifying that (reboot seems needed to make it take effect afterwards though perhaps reloading something else would do it) will then allow all the GUI setup to work and slice things the way TrueNAS wants while still offering finer grained property control. Particularly useful for things that can only be set at creation time, like pool properties such as ashift/feature flags or fs properties like utf8/normalization.

----

0: https://jrs-s.net/2018/08/17/zfs-tuning-cheat-sheet/

1: https://github.com/openzfs/zfs/pull/9735#issuecomment-570082...

2: https://news.ycombinator.com/item?id=29164727

john37386 · on Nov 18, 2021

What is the difference between Klara and IXSystems? Are they competitor, partners or something else?

magicalhippo · on Nov 18, 2021

To add a bit more, iXsystems makes enterprise NAS systems, and develops TrueNAS (formerly FreeNAS and TrueOS) which has ZFS as one of it's main features.

Klara systems is more of a development and support consultancy firm for FreeBSD and ZFS.

vermaden · on Nov 18, 2021

Just two different independent companies. Like Netflix and Apple TV for example.

willis936 · on Nov 18, 2021

Competitors, but they both participate in OpenZFS development. Yet another success story for open standards.

https://www.ixsystems.com/blog/openzfs-2-on-truenas/

thijsvandien · on Nov 18, 2021

Competing how, exactly?

willis936 · on Nov 18, 2021

They both make NAS software and have similar monetization strategies. Business to one is lost to the other.