At last I got a good grasp of ZIL and SLOG, thanks to this great article :)
I'm left with confusion regarding the use of ZIL in default scenarios: with a newly installed ZFS system consisting of just a number of disks and default ZIL configuration, seems to me that the concept of ZIL itself is superfluous or even outright harmful! Sync writes would write data to the ZIL (i.e. to the pool disks) and then the data would be written to the datasets (i.e... to the pool disks). So, double amount of writes? Makes no sense to me, especially for SSDs, which will degrade twice as fast.
The article covers this scenario:
> If the disks that comprise the pool are sufficiently fast (e.g., high performance SSDs), the pool shouldn't noticeably bog down under heavy sync write activity.
But I cannot but wonder why ZFS simply doesn't say "if the ZIL is as per default, configured to write into the storage pool, then skip it altogether". In case of power loss, whatever data which hadn't been written already to storage on time, wouldn't have had enough time to have been written to the ZIL, either.
From recollection, so apologies if I get it wrong because I didn't go check, but...
IIRC above a certain size, you skip the double write; part of the reason to do the ZIL batching is, I believe, to group tiny sync IOs together in a persistent form that can be replayed later, so you can return sooner, rather than having to do lots of random IOs before returning.
> (part of the reason is) to group tiny sync IOs together in a persistent form
Yes, that seems to match precisely with what the latest messages seem to clarify in that discussion.
(Just dropping my suggestion for the article author) Seeing that this looks like a reasonable doubt that ZFS newbies might have, it could be an interesting point to mention. The article goes in great detail into the internals, so it probably is already well posed to include a mention about why the ZIL makes sense even when no SLOG is in use.
Hm, the author's citations of things from the documentation as the basis for things like saying don't use a partition for L2ARC or slog makes me want to go update the documentation to explicitly not say that, since...that's not the advice we give people who ask.
So much to update, so little time. Now I'm putting going through this and figuring out what needs to be updated in the docs on my TODO.
I happen to be learning about ZFS, and IMO it would help to also add "dataset" to the initial list of terms (ZFS Terminology), given that pools, vdevs, and snapshots are also introduced there as a concept.
Otherwise, the word "dataset" is used in some of those descriptions, but is not defined clearly until much later in the article. Which was a bit confusing.
I'm left with confusion regarding the use of ZIL in default scenarios: with a newly installed ZFS system consisting of just a number of disks and default ZIL configuration, seems to me that the concept of ZIL itself is superfluous or even outright harmful! Sync writes would write data to the ZIL (i.e. to the pool disks) and then the data would be written to the datasets (i.e... to the pool disks). So, double amount of writes? Makes no sense to me, especially for SSDs, which will degrade twice as fast.
The article covers this scenario:
> If the disks that comprise the pool are sufficiently fast (e.g., high performance SSDs), the pool shouldn't noticeably bog down under heavy sync write activity.
But I cannot but wonder why ZFS simply doesn't say "if the ZIL is as per default, configured to write into the storage pool, then skip it altogether". In case of power loss, whatever data which hadn't been written already to storage on time, wouldn't have had enough time to have been written to the ZIL, either.