Would the Core Infrastructure Initiative be willing to sponsor a cryptographer to review it?
Getting this right would make it a lot easier to have faith in encrypted backups. Right now, you have to trust that each network backup product like bacula is doing it right.
I just randomly picked bacula and then looked up what they are doing. They're using OpenSSL with AES-CBC. I don't think anyone would recommend that.
As a start, it would be nice if Datto could document the design decisions. For instance, AES-CCM is a bit of a strange default and it's not explained anywhere. I see Solaris chose AES-128-CCM and maybe that influenced why OpenZFS is using AES-256-CCM. I haven't found a cryptographer in favor of CCM over GCM. For instance, Matthew Green doesn't appear to like it over GCM.
Given that GCM IV reuse is catastrophic and one of the main use cases for this is backups to potentially untrusted sources, I'm curious if AES-GCM-SIV would be a more conservative choice.
I'm Tom Caputi, author of the ZFS encryption patch and I can answer of few of your questions.
First of all, the choice of AES-CCM. I have had a few people ask me why we didn't chose something like ChaCha20 as a block cipher instead of AES. This is largely because AES is by far the most scrutinized block cipher around. It's use is currently so widely accepted that modern Intel CPUs have built-in AES instructions to improve performance. While its true to say that ChaCha20 (or other block ciphers) might theoretically be faster or that they ARE faster on some architectures like 32-bit cell phone CPUs, this is not currently the case with the vast majority of ZFS deployments.
As far as the choice for CCM as a default goes, this one was a little bit harder. Originally this decision was made to match the Oracle implementation as much as possible (a design decision which has since been dropped). Later, when we re-evaluated the descision, we found a paper (http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comment...) indicating there might be weaknesses with the authentication mechanism, although the paper only mentioned cases with truncated authentication tags. So in the interest of being as conservative as possible, we chose the option which looked the most secure. We did not look into AES-GCM-SIV since it is very new (it looks like it actually came out this year) and so I would not by any means consider it a "conservative" choice.
As far as performance goes, we have not yet (as far as I'm aware anyway) seen a case where read or write speed wasn't bottlenecked by the disk speed. The benchmarks you posted are (as far as I can tell) single threaded and ZFS processes each block asynchronously.
The biggest thing here is that AES-256-CCM is only the default. It is easy for users to pick GCM for the time being and for developers to add newer, better encryption algorithms and change the default as time goes on. I wouldn't be surprised i we had changed the default by the time that the patch ends up in a tagged release.
I'm not a cryptographer, but for your dedup version, you might want to consider using SIV mode [1,2]. It provides a well known, already vetted deterministic authenticated encryption, so you don't have to build your own from HMAC etc and then get it vetted.
Thank you for the suggestion. As I said above, there is nothing preventing something like that from being implemented in ZFS in the future. Unfortunately, however, the encryption implementation uses a port of the Illumos Kernel Crypto Framework, which has not yet implemented an SIV mode.
As far as using our own HMAC goes we are using a standard SHA512-HMAC implementation which should be just as secure.
This is really good! I was trying to find some time to write about the crypto details of it, but your post is a great summary :) I'll add a reference to it in my post.
I almost cannot wait for this. Native file system encryption will be so good, because it removes the ugly/inelegant layer of indirection with Geli or LUKS, and it allows stuff like encrypted ZFS send and receive. It allows you to switch on and off encryption dynamically. That will allow you to use services like rsync.net without having to place as much trust their internal security practices.
It's great they're taking their time, but I've been looking forward to a stable release since its announcement almost a year ago :)
Uncertain that it enables encrypted send and receive. It very likely decrypts to get it into the send format, and then is reencrypted on the receive side. Those are separate file systems so it's not certain that they share, or even should share, the same encryption key.
Update: Ahh well it helps to whole article before posting. Both encrypted and decrypted send/receive are possible. But this also suggests multiple keys per pool, a key per file system I guess?
The nice thing about LUKS for the end-user, however, is that it's well-supported in most Linux distros these days, and the end-user can turn it on within the graphical installer literally by checking a box, "Encrypt my Hard-drive."
When ZFS gets there, I will probably switch to it. Until then, however, I find manually partitioning volumes on the command-line to be terrifying, and I imagine I'm not alone.
Can I zfs send encrypted incremental snapshots to a remote server and have them merged without ever having the remote server decrypt anything? That would be awesome to backup to an untrusted remote computer.
64 bytes seems like an arbitrarily short limit for passphrases; it's going to be hashed at some point anyway (preferably with a decent KDF designed for the purpose) and 64 bytes is nowhere near long enough for natural English language text to be used to derive 32 bytes of entropy (estimates of English language text are ~1.5bits of entropy per character).
At the risk of invoking Cunningham's Law, nobody uses passphrases anywhere near 64 bytes that he / she has to memorize. And if you're using a password manager, just use the raw or hex key option.
32 bytes => 64 hex digits, which is probably not a coincidence. :-)
Eh, Materialistic doesn't allow me to edit my comment. I mean you're right about passwords, but a passphrases (e.g. a sentences) are easy to remember, more secure and can often be longer than 64 bytes.
Er, of course it's not a coincidence; hex is base 16, so it takes 4 bits to represent each digit, which is half a byte. That is, hex always is 2 digits per byte.
64 bytes is a bit arbitrary. Up until now I had not heard any complaints about this since most people I know who would use a 64 character passphrase are using a password manager instead. I am adding a test to the PR today and when I do I'll bump it to 512 bytes. We don't want arbitrary password lengths to prevent crazy amounts of passphrase hashing.
We use PBKDF2 as a key derivation function, which is specifically designed to turn low entropy, arbitrary length strings into fixed length, high entropy keys suitable for encryption. It also has the added bonus of making password brute forcing significantly harder.
Free(BSD|NAS) so far create a "normal" ZFS on top of an encrypted block device, produced via the cryptographic GEOM provider geli ( https://www.freebsd.org/cgi/man.cgi?geli(8) )
This instead is ZFS doing the actual encryption on a normal block device.
It's only supported encryption in the same way Linux does: by doing a lower-level block device encryption. There's a lot of disadvantages to it, and native ZFS dataset encryption would be nicer.
Does anyone know a good guide for getting set up with ZFS (particularly on Linux/Ubuntu)? The Ubuntu wiki basically just say "apt install zfs" and then links to a page that appears to be a brain dump of all the possible commands. Something with more coherence explanation of the relevant concepts would be really helpful.
Booting with UEFI, ZFS and RAIDz works, but I did not figure out how to install the UEFI grub-boot on all RAIDz drives.
Therefore, if the one and only drive with the boot partition fails, I have to rely on an USB-stick as a backup-plan for the bootloader.
Probably if I could start over I would try to skip UEFI and use legacy booting and installing Grub in the MBR of each drive (if that is possible).
For encryption we are using Luks right now, since it encrypts block devices and ZFS uses these block devices to form its raid.
Luks needs to decrypt ALL raid devices inside the Grub boot prompt in order for ZFS to start its raidz.
To make this happen, we had to modify:
/usr/share/initramfs-tools/hooks/cryptroot and remove an early return in a for-loop in get_fs_devices() , since otherwise only the first device was decrypted.
Make sure /etc/lvm/lvm.conf contains use_lvmetad = 0
Make sure /etc/default/grub contains GRUB_ENABLE_CRYPTODISK=y
GRUB_CMDLINE_LINUX_DEFAULT=“text"
So, yeah, I am looking forward for zfs-native encryption on Linux to make booting "safer/easier".
Maybe I had a better luck, for me, UEFI boot with ZFS in Virtualbox did work. I've used CentOS though.
Multiple UEFI boot partition are possible, but just like you note in your article, they have to be outside the ZFS pool. All of them. Then you set order for each one using efibootmgr. For each, you will get a separate entry in UEFI boot manager.
It is not necessary to install grub2 after rebooting into ZFS root. It is perfectly fine to run grub2-install in chroot from your temporary installation.
if it's about management, then you might want to look at the docs I recommend in the post: Oracle's docs are very detailed[1] and the freebsd handbook[2] is very useful as well.
[1] is a nice explanation of getting set up with ZFS root on Ubuntu 16.04 with explanations for what each of the commands are doing.
I'm not in a good position to judge whether this is the correct level of detail for someone new to ZFS; it seems slightly lighter on explanation of concepts than I'd prefer.
That's for Ubuntu 14, but it's basically the same process for Ubuntu 16. Neither installer supports root on ZFS, so you have to install Ubuntu to a flash drive, create the ZFS pool from the live environment, and then copy the contents of the flash drive onto the ZFS pool.
I'm not really familiar with the development process between ZFS on Linux and ZFS on BSD, do these pull requests usually get merged upstream for use on BSD, Solaris, etc?
Together FreeBSD, ZoL and the Illumos groups work on OpenZFS.
Oracle closed off Solaris. The decendent of Open Solaris is Illumos. Oracle ZFS and OpenZFS are not hte same thing.
OpenZFS repo is the upstream, though it is closely tracks the Illumos version.
From there the projects pull on OpenZFS to create their implementations. New features are developed independently by the projects with rules that a new feature must be enabled in a downstream distro for X period of time before it can be upstreamed into OpenZFS.
Yes and they have a poor implementation for encryption at that. Pawel who did the port of zfs to FreeBSD did not like how they implemented it. As far as zfs goes oracle will be the odd man out compatibility wise with the rest of the platforms supporting zfs and the openzfs encryption scheme.
Oracle ZFS does have encryption, but this implementation fixes a few security / usability issues with it, adds more features, and brings the implementation into the open source world.
I've been really looking forward to OpenZFS encryption support. ZFS on LUKS leaves a lot to be desired, and I've only just recently begun trying to shift all my PCs and devices to be fully encrypted by default.