ZFS Encryption and Organization

To learn Unix system administration and to facilitate backups, I have been maintaining a home server. I chose to avoid options like TrueNAS that put the user at a level of abstraction too high to understand the system’s individual components like ZFS. Everything becomes a checkbox on a webpage. This is a great feature for ease of use, but it would restrict my learning.

I gathered three one terabyte hard drives (one 2.5" drive from an old laptop, one 3.5" from an old desktop, and one from craigslist) and one 128 gigabyte solid state drive (SSD). While I cannot recommend that you reproduce this setup, it does offer good data resiliency. The drives are unlikely to fail at the same time, which may be a problem with drives bought in a batch. In a raidz1 configuration, the data will survive one drive failure.

Overview of previous config

I installed Debian Buster onto the SSD. The base install had only an ext4 filesystem. I had not initially considered that booting Linux on ZFS is not well-supported, but the server’s layout was made more redundant because of this. The server can boot without any of the ZFS drives, and the ZFS drives could be mounted on any other machine with the correct filesystem packages. In this way, the ZFS storage is agnostic to the operating system, and only has to concern itself with my media and backups.

I had an incomplete understanding of ZFS when I first deployed it on the server, and the implementation was sloppy. For one, I was so concerned with getting everything to work that I overlooked encryption until after the storage was in use, complicating migration to an encrypted setup.

Organizational problems arose from the misconception that my group of drives could only be mounted in one place. This is not at all the case. I began by using only the default mountpoint, which is just a directory at the root of your filesystem. For me, this was /mainpool. Of course, this caused a mess with permissions. By default, all files within the directory were owned by root, meaning that the system user for Jellyfin could not read my media.

How ZFS actually works

Now I will explain how ZFS works in my environment so that you will know enough to avoid the mistakes explained above. On Linux, block devices are referred to as files like /dev/sda or /dev/disk/by-uuid/... These disks (or even just one) can be grouped into a “virtual device” or “vdev” by ZFS. Parity can be set at the vdev level. This can be RAID-Z or a mirror. A mirror copies the same data to every drive in the vdev, whereas RAID-Z creates a parity of the specific number of disks. For instance, raidz1 is one drive of parity. One or more virtual devices can be added into a zpool. Data will be distributed across all of a pool’s vdevs. Bigger vdevs will receive a larger proportion of this data, so it can be more efficient to have vdevs of similar sizes.

I have transcribed an example config that captures the properties covered so far. Here, the zpool is example, and the vdevs are raidz1-0 and mirror-1:

$ zpool status example

  pool: example
 state: ONLINE
  scan: none requested
config:
    NAME        STATE   READ WRITE CKSUM
    example     ONLINE     0     0     0
      raidz1-0  ONLINE     0     0     0
        ada1    ONLINE     0     0     0
        ada2    ONLINE     0     0     0
        ada3    ONLINE     0     0     0
      mirror-1  ONLINE     0     0     0
        ada4    ONLINE     0     0     0
        ada5    ONLINE     0     0     0

errors: No known data errors

Once the zpool has been created, the zpool can be mounted in multiple locations via filesystem datasets. You can create as many of these datasets as you like, and they can be mounted just like other filesystems. You can have a dataset for every media folder, and every user’s home directory. There is no cost for having too many. Encryption is usually added per dataset.

Better Implementation

In order to reconfigure ZFS without losing data, I copied everything to an external hard drive, then destroyed the filesystem using zfs destroy.

Starting from scratch, I ran:

$ zpool create -O mountpoint=none tank raidz1 sdb sdc sdd

Where sdb sdc and sdd are the hard drives.

Here is what the config looks like:

NAME        STATE     READ WRITE CKSUM
tank        ONLINE       0     0     0
  raidz1-0  ONLINE       0     0     0
    sdb     ONLINE       0     0     0
    sdc     ONLINE       0     0     0
    sdd     ONLINE       0     0     0

Very nice. Next I will create an encrypted dataset mounted at my Jellyfin media directory.

$ zfs create -o encryption=aes-256-gcm -o keyformat=passphrase -o keylocation=prompt -o mountpoint=/var/lib/jellyfin/media tank/jelly

Now I can begin copying the media back over from the external hard drive.

In order to mount the pool in the future, the key must be loaded beforehand. This is achieved using zfs load-key as follows:

$ zfs load-key tank/jelly
$ zfs mount tank/jelly

Key Takeaways

In my previous configuration, reliance on the default mountpoint at the root of the Linux filesystem led to poor organization. It is better to create and mount several datasets for your specific needs, and to use options that fit each need, like encryption. Additionally, be aware of the immutability of settings like encryption. Encryption cannot be easily enabled on existing, unencrypted datasets. So, make sure that datasets are configured correctly before extensive use.

posted on: 09/06/2021 02:34:49 PM, last edited on: 01/14/2022 11:12:35 PM, written by: Nate Buttke