Cookie Consent by TermsFeed

Migrating from VM to Hierarchical Jails in FreeBSD

Migrating from VM to Hierarchical Jails in FreeBSD

A few days ago, coinciding with the update to FreeBSD 14.0-RELEASE, I encountered several servers that were operational, stable, and efficient. These servers hosted a number of service jails and, primarily, several VMs managed with bhyve. The VMs, mainly FreeBSD, OpenBSD, and Linux, are stable and efficient. Each has its own IP address and is designed for specific functions. However, I had a realization: it makes sense for Linux and OpenBSD, but what’s the point of having FreeBSD VMs within FreeBSD?

Certainly, this can ensure greater security and freedom in kernel usage, but in situations like the services I’m managing, it becomes somewhat secondary. The only thing I notice is that each of these VMs uses all the assigned RAM (caching the unused portion - which, added to the host’s cache, can lead to double caching and consequent RAM wastage), uses virtual disks themselves stored on the physical host’s ZFS pool (thus two layers for storage), and due to the current inability to trim disk images (which are faster than zvol volumes), there’s also a waste of disk space.

Therefore, I hypothesized two scenarios: the first, migrate all the jails directly onto the host, reconfiguring them. This would be a smart choice, but I need to maintain a certain separation between environments. Then, I revisited something I had already experimented with but not yet deployed in production: nested jails.

FreeBSD has supported nesting of jails natively since version 8.0, which dates back to 2009. Looking at the jail(8) man page, there is an entire paragraph named Hierarchical Jails that explains the concept of jail hierarchy well. It’s one of the many gems of FreeBSD that, although not widely known or used, is, in my opinion, extremely useful.

I decided to conduct an experiment with the following setup:

  • On the physical host, create a “main” jail (the equivalent of a VM) and, under it, all the other jails.
  • Adjust the configurations of the main jail to ensure that the underlying jails can run without issues.
  • Create a specific zfs dataset for the main jail and, by assigning it, enable it to create further datasets for the underlying jails.
  • Tweak the configurations and migrate the main dataset of BastilleBSD that contains all the jails.

I managed to get everything up and running successfully.

The first step was to create the main VNET jail. Using BastilleBSD, the process was quite straightforward:

bastille create -B jail01 14.0-RELEASE 0.0.0.0 bridge0

This command creates a VNET jail, attaches it to bridge0, and assigns the IP address via DHCP. Then, to stop it, I used:

bastille stop jail01

Next, it was time to create a dataset dedicated to that jail:

zfs create -o jailed=on zroot/jail01

Finally, it was necessary to modify the jail configuration:

vi /usr/local/bastille/jails/jail01/jail.conf

The next step involved commenting out the enforce_statfs = 2; and securelevel = 2; lines and adding several configurations. It’s important to note that some of these options might not be necessary, and the permissions granted to the main jail are more than what is typically required. This approach was intentional, as my priority was to get everything functioning first, then gradually tighten the settings to reach the minimal necessary permissions:

     children.max = 256;
     allow.mount;
     allow.mount.devfs;
     allow.mount.procfs;
     allow.mount.fdescfs;
     allow.mount.linprocfs;
     allow.mount.zfs;
     allow.mount.nullfs;
     allow.mount.tmpfs;
     allow.raw_sockets;
     allow.socket_af;
     allow.sysvipc;
     allow.chflags;
     enforce_statfs=1;
     securelevel=0; 

Please note: securelevel=0 is required to install and update the BastilleBSD templates using the bastille bootstrap 14.0-RELEASE update command. However, once the BastilleBSD templates have been bootstrapped and updated, it is recommended to change this value to securelevel=2 for increased security.

In the section where the VNET network interface lines are located, some modifications are necessary. Immediately after the last prestart command, add the following:

  exec.poststart += "/sbin/zfs jail jail01 zroot/jail01";
  exec.poststart += "jexec jail01 zfs mount -a";

These two lines ensure that, after the jail is activated, the additional dataset will be “attached” to it, and within the jail, a command is issued to mount all the datasets.

Theoretically, the configuration process should have been complete at this stage, but I encountered an issue: in the event of a shutdown of the primary jail and malfunction of the underlying jails, it might become impossible to restart them since the various nullmounts might still be active, causing BastilleBSD to fail.

To address this, I created a small script that would unmount, after shutting down the jail, all the underlying filesystems, including nullfs.

Add the following line at the end:

exec.poststop += "/usr/local/sbin/umount_jail.sh /usr/local/bastille/jails/jail01/root/jail01";

The umount_jail.sh script can be structured as follows:

#!/bin/sh

# Check if a path argument was provided
if [ $# -eq 0 ]; then
    echo "Usage: $0 <base_path>"
    exit 1
fi

# Get the base path from the command line argument
BASE_PATH=$1

# Function to recursively unmount all mount points under a given directory
unmount_recursive() {
    mount | grep -E "^.* on ${1}" | awk '{print $3}' | sort -r | while read mount_point
    do
        umount -f "$mount_point" || echo "Failed to unmount $mount_point"
    done
}

# Call the function with the specified base path
unmount_recursive "$BASE_PATH"

Now, it’s possible to launch the jail using:

bastille start jail01

Once it’s up and running, you can enter the console with:

bastille console jail01

At this point, it should be possible to verify that the external dataset has been mounted:

root@jail01:~ # zfs list
NAME           USED  AVAIL     REFER  MOUNTPOINT
zroot          305G  98.3G       96K  /zroot
zroot/jail01    96K  98.3G       96K  /zroot/jail01

Now, it’s time to install BastilleBSD on jail01:

pkg install -y bastille

Follow the configuration instructions available at BastilleBSD Getting Started, keeping in mind that our network interface will not be vtnet0 but vnet0 (for pf.conf).

Modify the /usr/local/etc/bastille/bastille.conf to include the correct options for using ZFS and the appropriate dataset (in this case, zroot/jail01).

Next, proceed with the bootstrap:

bastille bootstrap 14.0-RELEASE update

Once completed, we can create our first jail:

bastille create jailA 14.0-RELEASE 192.168.1.1 bastille0

However, this will lead to an error:

root@jail01:~ # bastille create jailA 14.0-RELEASE 192.168.1.1 bastille0
Valid: (192.168.1.1).
Valid: (bastille0).

Creating a thinjail...

[jailA]:
mount: .: Operation not permitted
jail: jailA: /sbin/mount -t devfs -oruleset=4 . /usr/local/bastille/jails/jailA/root/dev: failed

[jailA]: Not started. See 'bastille start jailA'.
[jailA]: Not started. See 'bastille start jailA'.
[jailA]:
mount: .: Operation not permitted
jail: jailA: /sbin/mount -t devfs -oruleset=4 . /usr/local/bastille/jails/jailA/root/dev: failed

After some research, I found a workaround for the jail creation issue. The solution involves modifying the /usr/local/bastille/jails/jailA/jail.conf file and changing the devfs_ruleset value from “4” to “0”.

This is not immediately intuitive, but I proposed a modification to BastilleBSD. This involves checking the security.jail.jailed sysctl; if it’s set to 1, it indicates that we are already inside a jail.

With this change made, it’s now possible to launch the jail:

bastille start jailA

And then connect to it:

bastille console jailA

Note that BastilleBSD might terminate prematurely, so it won’t insert the nameserver. You can easily add it in /etc/resolv.conf to rectify this.

Welcome to the jail-in-a-jail setup. In my case, I transferred the entire BastilleBSD dataset from the VM to the jail using zfs send and zfs receive. After modifying the devfs_ruleset from 4 to 0, everything started working perfectly.


See also