Cookie Consent by TermsFeed

How we are migrating (many of) our servers from Linux to FreeBSD - Part 3 - Proxmox to FreeBSD

How we are migrating (many of) our servers from Linux to FreeBSD - Part 3 - Proxmox to FreeBSD

In recent years, we’ve been migrating many of our servers from Linux to FreeBSD as part of our consolidation and optimization efforts. Specifically, we’ve been moving services that were previously deployed using Docker onto FreeBSD, and it has proven to be a great choice for handling workloads efficiently.

To this end, we’ve also been migrating many of our virtual machines (VMs) to FreeBSD, deploying services within FreeBSD jails. In some cases, these jails have even replaced entire VMs and run bare metal. Although we prefer to move to native FreeBSD whenever possible, sometimes it’s not the best option for all the services we offer. As a result, one of our most critical physical servers has been left behind for years.

This server was a Proxmox server that we installed many years ago and updated to version 6.4. It hosted some critical services, but upgrading to Proxmox 7.x posed some challenges. In particular, some of the LXC containers required tweaks.

Unfortunately, this server was quite old, with only four physical disks and 64 GB of RAM. It was located in an OVH data center and had been running well until one of the disks started to malfunction once a week, on Sundays. This would trigger a RAID reconstruction that kept the system busy for about two days.

Despite my preference for simple setups, this server had been deployed gradually over many years, and everything was tied together. As a result, unraveling the system to resolve the issues was not a simple task. Sometimes the combination of simple things can make everything complex.

The Proxmox Server

The Proxmox server was configured as the central hub for various services, including primary DNS, web hosting, VOIP, and more. It featured several bridges, each with its own specific purpose, and was connected to a virtual machine running MikroTik CHR. This machine was responsible for consolidating all incoming VPNs from the MikroTik devices we managed, both ours and those belonging to our clients. Additionally, it provided a series of bridges to manage these devices and all server management VPNs and other services. The Proxmox server also housed several virtual machines running Linux, FreeBSD, OpenBSD, and NetBSD, as well as LXC containers.

Over the last two years, we’ve been migrating most of these virtual machines and containers to FreeBSD-based VMs, which feature their own specific jails. Consequently, most of the VMs we’ve had to move were BSD-based, while only five Linux VMs remained. The LXC containers hosted a range of services, including servers managed by Virtualmin, a large installation of Zimbra (which was hosted within an LXC container running CentOS 7), as well as some minor Alpine Linux-based machines. We located all these virtual machines and containers in a LAN created and managed by CHR. All public IPs were managed by CHR, which relied on NAT mappings to establish communication between them. CHR had thus become the heart of our system, and if it experienced any issues, it could potentially take down the entire system. Fortunately, it remained stable for years.

Migration - first steps

The first step I took was to install FreeBSD on the new server. Easy peasy. The next step was to find a way for the CHR to migrate to the new server (under bhyve) and continue to manage all the public IPs of the original server. The problem is that OVH, with its failover IPs, ties a specific MAC address to each individual IP address. Therefore, the only way was to create a bridge on the FreeBSD server (on the Proxmox server, I already had the bridge on the physical network card) and create an L2 tunnel between the two servers - I used OpenVPN with tap interfaces, specifically inserted into the bridges. I could have used other methods and techniques, but I wanted to experiment with a setup that could allow, if necessary, to bridge a larger number of physical and virtual servers even if the IPs are all mapped to a single server. OVH does not allow, in fact, the splitting of classes, so a move must be made for the entire class, not for a single IP address.

Initially, MikroTik CHR 7 did not boot on bhyve. In the end, I managed to make it work (I will publish a brief article on the topic), but I had other problems, probably related to the MTU of the interfaces. So I thought about taking the opportunity to unbind the LXC containers and VMs from CHR and remove MikroTik from the setup. With RouterOS version 7, in fact, Wireguard-based VPNs are also supported, so within a few days, it was possible to update the few routers still on 6.x and recreate some VPNs using Wireguard. I mapped both the VMs and LXC containers directly to their respective public IPs, greatly simplifying the steps. Everything worked perfectly.

The next step was to test the first migrations, starting from the VMs already on FreeBSD. For simplicity, I created a new FreeBSD VM in bhyve and copied (via zfs-send and zfs-receive) the datasets related to BastilleBSD. All services are installed in jails managed by Bastille, so this was enough to have, in a short time, a new operating server equivalent to the previous one. At that point, I shut down the original server, connected the VM to the bridge linked to the tunnel (after modifying its MAC address), turned on the new FreeBSD VM (on bhyve), and everything started to work correctly - but from the new physical server.

One by one, I moved all the FreeBSD VMs. For Linux, NetBSD, and OpenBSD, I simply copied the images and pointed bhyve to them. Some small specific configuration on vm-bhyve and everything started to work correctly. Where possibile, I replaced the “virtio” with “nvme” as it performs much better on bhyve.

Migration - LXC containers to Virtual Machines

For LXC containers, I initially thought of creating an Alpine Linux virtual machine, installing LXD, and copying each individual container. It worked for some of them, but for others, I started to encounter strange issues, similar to those that would have required manual intervention to upgrade from Proxmox 6.x to 7.x. As is often the case with Linux-based solutions, compatibility is not always preserved between updates, so I would have had to fine-tune all the containers, which I didn’t feel like doing. The containers had been created (at the time) to optimize RAM usage on the Proxmox machine, but to date, they have caused more problems than benefits. In some cases, certain processes got “stuck,” making it impossible to “reboot” the LXC container, requiring the entire physical node to be rebooted. If they had been virtual machines, I could have given a “kill” command from the virtualizer (to the respective KVM process, in that case) and restarted it.

For greater compatibility and ease of future management, I decided to convert the LXC containers into actual VMs on bhyve. The process was simple:

  • Creating an empty VM with vm-bhyve and booting the VM with SystemRescueCD.
  • Creating destination partitions and file systems in the VM, then doing a complete rsync of the original LXC container.
  • Adjusting the fstab file, installing the kernel on the destination VM, and creating the initrd (some containers were already copies of VMs, so the kernel remained installed and updated, even though it wasn’t being used. The initrd, on the other hand, did not include the nvme or virtio drivers, so I had to regenerate it anyway.)
  • Adjusting the bhyve vm configuration file, doing one last rsync after shutting down the services, shutting down the original LXC container, and starting the bhyve VM.

Everything worked correctly, so one by one, I moved all the containers. The largest one ended up on another physical node (also FreeBSD with bhyve) temporarily because the space on the new server was not sufficient to contain it. It didn’t need to be on this server, so no problem.

One by one, the LXC containers started on the new server. Apart from some minor adjustments to the destination VMs (different network interface names, etc.), I didn’t encounter any particular problems even after several days. Everything works perfectly.

At the very end, I re-created the MikroTik CHR VM. I’ll keep this setup separate for now, as strictly tied to eoip interfaces. This was the main reason why I haven’t performed the migration before. Things were too tied together and I had to untie everything, step by step.

…and then one of the Linux VMs started to freeze

Several Linux VMs are just the basis on which Docker runs. One of them (not even among the busiest) started, every 12/15 hours, to completely freeze. It stopped responding to ping, and it was impossible to give any type of command from the console. In a word: stuck.

Searching the web, I found some references to this problem and, observing the errors of an ssh session that was left connected (stuck, but still showing the last error), I found it to be a problem similar to the one described in this post, namely:

"watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [khugepaged:67]"

I tried various solutions such as changing the storage driver, the number of cores, the distribution (from Alpine to Debian), etc., but none of these operations solved the issue. I also noticed that the problem occurs with all Linux VMs, but only those with a recent kernel (> 5.10.x) freeze, while the others continue to work. The problem does not occur, however, with the *BSDs.

In the end, I:

  • Reduced the number of cores to 1 for the VMs that did not have a high load (some remained with multiple cores), hypothesising a problem with allocating cores that were too busy
  • Gave the command: “/usr/bin/echo 60 > /proc/sys/kernel/watchdog_thresh” to the VM.

The VM became stable, and I have not seen that error/warning on any other machine since. I will investigate further, but I believe it is a problem related to the Linux kernel, which, for some reason, generates a kernel panic if particular situations of CPU concurrency are generated.

The End…and a nice OOM!

After moving everything, I was finally able to migrate the entire class of OVH IPs from one physical server to another. The operation was quite quick, but in order to avoid problems, I notified all users and performed the operation on a Sunday and during off-peak hours. The whole process took about 10 minutes and there were no hitches of any kind.

For safety reasons, I kept the Proxmox machine active for a few more days, but there was no need to use it. However, after a couple of days, I encountered a problem: the largest VM, in some cases, was being “killed” because FreeBSD generated an OOM. I had never seen, from FreeBSD 13.0 onwards, any OOM related to “abuse” of RAM usage by ZFS, but in this case, it actually happened.

In the end, I understood that ZFS, on FreeBSD, is able to release memory, but not quickly enough to manage any “spikes” in individual VMs. In fact, the VMs do not know the situation of the physical host’s RAM, so they will tend to occupy all the space allotted to them (even if only for caching). A sudden spike (i.e. if you create and launch a new VM) could cause a sudden increase in RAM usage by the bhyve process, and FreeBSD could be forced to kill it, even if part of the RAM is only ARC cache. While Proxmox supports HA (i.e., control over whether the VM is running), vm-bhyve only launches the VM (bhyve process). I should manage it with tools like monit, but for now, I preferred to simply set limits on ZFS RAM usage using “vfs.zfs.arc_max”, and there have been no more problems.

Final considerations

The operation was long but linear. The most complex part was unraveling all the configurations related to MikroTik CHR and the VPNs linked to each individual LXC machine/container. Once everything was implemented on a dedicated VM, the operation was fairly straightforward.

The hardware specifications of the destination physical server are slightly better than the starting one, but the final performance of the setup has greatly improved. The VMs are very responsive (even those that were previously LXC containers running directly on bare metal) and, thanks to ZFS, I can make local snapshots every 5 minutes. In addition, every 10 minutes, I can copy (using the excellent zfs-autobackup) all the VMs and jails to other nodes both as a backup and as an immediate restart in case of disaster. I just need to map the IPs, and everything will start working very quickly. Proxmox also allows you to perform this type of operation with ZFS, but you still need to have Proxmox (in a compatible version) on the target machine. With the current setup, I only need any FreeBSD node that supports bhyve.

Proxmox is an excellent tool, well-developed, open-source, efficient, and stable. We manage many installations, including complex ones (ceph clusters, etc.), and it has never let us down. However, not all tools are ideal for all situations, and for setups like the one described, the new configuration based on FreeBSD has shown significantly interesting performance and greater management and maintenance granularity.

Virtualizing on vm-bhyve is not complex, but it is certainly not comparable, at the current state, to the simplicity of using a clean and complete interface like Proxmox’s. A complete HA system is still missing (sure, it’s achievable manually, but…), as well as complete management web interface. However, for knowledgeable users, it is undoubtedly a powerful tool that allows you to have excellent FreeBSD as a base. I’m totally satisfied with my migration and the result is far better than I expected.

See also