Please note: This article has been automatically translated and adapted. There could be some errors.
Backup: methods
I have already dealt with the problem of performing a correct backup, giving some indications about the proper ways and why it is essential to do it, why RAID cannot be considered a kind of backup and about some software I have examined and used previously. This time, however, I will give some insights into how I make backups and how I am able to guarantee a certain level of security.
Backups: the basics
I'll use a bulleted list in order to cover the main issues to be dealt with:
- Operating system and data to save: every operating system is a different universe. It is reasonable to think that there is no universal solution, even if many of the main open source software solutions are multi-platform. The main gap is between Unix-Like systems (GNU/Linux, *BSDs, MacOS, etc.) and Windows. The same software, although available for multiple platforms, may not be the best for you or your needs.
- Type of data to be saved: each type of backup requires different solutions. There are situations where you need always incremental copies as files grow but are not modified. In this case, any incremental system (e.g. Duplicity) may be powerful enough. In case, however, of different backups (e.g. a database that changes continuously), a solution like the previous one can be extremely inefficient, especially in the long run.
- Need for encryption or not: backups of my invoices, for example, can be saved in an unencrypted form...
- Speed of execution and recovery: there are tools that are extremely efficient in performing the copy operation, but that make recovery extremely slow. One of the (few) shortcomings I found in BURP Backup, which I mentioned in the previous article and which I still use successfully in certain situations, is that requires file restoration and does not allow, at least directly, to browse the backup as a local file system. The same assumption is valid, for example, for Proxmox native backup: it is very easy to set up, it's complete and natively available but the recovery time can be very long, especially if done from or to remote locations and on slow connections. Recovering a file will, quite often, require the total recovery of the entire virtual machine.
- Snapshot: last on the list, but surely of primary importance. A backup of a "live" file system will have a "beginning" and a "completed" moment. In the meantime, data will change within it and inconsistency may occur. In the past I've had problems beacuse of this: a database (mysql) some GB big was compromised by a client and I was asked to recover it. I took, with confidence, the last backup they had and restored the various files (not a native dump). Needless to say, it was impossible to restart it: the very large file had changed too much between the beginning of the backup and the end, so it was inconsistent. Please note: I also had the dump, of course, so I recovered that one. But the problem remains clear: backing up a live file system is dangerous, unless you're copying the "My Documents" or "Images" folder. An open database, even if as simple as the browser's one, has very high chances of getting corrupted and making the backup completely useless. The technique is to take a snapshot of the entire file system before starting with the copy. There are risks with this procedure, too (the backup will have roughly the same state that the machine would have if it was suddenly unplugged), but much lower. To date, using snapshots, I have been able to recover everything.
Backup: Snapshot
Taking a snapshot of the file system is important in order to have a sufficiently consistent backup. Year after year, I have tried several solutions: Windows has its own VSS, on Linux there are several different options. I will list the main ones:
- Native File System Snapshot (e.g. BTRFS or ZFS): if your file system natively supports the option to take a snapshot, you should take advantage of this. It will undoubtedly be the least expensive and the most correct solution, formally speaking.
- LVM snapshot: if you use LVM, you can request a snapshot of the logical volume, mount it and back it up. You will need to specify a maximum size of variation (i.e. how much data the system can vary over the lifetime of the snapshot itself) and this space should be part of the VG but not allocated to any LV. There will be some wasted space and directly proportional to the working set, i.e. to the data that are handled and written in a specific time frame. It is a solution that I still use but that has caused me some problems. In certain circumstances, in fact, the file system hangs during the destruction of the snapshot, requiring a restart. That's not very nice, luckily it happened rarely but in different situations, so not because of a specific disk/controller/hardware.
- DattoBD: I followed the development of this tool since its beginning and I had, in the very first versions, some problems that disappeared release after release. It has repositories for the main distributions and it's now working in a stable and, according to what I have experienced until today, safe way. To make snapshots with Datto I generally use UrBackup scripts (another great backup system that I used a lot but I'm now using only for Windows systems), convenient and fast.
Backup: push or pull?
One of the diatribes that has always divided the experts is the one about the type of backup management and initiation: should it be the client to connect to the server (push), actually "starting" the backup, or the server to connect to the clients and "request" the backup procedure to be started (pull)?
In my opinion, it depends. Generally speaking I tend to have centralized backup systems on dedicated servers which, consequently, I maintain in a maximum security environment, running only the basic services. In some cases, I use Docker to encapsulate the entire backup system and avoid other services as much as possible. This is the reason why I tend to prefer the "pull" approach, in which the server connects to the client and asks for the operation to start, or the mixed approach of the already mentioned BURP, in which the client connects to the server that, however, decides whether or not it's time to make the backup and directs the operation. The server, in this case, is not a "stupid" storage, but a real software that performs its function.
Unfortunately, however, my current favorite systems do not support this kind of functionality. That's why I have put in place alternative solutions that can somehow remedy the situation.
I will now examine the two tools that I use the most (together with the already mentioned BURP), namely Borg Backup and Restic.
Borg Backup
Borg Backup is the tool that, for more than two years, guarantees the security of almost all my backups. Some of the positive aspects:
- Compression and deduplication of data within the same repository
- Ability to mount backups in a directory, allowing easy & convenient navigation, browsing and recovery
- Fast in every feature, generates a local cache to keep track of files already copied and uses it for subsequent backups
So goodbye to incremental systems, etc. Borg allows you to have all backups deduplicated and compressed as if they were all complete, avoiding slow reconstructions during file browsing or restoring.
I found some weaked points that, in my opinion, are not very important but must be taken into consideration:
- The most important: it is recommended to use a repository for each server as the backup operation is blocking. This implies that deduplication will only take place between "backup generations" and data from the same server. In my case, I have hundreds of "similar" servers (OS, configurations, etc.) and there will be a big loss of space. This can be considered both a defect and a pro (breaking a repository will be limited to that repository, not to all backups). Your mileage may vary.
- Being written in Python (I love Python), it takes a few seconds to load and run. Nothing extreme, mind you, and it's fast and streamlined, but in some cases it can be a bit annoying.
- When mounting a repository, it starts creating internal indexes to show the directories. When backups starts to be full of lots of data, this can take a few minutes. No problem if you are not in a hurry, but when you have the client on the phone and desperately asking you to put his site back up or retrieve a very important and urgent document, those minutes are precious
- It is also a push solution, so it is the client that connects to the storage server to send its data there. If the storage, as in my case, is a server equipped with the borg executable, the operation will be faster and more effective while still running via ssh
Borg is fast, effective and my experience suggests that it is safe. On many occasions I have been able to recover entire servers, as well as scattered data, in a very short time and in a complete manner.
Restic
I've been following Restic for a while now, I've been experimenting, but until a while ago I still preferred Borg. They are two very different software (Restic is written in Go) and yet very similar, both in approach and functionality.
Lately, however, Restic has been improving: it is now much faster than before and development is moving towards, Restic now supports compression!
Here are some advantages of Restic:
- Data deduplication even between different servers: Restic, unlike Borg, suggests the use of a single data repository to increase the possibility of deduplication even between different machines. It's a big advantage in situations, like mine, where there are many very similar servers and it also streamlines the first backup
- Fast: my own (not) scientific tests say it's faster than Borg
- Ability to mount backups in a directory, allowing you to browse and restore in a simple & convenient way. Navigation is quick and tagged by snapshot, by host, by tag, making it very easy to identify the required backup. Unlike Borg, the structures are created as you navigate through the directories, slowing down (just a little bit) the access to the data but always ensuring acceptable performance.
- Possibility to separate the operations of "forget" and "prune": at the end of the backup, I tend to make the client say "ok, throw away the oldest backups of ..." and it executes immediately. The actual deletion operation will then be done later, launching the "prune" command. In my case, this is done by the server and for the whole repository (ensuring different data retention policies depending on the server and the client), reducing the actual backup time to the bare essentials
- Extremely good community: the main developer, Alexander Neumann is extremely kind, courteous and helpful, in the official forum, to help and receive feedback and proposals from anyone.
Restic, like any software, is not perfect. Here are some downsides I've found in using Restic:
- Restic is also a push solution
- It seems a stupid problem, but can be uncomfortable: when you mount the repository (to browse backups), it doesn't return the prompt but you get a message saying to unmount at the end of the operation. That is, you need to have a second shell available to enter the backup itself. Sometimes this can be frustrating, especially if you have emergency access. When I remember this situation, I open a tmux before the mount and a second shell under the first one, so that I don't forget to unmount at the end of the operation.
Example: Script used to back up my laptop using Borg and Restic
Let's look at a practical example: the script I use to back up my notebooks. First, I check that the computer is not running on battery power and that the backup server is present (we are not on a different network or down for maintenance). Then, I lock and take a snapshot using DattoBD, redirecting all the output to a file in the temporary directory:
Borg:
#!/bin/bash
PATH="/usr/local/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11:/usr/pkg/bin:/usr/pkg/sbin"
export PATH
server=192.168.2.254
export server
STATE=`upower -i /org/freedesktop/UPower/devices/battery_BAT0|grep state|grep discharging`
export STATE
if [[ $STATE == *'discharging'* ]]; then
exit
fi
if nc -w 10 -z $server 22 2>/dev/null; then
echo "$server ✓";
else
echo "$server ✗";
exit;
fi
if mkdir /tmp/backuphappening; then
echo "Locking succeeded" >&2
else
echo "Lock failed - exit" >&2
exit 1
fi
exec > /tmp/borg_backup_log
exec 2>&1
date;
/usr/local/share/urbackup/dattobd_create_filesystem_snapshot 1 /
REPOSITORY=user@$server:repo
TAG=daily
ionice -c3 borg create -v --progress --compression zlib --stats \
$REPOSITORY::$TAG'-{now:%Y-%m-%dT%H:%M:%S}' \
/mnt/urbackup_snaps/ /boot /boot/efi \
--exclude '*.cache*' \
--exclude '*/home/*/.cache*' \
--exclude '*/home/*/Scaricati*' \
--exclude '*.datto*' \
--exclude '*.overlay*' \
--exclude '*.crdownload' \
--exclude '*.rpm' \
--exclude '*.deb' \
--exclude '*swapfile*' \
--exclude '*/home/*/Virtualbox VMs*' \
--exclude '*/home/*/VirtualBox VMs*' \
--exclude '*/home/*/.vagrant.d*' \
--exclude '*/root/.cache*' \
--exclude '*/var/lib/docker*' \
--exclude '*/tmp'
/usr/local/share/urbackup/dattobd_remove_filesystem_snapshot 1 /mnt/urbackup_snaps/1
borg prune -v $REPOSITORY --stats --prefix $TAG'-' \
--keep-hourly=12 --keep-daily=60 --keep-weekly=12 --keep-monthly=24
rm -Rf /tmp/backuphappening
date;
Restic:
#!/bin/bash
PATH="/usr/local/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11:/usr/pkg/bin:/usr/pkg/sbin"
export PATH
STATE=`upower -i /org/freedesktop/UPower/devices/battery_BAT0|grep state|grep discharging`
export STATE
if [[ $STATE == *'discharging'* ]]; then
exit
fi
if nc -w 10 -z $server 22 2>/dev/null; then
echo "$server ✓";
else
echo "$server ✗";
exit;
fi
if mkdir /tmp/backuphappening; then
echo "Locking succeeded" >&2
else
echo "Lock failed - exit" >&2
exit 1
fi
exec > /tmp/restic_backup_log
exec 2>&1
date;
/usr/local/share/urbackup/dattobd_create_filesystem_snapshot 1 /
ionice -c3 restic -r myrepo \
backup \
--exclude='*/home/*/.cache*' \
--exclude='*.cache*' \
--exclude='*/home/*/Scaricati*' \
--exclude='*.datto*' \
--exclude='*.overlay*' \
--exclude='*.crdownload' \
--exclude='*.rpm' \
--exclude='*.deb' \
--exclude='*swapfile*' \
--exclude='*/home/*/Virtualbox VMs*' \
--exclude='*/home/*/VirtualBox VMs*' \
--exclude='*/home/*/.vagrant.d*' \
--exclude='*/root/.cache*' \
--exclude='*/var/lib/docker*' \
--exclude='/sys' \
--exclude='/proc' \
--exclude='/dev' \
--exclude='*/tmp' \
--exclude='/run' \
/mnt/urbackup_snaps/1/
/usr/local/share/urbackup/dattobd_remove_filesystem_snapshot 1 /mnt/urbackup_snaps/1
restic forget -d 30 -w 8 -m 12 -y 1 --host myhost -r myrepo
rm -Rf /tmp/backuphappening
date;
Restic's script is much more primitive, it's just an adaptation of Borg's script to make it run and, as already written, it doesn't launch the actual prune of the repository but only marks as "to remove" the older backups. I recommend to read, or at least to have a look at, the manuals of the two tools, so you don't have the classic copy & paste problems.
And now...
Right now I'm using both Borg and Restic, the former as the main backup system, the latter as "emerging", I'm testing it in detail and it looks really good. The arrival of compression closed much of the gap and allows me to replace Borg, thanks to the increased efficiency of deduplication. In both cases, at night I rsync the entire repository on a remote storage to have replicated backups in multiple places. And a Jenkins that takes care of connecting to individual machines, backing up and alerting me in case of problems.