Searching for a perfect backup solution

Note: This article is dated, and while the core principles may still be correct, technologies have evolved. I currently have an update in progress, which will include more recent content. Please check back soon for the updated version.

Backup: Why It's Essential

Data loss is a reality we must always be prepared for. It can happen unexpectedly, either due to accidental causes or deliberate actions. Often, we experience the loss of physical objects, but losing digital data, like files and information from a computer, can be even more common. Whether through deletion, the desire to revert to a previous version of a file, or other reasons, data vulnerability is a constant concern.

Many assume that simply having redundant storage solutions, like a RAID system, is sufficient for data protection. This is a misconception. While RAID can prevent total data loss in the event of a disk failure, it does not safeguard against accidental deletions, virus attacks, external intrusions, theft, fire, or other unforeseen disasters.

I have encountered various scenarios of data loss:

Datacenters destroyed by fire.
Server rooms flooded.
Servers destroyed in earthquakes, often due to collapsing walls.
Increasing incidents of various ransomware attacks.
Intentional damage by entities seeking to create problems, such as computer companies inflicting harm to generate business (a situation I have unfortunately witnessed more than once and am currently addressing).
Mistakes made by administrators, which can happen to anyone.

The risk escalates for servers connected to the internet, like e-commerce and email servers. Here, not only data integrity is crucial but also the uninterrupted operation of services.

Therefore, the optimal solution is to maintain regular backups. But what specific features should these backups possess?

Backup: Strategies and Considerations

When it comes to backups, the landscape is diverse, offering a range of tools tailored to different needs. From comprehensive suites like Bacula and Amanda to smaller, more specialized tools, the options are plentiful. However, finding the ideal tool, whether open source or proprietary, isn't straightforward. It requires answering key questions:

"How much risk am I willing to take? What data do I need to protect? What downtime can I tolerate in case of data loss? What type and amount of storage space do I have available?"

The first question is particularly critical, often influencing and influenced by prior technical choices. A common but risky approach is to store a backup on the same machine that requires backing up. While convenient, this method fails in the event of a machine failure. Even relying on a classic USB drive for daily backups is not foolproof, as these devices are as susceptible to failure as any other hardware component. And contrary to popular belief, even high-end uninterruptible power supplies (UPS) are not immune to catastrophic failures.

Thus, the initial step is to establish a management plan, balancing between security and cost-saving considerations. The safest backup is often the one stored farthest from the source machine. However, this approach introduces challenges related to space and bandwidth requirements, especially if the backup is off-network. While local area network (LAN) backups are relatively straightforward, off-network backups involve additional connectivity considerations. This might lead to a compromise on the amount of data backed up to maintain operational speed during both backup and recovery processes.

Safety doesn't always equate to practicality. For instance, with a 7 MBit/sec connection and 30 GB of backup data, the recovery time in the event of a failure could be significant. The acceptability of this downtime varies depending on the nature of the data—whether they are personal photos or critical business or medical records.

Therefore, it is essential to develop a backup policy tailored to specific needs, keeping in mind that no 'perfect' solution exists.

Backup Strategy: Disk vs. Individual Files

When planning a backup strategy, one key decision is whether to backup the entire disk or just individual files. Each approach has its pros and cons, which I'll outline below:

Entire Disk (or Storage) Backup

Advantages

Complete Recovery: Restoring a full disk backup can quickly revert a system to its exact previous state.
Integration in Virtualization Systems: Solutions like Proxmox offer easy management of full disk backups, accessible via command line or web interface.
Flexibility in Virtual Environments: Products like Veeam Backup offer the ability to recover individual files, combining the benefits of both approaches.

Disadvantages

Downtime for Physical Machines: Often, it's necessary to shut down the machine to create a full disk backup, leading to operational interruptions.
Large Space Requirements: Full disk backups can consume substantial space, including unnecessary data.
Potential Slowdowns and Compatibility Issues: The backup process can be slow, and may encounter issues with non-standard file system configurations (e.g., a disk formatted without partitions).

Individual File Backup

While it might seem simpler, backing up individual files can get complicated.

Advantages

Basic Utility Use: Possible with standard system utilities like tar, cp, rsync, etc.
Granular Backups: Allows for backing up specific files, and comparing them to previous backups.
Delta Copying: Only modified parts of the files are backed up, saving space and reducing data transfer.
Portability and Partial Recovery: Files can be moved individually and partially restored as needed.
Compression and Deduplication: These features are often available at the file or block level.
Operational Continuity: Can be done without shutting down the machine.

Disadvantages

Storage Space Requirements: Simple solutions might require significant storage.
Need for File System Snapshot: For efficient backups, a snapshot (like Microsoft's VSS) is recommended before copying.
Hidden Pitfalls: Issues may not become apparent until a backup is needed.

In summary, the choice between disk and file backups depends on your specific needs, including considerations like recovery time, storage space, operational continuity, and the nature of the data being protected.

Backup Strategies: Personal Experience and Recommendations

Over the years, I've utilized both complete machine backups and individual file backups, tailoring my approach based on specific needs and situations. I've consistently favored having granular control over backups, often finding the need to recover specific files or emails accidentally deleted by clients.

A good backup system, in my opinion, should possess these key features:

Instant Recovery and Speed: The system should enable quick recovery and operate at a high processing speed.
External Storage: Backups must be stored externally, not on the same system being backed up.
Security: I avoid using mainstream cloud storage services like Dropbox or Google Drive for backups.
Efficient Space Management: This includes features like compression and deduplication, ideally performed off-line or rapidly.
Minimal Invasiveness: The system should require minimal additional components to function.

There are differing opinions on whether the backup machine should access the backup server or vice versa. I prefer the server connecting to the clients for two reasons: 1) It's easier to secure and hide one server than to manage numerous access points for all clients, and 2) This approach allows for more controlled and logical backup scheduling, reducing the risk of resource saturation due to overlapping backups.

Pure Rsync Approach

Historically, I've used a custom script based on rsync and hard links. Each backup builds on the previous one, creating hard links for unchanged files and copying only the differences for modified files.

Advantages

Immediate Usability: The backup is always a complete, ready-to-use copy of the files.
Space Efficiency: The used space is not the sum of all backups but rather the initial backup plus the modified files.
Simplicity: Easy to set up using just rsync and SSH access.
Snapshot Independence: There's no need for a snapshot system.

Disadvantages

Inefficiency with Active Databases: Without snapshots, backups of active databases might be unreliable.
Space Usage for Minor Changes: Without deduplication, minor changes in large files can consume significant space.
Uncompressed Files: Unless using a compressed file system, files remain uncompressed.

This method is suitable for backing up a limited number of machines or smaller systems without extensive historical data needs.

Preferred Methods: Borg, Restic, and BURP Backup

As my needs expanded to include hundreds of servers, I sought more comprehensive and professional alternatives.

Borg and Restic: I've written about Borg and Restic previously, noting their effectiveness, especially with live file system snapshots.
BURP Backup: After over five years of use, BURP Backup has proven reliable and efficient.
StoreBackup: While highly efficient, I moved away from StoreBackup as it initiates backups from the client rather than the server, a less preferable approach for my setup.

Each of these tools offers a balance of efficiency, security, and flexibility, making them suitable for a variety of backup scenarios.

Backup Solutions: Exploring Alternatives and Settling on BURP

Over time, I've experimented with various backup products like Obnam (now discontinued) and Attic, but they were ultimately discarded due to performance issues or the same reasons that led me to move away from StoreBackup. I then turned to BackupPC, a solution I still recommend for clients who need a user-friendly, web-based system. While I've used BackupPC for over a decade and appreciate its capabilities, it requires periodic full backups, which can strain or overload networks. This led me to discontinue its use for full or remote server backups.

BURP emerged as an excellent integrated system, aligning closely with my listed requirements:

Server-Client Architecture: BURP features a server that coordinates backups, with communication keys generated during the first contact.
Server-Controlled Backups: The server dictates the timing and method of backups.
Lightweight and Versatile: The software is small and easily installable across various servers and operating systems, including embedded systems.
Intelligent File Transfer: Using rsync libraries, BURP copies only file differences, storing just the deltas across generations. For example, adding a line to a 10 GB database would only increase the backup size by the size of that line.
Offline Optimization: The client sends a file list to the server, which then requests only necessary transfers. Post-transfer, the server optimizes and organizes the backup data.
Compression and Deduplication: These are handled efficiently, with version 1.x using an external utility and version 2.x (still in development) automating the process.
Accessible Interface: BURP offers an ncurses interface, and a web interface (which I use less frequently), for easy management.

With BURP, my backups have become quick, efficient, and granular. The system self-manages, adding clients is straightforward, and I receive convenient email updates.

BURP's Windows support is also noteworthy. It handles Volume Shadow Copy Service (VSS) creation and can auto-update the Windows app. Using guides from the project site, it's possible to recover an entire Windows installation from a backup, ensuring bootability.

In conclusion, while no backup system is perfect, BURP—alongside Borg and Restic—remains among my top choices for its efficiency and reliability. A key rule I always follow is: it's better to have an extra backup than to be one short.

Backup: Why It's Essential

Backup: Strategies and Considerations

Backup Strategy: Disk vs. Individual Files

Backup Strategies: Personal Experience and Recommendations

Preferred Methods: Borg, Restic, and BURP Backup

Backup Solutions: Exploring Alternatives and Settling on BURP

You may also like

Searching for a perfect backup solution: Borg and Restic

How we are migrating (many of) our servers from Linux to FreeBSD - Part 2 - Backups and Disaster Recovery

From Cloud Chaos to FreeBSD Efficiency