For postgres, I used to just have a systemd timer that would `pg_dumpall` and throw it in s3.
Now I use https://github.com/wal-g/wal-g to backup my postgresql databases.
For other local files, I use borg backup for personal files and services I just run for myself, and I use restic to backup server files to s3.
The operating system's configuration is all stored in git via the magic of NixOS, so I don't have to worry about files in /etc, they all are 100% reproducible from my NixOS configuration.
we have a belt & suspenders approach:
* backups of selected files / database dumps [ via dedicated tools like mysqldump or pg_dumpall ] from within VMs
* backups of whole VMs
backups of whole VMs are done by creating snapshot files via virsh snapshot-create-as, rsync followed by virsh blockcommit. this provides crash-consistant images of the whole virtual disks. we zero-fill virtual disks before each backup.
all types of backups later go to borg backup[1] [ kopia[2] would be fine as well ] deduplicated, compressed repository.
this approach is wasteful in terms of bandwidth, backup size but gives us peace of mind - even if we miss to take file-level backup of some folder - we'll have it in the VM-level backups.
[1]: https://www.borgbackup.org/
[2]: https://kopia.io/
For databases they all have programs to export their databases, like pg_dump or mysqldump, and compressed. Is good to keep some historic backups (not just the last one) do you can go back in case to before some not yet detected harmful change. How much? It depends on for what you use them, some may be required by law for years, or be totally meaningless for you in a few days. Backing up the transactions that happened since the last backup to do a point in time recovery, even having a running slave with that information, will be good too.
About files and directories, borg backup is great. You can have a remote server with ssh as repository, and you may use it to have very efficient historic backups too.
We also use Nimble for snapshots every 15 minutes on some servers, although not our databases afaik. Pretty effective if a worst case ransom attack was successful on a user with too many access rights.
These solutions aren't cheap though.
Privately I just use multiple hard drives and for database servers I would use the internal backup functionalities, which are pretty good on most databases.
It only takes a minute to restore if needed, but the problem is if the OS goes down (which has happened once in the last 10 years due to a disk failure), it takes several hours to reinstall everything from scratch.
Working storage is a pair of n TB SSD in RAID0 where the VMs live. Nightly backups are stored there.
A single n*2 TB disk for backups and long term media.
When I lifecycle disks, I move them to my desktop which just has a cronjob to scp from server storage.
If you’re setup to quickly provision new VMs there’s no state other than that to backup in most cases. Files are in s3 as well.
Configuration and other stuff lives outside the VMs, in some git repo, in the form of terraform / saltstack / shell scripts / what have you, that can rebuild the VM from scratch, and does it regularly. Git has very good built-in "backup tools".
What else needs backing up in a DB server?
Sadly, despite being pretty great (doing deduplicated and compressed incremental backups across a bunch of nodes, allowing browsing the backups and restoring them also being easy), the project is more or less abandoned: https://github.com/backuppc/backuppc/issues/518
Nowadays it's mostly just rsync and some compression for archives, sometimes just putting them in an S3 compatible store, or using whatever the VPS provider gives me. I'm not really happy with either approach, they feel insufficient.
It's not like there aren't other options out there: UrBackup, Bacula, Duplicati, BorgBackup, restic, Kopia and others, it's just that I haven't had the time or energy to truly dive into them and compare them all, to make the decision of what I should move to.
I created a mesh VPN with Wireguard between the 2 VMs and my office and home serversc (Minisforum PC with Debian) (this is important for my backup and to not open external ports. The VPNs at home at office are provided by 2 Fritz!Box routers)
Just finished yesterday to re-engineering my backups as follow :
- Contabo provides 2 snapshots per VM (on my current plan), I create manually a snapshot every week (for the future I'd like to make this automatic)
- I use bot restic and borg for redundancy as follow :
-- An automysqlbackup cron job creates the encrypted data dumps every day
-- I created two bash scripts that make archives of /etc, /root (contains acme.sh ssl certificates). These archives are encrypted with my public GPG key. They are scheduled via a cron
-- The others directories I backup are /var/vmail (mail files already compressed and encrypted by vmail) and /var/www (websites and web applications), but I don't create archives of them
-- I created different repositories for : /etc, /root/, databases, vmail, sites. In case a repository or a single backup fails for every reason I don't loose everything, but only a part of the full backup
-- Restic sends the backups to iDrive e2 S3 storage : I'll replace part of the procedure using resticprofile in the near future
-- Borg sends the same backups both to home and office servers : I use borgmatic, but have different configurations, and cron jobs, for every destination, otherwise, if a backup fails on one destination, fails on both
-- If one backup fails fo every reason I receive an alert via Pushover : I'd like to substitute it with a self hosted instance of ntfy.sh
I have two additional offline external disks I sync (every week manually) with the main backup disks using rsync
Still to improve / change :
- automatic snapshots creation (via cron jobs)
- use resticprofile for restic
- use backrest to check and access the restic backups
- use Vorta to check and access the borg backups
- use self hosted ntfy.sh for notifications
A cronjob makes a daily snapshot of the running machines (of the datasets), maybe resulting in slightly corrupted VMs, but every now and so often I shut down all vms, make a scripted snapshot of each one and restart them.
Then I extract the raw VM files which are in the snapshots onto a HDD and run an md5sum on each of the source and target files and get the results sent to me via email.
All of this is automated, except for the shutdown of the VM and the triggering of the snapshot-making script which follows the shutdowns.
Apparently there's "domfsfreeze" and "domfsthaw" which could help with the possible corruption of daily snapshots.
It's all not really a good or solid strategy, but it's better than nothing.
If you can't do that, make a script to make a .tgz backup and have cron do it every day and copy it up to an s3 bucket. Have the s3 bucket configured to delete items that are over a certain age.
Our local files can either be pulled from git repositories or rebuilt by running an Ansible playbook against the server. Luckily we don't have an user uploaded files on our servers to worry about.
Consider using a backup SaaS if you want a hassle free experience, there are a lot of moving parts to make it reliable.
cd <whatever root you want to be backup)
duplicacy init -e <storage-name> <storage-url>
# set password for storage if neccessary
duplicacy set -key password -value <storage-password>
# edit .duplicacy/filters to ignore/add files to backup
vim .duplicacy/filter
# check which files will be backed up
duplicacy -d -log backup --enum-only
# run your first backup
duplicacy backup -stats
# add the following command to your crontab
duplicacy -log backup -stats
I love Kopia + Backblaze
The advantage of this setup is that the Backup server does not need to know the encryption keys for ZFS. It can pull snapshots without knowing keys (i.e. zero trust). The main server also cannot reach the Backup server, only the other way around works (configured in opnsense that connects my offsite backup via IPSEC). The backup server is locked down in its own subnet and can only be reached from a few selected points. This is possible because there is no interaction needed due to the Shelly Plug S self-starting automation.
ZFS also doesn't care about filesystems (etc.) - it can incrementally pull ext4/xfs filesystems of VMs, without touching individual files or the need for individual file hashes (such as with rsync).
[1]: https://github.com/jimsalterjrs/sanoid/blob/master/syncoid