How to create incremental backups using rsync on Linux - LinuxConfig.org

EgDoc · July 29, 2020, 4:38am

In previous articles, we already talked about how we can perform local and remote backups using rsync and how to setup the rsync daemon.

This is a companion discussion topic for the original entry at https://linuxconfig.org/how-to-create-incremental-backups-using-rsync-on-linux

silocoder · August 6, 2020, 12:57am

This is such a great article. Shows the simplicity of how rsync can be used to backup. I have couple of questions.

I want to run it so it backs up my complete file system starting with ‘/’ including all users and root (and maybe exclude tmp,cache,etc). What is the strategy. Should I run it as sudo? Or is there some other strategy.
Will all files carry over the same perms, u,g,o, acl and timestamp?
Does --delete remove the files from the older ‘snapshots’ as well? Basically I would want the deleted files kept if I go back to an older snapshot.

Thanks

sandmann · August 7, 2020, 1:17pm

Hi Silocoder,

Welcome to our forums.

1.) Your backup plan depends on your needs and your plan on recovery. Let’s say the original machine gets a HDD error, and the data on the disk can not be recovered. So you take your latest backup for recovery, and… Would you like to restore the whole system, or reinstall a clean OS, and restore user data only?
I can tell you a personal example: I have a (few) postgreSQL databases, small ones, but the data is valuable. So I make regular backups, and rsync them to remote locations, because the environment itself isn’t that problematic to re-create - so my backups are small and I can restore them anywhere. So I don’t need a full backup of the entire filesystem, only this small portion. This is just one use case, it entirely depends on needs.
2.) Yes, all file permissions and timestamps are carried.
3.) If you would like to keep old “snapshots”, you can always backup to another destination - like another directory created for every backup, maybe based on time of backup.

EgDoc · August 13, 2020, 3:42pm

Hi Silcoder,
I’m really glad you found the article useful. About your questions:

If you want to backup the whole system using rsync, you need to run the program with root privileges. Using sudo it’s usually the recommended way to do it. Creating a backup of a running system, however, is usually not recommended; this depends on what you are using the system for. If there are not a lot of processes which write very often to the disk, for example, it should be ok. Alternatively, you can create a snapshot and backup from it.
The rsync -a option is a shortcut for running the program with the -rlptgoD options. The -p option (short for --perm) preserves the majority of permissions but not all. ACLs and extended attributes are not included. To preserve those you should use the -A (–acls) and -X (–xattrs) options.
Using --delete causes files which don’t exist in the source, to be deleted in the destination, to create an exact copy. In this context, those files will not be deleted from the directory used as the argument of the --link-dest option; they will simply not be hard linked from it to the new backup.

nobicycle · August 20, 2020, 3:24pm

Hi, won’t the script ALWAYS create full backups?
The BACKUP_PATH is always different (to the nearest second). Therefore rsync will be syncing with an empty directory always, and thus perform a full backup every time.

jdn · August 20, 2020, 5:58pm

could be an idea to look at rdiff-backup which is based on rsync.
Handles efficient rolling back to backup #1,2,3 relative to “now”
Better than hot water
google rdiff-backup

Jens

EgDoc · August 21, 2020, 2:55pm

SOURCE_DIR -> The directory to backup - rsync source
LATEST_LINK -> The directory passed as argument to the --link-dest option
BACKUP_PATH -> The path of the new backup directory - rsync dest

Files in SOURCE_DIR which are unchanged when compared to files in the LATEST_LINK directory are hard linked to BACKUP_PATH.

Files that changed and new files are copied from SOURCE_DIR to BACKUP_PATH. In BACKUP_PATH you will always have all the files, but you will save space since unchanged files will be hard linked from the previous backup.

After each backup is made the old LATEST_LINK is removed and a new one is created which points to the latest made backup.

nobicycle · August 22, 2020, 2:15am

@EgDoc
All understood thanks to you!

vsagar · November 27, 2020, 9:16am

This is very useful, thanks!
But I wasted some time before realizing that FAT32 and exFAT file systems don’t support soft links.
If anyone else has the same problem, use NTFS (also supported by Windows) or HFS+ (also supported by MacOS) or ext4.

scottthepotter · December 28, 2020, 11:59pm

I have been using this strategy for quite some time, but without fully understanding it. Thanks for your detailed explanation. I do have one question though. How do you copy such a ‘backup set’ to another machine? I currently have many of these ‘backup sets’ on a CentOS7 box and want to move/copy them to a TrueNAS (FreeBSD) box, retaining the exact same structure as the source.

sandmann · December 29, 2020, 5:51pm

Hi Scottthepotter,

Welcome to our forums.

If you have a dedicated NAS machine, you could configure your CentOS box to mount the remote filesystem, using nfs for example (most NAS devices should support it). That way the remote storage appears in your local directory hierarchy, so all else you have to do is changing the target directory of rsync to point to the remote filesystem.

To mount the remote exported filesystem, you can check our NFS configuration guilde, client configuration part. To configure the NAS to serve NFS, you can check the device’s manual, but if unsure, if you provide us with details about it, such as version number, or options the management software running on it allows, I’m sure we can also help with that as well.

ffuentes · January 3, 2021, 2:04pm

I copied this script because it seems pretty simple and easy to work with but I’m using it to create an incremental backup to a remote location. The thing is: This is NOT making an incremental backup. It’s just making a full backup each time. I’m not sure the script is wrong or it’s me and why.

madbobmcjim · February 15, 2021, 11:38pm

@ffuentes I went through this, it’s because the rm and ls commands are local, not remote. I changed it to this:

#!/bin/bash

A script to perform incremental backups using rsync

set -o errexit
set -o nounset
set -o pipefail

readonly BACKUP_SVR=“172.16.0.5”
readonly SOURCE_DIR=“/raid/content/images”
readonly BACKUP_DIR=“/backup/images”
readonly DATETIME=“$(date ‘+%Y-%m-%d_%H:%M:%S’)”
readonly BACKUP_PATH=“${BACKUP_DIR}/${DATETIME}”
readonly LATEST_LINK=“${BACKUP_DIR}/latest”

ssh ${BACKUP_SVR} “mkdir -p ${BACKUP_DIR}”

rsync -avW --no-compress --delete
“${SOURCE_DIR}/”
–link-dest “${LATEST_LINK}”
–exclude=“.cache”
“${BACKUP_SVR}:${BACKUP_PATH}”

ssh ${BACKUP_SVR} “rm -rf ${LATEST_LINK}”
ssh ${BACKUP_SVR} “ln -s ${BACKUP_PATH} ${LATEST_LINK}”

Danran · March 7, 2021, 2:39am

Thank you for the guide and script! What would be the proper way to restore then? And if I wanted to backup my rootfs under “/” would I use the same procedure for backing up and restoring?

sandmann · March 12, 2021, 4:01pm

Hi Danran,

Welcome to our forums.

I would suggest you don’t try to backup the filesystem as a whole; there are special parts of it that you couldn’t backup anyway. For example, the /proc subtree holds processes that are quite dynamic, and there is no point to backup them. The same goes to /dev, where devices are, and also /tmp, where temporary files are located.

yagus · June 3, 2021, 11:46pm

Thanks for this article. However, whenever I use a symbolic link as the argument for link-dest, I get a does not exist error. Do you have an idea why this could be the case?

sandmann · June 11, 2021, 9:24am

Hi Yagus,

Welcome to our forums.

The issue you describe could be a simple path error. How do you define your link-dest attribute? Is it a relative or an absolute path?

paqman · August 19, 2021, 7:26pm

Sorry to comment on an older article, but this was a great read and I like the rsync script. I’ve done some testing with it and I like the use of hard links to maintain versions without taking up more space. I’d like to use it on my home network to back up our data drives. Photos, videos, documents, etc. My question is on deleting old backups. Since the very first time you use the script, it takes a full backup, and every time after that is an incremental backup, that means that the oldest directory is the one that actually contains the bulk of the data right? What if I want to have a retention period of say 2 months. So that files that were deleted on the source could be retrieved for 2 months, but then they would be gone after that? I would like to create a cron job that would delete folders older than 2 months, however, the oldest folder will always be the full backup, and I would lose all the data.

So do I need to do like a weekly full backup or something to a different folder somewhere, then delete the older full backup? How can I manage these so that I can set a retention time for these backups, but not lose old family photos that haven’t changed in 5 or 10 years?

Edit: Or maybe I could make a copy of the original script and edit it to make a weekly or monthly script that will take a new full backup, then delete the earlier full? Just not quite sure the best way to go about it. Obviously I’m a noob at good backup procedures.

sandmann · August 20, 2021, 12:35pm

Hi Paqman,

Welcome to our forums.

You got the logic right, this is how backups are made even in enterprise systems - you create a new full backup, and delete the old one, including the incremental ones. Then you do incremental backups until it is time for another full backup.

On how to do it, I would create target directories that have the name of the date of creation, that way I could easily know which is the most recent and what I (or my script) need to delete.

paqman · August 22, 2021, 7:26pm

Thanks for the response! I think I have something set up, and I’m curious what you think of how I’m doing it. I made a slight change and want to make sure what I’m doing really is working. It seems to work in my tests, but with these hard links it’s hard to know for sure.

What I’ve set up is a Daily script, which is basically the base script you’ve set here. A Sunday script will run a full backup every Sunday, and to accomplish that, I simply took out the “–link-dest” portion of the script, so it will always run a full. Then on Saturdays, I run the normal script, except I coded in a bit that will also delete the oldest full and all it’s incrementals. The way I’ve coded it, I will sometimes have two fulls taking up space, but will never have any less than 6 days of incrementals in case there is a deletion we need to recover.

My question is on the usage of the Sunday Full script. Does taking out the --link-dest portion work as I think it does? I know that it is taking a full. But am I correct in assuming that removing the link portion separates it from the previous incrementals and starts a new set? The subsequent incrementals after that will link to that last Sunday full right?