Monday, 24 June 2019

linux - How can this backup strategy work?


I am trying to script a very simple backup strategy. Here is the general idea.


Daily - Backup the entire filesystem using rsync, overwriting the previous day's backup.


Weekly - Once a week copy the daily backup to a separate folder to keep around for a week, overwriting the previous week's backup.


Monthly - on the first of the month copy the daily backup to a montly backup folder to keep around for a month, overwriting last month's backup.


Here is the conundrum: Every day I do the weekly backup, the weekly and daily backups will be the same, so I won't have a few day old backup.


If this day falls on the first of the month all the backups will be the same, diminishing the whole point of having multiple backups.


I am limited on space and three backups is all I have room for. I am backing up VMs and websites so I don't need long term, but I do want backups that go back a while in case an error goes unnoticed for a few days.


Anyone have some ideas to rework this strategy? So I don't have periods where all the backups are the same.



Answer



I would write a script that checks if a backup is more than 1,7 or 30 days old and acts accordingly. You have not said so but I assume you are using Linux (I added the tag to your question) and you are backing up to a remote server. The first step will be to write a little script that runs your rsync command and also creates a file on the remote server when the backup is finished. This will be used both to tell whether a backup is currently running and to check the backup's age (I assume you are keeping the original timestamps when you backup files, so you can't get the date from the files themselves):


Rsync script (this assumes you have password-less access to the remote server):


#!/usr/bin/env bash
ssh user@remote rm /path/to/daily/backup/backup_finished.txt
rsync /path/to/source/ user@remote:/path/to/daily/backup/
ssh user@remote touch /path/to/daily/backup/backup_finished.txt

On the local machine, set up a cron job that does daily backups:


@daily rsync_script.sh

On the remote machine, you need to run the script I give below every few hours:


@hourly check_backup.sh

The check_backup.sh script:


#!/usr/bin/env bash

daily=/path/to/daily;
weekly=/path/to/weekly;
monthly=/path/to/monthly;

## The dates will be measured in seconds since the UNIX epoch,
## so we need to translate weeks and months (31 days) to seconds.
week=$((60*60*24*7));
month=$((60*60*24*31));

## Make sure no backup is currently running
if [ ! -e $daily/backup_finished.txt ]; then
echo "A backup seems to be running, exiting." && exit;
fi

## Get the necessary dates
weekly_backup_date=$(stat -c %Y $weekly/backup_finished.txt)
monthly_backup_date=$(stat -c %Y $monthly/backup_finished.txt)
now=$(date +%s)
monthly_backup_age=$((now - monthly_backup_date))
weekly_backup_age=$((now - weekly_backup_date))

## Check the age of the daily backup and copy it accordingly
if [[ "$monthly_backup_age" -gt "$month" ]]; then

## Copy unless the current $daily is identical to $weekly
diff $daily $weekly > /dev/null ||
## Delete the previous backup and copy the new one over
rm -rf $monthly && cp -rp $daily $monthly
fi
## Copy the weekly backup if it is older than a week but only
## if it is not identical to $monthly. The -r flag makes cp
## recursive and the -p flag makes it preserve dates and permissions.
if [[ "$weekly_backup_age" -gt "$week" ]]; then
## Copy unless the current $daily is identical to $monthly
diff $daily $monthly > /dev/null ||
rm -rf $weekly && cp -rp $daily $weekly
fi

So, this script (check_backup.sh) will be run every hour on your backup server. Since it does nothing unless the backup is old enough, it's no problem to have it run so often. Now, every time a daily backup is older than 31 days, it will be copied to the monthly directory and the contents of monthly will be deleted. Similarly for weekly when the backup is more than 7 days old.


I am using diff to compare the backups. This means that we will copy daily to weekly if the current weekly is more than a week old but only if the backup that will be copied (the current daily) is not the same as the existing weekly and similarly for monthly. For example, if the script has just run and it has seen that the monthly backup is the same as the current weekly one, it will not overwrite the existing monthly. However, one week later when the weekly will have changed, then it will copy the monthly one.


The net result of this is that at any time you should have a minimum of two different backups and usually you will have three. The worst case scenario is that something fails and you don't have a week old backup, just a month old one or, vice versa, you don't have a month old one but you do have last week's.


No comments:

Post a Comment

How can I VLOOKUP in multiple Excel documents?

I am trying to VLOOKUP reference data with around 400 seperate Excel files. Is it possible to do this in a quick way rather than doing it m...