[PLUG] Rsync Backup Solution over SSH
Kyle Hayes
kyle at silverbeach.net
Tue Oct 21 12:18:02 UTC 2003
On Tuesday 21 October 2003 11:21, D. Cooper Stevenson wrote:
> [backups via rsync]
Rsync is fine if you have a lot of files that are in one of the two categories
below:
1) they are small,
2) they are not renamed.
If you have any kind of database or "old-style" log rotation, you fail both
these conditions as Keith noted a few days ago.
In a previous life, I dealt with multi GB databases and found that rsync was
more of a hinderance than not.
First, only some databases come with support for taking snapshots. Postgres
is particularly annoying in this regard. MySQL isn't much better, but a
combination of "LOCK ALL TABLES" and "FLUSH TABLES" will get you a version of
the DB on disk that is clean. You can then use LVM with its filesystem
snapshot capability to make a copy. It took a couple of seconds at most to
do this. Definitely odd, but we had a lot of luck with it. Of course, we
also used MySQL replication to keep live copies on different systems. I
think Postgres has this now too.
In doing this, we found that rsync was too slow deciding that a file had
changed (some were many GB in size). So, we just checked change dates and
did brute force copies. We lowered our sync time dramatically doing this.
In going through our backup strategies, we found that the following process
gave us good results:
- first eliminate files that are installed from the OS install. I.e. things
that came on the installation CD-ROM or updates afterward. These are easier
to reinstall directly from original media or other copies. Things that rsync
does not need to check are faster than things it does!
- copy /etc all the time. It tarred and gzipped faster than we could run
rsync on it remotely, so we just had a local script that would periodically
tar and gzip /etc to save our configs. Then rsync would look in the
directory where the gzip file was and decide whether or not to copy it.
- look carefully at other data files. Segregate those into several
categories:
- big and change often
- big and never change/change very infrequently
- small and change often
- small and never change/change infrequently
- files that get renamed at some point (logs etc.).
We handled the big files manually. This turned out to be easy because there
were not that many of them and some simple shell scripts doing various things
and LVM snapshots were enough to get us solid, coherent copies when needed.
The smaller files could be handled by rsync if there wasn't a reason to
include them in with some of the bigger files. I.e. we handled all database
files the same way regardless of size.
We bent over backward to avoid renaming files. This worked fairly well, but
we still had a few that were renamed. Luckily only one was big (the MySQL
full log). That file we managed to rename once to mysql_full_log_<data and
time>.log and thus rsync was not too much of a problem. We timed the
rotation of the log file to be just before rsync would kick off. That way,
the mysql_fulllog.log file was always fairly small and none of the rotated
files had changed (except for the new one). This helped a lot. We cut
several minutes off of rsync times this way.
Rsync is a nice tool, but it is not a "one size fits all" tool. In fact, I've
often found that it simply doesn't do the things I need it to do for most of
my server data. For home directories for users it might be OK. YMMV. Each
system is different and has different requirements.
Best,
Kyle
More information about the PLUG
mailing list