[PLUG] Rsync Backup Solution over SSH

Kyle Hayes kyle at silverbeach.net
Tue Oct 21 12:18:02 UTC 2003


On Tuesday 21 October 2003 11:21, D. Cooper Stevenson wrote:
> [backups via rsync]

Rsync is fine if you have a lot of files that are in one of the two categories 
below:

1) they are small,

2) they are not renamed.

If you have any kind of database or "old-style" log rotation, you fail both 
these conditions as Keith noted a few days ago.

In a previous life, I dealt with multi GB databases and found that rsync was 
more of a hinderance than not.

First, only some databases come with support for taking snapshots.  Postgres 
is particularly annoying in this regard.  MySQL isn't much better, but a 
combination of "LOCK ALL TABLES" and "FLUSH TABLES" will get you a version of 
the DB on disk that is clean.  You can then use LVM with its filesystem 
snapshot capability to make a copy.  It took a couple of seconds at most to 
do this. Definitely odd, but we had a lot of luck with it.  Of course, we 
also used MySQL replication to keep live copies on different systems.  I 
think Postgres has this now too.

In doing this, we found that rsync was too slow deciding that a file had 
changed (some were many GB in size).  So, we just checked change dates and 
did brute force copies.  We lowered our sync time dramatically doing this.

In going through our backup strategies, we found that the following process 
gave us good results:

- first eliminate files that are installed from the OS install.  I.e. things 
that came on the installation CD-ROM or updates afterward.  These are easier 
to reinstall directly from original media or other copies.  Things that rsync 
does not need to check are faster than things it does!

- copy /etc all the time.  It tarred and gzipped faster than we could run 
rsync on it remotely, so we just had a local script that would periodically 
tar and gzip /etc to save our configs.  Then rsync would look in the 
directory where the gzip file was and decide whether or not to copy it.

- look carefully at other data files.  Segregate those into several 
categories:
	- big and change often
	- big and never change/change very infrequently
	- small and change often
	- small and never change/change infrequently
	- files that get renamed at some point (logs etc.).

We handled the big files manually.  This turned out to be easy because there 
were not that many of them and some simple shell scripts doing various things 
and LVM snapshots were enough to get us solid, coherent copies when needed.

The smaller files could be handled by rsync if there wasn't a reason to 
include them in with some of the bigger files.  I.e. we handled all database 
files the same way regardless of size.

We bent over backward to avoid renaming files.  This worked fairly well, but 
we still had a few that were renamed.  Luckily only one was big (the MySQL 
full log).  That file we managed to rename once to mysql_full_log_<data and 
time>.log and thus rsync was not too much of a problem.  We timed the 
rotation of the log file to be just before rsync would kick off.  That way, 
the mysql_fulllog.log file was always fairly small and none of the rotated 
files had changed (except for the new one).  This helped a lot.  We cut 
several minutes off of rsync times this way.

Rsync is a nice tool, but it is not a "one size fits all" tool.  In fact, I've 
often found that it simply doesn't do the things I need it to do for most of 
my server data.  For home directories for users it might be OK.  YMMV.  Each 
system is different and has different requirements.

Best,
Kyle







More information about the PLUG mailing list