[PLUG] Faster giant disk copy

Sun Aug 8 14:34:02 UTC 2004

I do big disk-to-disk backups with rsync and lots of hard links, as
I presented at PLUG in March.

I goofed.  I built one of my 250G backup drives with 8K/inode (default)
rather than the recommended 4K/inode .  After about 150 days of use, 
dirvish filled up all the inodes, leaving 80GB of space on the drive
unused.  This is because dirvish/rsync overlays unchanged files with
hard links, making a much richer set of small directories.

It is impossible to add inodes to an existing partition.  If I want 
to keep all those files, I must do a file-level copy to another drive
with enough inodes, using either rsync or "cp -a" or tar or pax.

The problem is that there are an *enormous* number of little files
on these disks (150 hard links to the same data in some cases), and
none of these copy methods works efficiently with that many files.
Rsync bombs after using up all the RAM and swap (it needs about 80
bytes per file link).  "cp -a" is running now; after two days, it has
moved about 40% of the data, because it is traversing every file tree
in a rather unintelligent manner.  

Is there a better way, or some non-standard disk copy utility out
there?  For future use, I am thinking about building a perl wrapper
around rsync that moves data in smaller chunks rather than a whole
partition-full at a time.  However, if someone has already done
something like this, why re-invent the wheel?

Keith

-- 
Keith Lofstrom           keithl at ieee.org         Voice (503)-520-1993
KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon"
Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs