From: Tim F. <ti...@ni...> - 2011-12-16 14:01:13
|
On Fri, 2011-12-16 at 07:33 -0600, Les Mikesell wrote: > On Fri, Dec 16, 2011 at 4:49 AM, Jean Spirat <jea...@sq...> wrote: > > for my understanding rsync had allways seems to be the most efficient > > of the two but i never challenged this "fact" ;p > > Rsync working natively is very efficient, but think about what it has > to do in your case. It will have to read the entire file across nfs > just so rsync can compere contents and decide not to copy the content > that already exists in your backup. > > > i will have a look at tar and see if i can work with it . > > I'd try rsync over ssh first, at least if most of the files do not > change between runs. If you don't have enough ram to hold the > directory listing or if there are changes to a large number of files > per run, tar might be faster. The real issue with rsync is the memory usage for the 8 million entries in the file list. This is because the first thing that happens is rsync walks the tree comparing with already backuped up files to see if the date stamp has changed. This puts memory and disk load on both the backup server and the backed up client. The approach that tar uses is just to walk the directory tree and transfer everything newer than a timestamp that backuppc passes to it. This costs some extra network bandwidth but massively reduces the disk and memory bandwidth needed on both the backuppc client and server. The server that I am backing up with ~7 million files takes on the order of 6000 minutes to backup with rsync, the bulk of that time is taken up by rsync building the tree of files to transfer. The same server takes about 2500 minutes with tar because of the simpler way of finding files. Overall rsync makes better backups because it finds moved and deleted files and is far far more efficient with network bandwidth, but if you understand the draw backs and need the filesystem efficiency of tar then it is still an excellent backup tool. -- Tim Fletcher <ti...@ni...> |