From: John G. <jgo...@co...> - 2009-02-18 17:08:27
|
Hi, I've been reading docs on BackupPC and I have a few questions about how it works. First off, I gather that it keeps a hardlinked pool of data, so whenever a file changes on any host, on the backup device, it will be hardlinked to a file containing the same data, regardless of the host it came from, right? So, given that, I don't really understand why there is a distinction between a full and an incremental backup. Shouldn't either one take up the same amount of space? That is, if you've got few changes on the client, then on the server you're mostly just hardlinking things anyway, right? So why is there a choice? Secondly, I gather that BackupPC mangles filenames. That doesn't bother me, but how is it possible to use rsync in an efficient way with that? rsync wouldn't be able to match up client-side filenames with the server-side names since the server names are different, so it wouldn't do its efficient transfers. Either that or you're having to create temporary directory trees on the server, which sounds inefficient. Or am I missing something? Thanks, -- John |
From: Richard S. <hob...@gm...> - 2010-10-29 13:16:58
|
There is surprisingly little info on how BackupPC really works, at least with the google searches I've tried. I'm just looking for a concise overview of how the different backup methods work and how they are different from one another. The reason I'm looking for this information is I will be giving a presentation of BackupPC to my local LUG. I'm looking for something as close as I can get to: Rsync: 1. BackupPC does x. 2. Then this. 3. Then this. 4. etc. Rsyncd: 1. Same as rsync except x. Tar: ... Smb: ... and so forth. For instance: I can understand how tar over SSH transfers files. But who decides what files to transfer? Does BackupPC crawl the share first and build a filelist? Or is there an option for tar that takes care of if for BackupPC? For SMB I understand that it would just look like a local filesystem after the share is mapped to the BackupPC server. Then what? Does it use rsync against the network share? Thanks, Richard |
From: Les M. <les...@gm...> - 2010-10-29 15:25:19
|
On 10/29/2010 8:16 AM, Richard Shaw wrote: > There is surprisingly little info on how BackupPC really works, at > least with the google searches I've tried. I'm just looking for a > concise overview of how the different backup methods work and how they > are different from one another. > > The reason I'm looking for this information is I will be giving a > presentation of BackupPC to my local LUG. > > I'm looking for something as close as I can get to: > > Rsync: > 1. BackupPC does x. > 2. Then this. > 3. Then this. > 4. etc. > > Rsyncd: > 1. Same as rsync except x. > > Tar: > ... > > Smb: > ... > > and so forth. First backuppc uses a native tool to transfer files. Then it checks the file contents against the pool with a hashing mechanism, and replaces any exact matches with a link to the existing pool copy. > For instance: I can understand how tar over SSH transfers files. But > who decides what files to transfer? Your file include or exclude lists are mapped into the options appropriate for the xfer program with the 'share' as the top of the tree. > Does BackupPC crawl the share > first and build a filelist? Or is there an option for tar that takes > care of if for BackupPC? With tar and smb, backuppc doesn't know much about the remote side - it just passes the options to the program. Tar runs over ssh entirely on the remote side. > For SMB I understand that it would just look like a local filesystem > after the share is mapped to the BackupPC server. Then what? Does it > use rsync against the network share? The smb method actually uses the smbtar program so it looks more like tar than a mapped file system. The rsync method runs a native rsync via ssh on the remote side, using a perl implementation on the server. Rsyncd is similar, but talks to a standalone rsync running in daemon mode that must be set up on the target. Rsync sends the entire directory tree you request from the remote, then both sides walk the list to find and send differences. The practical difference is that rsync uses less bandwidth and is better at catching every change in incrementals. The smb and tar methods go strictly by file timestamps and will miss moved or copied files that keep their old timestamps, where rsync will find them with the directory comparison against your previous full tree. -- Les Mikesell les...@gm... |
From: Kris L. <kl...@th...> - 2010-10-29 16:50:28
|
This is informative. Comparing Rsync vs Rsyncd, which has less load on the client side? I'm considering moving away from my implementation rsync via autofs cifs. Kris Lou kl...@th... On Fri, Oct 29, 2010 at 8:25 AM, Les Mikesell <les...@gm...> wrote: > On 10/29/2010 8:16 AM, Richard Shaw wrote: > > There is surprisingly little info on how BackupPC really works, at > > least with the google searches I've tried. I'm just looking for a > > concise overview of how the different backup methods work and how they > > are different from one another. > > > > The reason I'm looking for this information is I will be giving a > > presentation of BackupPC to my local LUG. > > > > I'm looking for something as close as I can get to: > > > > Rsync: > > 1. BackupPC does x. > > 2. Then this. > > 3. Then this. > > 4. etc. > > > > Rsyncd: > > 1. Same as rsync except x. > > > > Tar: > > ... > > > > Smb: > > ... > > > > and so forth. > > First backuppc uses a native tool to transfer files. Then it checks the > file contents against the pool with a hashing mechanism, and replaces > any exact matches with a link to the existing pool copy. > > > For instance: I can understand how tar over SSH transfers files. But > > who decides what files to transfer? > > Your file include or exclude lists are mapped into the options > appropriate for the xfer program with the 'share' as the top of the tree. > > > Does BackupPC crawl the share > > first and build a filelist? Or is there an option for tar that takes > > care of if for BackupPC? > > With tar and smb, backuppc doesn't know much about the remote side - it > just passes the options to the program. Tar runs over ssh entirely on > the remote side. > > > For SMB I understand that it would just look like a local filesystem > > after the share is mapped to the BackupPC server. Then what? Does it > > use rsync against the network share? > > The smb method actually uses the smbtar program so it looks more like > tar than a mapped file system. > > The rsync method runs a native rsync via ssh on the remote side, using a > perl implementation on the server. Rsyncd is similar, but talks to a > standalone rsync running in daemon mode that must be set up on the > target. Rsync sends the entire directory tree you request from the > remote, then both sides walk the list to find and send differences. > > The practical difference is that rsync uses less bandwidth and is better > at catching every change in incrementals. The smb and tar methods go > strictly by file timestamps and will miss moved or copied files that > keep their old timestamps, where rsync will find them with the directory > comparison against your previous full tree. > > -- > Les Mikesell > les...@gm... > > > > > ------------------------------------------------------------------------------ > Nokia and AT&T present the 2010 Calling All Innovators-North America > contest > Create new apps & games for the Nokia N8 for consumers in U.S. and Canada > $10 million total in prizes - $4M cash, 500 devices, nearly $6M in > marketing > Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store > http://p.sf.net/sfu/nokia-dev2dev > _______________________________________________ > BackupPC-users mailing list > Bac...@li... > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users > Wiki: http://backuppc.wiki.sourceforge.net > Project: http://backuppc.sourceforge.net/ > |
From: Les M. <les...@gm...> - 2010-10-29 17:08:32
|
On 10/29/2010 11:50 AM, Kris Lou wrote: > This is informative. Comparing Rsync vs Rsyncd, which has less load on > the client side? I'm considering moving away from my implementation > rsync via autofs cifs. Rysnc over ssh will have some additional load for encryption - but you can minimize that by using blowfish. If you don't need encryption, the choice should be more about which you'd rather set up and whether you'd use ssh for anything else. In older versions of the windows/cygwin ports of ssh and rsync there was a bug that would make it hang at random when rsync was started by sshd, but that is fixed in the current (1.7.x) versions. It's probably easier to set up a standalone rsyncd on windows, but with ssh you can mix/match windows and linux without much regard for the differences. -- Les Mikesell les...@gm... |
From: Richard S. <hob...@gm...> - 2010-10-29 17:53:54
|
On Fri, Oct 29, 2010 at 10:25 AM, Les Mikesell <les...@gm...> wrote: > First backuppc uses a native tool to transfer files. Then it checks the > file contents against the pool with a hashing mechanism, and replaces > any exact matches with a link to the existing pool copy. > >> For instance: I can understand how tar over SSH transfers files. But >> who decides what files to transfer? > > Your file include or exclude lists are mapped into the options > appropriate for the xfer program with the 'share' as the top of the tree. So in effect, BackupPC doesn't do anything directly, but rather indirectly by proxy? (not the network meaning, but the literal meaning). >> Does BackupPC crawl the share >> first and build a filelist? Or is there an option for tar that takes >> care of if for BackupPC? > > With tar and smb, backuppc doesn't know much about the remote side - it > just passes the options to the program. Tar runs over ssh entirely on > the remote side. So with tar/smbtar for a full backup all files are transferred? In other words there's no checksum on the remote client to see if it already exists? >> For SMB I understand that it would just look like a local filesystem >> after the share is mapped to the BackupPC server. Then what? Does it >> use rsync against the network share? > > The smb method actually uses the smbtar program so it looks more like > tar than a mapped file system. > > The rsync method runs a native rsync via ssh on the remote side, using a > perl implementation on the server. Rsyncd is similar, but talks to a > standalone rsync running in daemon mode that must be set up on the > target. Rsync sends the entire directory tree you request from the > remote, then both sides walk the list to find and send differences. I assume that the checksuming that rsync does can create quite a CPU load on the client which is why on linux clients are often nice'd? (or ionice?) Richard |
From: Les M. <les...@gm...> - 2010-10-29 18:12:32
|
On 10/29/2010 12:53 PM, Richard Shaw wrote: > >>> For instance: I can understand how tar over SSH transfers files. But >>> who decides what files to transfer? >> >> Your file include or exclude lists are mapped into the options >> appropriate for the xfer program with the 'share' as the top of the tree. > > So in effect, BackupPC doesn't do anything directly, but rather > indirectly by proxy? (not the network meaning, but the literal > meaning). Yes, the program selected by the XferMethod is used to access the client. >> With tar and smb, backuppc doesn't know much about the remote side - it >> just passes the options to the program. Tar runs over ssh entirely on >> the remote side. > > So with tar/smbtar for a full backup all files are transferred? In > other words there's no checksum on the remote client to see if it > already exists? Yes. Then existing files are discarded on the server and replaced with links to the pooled copies. >> The rsync method runs a native rsync via ssh on the remote side, using a >> perl implementation on the server. Rsyncd is similar, but talks to a >> standalone rsync running in daemon mode that must be set up on the >> target. Rsync sends the entire directory tree you request from the >> remote, then both sides walk the list to find and send differences. > > I assume that the checksuming that rsync does can create quite a CPU > load on the client which is why on linux clients are often nice'd? (or > ionice?) It is more of an io load than CPU since disks are much slower than processors. And rsync incrementals quickly skip files where the directory name/length/timestamp match the previous full. Full runs add the --ignore-times option to rsync so all files are read for the checksum comparison but only the differences are sent over the network. -- Les Mikesell les...@gm... |
From: Peter V. <use...@se...> - 2010-10-30 09:21:39
|
On 10/29/2010 03:16 PM, Richard Shaw wrote: > The reason I'm looking for this information is I will be giving a > presentation of BackupPC to my local LUG. could you please publish it here when done? With your work my LUG maybe will see a presentation at last .-) (after me talking for years of that...) yours Peter http://www.valug.at if you a in the region of upper austria and interessted -- "Als ich so klein war wie ein Finger, da wusste meine Mutter ganz genau: das wird wieder einmal teuer. Als ich dann so gross war wie ein Fussball, ja da wusste es mein Vater dann auch. Das wird wieder einmal teuer...." (Die Nuts) [blog: http://klicklich.at] |
From: Richard S. <hob...@gm...> - 2010-11-05 21:35:56
|
On Sat, Oct 30, 2010 at 4:21 AM, Peter Vratny <use...@se...> wrote: > On 10/29/2010 03:16 PM, Richard Shaw wrote: >> The reason I'm looking for this information is I will be giving a >> presentation of BackupPC to my local LUG. > > could you please publish it here when done? > > With your work my LUG maybe will see a presentation at last .-) > (after me talking for years of that...) Certainly, but I'm not sure how detailed I'm going to make it... I'm teaching right now during the normal meeting times so I've got the December meeting slot. I should have the presentation done sometime this month. Richard |
From: Carl W. S. <ch...@re...> - 2010-11-02 18:15:37
|
On 10/29 08:16 , Richard Shaw wrote: > There is surprisingly little info on how BackupPC really works, at > least with the google searches I've tried. I'm just looking for a > concise overview of how the different backup methods work and how they > are different from one another. Reading /etc/backuppc/config.pl is remarkably informative. I suggest it to all Backuppc users. -- Carl Soderstrom Systems Administrator Real-Time Enterprises www.real-time.com |
From: Tyler J. W. <ty...@to...> - 2010-11-02 20:21:56
|
On Tue, 2010-11-02 at 13:15 -0500, Carl Wilhelm Soderstrom wrote: > On 10/29 08:16 , Richard Shaw wrote: > > There is surprisingly little info on how BackupPC really works, at > > least with the google searches I've tried. I'm just looking for a > > concise overview of how the different backup methods work and how they > > are different from one another. > > Reading /etc/backuppc/config.pl is remarkably informative. I suggest it to > all Backuppc users. And to be honest, so is the web-based inline documentation that comes with it. Nearly all you need to understand about BackupPC, you can learn by configuring it once. Perhaps this is why it isn't documented elsewhere. Regards, Tyler -- "Freedom of thought is best promoted by the gradual illumination of men's minds, which follows from the advance of science." -- Charles Darwin |
From: Tino S. <bac...@ti...> - 2009-02-18 17:33:52
|
Hi John, On Wed, Feb 18, 2009 at 10:58:14AM -0600, John Goerzen wrote: > I've been reading docs on BackupPC and I have a few questions about > how it works. > > First off, I gather that it keeps a hardlinked pool of data, so > whenever a file changes on any host, on the backup device, it will be > hardlinked to a file containing the same data, regardless of the host > it came from, right? Right. > So, given that, I don't really understand why there is a distinction > between a full and an incremental backup. Shouldn't either one take > up the same amount of space? That is, if you've got few changes on > the client, then on the server you're mostly just hardlinking things > anyway, right? So why is there a choice? The only difference between incremental and full (for rsync!) is that 1) all files are completely checksummed, so you detect pool curruption 2) you get the whole directory structore for the server (which is used as the base for incremental backups) with all hardlinks to pool files For an incremental, you only get the directory structure and hardlinks to new/modified files to the pool. > Secondly, I gather that BackupPC mangles filenames. That doesn't > bother me, but how is it possible to use rsync in an efficient way > with that? rsync wouldn't be able to match up client-side filenames > with the server-side names since the server names are different, so it > wouldn't do its efficient transfers. Either that or you're having to > create temporary directory trees on the server, which sounds > inefficient. Or am I missing something? BackupPC does not use rsync directly on the backup server. It uses a Perl module File::RsyncP which speaks the rsync protocol and handles all the checksumming, file comparison etc. IIRC, the file hash for the pool is built based on the first 256k of a file, therefore, collisions occur and are handled (by creating new pool files). That way, the pool may be compressed as well. (I've got the feeling I missed some detail here, but the overall picture should be correct.) HTH, Tino. -- "What we nourish flourishes." - "Was wir nähren erblüht." www.lichtkreis-chemnitz.de www.craniosacralzentrum.de |
From: Adam G. <mai...@we...> - 2009-02-18 17:48:28
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tino Schwarze wrote: > Hi John, > > On Wed, Feb 18, 2009 at 10:58:14AM -0600, John Goerzen wrote: > >> I've been reading docs on BackupPC and I have a few questions about >> how it works. >> >> First off, I gather that it keeps a hardlinked pool of data, so >> whenever a file changes on any host, on the backup device, it will be >> hardlinked to a file containing the same data, regardless of the host >> it came from, right? > > Right. Mostly right... If you have a file with identical content stored on two different hosts (or even two files on the same host): host1:/var/log/messages host1:/var/log/kernel.log Let's assume these two files get the exact same log data... They are both backed up onto the server, so each file in full is transferred to the server, no bandwidth savings (basically)... The next day, both files have changed, but the two new files are identical. The first file is copied to a new file in the backup dir, and rsync transfers only the changed data. The second file is copied to a new file in the backup dir, and rsync transfers only the changed data. After the backup completes, backuppc runs through all the new files, and creates a hardlink between the first file and the pool. When it sees the second file, it will delete it from the backup dir, and create a hardlink to the version in the pool. The same applies if the two files were on different hosts. If the host or path is different, then the changed data will be transferred multiple times (or entire content for new files). Worst case is when someone manages to copy their photo library or something on a remote host... >> So, given that, I don't really understand why there is a distinction >> between a full and an incremental backup. Shouldn't either one take >> up the same amount of space? That is, if you've got few changes on >> the client, then on the server you're mostly just hardlinking things >> anyway, right? So why is there a choice? > > The only difference between incremental and full (for rsync!) is that > 1) all files are completely checksummed, so you detect pool curruption > 2) you get the whole directory structore for the server (which is used > as the base for incremental backups) with all hardlinks to pool files > > For an incremental, you only get the directory structure and hardlinks > to new/modified files to the pool. Maybe not (1) since there is an option CSumVerify or something, which is set to 0.01 by default (checks 1% of pool files) each time. Basically, incremental uses less disk IO, CPU, and memory on both client and server, because it doesn't examine the files on the client in as much detail (just size, path, modification date/time) instead of checksum as well. Regards, Adam -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkmcSeMACgkQGyoxogrTyiVwTwCfZS5vCvoyEgaiwQoW4hGipCgZ 0q0AnRVlccbJqXnXsPnbghDmMsj34jXC =OvXr -----END PGP SIGNATURE----- |
From: Les M. <les...@gm...> - 2009-02-18 17:47:17
|
John Goerzen wrote: > > First off, I gather that it keeps a hardlinked pool of data, so > whenever a file changes on any host, on the backup device, it will be > hardlinked to a file containing the same data, regardless of the host > it came from, right? > > So, given that, I don't really understand why there is a distinction > between a full and an incremental backup. Shouldn't either one take > up the same amount of space? That is, if you've got few changes on > the client, then on the server you're mostly just hardlinking things > anyway, right? So why is there a choice? With the tar and smb backup methods, full runs transfer everything from the remote, incrementals transfer only files with timestamps newer than the last full. With rsync, a full does a block checksum compare of all files, incrementals only files where the timestamp or length differ. On the server side, fulls rebuild a complete tree of links, incrementals only have the differing files. > Secondly, I gather that BackupPC mangles filenames. That doesn't > bother me, but how is it possible to use rsync in an efficient way > with that? rsync wouldn't be able to match up client-side filenames > with the server-side names since the server names are different, so it > wouldn't do its efficient transfers. Either that or you're having to > create temporary directory trees on the server, which sounds > inefficient. Or am I missing something? The server doesn't run the stock version of rsync. It has a perl version that understands the filename and compression conventions it uses for storage and can work with a stock rsync on the remote side. -- Les Mikesell les...@gm... |
From: John G. <jgo...@co...> - 2009-02-18 20:08:25
|
On 2009-02-18, Les Mikesell <les...@gm...> wrote: > John Goerzen wrote: >> So, given that, I don't really understand why there is a distinction >> between a full and an incremental backup. Shouldn't either one take >> up the same amount of space? That is, if you've got few changes on >> the client, then on the server you're mostly just hardlinking things >> anyway, right? So why is there a choice? > > With the tar and smb backup methods, full runs transfer everything from > the remote, incrementals transfer only files with timestamps newer than > the last full. With rsync, a full does a block checksum compare of all > files, incrementals only files where the timestamp or length differ. On > the server side, fulls rebuild a complete tree of links, incrementals > only have the differing files. So, if I use the rsync method, is there any reason to ever run a full backup after the very first one? It seems like all the info needed would be preserved, even if that very first full backup gets deleted eventually, right? |
From: Tino S. <bac...@ti...> - 2009-02-18 20:47:30
|
Hi John, On Wed, Feb 18, 2009 at 01:24:16PM -0600, John Goerzen wrote: > >> So, given that, I don't really understand why there is a distinction > >> between a full and an incremental backup. Shouldn't either one take > >> up the same amount of space? That is, if you've got few changes on > >> the client, then on the server you're mostly just hardlinking things > >> anyway, right? So why is there a choice? > > > > With the tar and smb backup methods, full runs transfer everything from > > the remote, incrementals transfer only files with timestamps newer than > > the last full. With rsync, a full does a block checksum compare of all > > files, incrementals only files where the timestamp or length differ. On > > the server side, fulls rebuild a complete tree of links, incrementals > > only have the differing files. > > So, if I use the rsync method, is there any reason to ever run a full > backup after the very first one? It seems like all the info needed > would be preserved, even if that very first full backup gets deleted > eventually, right? No, there is still info missing: The incremental has "holes" - unchanged files are not linked into the directory tree, so you'll lose files. Just don't bother about the fulls - have BackupPC take them once in a while. Bye, Tino. -- "What we nourish flourishes." - "Was wir nähren erblüht." www.lichtkreis-chemnitz.de www.craniosacralzentrum.de |
From: Les M. <le...@fu...> - 2009-02-18 23:53:11
|
John Goerzen wrote: >> With rsync, a full does a block checksum compare of all >> files, incrementals only files where the timestamp or length differ. On >> the server side, fulls rebuild a complete tree of links, incrementals >> only have the differing files. > > So, if I use the rsync method, is there any reason to ever run a full > backup after the very first one? It seems like all the info needed > would be preserved, even if that very first full backup gets deleted > eventually, right? Rsync does its comparison against the previous full, so the incremental runs transfer more and more as changes accumulate. Also, you can't delete a full when you have incrementals that depend on it. The best strategy is to just skew the days when the fulls run (by manually forcing one at an appropriate time). -- Les Mikesell les...@gm... |
From: Cody D. <cd...@cs...> - 2009-02-19 03:15:47
|
Hi everyone, I tried to summarize this conversation on the wiki. Please check over it to see if there's anything that I got wrong or you can add: http://backuppc.wiki.sourceforge.net/Full+vs.+Incremental+Backups Cody |