From: John R. <rou...@re...> - 2012-11-20 21:07:20
|
On Tue, Nov 20, 2012 at 03:34:02PM -0500, Bowie Bailey wrote: > On 11/20/2012 3:13 PM, John Rouillard wrote: > > On Tue, Nov 20, 2012 at 09:46:33AM -0500, Bowie Bailey wrote: > >> On 11/19/2012 4:35 PM, John Rouillard wrote: > >>> What may also work is to use excludes to do your sharding. I have 4 > >>> "hosts" now with different excludes. All of them back up the same share: > >> That seems a bit overly complex. Wouldn't it be easier to use includes? > >> > >> # include subdirectories starting with a, b, or c case insensitive > >> $Conf{BackupFilesOnly} = { > >> '/home1' => [ "/[A-Ca-c]*/**" ], > >> }; > >> > >> # include subdirectories starting with d...m case insensitive > >> $Conf{BackupFilesOnly} = { > >> '/home1' => [ "/[D-Md-m]*/**" ], > >> }; > >> > >> # include subdirectories starting with n...z case insensitive > >> $Conf{BackupFilesOnly} = { > >> '/home1' => [ "/[N-Zn-z]*/**" ], > >> }; > >> # exclude your problem case > >> $Conf{BackupFilesExclude} = { > >> '/home1' => [ "- /user/**" ], > >> }; > > IIRC BackupFilesOnly and BackupFilesExclude interact in very wierd > > ways. I think you can only choose one method. > > > >> # back up problem user and other misc directories (non-alphabetic > >> first char) > >> $Conf{BackupFilesExclude} = { > >> '/home1' => [ "+ /user/**", "- /[A-Za-z]*/**" ], > >> }; > >> > >> This way, it is much more obvious what is being backed up by each host. > > That is true but..... > > > >> This is off the top of my head and not tested, so it may need to be > >> tweaked a bit. > > But what happens when somebody creates a directory starting with '.', > > '!' or some unicode character that you didn't put in your include > > range? > > Then they get backed up by the last host which IS done as an exclude. > > > With exclusion you will get those directories (multiple times but you > > will get them). With inclusion you must be sure that there is no > > chance at all of having a directory created that you have not included > > otherwise you have no backup. There is also no error that you have no > > backup. Granted expanding the inclusion to all possible initial > > characters is possible, but IMO more likey to fail. > > > > Also character classes depend on the language settings. While I expect > > everybody to use LANG=C that is probaly a stupid assumption. In > > Estonian [A-Z] is not the same as it is with LANG=C (see > > http://en.wikipedia.org/wiki/Estonian_alphabet first listing). While > > it's unlikely you would trip over it, you still need to realize the > > issue as it leads to a silent failure to back up data. So you need to > > test every possible filename that can possibly be included. > > > > If you are using exclusion, you still have the same character class > > issue, but since you are excluding those files, some host will not > > have that range/character class excluded and the files will get backed > > up. So it fails safely - the data is backed up. > > > > Hence I claim the testing is easier, you just run 10 or so test cases > > (control character, puncutation, chars > 128, Things beginging with T > > if you are Estonian ...) that are not in any of the exclusion ranges > > and verify that it gets backed up. > > The first set of hosts backs up their specified range (a-c,d-m,n-z). > The last host backs up everything else (exclude [a-z]). Any strange > characters, punctuation and such be picked up by the last host. Since > the last host only excludes what was included in the other hosts, I > don't see an issue. Language settings should not matter since any given > character will either match [a-zA-Z] or fail to match. If it matches, > it will be picked up by the includes in the first few hosts. If it > doesn't match, it will be picked up by the last host. You are assuming that [A-Za-z] is the same as [A-Ca-cD-Md-mN-Zn-z]. You are correct AFAIK in the C locale. I don't feel comfortable making the same claim in any other locale. E.G. There could be a C caret after C and before D that is included in the exclude list [A-Z] but not in the other set. To work around that this should work: # back up problem user and other misc directories (non-alphabetic first char) $Conf{BackupFilesExclude} = { '/home1' => [ "+ /user/**", "- /[A-Ca-c]*/**", "- /[D-Md-m]*/**", "- /[N-Zn-z]*/**", ], }; The reason I could exclude all alphabetics in my 4th host setup was because it only needed to back up exactly one directory. The other 'hosts' handled the backup of strange directories/files. I agree this method looks promising (except for the mixing of BackupFilesExclude and BackupFilesOnly). -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 |