Les Mikesell <lesmikesell@gmail.com> wrote on 09/17/2012 01:34:33 PM:

> On Mon, Sep 17, 2012 at 11:05 AM, Timothy J Massey
> <tmassey@obscorp.com> wrote:
> >
> >
> > I'm writing a longer reply, but here's a quick in-thread reply:
> >
> > I know exactly what you mean by waiting until after the first full.  Often
> > the second full will be faster -- but only *IF* you are bandwidth limited
> > will you will see an improvement.  In this case, neither him nor I are
> > bandwidth limited.  I don't see an improvement.
> The 2nd might even be slower, since the server side has to decompress
> and recompute the checksums.

Interesting possibility.  However, on the "big" server, it's new enough to see the first backup.  First one took 9987 minutes, and the second took 5558.  The third took         5502.  So there *was* a significant "speedup" between the first and second.  But I'm still only getting 442MB/minute, or 7MB/s.  That server really should be able to get 5 times as much without breaking a sweat.  1100 minutes is still a long time, but manageable.  5500 minutes is nearly four *days*...  :(

> There's little risk of file-level corruption that would still let the
> checksums cached at the end of the file match unless you have bad RAM
> (which would likely cause crashes) or physical disk block corruption
> which you can check relatively quickly with a 'cat /dev/sd?
> >/dev/null' or a smartctl test run followed by checking the status.

It's disk block corruption caused by (silently) failing drives I'm most worried about.  It wasn't until a data-loss event a few months ago that I found that smartd was not set up properly--and I'm not certain that SMART would have actually helped in this case.  (Fortunately, the loss of data was on the backup server alone, and no one needed that data at that moment, and my off-site archives were unaffected, but I still didn't like it.)  I *do* scrub the array (the default is weekly, I believe),  and now have both SMART and md configured to e-mail alerts.

But I still like the extra protection of *directly* comparing the files.  But not enough to take 4 days to do a backup!  :)

I'm working on investigating each of these possibilities to improve performance.  I will let everyone know what I find.

