On 05/02/13 21:11, Sorin Srbu wrote:
>> Have you done at least two FULL backups since you enabled the
>> checksum-seed option? If not, stop now, and wait until you have.
> I have eighteen full backups online for this particular machine.
Note I said number of full backups since you changed that option, not
just number of fulls. There is a difference!
>> Check the following during an incremental backup:
>> 1) Memory/swap used on both backup server and the backup client. If you
>> are using all available memory, or see memory being paged in/out (use
>> vmstat) then you need to upgrade RAM on that machine, or find a way to
>> backup a smaller number of files (split the client into multiple shares
>> or multiple machines, etc).
> Thanks. Are there any particular limits/numbers I should be aware of, i.e. the
> rule-of-thumb kind?
No, as long as you have enough. The only way to see if you have enough
in your environment is to watch it during a backup and see.
>> 2) Check disk performance on the backup client. You have a single SATA
>> drive on the client, and this will be slow, you are doing a lot of
>> seeks, not just one big read. Can you enable noatime on the client
>> (probably)? This would decrease the amount of seek and writes on the
>> single SATA drive.
> Noatime on client; Just checked - not set. Setting. Thanks for the reminder!
> As for the single drive, I can't do much about that. It's an instrument
> computer and not really allowed to change the config or the service
> support-people won't be too happy about it.
Understood, sometimes you just don't get a choice.... In any case, you
do need to check this anyway to find out if it is in fact the
bottleneck. If it is, then there is no point looking or changing
anything else, if not, then you will need to keep looking.
You can keep an eye on the /sys/block/sda/stat file, in particular watch
the activetime value (10th value). If this is increasing at the nearly
the same rate as wall clock time, then it means your drive is basically
100% busy, and therefore the bottleneck. If it is much slower than wall
clock time, then your bottleneck is elsewhere... Again, you want to
watch this during a backup, both during the first stage while the client
is building the list of files to backup, and again while the 200M data
is being transferred.
>> 3) Check disk performance on the backup server
> Any best practices here?
Yes, get as many drives as possible, make each drive as fast as
possible, and combine with a hardware raid card with a Battery Back Up
IMHO, also only use RAID level 0, 1 or 10 (ie, no checksum based raid
levels like 5 or 6).
In reality, you make do with what you have.
BTW, you might also look into changing the filesystem format from ext3
to something more modern which will probably perform better. Personally,
I use reiserfs, but only because I built this system back when it was
the best performing filesystem, wouldn't suggest it now for a new system
due to it's seemingly un-maintained status... Though it has proved
reliable for me.
>> 4) Check CPU on the backup server, if you have compression enabled, this
>> will really slow things down, consider to disable compression (though
>> this well mess with the pool).
> Is this the (GUI) Edit Config/Backup settings/CompressLevel=3 you're referring
Yes, set this to 0 to disable compression, but you might need to delete
all existing backups to really see the effect it will have. (Existing
unchanged files will still be stored compressed, only new files will be
stored uncompressed. You will still need to uncompress an old file if it
is updated, but after the first update it will be stored uncompressed.
Also, you will still uncompress a file to do a full comparison (of a
small percentage of files). (Watch CPU consumption on the backup server,
if CPU is busy, then compression is an issue, remember compression will
only use a single core even if you have a multi-core CPU).
>> 5) Check bandwidth between the two (least likely to be the culprit, but
>> worth checking).
> user@... ~/ # lftp -e 'pget
> got 193027041 of 4289386496 (4%) 6.99M/s eta:11m
> Speeds varies around 7 M(Mbyte? Mbit?)/s. I guess it's good enough for a
This is not relevant, I meant to watch what the bandwidth usage was
during a backup. BTW, 7MB/s is fine for a 10Mbps connection, but if you
really have a 100Mbps network, you should see at least 80MB/s transfer
speeds. Try testing with iperf if you want to generate your own load.
BTW, slower speed with the client compared to the server may point to
CPU or network driver issues on the client (ie, old crappy network card,
or slow cpu, etc).
>> BTW, are you sure the backup server has 2TB with 4 drives in RAID0 ?
>> That suggests that any one of those 4 drives fail, and you lose ALL of
>> your backups and pool etc... You might confirm you are using RAID0 and
>> not linear, and also check the stripe size. If you are backing up lots
>> of small files, then you want the stripe size to be about the same size
>> as your file size. If your files are between 1 and 2kB each, then you
>> would want a stripe size of 4k, not the current linux default of 512k.
>> I would suggest RAID10 if you want any sort of resilience....
> I was a bit wrong here I see; three drives and 1,4 TB. All seem active. It
> would seem I also added a drive on the PATA-port, in addition to the
> SATA-ports. I think the reason for using a PATA-drive at the time was the mobo
> only had two SATA-ports and I needed more space, thus adding a slower
> PATA-drive as well. The actual BPC-server was rather old even at the beginning
> when it was converted to a backup-server.
> Maybe PATA would slow things down a bit as well?
So is this 3 x 500G drives?
Try a "hdparm -tT /dev/sd[ab] /dev/hdb" to compare the speeds of them
(while there are no running backups). Maybe you could replace the PATA
drive now with a SATA if you have a spare around? Also might be able to
increase to the 4 x 500G drives you thought you had.... Though none of
this will help if the problem is somewhere up above....
> user@... ~/ # cat /proc/mdstat
> Personalities : [raid0]
> md0 : active raid0 sdb1 sda1 hdb1
> 1465151616 blocks 128k chunks
> I'm assuming the above mentioned 128 kB chunks are the same as stripe sizes
> and can't be changed to the 4 kB size you mention w/o a reformat. Correct?
Correct, can't be changed unless you re-create the raid wiping all the
data in the process.
> Anyway, I know. It's a calculated risk using raid0.
> I'm figuring as these are just casual backups (users always copy their
> personal data using Winscp to their homefolders, which is being backed up on
> another more resilient BPC), there's no real need for redundancy - just plenty
> of space.
No problem... just pointing it out :) You never know what other people
do or don't know.
If the problem is server side, it may be worthwhile to wipe the data and
use a smaller chunk size on the RAID, format with a different
filesystem, and start the backups again. Just remember, don't bother
timing anything until after the second full backup has finished.