Mark Coetser <> wrote on 09/17/2012 03:08:49 AM:

> Hi
> backuppc                                       3.1.0-9.1
> rsync                                          3.0.7-2
> OK I have a fairly decent spec backup server with 2 gigabit e1000 nics
> bonned together and running in bond mode 0 all working 100%. If I run
> plain rsync between the backup server and a backup client both connected
> on gigabit lan I can get sync speeds of +/- 300mbit/s but using backuppc
> and rsync the max speed I get is 20mbit and the backup is taking
> forever. Currently I have a full backup thats been running for 3461:23
> minutes where as the normal rsync would have taken a few hours to complete.
> The data is users maildirs and its about 2.6Tb and I am not using rsync
> over ssh, I have the rsync daemon running on the client and have setup
> the .pl as follows.

I have several very similar configurations.  Here's an example:

Atom D510 (1.66GHz x 2 Cores)
CentOS 6 64-bit
4 x 2TB Seagate SATA drives in RAID-6 configuration
        I get almost 200 MB/s transfer rate from this array...
2 x Intel e1000 NICs in bonded mode.

In the past, the biggest server I backed up was around 1TB.  Personally, I prefer to keep each server image under 1TB if I can help it.  Everything is easiser that way:  not just file-level backups with BackupPC but image level as well, and there's less downtime (or less time with noticaeable slowdown if it is up) when having to take such images.

With servers <1TB, rsync-based BackupPC full backups are slow, but get done in a reasonable amount of time:  8-12 hours, and I can live with that.  It is usually kind of beneficial:  if I start a backup in the middle of the day it does not hammer the client I'm backing up noticeably.  (Lemons, lemonade...  :) )

However, I have recently inherited a server that is >3TB big, and 97% full, too!  Backups of that system take 3.5 *days* to complete.  I *can't* live with that.  I need better performance.

I was going to write a very similar e-mail to what you wrote as well!  So maybe we can work this together.

All of your configuration looks pretty straightforward to me (except the mounts:  I'm not sure why you have them if you're using rsyncd).  Mine are quite similar.

No matter the size of the system, I seem to top out at about 50GB/hour for full backups.  Here is a perfectly typical example:

Full Backup:  769.3 minutes for 675677.3MB of data.  That works out to be 878MB/min, or about 15MB/s.  For a system with an array that can move 200MB/s, and a network system that can move at least 70MB/s.

Now, let's look at the "big" server:

Full backup:  5502.8 minutes for 2434613.6MB of data.  That's even worse: 442MB/min.  And 5502.8 minutes is three and a half *DAYS*.

First, a quick look at the client will show that we can eliminate it completely.  I have checked the performance of several of them while a backup is running.  The client is not CPU or I/O or memory bound whatsoever.  Here is a typical example:  a Windows Server 2008.  Task Manager shows minimal everything:  between 0% and 20% CPU usage (with most time below 5%), and more than 1GB of 2GB RAM free (with 1300MB of cached memory).  Network utilization is absolutely flatlined!  A quick sanity check of the server's physical drive lights show that the drive activity is in brief fits and starts.  This system is *clearly* not being taxed.  By the way, this contrasts to the beginning of the backup, when rsync is building the file list. The rsync daemon's CPU usage bounces around with peaks over 70%, and the drives are blinking constantly during this process--so the server is perfectly capable of doing something when it's asked to!

The server side, though, shows something completely different.  Here is a few lines from dstat:

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
 33   2  64   1   0   0|  22M   47k|   0     0 |   0     0 |1711   402
 43   3  49   6   0   0|  40M  188k|  35k 1504B|   0     0 |2253   632
 45   4  49   1   0   1|  50M   36k|  38k 1056B|   0     0 |2660   909
 46   4  50   0   0   0|  46M    0 |  55k 1754B|   0     0 |2540   622
 45   4  50   1   0   0|  45M   12k| 120B  314B|   0     0 |2494   708
 43   3  50   3   0   0|  42M    0 |  77k 1584B|   0     0 |2613   958
 41   4  47   8   0   0|  50M  268k| 449B  356B|   0     0 |2333   704
 46   3  50   1   0   0|  42M   36k|  26k 1122B|   0     0 |2583   771
 45   4  50   1   0   0|  40M    0 |  30k  726B|   0     0 |2499   681

It looks like everything is under-utilized.  For example, I'm getting a measly 40-50MB of read performance from my array of four drives, and *nothing* is going out over the network.  My physical drive and network lights echo this:  they are *not* busy.  My interrupts are certainly manageable and context switches are very low.  Even my CPU numbers look tremendous:  nearly no time in wait, and about 50% CPU idle!

Ah, but there's a problem with that.  This is a dual-core system.  Any time you see a dual-core system that is stuck at 50% CPU utilization, you can bet big that you have a single process that is using 100% of the CPU of a single core, and the other core is sitting there idle.  That's exactly what's happening here.

Notice what top shows us:

top - 13:21:27 up 49 min,  1 user,  load average: 2.07, 1.85, 1.67
Tasks: 167 total,   2 running, 165 sleeping,   0 stopped,   0 zombie
Cpu(s): 43.7%us,  3.6%sy,  0.0%ni, 50.5%id,  2.1%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   3924444k total,  3774644k used,   149800k free,     9640k buffers
Swap:        0k total,        0k used,        0k free,  3239600k cached

 1731 backuppc  20   0  357m 209m 1192 R 95.1  5.5  35:58.08 BackupPC_dump
 1679 backuppc  20   0  360m 211m 1596 D 92.1  5.5  32:54.18 BackupPC_dump

My load average is 2, and you can see those two processes:  two instances of BackupPC_dump.  *Each* of them are using 100% of the CPU given to them, but they're both using the *same* CPU (core), which is why I have 50% idle!

Mark Coetser, can you see what top shows for the CPU utilization for your system while doing a backup?  Don't just look at the single "idle" or "user" numbers:  look at each BackupPC process as well, and let us know what they are--and how many physical (and hyper-threaded) cores you have.  Additional info can be found in /proc/cpuinfo if you don't know the answers.

To everyone:  is there a way to get Perl to allow each of these items to run on *different* processes?  From my quick Google it seems that the processes must be forked using Perl modules designed for this purpose.  At the moment, this is beyond my capability.  Am I missing an easier way to do this?

And one more request:  for those of you out there using rsync, can you give me some examples where you are getting faster numbers?  Let's say, full backups of 100GB hosts in roughly 30-35 minutes, or 500GB hosts in two or three hours?  That's about four times faster than what I'm seeing, and would work out to be 50-60MB/s, which seems like a much more realistic speed.  If you are seeing such speed, can you give us an idea of your hardware configuration, as well as an idea of the CPU utilization you're seeing during the backups?  Also, are you using compression or checksum caching?  If you need help collecting this info, I'd be happy to help you.

To cover a couple of other frequently suggested items, here's what I've examined to improve this:

Yes, I have noatime.  From fstab:  UUID=<snipped>  /data     ext4    defaults,noatime    1 2
Noatime only makes a difference when you are I/O bound--which ideally a BackupPC server would be.  In my case, it made very little difference.  I'm not I/O bound.

I am using EXT4.  I have gotten very similar performance with EXT3.  Have not tried XFS or JFS, but would *really* prefer to keep my backups on the extremely well-known and supported EXT series.

I am using compression on this BackupPC server.  Obviously, this may contribute to the CPU consumption.  My old servers did not have compression, but had terrible VIA C3 single-core processors.  And their backup performance was quite similar.  I figured with the Atom D510 I'd be OK with compression.  But maybe not.  I'll try to see if I can do some testing with some smaller hosts without compression and see what happens.

As for checksum caching:  As I mentioned, I think the strength of leaving it off is very valuable.  But I look forward to seeing the performance others are getting and how they compare to see at what performance cost this protection is coming.

Thank you very much for your help!

Timothy J. Massey

Out of the Box Solutions, Inc.
Creative IT Solutions Made Simple!
      22108 Harper Ave.
St. Clair Shores, MI 48080
Office: (800)750-4OBS (4627)
Cell: (586)945-8796