Thread: [BackupPC-users] Hanging rsync backup on /usr/local

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi all:

Figured I should start a new thread on this as it is a separate
problem from the SIGPIPE issue.

I have a backup hanging until the SIGALARM triggers some 20 hours
later.

The (partial) config is:

  $Conf{XferMethod} = 'rsync';
  $Conf{RsyncClientPath} = '/usr/bin/rsync';
  $Conf{RsyncClientCmd} = '$sshPath -q -x -l backup \
     -o ServerAliveInterval=30 $host sudo $rsyncPath $argList+';
  $Conf{RsyncShareName} = [
    '/etc',
    '/var/bak',
    '/var/log',
    '/usr/local',
  ];
  $Conf{RsyncArgs} = [
            #
            # Do not edit these!
            #
            '--numeric-ids',
            '--perms',
            '--owner',
            '--group',
            '-D',
            '--links',
            '--hard-links',
            '--times',
            '--block-size=2048',
            '--recursive',
            '--one-file-system',

            #
            # Rsync >= 2.6.3 supports the --checksum-seed option
            # which allows rsync checksum caching on the server.
            # Uncomment this to enable rsync checksum caching if
            # you have a recent client rsync version and you want
            # to enable checksum caching.
            #
            '--checksum-seed=32761',

            #
            # Add additional arguments here
            #
  ];

The server side uses (a cpan2rpm locally built)
perl-File-RsyncP-0.68-1, with perl:

  Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
    Platform:
      osname=linux, osvers=2.6.9-42.elsmp,
  archname=i386-linux-thread-multi
      uname='linux build-i386 2.6.9-42.elsmp #1 smp sat aug 12 09:39:11
  cdt 2006 i686 i686 i386 gnulinux '
      config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
  -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
  -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat,
  Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
  -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
  -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
  -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
  -Dinstallusrbinperl -Ubincompat5005 -Uversiononly
  -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
  5.8.0'
      hint=recommended, useposix=true, d_sigaction=define
      usethreads=define use5005threads=undef useithreads=define
  usemultiplicity=define
      useperlio=define d_sfio=undef uselargefiles=define usesocks=undef

and runs Centos 4.4.

The client side box is a CentOS release 4.5 (Final) running
rsync-2.6.3-1 and the rsync process is run via sudo and results in ps
output of:

  root     31103  3749  0 14:59 ?        00:00:00 sshd: backup [priv]
  backup   31105 31103  0 14:59 ?        00:00:00 sshd: backup@notty

  root     31106 31105  0 14:59 ?        00:00:00 sesh /usr/bin/rsync
    --server --sender --numeric-ids --perms --owner --group -D --links
    --hard-links --times --block-size=2048 --recursive --one-file-system
    --checksum-seed=32761 --ignore-times . /usr/local/

  root     31107 31106  0 14:59 ?        00:00:00 /usr/bin/rsync
    --server --sender --numeric-ids --perms --owner --group -D --links
    --hard-links --times --block-size=2048 --recursive --one-file-system
    --checksum-seed=32761 --ignore-times . /usr/local/

and an strace of the rsync (pid 31107) looks like it is waiting for
input:

  [rouilj@vpn01 ~]$ sudo strace -p 31107
  Process 31107 attached - interrupt to quit
  select(1, [0], [], NULL, {20, 441000})  = 0 (Timeout)
  select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
  select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
  select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
  select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
  select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
  select(1, [0], [], NULL, {60, 0} ...

On the server side I have:

  BackupPC_dump,26148 /tools/BackupPC-3.1.0beta0/bin/BackupPC_dump -f...
    (BackupPC_dump,26355)
    (BackupPC_dump,26874)
    (BackupPC_dump,26915)
    BackupPC_dump,27716 /tools/BackupPC-3.1.0beta0/bin/BackupPC_dump ...
      (ssh,26219)
      (ssh,26855)
      (ssh,26883)
      ssh,27681 -q -x -l backup -o ServerAliveInterval=30 ...

where ()'s processes are defunct.

stracing the ssh that should be the server side of the rsync client
above produces:

  Process 27681 attached - interrupt to quit
  select(7, [3 4], [], NULL, {28, 254000}) = 0 (Timeout)
  select(7, [3 4], [3], NULL, {30, 0})    = 1 (out [3], left {30, 0})
  write(3,
    "-\310\375\373\356\377\214\1^\310\335\266\377a\326\31v\260"..., 64) =
    64
  select(7, [3 4], [], NULL, {30, 0})     = 1 (in [3], left {29,
     990000})
  read(3, "GF\314\303\230e\23\272f\372\212#J\204sR\205\30\266\v\201"...,
     8192) = 32
  select(7, [3 4], [], NULL, {30, 0})     = 0 (Timeout)
  select(7, [3 4], [3], NULL, {30, 0})    = 1 (out [3], left {30, 0})
  write(3, "\350\307\213\306\263\6\225\240\32}\247p\32\345f;qo\33h"...,
     64) = 64
  select(7, [3 4], [], NULL, {30, 0})     = 1 (in [3], left {29,
     991000})
  read(3, "\233\217(\244\6\216H\313\263\326\221\317\230\4\337\240"...,
     8192) = 32
  select(7, [3 4], [], NULL, {30, 0}

So it looks like the ssh client is alive and sending/receiving data
(probably server keepalives, but I am not sure).

An strace of the parent BackupPC_dump at pid 27716 shows:

  concord.rouilj [~/develop] 779> sudo strace -p 27716
  Process 27716 attached - interrupt to quit
  select(16, [11], NULL, [11], NULL

and that's it. Top shows no activity for the BackupPC_dump, or rsync
processes in this tree.

To try to eliminate the remote rsync/sudo and ssh connection as a
culprit, I ran:

  sudo -u backup /usr/bin/rsync --rsync-path 'sudo /usr/bin/rsync' -vv
    --numeric-ids --perms --owner --group -D --links --hard-links --times
    --block-size=2048 --recursive --one-file-system --checksum-seed=32761
    --ignore-times vpn01.psm1:/usr/local/ .

on the server to get an ssh to the client system using the backup user
and starting the rsync on the client (in --server --sender mode) via
sudo. The process tree (partial) is:

  root     22148 22146  2 16:55 ?        00:00:00 sesh /usr/bin/rsync
    --server --sender -vvlHogDtprIx -B2048 --checksum-seed=32761
    --numeric-ids . /usr/local/
  root     22149 22148  2 16:55 ?        00:00:00 /usr/bin/rsync
    --server --sender -vvlHogDtprIx -B2048 --checksum-seed=32761
    --numeric-ids . /usr/local/

The output from the rsync process is:

  opening connection using ssh vpn01.psm1 "sudo /usr/bin/rsync" --server
  --sender -vvlHogDtprIx -B2048 --checksum-seed=32761 --numeric-ids
  . /usr/local/ 
  receiving file list ...
  [sender] expand file_list to 131072 bytes, did move
  done
  delta transmission enabled
  ./
  bin/
  bin/envdir -> /command/envdir
  bin/envuidgid -> /command/envuidgid
    [... more files/dirs etc elided]
  src/fastforward-0.51/wait_pid.o
  src/fastforward-0.51/warn-auto.sh
  total: matches=0  tag_hits=0  false_alarms=0 data=1658996

  sent 6736 bytes  received 1682139 bytes  675550.00 bytes/sec
  total size is 1659279  speedup is 0.98

which completes in a few seconds. So I have a nice fast rsync of a
4.2MB of data using sudo to start the remote rsync if I use the rsync
program.I ran it 20 times (in an empty directory), it ran the same
every time. The BackupPC trigger backup however has hung every time.

So does anybody have an rsync client written using File::RsyncP? Or
should I craft up one of my own to try this test and see if it's a
deadlock condition or something in File::RsyncP.

Other ideas on what could cause this and how to troubleshoot it?

-- 
				-- rouilj

John Rouillard
System Administrator
Renesys Corporation
603-643-9300 x 111

Thread: [BackupPC-users] Hanging rsync backup on /usr/local

backuppc-users