|
From: John R. <rou...@re...> - 2007-10-26 17:01:21
|
Hi all:
Figured I should start a new thread on this as it is a separate
problem from the SIGPIPE issue.
I have a backup hanging until the SIGALARM triggers some 20 hours
later.
The (partial) config is:
$Conf{XferMethod} = 'rsync';
$Conf{RsyncClientPath} = '/usr/bin/rsync';
$Conf{RsyncClientCmd} = '$sshPath -q -x -l backup \
-o ServerAliveInterval=30 $host sudo $rsyncPath $argList+';
$Conf{RsyncShareName} = [
'/etc',
'/var/bak',
'/var/log',
'/usr/local',
];
$Conf{RsyncArgs} = [
#
# Do not edit these!
#
'--numeric-ids',
'--perms',
'--owner',
'--group',
'-D',
'--links',
'--hard-links',
'--times',
'--block-size=2048',
'--recursive',
'--one-file-system',
#
# Rsync >= 2.6.3 supports the --checksum-seed option
# which allows rsync checksum caching on the server.
# Uncomment this to enable rsync checksum caching if
# you have a recent client rsync version and you want
# to enable checksum caching.
#
'--checksum-seed=32761',
#
# Add additional arguments here
#
];
The server side uses (a cpan2rpm locally built)
perl-File-RsyncP-0.68-1, with perl:
Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
Platform:
osname=linux, osvers=2.6.9-42.elsmp,
archname=i386-linux-thread-multi
uname='linux build-i386 2.6.9-42.elsmp #1 smp sat aug 12 09:39:11
cdt 2006 i686 i686 i386 gnulinux '
config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
-mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
-Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat,
Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
-Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
-Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
-Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
-Dinstallusrbinperl -Ubincompat5005 -Uversiononly
-Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
5.8.0'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
and runs Centos 4.4.
The client side box is a CentOS release 4.5 (Final) running
rsync-2.6.3-1 and the rsync process is run via sudo and results in ps
output of:
root 31103 3749 0 14:59 ? 00:00:00 sshd: backup [priv]
backup 31105 31103 0 14:59 ? 00:00:00 sshd: backup@notty
root 31106 31105 0 14:59 ? 00:00:00 sesh /usr/bin/rsync
--server --sender --numeric-ids --perms --owner --group -D --links
--hard-links --times --block-size=2048 --recursive --one-file-system
--checksum-seed=32761 --ignore-times . /usr/local/
root 31107 31106 0 14:59 ? 00:00:00 /usr/bin/rsync
--server --sender --numeric-ids --perms --owner --group -D --links
--hard-links --times --block-size=2048 --recursive --one-file-system
--checksum-seed=32761 --ignore-times . /usr/local/
and an strace of the rsync (pid 31107) looks like it is waiting for
input:
[rouilj@vpn01 ~]$ sudo strace -p 31107
Process 31107 attached - interrupt to quit
select(1, [0], [], NULL, {20, 441000}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0} ...
On the server side I have:
BackupPC_dump,26148 /tools/BackupPC-3.1.0beta0/bin/BackupPC_dump -f...
(BackupPC_dump,26355)
(BackupPC_dump,26874)
(BackupPC_dump,26915)
BackupPC_dump,27716 /tools/BackupPC-3.1.0beta0/bin/BackupPC_dump ...
(ssh,26219)
(ssh,26855)
(ssh,26883)
ssh,27681 -q -x -l backup -o ServerAliveInterval=30 ...
where ()'s processes are defunct.
stracing the ssh that should be the server side of the rsync client
above produces:
Process 27681 attached - interrupt to quit
select(7, [3 4], [], NULL, {28, 254000}) = 0 (Timeout)
select(7, [3 4], [3], NULL, {30, 0}) = 1 (out [3], left {30, 0})
write(3,
"-\310\375\373\356\377\214\1^\310\335\266\377a\326\31v\260"..., 64) =
64
select(7, [3 4], [], NULL, {30, 0}) = 1 (in [3], left {29,
990000})
read(3, "GF\314\303\230e\23\272f\372\212#J\204sR\205\30\266\v\201"...,
8192) = 32
select(7, [3 4], [], NULL, {30, 0}) = 0 (Timeout)
select(7, [3 4], [3], NULL, {30, 0}) = 1 (out [3], left {30, 0})
write(3, "\350\307\213\306\263\6\225\240\32}\247p\32\345f;qo\33h"...,
64) = 64
select(7, [3 4], [], NULL, {30, 0}) = 1 (in [3], left {29,
991000})
read(3, "\233\217(\244\6\216H\313\263\326\221\317\230\4\337\240"...,
8192) = 32
select(7, [3 4], [], NULL, {30, 0}
So it looks like the ssh client is alive and sending/receiving data
(probably server keepalives, but I am not sure).
An strace of the parent BackupPC_dump at pid 27716 shows:
concord.rouilj [~/develop] 779> sudo strace -p 27716
Process 27716 attached - interrupt to quit
select(16, [11], NULL, [11], NULL
and that's it. Top shows no activity for the BackupPC_dump, or rsync
processes in this tree.
To try to eliminate the remote rsync/sudo and ssh connection as a
culprit, I ran:
sudo -u backup /usr/bin/rsync --rsync-path 'sudo /usr/bin/rsync' -vv
--numeric-ids --perms --owner --group -D --links --hard-links --times
--block-size=2048 --recursive --one-file-system --checksum-seed=32761
--ignore-times vpn01.psm1:/usr/local/ .
on the server to get an ssh to the client system using the backup user
and starting the rsync on the client (in --server --sender mode) via
sudo. The process tree (partial) is:
root 22148 22146 2 16:55 ? 00:00:00 sesh /usr/bin/rsync
--server --sender -vvlHogDtprIx -B2048 --checksum-seed=32761
--numeric-ids . /usr/local/
root 22149 22148 2 16:55 ? 00:00:00 /usr/bin/rsync
--server --sender -vvlHogDtprIx -B2048 --checksum-seed=32761
--numeric-ids . /usr/local/
The output from the rsync process is:
opening connection using ssh vpn01.psm1 "sudo /usr/bin/rsync" --server
--sender -vvlHogDtprIx -B2048 --checksum-seed=32761 --numeric-ids
. /usr/local/
receiving file list ...
[sender] expand file_list to 131072 bytes, did move
done
delta transmission enabled
./
bin/
bin/envdir -> /command/envdir
bin/envuidgid -> /command/envuidgid
[... more files/dirs etc elided]
src/fastforward-0.51/wait_pid.o
src/fastforward-0.51/warn-auto.sh
total: matches=0 tag_hits=0 false_alarms=0 data=1658996
sent 6736 bytes received 1682139 bytes 675550.00 bytes/sec
total size is 1659279 speedup is 0.98
which completes in a few seconds. So I have a nice fast rsync of a
4.2MB of data using sudo to start the remote rsync if I use the rsync
program.I ran it 20 times (in an empty directory), it ran the same
every time. The BackupPC trigger backup however has hung every time.
So does anybody have an rsync client written using File::RsyncP? Or
should I craft up one of my own to try this test and see if it's a
deadlock condition or something in File::RsyncP.
Other ideas on what could cause this and how to troubleshoot it?
--
-- rouilj
John Rouillard
System Administrator
Renesys Corporation
603-643-9300 x 111
|