|
From: John R. <rou...@re...> - 2007-10-26 17:01:21
|
Hi all:
Figured I should start a new thread on this as it is a separate
problem from the SIGPIPE issue.
I have a backup hanging until the SIGALARM triggers some 20 hours
later.
The (partial) config is:
$Conf{XferMethod} = 'rsync';
$Conf{RsyncClientPath} = '/usr/bin/rsync';
$Conf{RsyncClientCmd} = '$sshPath -q -x -l backup \
-o ServerAliveInterval=30 $host sudo $rsyncPath $argList+';
$Conf{RsyncShareName} = [
'/etc',
'/var/bak',
'/var/log',
'/usr/local',
];
$Conf{RsyncArgs} = [
#
# Do not edit these!
#
'--numeric-ids',
'--perms',
'--owner',
'--group',
'-D',
'--links',
'--hard-links',
'--times',
'--block-size=2048',
'--recursive',
'--one-file-system',
#
# Rsync >= 2.6.3 supports the --checksum-seed option
# which allows rsync checksum caching on the server.
# Uncomment this to enable rsync checksum caching if
# you have a recent client rsync version and you want
# to enable checksum caching.
#
'--checksum-seed=32761',
#
# Add additional arguments here
#
];
The server side uses (a cpan2rpm locally built)
perl-File-RsyncP-0.68-1, with perl:
Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
Platform:
osname=linux, osvers=2.6.9-42.elsmp,
archname=i386-linux-thread-multi
uname='linux build-i386 2.6.9-42.elsmp #1 smp sat aug 12 09:39:11
cdt 2006 i686 i686 i386 gnulinux '
config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
-mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
-Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat,
Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
-Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
-Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
-Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
-Dinstallusrbinperl -Ubincompat5005 -Uversiononly
-Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
5.8.0'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
and runs Centos 4.4.
The client side box is a CentOS release 4.5 (Final) running
rsync-2.6.3-1 and the rsync process is run via sudo and results in ps
output of:
root 31103 3749 0 14:59 ? 00:00:00 sshd: backup [priv]
backup 31105 31103 0 14:59 ? 00:00:00 sshd: backup@notty
root 31106 31105 0 14:59 ? 00:00:00 sesh /usr/bin/rsync
--server --sender --numeric-ids --perms --owner --group -D --links
--hard-links --times --block-size=2048 --recursive --one-file-system
--checksum-seed=32761 --ignore-times . /usr/local/
root 31107 31106 0 14:59 ? 00:00:00 /usr/bin/rsync
--server --sender --numeric-ids --perms --owner --group -D --links
--hard-links --times --block-size=2048 --recursive --one-file-system
--checksum-seed=32761 --ignore-times . /usr/local/
and an strace of the rsync (pid 31107) looks like it is waiting for
input:
[rouilj@vpn01 ~]$ sudo strace -p 31107
Process 31107 attached - interrupt to quit
select(1, [0], [], NULL, {20, 441000}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0} ...
On the server side I have:
BackupPC_dump,26148 /tools/BackupPC-3.1.0beta0/bin/BackupPC_dump -f...
(BackupPC_dump,26355)
(BackupPC_dump,26874)
(BackupPC_dump,26915)
BackupPC_dump,27716 /tools/BackupPC-3.1.0beta0/bin/BackupPC_dump ...
(ssh,26219)
(ssh,26855)
(ssh,26883)
ssh,27681 -q -x -l backup -o ServerAliveInterval=30 ...
where ()'s processes are defunct.
stracing the ssh that should be the server side of the rsync client
above produces:
Process 27681 attached - interrupt to quit
select(7, [3 4], [], NULL, {28, 254000}) = 0 (Timeout)
select(7, [3 4], [3], NULL, {30, 0}) = 1 (out [3], left {30, 0})
write(3,
"-\310\375\373\356\377\214\1^\310\335\266\377a\326\31v\260"..., 64) =
64
select(7, [3 4], [], NULL, {30, 0}) = 1 (in [3], left {29,
990000})
read(3, "GF\314\303\230e\23\272f\372\212#J\204sR\205\30\266\v\201"...,
8192) = 32
select(7, [3 4], [], NULL, {30, 0}) = 0 (Timeout)
select(7, [3 4], [3], NULL, {30, 0}) = 1 (out [3], left {30, 0})
write(3, "\350\307\213\306\263\6\225\240\32}\247p\32\345f;qo\33h"...,
64) = 64
select(7, [3 4], [], NULL, {30, 0}) = 1 (in [3], left {29,
991000})
read(3, "\233\217(\244\6\216H\313\263\326\221\317\230\4\337\240"...,
8192) = 32
select(7, [3 4], [], NULL, {30, 0}
So it looks like the ssh client is alive and sending/receiving data
(probably server keepalives, but I am not sure).
An strace of the parent BackupPC_dump at pid 27716 shows:
concord.rouilj [~/develop] 779> sudo strace -p 27716
Process 27716 attached - interrupt to quit
select(16, [11], NULL, [11], NULL
and that's it. Top shows no activity for the BackupPC_dump, or rsync
processes in this tree.
To try to eliminate the remote rsync/sudo and ssh connection as a
culprit, I ran:
sudo -u backup /usr/bin/rsync --rsync-path 'sudo /usr/bin/rsync' -vv
--numeric-ids --perms --owner --group -D --links --hard-links --times
--block-size=2048 --recursive --one-file-system --checksum-seed=32761
--ignore-times vpn01.psm1:/usr/local/ .
on the server to get an ssh to the client system using the backup user
and starting the rsync on the client (in --server --sender mode) via
sudo. The process tree (partial) is:
root 22148 22146 2 16:55 ? 00:00:00 sesh /usr/bin/rsync
--server --sender -vvlHogDtprIx -B2048 --checksum-seed=32761
--numeric-ids . /usr/local/
root 22149 22148 2 16:55 ? 00:00:00 /usr/bin/rsync
--server --sender -vvlHogDtprIx -B2048 --checksum-seed=32761
--numeric-ids . /usr/local/
The output from the rsync process is:
opening connection using ssh vpn01.psm1 "sudo /usr/bin/rsync" --server
--sender -vvlHogDtprIx -B2048 --checksum-seed=32761 --numeric-ids
. /usr/local/
receiving file list ...
[sender] expand file_list to 131072 bytes, did move
done
delta transmission enabled
./
bin/
bin/envdir -> /command/envdir
bin/envuidgid -> /command/envuidgid
[... more files/dirs etc elided]
src/fastforward-0.51/wait_pid.o
src/fastforward-0.51/warn-auto.sh
total: matches=0 tag_hits=0 false_alarms=0 data=1658996
sent 6736 bytes received 1682139 bytes 675550.00 bytes/sec
total size is 1659279 speedup is 0.98
which completes in a few seconds. So I have a nice fast rsync of a
4.2MB of data using sudo to start the remote rsync if I use the rsync
program.I ran it 20 times (in an empty directory), it ran the same
every time. The BackupPC trigger backup however has hung every time.
So does anybody have an rsync client written using File::RsyncP? Or
should I craft up one of my own to try this test and see if it's a
deadlock condition or something in File::RsyncP.
Other ideas on what could cause this and how to troubleshoot it?
--
-- rouilj
John Rouillard
System Administrator
Renesys Corporation
603-643-9300 x 111
|
|
From: John R. <rou...@re...> - 2007-10-26 19:47:09
|
Trying to split theads here. In a prior discussion on the hang issue,
on Fri, Oct 26, 2007 at 11:13:14AM -0500, Les Mikesell wrote:
> John Rouillard wrote:
> >>> [rouilj]
> >>> $Conf{ClientTimeout} = 72000;
> >>>
> >>>which is 20 hours and the sigpipe is occurring
> >>>before then. You'd see sigalarm instead of sigpipe
> >>>if you had a timeout.
> >
> >Something like this I assume:
> >
> [...]
> > create d 755 0/1 12288 src/fastforward-0.51
> > finish: removing in-process file .
> > Child is aborting
> > Done: 17 files, 283 bytes
> > Got fatal error during xfer (aborted by signal=ALRM)
> > Backup aborted by user signal
>
> Yes, that one is a timeout on the backuppc side.
>
> >Also I straced the rsync process on the remote
> >system while it was hung (I assume on whatever
> >occurred after the src/fastforward-0.51) directory
> >and got:
> >
> > rouilj@vpn01 ~]$ ps -ef | grep 6909
> > root 6909 6908 0 Oct25 ? 00:00:00 /usr/bin/rsync
> > --server --sender --numeric-ids --perms --owner --group -D --links
> > --hard-links --times --block-size=2048 --recursive --one-file-system
> > --checksum-seed=32761 --ignore-times . /usr/local/
> > rouilj 10603 10349 0 05:36 pts/0 00:00:00 grep 6909
> > [rouilj@vpn01 ~]$ strace -p 6909
> > attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
> > [rouilj@vpn01 ~]$ sudo strace -p 6909
> > Process 6909 attached - interrupt to quit
> > select(1, [0], [], NULL, {42, 756000}) = 0 (Timeout)
> > select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> > select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> > select(1, [0], [], NULL, {60, 0} <unfinished ...>
> > Process 6909 detached
> >
> >And similar results on the server side
> >process. Maybe a deadlock somewhere? The ssh pipe
> >appeared open. I set it up to forward traffic and
> >was able to pass traffic from the server to the
> >client.
>
> Are these 2 different scenarios (the sigalarm and
> sigpipe)? I don't think I've ever seen a real
> deadlock on a unix/linux rsync although I always got
> them on windows when trying to run rsync under sshd
> (and I'd appreciate knowing the right versions to use
> if that works now). The sigpipe scenario sounded like
> the remote rsync crashed or quit (perhaps not being
> able to handle files >2gigs). This looks like
> something different. Can you start the remote strace
> before the hang so you have a chance of seeing the
> file and activity in progress when the hang occurs?
Ask and you will receive. Here is part of an strace of the
rsync on the client machine. Continuation of long lines are
indented by 2 spaces, line number in parens.
Starts:
execve("/usr/bin/rsync", ["/usr/bin/rsync", "--server", (line 1)
"--sender", "--numeric-ids", "--perms", "--owner",
"--group", "-D", "--links", "--hard-links", "--times",
"--block-size=2048", "--recursive",
"--one-file-system", "--checksum-seed=32761",
"--ignore-times", ...], [/* 16 vars */]) = 0
uname({sys="Linux", node="vpn01.fp.psm1.renesys.com", ...}) = 0
some initialization and then:
lstat64("/usr/local/.", {st_mode=S_IFDIR|0755, st_size=4096,
...}) = 0
chdir("/usr/local") = 0 (line 63)
stat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
mmap2(NULL, 266240, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ef7000
mmap2(NULL, 135168, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ed6000
open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
fstat64(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
getdents64(3, /* 12 entries */, 4096) = 320
lstat64("share", {st_mode=S_IFDIR|0755, st_size=4096, ...})
= 0
open("share", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) =
4
fstat64(4, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
getdents64(4, /* 4 entries */, 4096) = 96
lstat64("share/info", {st_mode=S_IFDIR|0755, st_size=4096,
...}) = 0
open("share/info",
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 5
fstat64(5, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl64(5, F_SETFD, FD_CLOEXEC) = 0
getdents64(5, /* 2 entries */, 4096) = 48
getdents64(5, /* 0 entries */, 4096) = 0
close(5) = 0
lstat64("share/man", {st_mode=S_IFDIR|0755, st_size=4096,
...}) = 0
open("share/man",
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 5
fstat64(5, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl64(5, F_SETFD, FD_CLOEXEC) = 0
getdents64(5, /* 12 entries */, 4096) = 288
it starts walking through the directory tree.
[fl]stats all over the place.
Then file writes:
open("/usr/local/bin/addcr", O_RDONLY|O_LARGEFILE) = 3 (line 635)
fstat64(3, {st_mode=S_IFREG|0755, st_size=4264, ...}) = 0
read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0\3\0\1\0\0\0h\203\4"...,
4264) = 4264
select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left
{60, 0})
write(1,
"\374\17\0\7\2\0\0\0\0\0\0\0\0\10\0\0\2\0\0\0\0\0\0\0\250"...,
4096) = 4096
close(3) = 0
open("/usr/local/bin/argv0", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0755, st_size=9768, ...}) = 0
read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0\3\0\1\0\0\0$\204\4"...,
9768) = 9768
select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left
{60, 0})
write(1,
"\374\17\0\7\1\0\0\0\3\0\0\0X\233\4\10X\v\0\0\4\0\0\0\0"...,
4096) = 4096
select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left
{60, 0})
write(1,
"\374\17\0\7\1\0\0\203\372$\270l\230\4\10\17\204\7\1\0\0"...,
4096) = 4096
close(3) = 0
...
up to:
open("/usr/local/src/fastforward-0.51/wait.h", (line 2781)
O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=295, ...}) = 0
read(3, "#ifndef WAIT_H\n#define WAIT_H\n\ne"..., 295) = 295
close(3) = 0
open("/usr/local/src/fastforward-0.51/wait_pid.c",
O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=709, ...}) = 0
read(3, "#include <sys/types.h>\n#include "..., 709) = 709
close(3) = 0
open("/usr/local/src/fastforward-0.51/wait_pid.o",
O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=896, ...}) = 0
read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\1\0\3\0\1\0\0\0\0\0\0\0"...,
896) = 896
select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left
{59, 999000})
write(1, "\374\17\0\7.sa_handler = 0;\n sa.sa_fla"...,
4096) = 4096
close(3) = 0
open("/usr/local/src/fastforward-0.51/warn-auto.sh",
O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=64, ...}) = 0
read(3, "#!/bin/sh\n# WARNING: This file w"..., 64) = 64
close(3) = 0
select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left (line 2799)
{60, 0})
write(1, "\300\0\0\7\0waitpid\0__errno_location\0er"...,
196) = 196
select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left
{60, 0})
write(1, "\4\0\0\7\377\377\377\377", 8) = 8
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
and toast city.
What's wierd is the select 2 after the close. The mode I see
a lot is:
open file,
stat file
read from file on fd 3
(if needed I assume file is different from one on server
select on 2,
write on fd 2
select on fd 2
write on fd 1
)
close fd 3
back to open of file
before the hang, I see a close, select, write, select,
write, then selects forever (1.5 hours and counting).
Running a sed to eliminate paired open/closes I see only one
other instance of the selects outside of a file open/close
pair. It occurs at line 629 in the trace output:
chdir("/home/backup") = 0 (line 616)
open("/etc/mtab", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=469, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ed5000
read(3, "/dev/mapper/VolGroup00-LogVol01 "..., 4096) = 469
close(3) = 0
munmap(0xb7ed5000, 4096) = 0
open("/proc/meminfo", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ed5000
read(3, "MemTotal: 2074828 kB\nMemFre"..., 1024) = 670
close(3) = 0
munmap(0xb7ed5000, 4096) = 0
select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left (line 629**)
{60, 0})
write(1,
"\374\17\0\7tdio.h\314\3\0\0\206\262a5\1\0\0\0\272\25\n"...,
4096) = 4096
select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left
{60, 0})
write(1, "\1\0\0\7\0", 5) = 5
select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left
{59, 684000})
read(0,
"\2\0\0\0\0\0\0\0\0\10\0\0\2\0\0\0\0\0\0\0\3\0\0\0\0\0\0"...,
8184) = 6724
open("/usr/local/bin/addcr", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0755, st_size=4264, ...}) = 0
read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0\3\0\1\0\0\0h\203\4"...,
4264) = 4264
select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left
{60, 0})
write(1,
"\374\17\0\7\2\0\0\0\0\0\0\0\0\10\0\0\2\0\0\0\0\0\0\0\250"...,
4096) = 4096
close(3) = 0
I can make the whole file available on the web if you or anybody else
want's it. Contact me off list, no sense spamming people.
--
-- rouilj
John Rouillard
System Administrator
Renesys Corporation
603-643-9300 x 111
|
|
From: Les M. <le...@fu...> - 2007-10-26 20:27:34
|
John Rouillard wrote:
>
> open("/usr/local/src/fastforward-0.51/warn-auto.sh",
> O_RDONLY|O_LARGEFILE) = 3
> fstat64(3, {st_mode=S_IFREG|0644, st_size=64, ...}) = 0
> read(3, "#!/bin/sh\n# WARNING: This file w"..., 64) = 64
> close(3) = 0
> select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left (line 2799)
> {60, 0})
> write(1, "\300\0\0\7\0waitpid\0__errno_location\0er"...,
> 196) = 196
> select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left
> {60, 0})
> write(1, "\4\0\0\7\377\377\377\377", 8) = 8
> select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
>
> and toast city.
>
> What's wierd is the select 2 after the close.
There are earlier selects on fd2 that aren't followed by a write. The
real problem is the select on fd1 (stdout) that tells you that a write
would block.
> I can make the whole file available on the web if you or anybody else
> want's it. Contact me off list, no sense spamming people.
I don't think it would help. The question is, why can't you write to
stdout? It should be connected to sshd which should be passing stuff to
the invoking ssh and perl should be consuming it.
--
Les Mikesell
les...@gm...
|
|
From: John R. <rou...@re...> - 2007-10-26 22:20:44
|
On Fri, Oct 26, 2007 at 03:28:28PM -0500, Les Mikesell wrote:
> John Rouillard wrote:
> >
> >open("/usr/local/src/fastforward-0.51/warn-auto.sh",
> > O_RDONLY|O_LARGEFILE) = 3
> >fstat64(3, {st_mode=S_IFREG|0644, st_size=64, ...}) = 0
> >read(3, "#!/bin/sh\n# WARNING: This file w"..., 64) = 64
> >close(3) = 0
> >select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left (line 2799)
> > {60, 0})
> >write(1, "\300\0\0\7\0waitpid\0__errno_location\0er"...,
> > 196) = 196
> >select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left
> > {60, 0})
> >write(1, "\4\0\0\7\377\377\377\377", 8) = 8
> >select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> >select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> >select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> >select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> >select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> >select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> >select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> >select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
> >
> >and toast city.
> >
> >What's wierd is the select 2 after the close.
>
> There are earlier selects on fd2 that aren't followed by a write.
Correct, but they occur before the input file closes.
> The real problem is the select on fd1 (stdout) that tells you that
> a write would block.
>
> >I can make the whole file available on the web if you or anybody else
> >want's it. Contact me off list, no sense spamming people.
>
> I don't think it would help. The question is, why can't you write to
> stdout? It should be connected to sshd which should be passing stuff to
> the invoking ssh and perl should be consuming it.
Do you mean this select?
> >select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left (line 2799)
> > {60, 0})
My C is rusty, but I think that means:
look at no fd's for reading and fd 1 for writing and no fd's for
errors. Time out in 60.000 seconds.
What I am not sure of is why the first argument is 2. I would expect
that if the [1] was [1, 2] with two fd's. Since there is
only one fd in the set (namely fd 1), I would expect the 2 to be 1.
In nay case, the select call returns 1 meaning that there is one
file descriptor ready for writing and it waited 0 seconds to
determine the write handle was ready to be written to. Then the write
occurs:
> >write(1, "\300\0\0\7\0waitpid\0__errno_location\0er"...,
> > 196) = 196
writing 196 bytes.
> >select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left
> > {60, 0})
again indicates that fd 1 is available for writing. an 8 byte write is
done then fd 0 is checked to see if there is anything to read
> >write(1, "\4\0\0\7\377\377\377\377", 8) = 8
> >select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
and there never is anything to read.
So by that point it is waiting for data/info from the server and there
is no data forthcoming. Can you point out where my analysis is wrong?
--
-- rouilj
John Rouillard
System Administrator
Renesys Corporation
603-643-9300 x 111
|
|
From: Les M. <le...@fu...> - 2007-10-26 23:00:21
|
John Rouillard wrote:
> Do you mean this select?
>
>>> select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left (line 2799)
>>> {60, 0})
>
> My C is rusty, but I think that means:
>
> look at no fd's for reading and fd 1 for writing and no fd's for
> errors. Time out in 60.000 seconds.
>
> What I am not sure of is why the first argument is 2. I would expect
> that if the [1] was [1, 2] with two fd's. Since there is
> only one fd in the set (namely fd 1), I would expect the 2 to be 1.
I guess I had that backwards - the first argument is really the highest
numbered fd to consider plus 1.
> again indicates that fd 1 is available for writing. an 8 byte write is
> done then fd 0 is checked to see if there is anything to read
>
>>> write(1, "\4\0\0\7\377\377\377\377", 8) = 8
>>> select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
>
> and there never is anything to read.
>
> So by that point it is waiting for data/info from the server and there
> is no data forthcoming. Can you point out where my analysis is wrong?
Yes, I think that is right. I wonder if that 8-byte write is sitting in
a buffer somewhere. Did this break on previously working machines or
have you always had this problem?
--
Les Mikesell
le...@fu...
|