Thread: [Filterproxy-devel] Re: FilterProxy BUG
Brought to you by:
mcelrath
|
From: Bob M. <mce...@dr...> - 2002-05-02 15:58:03
|
Well that's interesting...
This should never happen and is probably a bug in HTTP::Daemon or
IO::Socket. The $daemon->accept method should wait forever.
This may seem unrelated...but the only thing I can think of is if you're
exhausting the available sockets or something. Somehow the socket is
getting closed forcibly. (maybe exhausting a per-user process or socket
limit?) This seems reasonable if you reload a page with lots of images
many times -- lots of FilterProxy activity.
Try adding this somewhere inside data_handler in FilterProxy.pl:
if(!defined $client->connected()) {
die("Client aborted download");
}
This should close outgoing connections when you hit "stop". Do you hit
reload repeatedly, and wait for the page to finish each time? Or do you
hit reload without waiting for the page to finish? (the above patch
might help in the latter case)
Can you tell me how many FilterProxy processes there are just before it
crashes?
ps aux | grep FilterProxy
and network connections?=20
netstat -anp | grep FilterProxy
And what is the version of LWP and HTTP::Daemon that you have installed?
My only other suggestion is to upgrade your perl, as this could be
caused by a perl bug (and let me know if that works...).
Cheers,
-- Bob
Guillaume Morin [gui...@mo...] wrote:
> Hi Bob,
>=20
> First I'd like to thank for FilterProxy which is pretty cool piece of
> software. I've found a bug in FilterProxy, here is a debug log :
>=20
> [10389 Thu May 2 15:55:44 2002] [ERROR] Exiting outside main loop
> (BUG!)
> [11027 Thu May 2 15:55:44 2002] [Perl WARNING] Use of uninitialized
> value at /usr/lib/perl5/5.00503/i386-linux/Socket.pm line 295.
> [11027 Thu May 2 15:55:44 2002] Use of uninitialized value at
> /usr/lib/perl5/5.00503/i386-linux/Socket.pm line 295.
> FilterProxy::__ANON__('Use of uninitialized value at
> /usr/lib/perl5/5.00503/i386-linux/...') called at
> /usr/lib/perl5/5.00503/i386-linux/Socket.pm line 295
> Socket::sockaddr_in(undef) called at ./FilterProxy.pl line 534
> FilterProxy::handler('HTTP::Daemon::ClientConn=3DGLOB(0x85b56a0)')
> called at ./FilterProxy.pl line 330
> [11027 Thu May 2 15:55:44 2002] [Perl ERROR] Bad arg length for
> Socket::unpack_sockaddr_in, length is 0, should be 16 at
> /usr/lib/perl5/5.00503/i386-linux/Socket.pm line 295.
> [11027 Thu May 2 15:55:44 2002] Bad arg length for
> Socket::unpack_sockaddr_in, length is 0, should be 16 at
> /usr/lib/perl5/5.00503/i386-linux/Socket.pm line 295.
> FilterProxy::__ANON__('Bad arg length for
> Socket::unpack_sockaddr_in, length is 0, shou...') called at
> /usr/lib/perl5/5.00503/i386-linux/Socket.pm line 295
> Socket::sockaddr_in(undef) called at ./FilterProxy.pl line 534
> FilterProxy::handler('HTTP::Daemon::ClientConn=3DGLOB(0x85b56a0)')
> called at ./FilterProxy.pl line 330
>=20
> - There is no other mentions of 11027 in the log)
> - 10389 is of course the main process
>=20
> I think the problem in 11027 causes the BUG in the parent (reported
> first, but I guess it is a race condition);
>=20
> The problem can be triggered only with IE, you just have to load several
> times a page with images.
>=20
> The patch I suggest, untested unfortunately, is
>=20
> 332: my($peername) =3D getpeername($client);
> + next if (! $peername);
>=20
> I will be able to test it tomorrow. I'll report if it works.=20
> If you think this patch is wrong and/or you have another idea, please
> write to me.
>=20
> TIA. Regards,
>=20
> --=20
> Guillaume Morin <gui...@mo...>
>=20
> Support the Debian Project (http://www.debian.org/)
-- Bob
Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics
|
|
From: Guillaume M. <gui...@mo...> - 2002-05-06 10:05:20
|
Dans un message du 02 May à 10:57, Bob McElrath écrivait :
> This may seem unrelated...but the only thing I can think of is if you're
> exhausting the available sockets or something. Somehow the socket is
> getting closed forcibly. (maybe exhausting a per-user process or socket
> limit?) This seems reasonable if you reload a page with lots of images
> many times -- lots of FilterProxy activity.
I don't think so. I've done a strace of the main processe. I saw no
signal or errors related to limits. The only error was getpeername
returning ENOTCONN, just as Perl reports.
> Try adding this somewhere inside data_handler in FilterProxy.pl:
>
> if(!defined $client->connected()) {
> die("Client aborted download");
> }
>
> This should close outgoing connections when you hit "stop".
I tried this. It happens sometimes (I've added a logger call), but the
main process still dies.
> Do you hit
> reload repeatedly, and wait for the page to finish each time? Or do you
> hit reload without waiting for the page to finish? (the above patch
> might help in the latter case)
The crash happens when you load a page and click stop.
> Can you tell me how many FilterProxy processes there are just before it
> crashes?
> ps aux | grep FilterProxy
> and network connections?
> netstat -anp | grep FilterProxy
The results are very different. It varies from about 5 process or
connections to 40.
> And what is the version of LWP and HTTP::Daemon that you have installed?
It is LWP 1.00 and libwww 5.43 (I tried to upgrade to LWP 5.47, but it
did not help)
> My only other suggestion is to upgrade your perl, as this could be
> caused by a perl bug (and let me know if that works...).
Unfortunately, I cannot do that. I must keep that version atm :-(.
Thanks for your help.
Regards,
--
Guillaume Morin <gui...@mo...>
|
|
From: Guillaume M. <gui...@mo...> - 2002-05-06 12:39:44
|
Hi,
I've tried to upgrade most related modules and to remove the hook on
$client->connected. It still crashes but I can't
see any errors in the debug log. I only get the "[ERROR] Exiting outside
main loop (BUG!)" message. But no Perl errors. This is very weird.
The strace shows this :
connect(8, {sin_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("127.0.0.1")}}, 16) = 0
send(8, "\204>\1\0\0\1\0\0\0\0\0\0\00212\0010\00216\003172\7in-"..., 42,
0) = 42
time(NULL) = 1020688418
poll([{fd=8, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(8,
"\204>\205\203\0\1\0\0\0\1\0\0\00212\0010\00216\003172\7"..., 1024, 0,
{sin_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("127.0.0.1")}}, [16]) = 90
close(8) = 0
fork() = 16231
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
close(7) = 0
munmap(0x401ec000, 4096) = 0
close(7) = -1 EBADF (Bad file descriptor)
munmap(0x401eb000, 4096) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
accept(6, 0xbffff714, [16]) = -1 ECONNRESET (Connection
reset by peer)
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
time([1020688418]) = 1020688418
write(5, "[14919 Mon May 6 14:33:38 2002]"..., 74) = 74
write(1, "[14919 Mon May 6 14:33:38 2002]"..., 74) = 74
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
close(6) = 0
munmap(0x401c7000, 4096) = 0
close(6) = -1 EBADF (Bad file descriptor)
munmap(0x401c6000, 4096) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
close(4) = 0
munmap(0x40194000, 4096) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
munmap(0x401c8000, 4096) = 0
munmap(0x401c5000, 4096) = 0
_exit(0) = ?
The write text is the BUG message. As you can see, the problem seems to
be trigerred by an unsucessful call to accept.
Thanks for your help.
--
Guillaume Morin <gui...@mo...>
A friend in need is a friend indeed. A friend who bleeds is better.
My friend confessed, she passed the test. We will never sever.
(Placebo)
|
|
From: Guillaume M. <gui...@mo...> - 2002-05-07 10:59:41
|
Hi Bob,
I tried to upgrade perl to 5.6.1. It did not change the problem.
getpeername still returns undef in some cases and this triggers an EPIPE
in accept which kills the main loop.
If you have any other ideas, please tell me.
--
Guillaume Morin <gui...@mo...>
People get the operating system they deserve.
|
|
From: Bob M. <mce...@dr...> - 2002-05-07 14:48:32
|
Guillaume Morin [gui...@mo...] wrote:
> Hi Bob,
>=20
> I tried to upgrade perl to 5.6.1. It did not change the problem.
> getpeername still returns undef in some cases and this triggers an EPIPE
> in accept which kills the main loop.
>=20
> If you have any other ideas, please tell me.
Well I'm stumped. And bugs I can't reproduce are very hard to debug. :(
But here are a few suggestions:
1) If it's dying because getpeername is returning undef, why is
getpeername returning undef? getpeername is a system call
(/usr/include/sys/socket.h) What operating system/libc are you
using?
2) $daemon->accept is returning false or undef (line 295).
IO::Socket (parent of HTTP::Daemon) is supposed to return undef upon
'failure'. The only failure I can see is if the port it is binding
to is closed, or timeout, and clearly the latter is not happening.
Under some circumstances (timeout) IO::Socket::accept returns an
error code in $@. Try printing out $@ in the "Exiting outside main
loop (BUG!)" message.
3) It should be possible to work around this bug by re-initializing
the HTTP::Daemon object. (But I'd rather fix it! ;) Add a big
while loop around the main loop and do this at the end of it:
$daemon =3D undef;
$daemon =3D new HTTP::Daemon LocalAddr =3D> $HOSTNAME,=20
LocalPort =3D> $LISTEN_PORT,
Reuse =3D> 1, Listen =3D> 40
or croak "HTTP::Daemon failed to initialize: $!\n"
. "Is $HOSTNAME:$LISTEN_PORT correct?";
The latter is copied from around line 189.
4) I just noticed the 'Listen =3D> 40' paremter above. The man page
says:
Listen Queue size for listen
try increasing it?
5) The man page also says that the 'Reuse' parameter is depreciated,
in favor of 'ReuseAddr' and 'ReusePort'. Try setting those two to
one instead, and drop the 'Reuse =3D> 1' parameter?
Good Luck,
-- Bob
Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics
|
|
From: Guillaume M. <gui...@mo...> - 2002-05-07 15:45:03
|
Dans un message du 07 May à 9:48, Bob McElrath écrivait :
> Well I'm stumped. And bugs I can't reproduce are very hard to debug. :(
I know, even I can reproduce it, I can't fix it :-)
> But here are a few suggestions:
> 1) If it's dying because getpeername is returning undef, why is
> getpeername returning undef? getpeername is a system call
> (/usr/include/sys/socket.h) What operating system/libc are you
> using?
It is Linux 2.2.14 (tried too with .19) and glibc 2.1.3. Getpeername can
return undef, if the syscall fails (here with ENOTCONN). If the peer has
shutdown the connection, it looks logical to me.
> 2) $daemon->accept is returning false or undef (line 295).
> IO::Socket (parent of HTTP::Daemon) is supposed to return undef upon
> 'failure'. The only failure I can see is if the port it is binding
> to is closed, or timeout, and clearly the latter is not happening.
> Under some circumstances (timeout) IO::Socket::accept returns an
> error code in $@. Try printing out $@ in the "Exiting outside main
> loop (BUG!)" message.
Yes, but accept(2) can't fail for other reasons that you should test
such as ECONNRESET. After the getpeername error, I get EPIPE, I tried to
launch accept again, it seems to trigger endless EPIPE errors.
I do not know if you should ignore that one ... But there is surely a
list of accept(2) errors that you want to ignore.
> 3) It should be possible to work around this bug by re-initializing
> the HTTP::Daemon object. (But I'd rather fix it! ;) Add a big
> while loop around the main loop and do this at the end of it:
> $daemon = undef;
> $daemon = new HTTP::Daemon LocalAddr => $HOSTNAME,
> LocalPort => $LISTEN_PORT,
> Reuse => 1, Listen => 40
> or croak "HTTP::Daemon failed to initialize: $!\n"
> . "Is $HOSTNAME:$LISTEN_PORT correct?";
> The latter is copied from around line 189.
Yes I'd prefer to fix it too. Unfortunately, This does not work since
the port is already bound. I tried to do a $daemon->close before that,
but it did not help.
> 4) I just noticed the 'Listen => 40' paremter above. The man page
> says:
> Listen Queue size for listen
> try increasing it?
Unfortunately, it does not change the behavior :-(
> 5) The man page also says that the 'Reuse' parameter is depreciated,
> in favor of 'ReuseAddr' and 'ReusePort'. Try setting those two to
> one instead, and drop the 'Reuse => 1' parameter?
I tried, but since Linux does not have the SO_REUSEPORT socket options.
There is no changes :-(
Anyway, thanks a lot for your help.
Regards,
--
Guillaume Morin <gui...@mo...>
I am the saddest kid in grade number two
(Lisa Simpsons)
|
|
From: Bob M. <mce...@dr...> - 2002-05-07 16:10:49
|
Guillaume Morin [gui...@mo...] wrote:
> Dans un message du 07 May =E0 9:48, Bob McElrath =E9crivait :
> > Well I'm stumped. And bugs I can't reproduce are very hard to debug. :(
>=20
> I know, even I can reproduce it, I can't fix it :-)
>=20
> > But here are a few suggestions:
> > 1) If it's dying because getpeername is returning undef, why is
> > getpeername returning undef? getpeername is a system call
> > (/usr/include/sys/socket.h) What operating system/libc are you
> > using?
>=20
> It is Linux 2.2.14 (tried too with .19) and glibc 2.1.3. Getpeername can
> return undef, if the syscall fails (here with ENOTCONN). If the peer has
> shutdown the connection, it looks logical to me.
The peer cannot shut down the connection. This is a "listen" socket,
and should stay open when there are no connections.
The syscalls getpeername and accept are part of glibc.
> > 2) $daemon->accept is returning false or undef (line 295).
> > IO::Socket (parent of HTTP::Daemon) is supposed to return undef upon
> > 'failure'. The only failure I can see is if the port it is binding
> > to is closed, or timeout, and clearly the latter is not happening.
> > Under some circumstances (timeout) IO::Socket::accept returns an
> > error code in $@. Try printing out $@ in the "Exiting outside main
> > loop (BUG!)" message.
>=20
> Yes, but accept(2) can't fail for other reasons that you should test
> such as ECONNRESET. After the getpeername error, I get EPIPE, I tried to
> launch accept again, it seems to trigger endless EPIPE errors.
This should not be possible.
The secondary connections (the HTTP::Daemon::ClientConn object returned
by accept()) should be resettable. But the listen socket should not be
resettable.
> I do not know if you should ignore that one ... But there is surely a
> list of accept(2) errors that you want to ignore.
Hmmm good point. I don't see how to get the linux error in perl. All
the perl man pages just say it returns undef.
Do you know where to get this error code? Maybe try printing out '$!'?
> > 3) It should be possible to work around this bug by re-initializing
> > the HTTP::Daemon object. (But I'd rather fix it! ;) Add a big
> > while loop around the main loop and do this at the end of it:
> > $daemon =3D undef;
> > $daemon =3D new HTTP::Daemon LocalAddr =3D> $HOSTNAME,=20
> > LocalPort =3D> $LISTEN_PORT,
> > Reuse =3D> 1, Listen =3D> 40
> > or croak "HTTP::Daemon failed to initialize: $!\n"
> > . "Is $HOSTNAME:$LISTEN_PORT correct?";
> > The latter is copied from around line 189.
>=20
> Yes I'd prefer to fix it too. Unfortunately, This does not work since
> the port is already bound. I tried to do a $daemon->close before that,
> but it did not help.
The accept(2) man pages says to retry when it gives odd errors. Maybe
rearrange that main loop:
while(1) {
my $client =3D $daemon->accept;
next unless(defined $client);
...
}
Maybe throw a sleep() in there so it doesn't consume 100% of the cpu if
the network goes down (or something).
Cheers,
-- Bob
Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics
|
|
From: Guillaume M. <gui...@mo...> - 2002-05-07 16:44:22
|
Dans un message du 07 May à 11:09, Bob McElrath écrivait :
> > It is Linux 2.2.14 (tried too with .19) and glibc 2.1.3. Getpeername can
> > return undef, if the syscall fails (here with ENOTCONN). If the peer has
> > shutdown the connection, it looks logical to me.
>
> The peer cannot shut down the connection. This is a "listen" socket,
> and should stay open when there are no connections.
I do not see how the listen socket is concerned. You call getpeername on
connected socket which are not listen anymore. Shutting down the
connection would trigger ENOTCONN, I do not see what is surprising.
> The syscalls getpeername and accept are part of glibc.
I do not know what is your point. Sure, the glibc contains hooks for the
syscalls but the syscalls code is definitely in the kernel.
> > Yes, but accept(2) can't fail for other reasons that you should test
> > such as ECONNRESET. After the getpeername error, I get EPIPE, I tried to
> > launch accept again, it seems to trigger endless EPIPE errors.
>
> This should not be possible.
I thought so, but it just happens here :-(
> > I do not know if you should ignore that one ... But there is surely a
> > list of accept(2) errors that you want to ignore.
>
> Hmmm good point. I don't see how to get the linux error in perl. All
> the perl man pages just say it returns undef.
>
> Do you know where to get this error code? Maybe try printing out '$!'?
yes if you use $! and compare it to the constants define in POSIX qw/:errno_h/
I think you should restart after at least ECONNRESET, EAGAIN, EINTR.
> The accept(2) man pages says to retry when it gives odd errors. Maybe
> rearrange that main loop:
> while(1) {
> my $client = $daemon->accept;
> next unless(defined $client);
> ...
> }
>
> Maybe throw a sleep() in there so it doesn't consume 100% of the cpu if
> the network goes down (or something).
I've basically done this. But after the first EPIPE error, all sucessive
calls were EPIPE (I tried this during one minute or so).
--
Guillaume Morin <gui...@mo...>
Why critize what you don't understand ? (Sepultura)
|
|
From: Bob M. <mce...@dr...> - 2002-05-07 18:32:03
|
Guillaume Morin [gui...@mo...] wrote:
> Dans un message du 07 May =E0 11:09, Bob McElrath =E9crivait :
> > > It is Linux 2.2.14 (tried too with .19) and glibc 2.1.3. Getpeername =
can
> > > return undef, if the syscall fails (here with ENOTCONN). If the peer =
has
> > > shutdown the connection, it looks logical to me.
> >=20
> > The peer cannot shut down the connection. This is a "listen" socket,
> > and should stay open when there are no connections.
>=20
> I do not see how the listen socket is concerned. You call getpeername on
> connected socket which are not listen anymore. Shutting down the
> connection would trigger ENOTCONN, I do not see what is surprising.
Well, if accept() is returning undef (which is what causes the main loop
to exit), then it's the listen socket that has failed. I believe the
getpeername() that is failing is not the one in FilterProxy.pl, but a
getpeername called inside the accept() method somewhere (which is buried
in perl XS). This then causes accept() to return undef, which causes
the main loop to exit. The getpeername() in FilterProxy.pl cannot cause
the main loop to exit.
BTW I didn't see any getpeername syscalls in the strace you sent me. It
just looked like accept returned ECONNRESET.
> > > I do not know if you should ignore that one ... But there is surely a
> > > list of accept(2) errors that you want to ignore.
> >=20
> > Hmmm good point. I don't see how to get the linux error in perl. All
> > the perl man pages just say it returns undef.
> >=20
> > Do you know where to get this error code? Maybe try printing out '$!'?
>=20
> yes if you use $! and compare it to the constants define in POSIX qw/:err=
no_h/
> I think you should restart after at least ECONNRESET, EAGAIN, EINTR.
Ok, I'll add this. But I wish it fixed your problem.
> > The accept(2) man pages says to retry when it gives odd errors. Maybe
> > rearrange that main loop:
> > while(1) {
> > my $client =3D $daemon->accept;
> > next unless(defined $client);
> > ...
> > }
> >=20
> > Maybe throw a sleep() in there so it doesn't consume 100% of the cpu if
> > the network goes down (or something).
>=20
> I've basically done this. But after the first EPIPE error, all sucessive
> calls were EPIPE (I tried this during one minute or so).
Then it seems the logical solution would be to close the Listen socket
and start over (which you tried unsuccessfully). Perhaps close() does
not relinquish the listening socket. Try $daemon->shutdown()? From the
perlfunc man page:
=20
This is useful with sockets when you want to tell the other side
you're done writing but not done reading, or vice versa. It's also a
more insistent form of close because it also disables the file
descriptor in any forked copies in other processes.
Note that for this to work consistently, FilterProxy should kill any
forked children. The shutdown() will probably forcibly close all
sockets to all clients.
Cheers,
-- Bob
Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics
|
|
From: Guillaume M. <gui...@mo...> - 2002-05-08 09:32:54
|
Hi Bob,
Dans un message du 07 May à 13:31, Bob McElrath écrivait :
> Well, if accept() is returning undef (which is what causes the main loop
> to exit), then it's the listen socket that has failed. I believe the
> getpeername() that is failing is not the one in FilterProxy.pl, but a
> getpeername called inside the accept() method somewhere (which is buried
> in perl XS). This then causes accept() to return undef, which causes
> the main loop to exit. The getpeername() in FilterProxy.pl cannot cause
> the main loop to exit.
Nope, according to the backtraces that FilterProxy prints, the
getpeername calls are the one in FiterProxy.pl. Even if I kill the child
when getpeername returns undef, the following accept call will return
EPIPE. I do not know why. If you ignore it and just restart accept, at
some point accept will always return EPIPE.
> BTW I didn't see any getpeername syscalls in the strace you sent me. It
> just looked like accept returned ECONNRESET.
Yes, indeed it happened sometimes. This seems to have been fixed when I
upgraded to a more recent libwww-perl.
> Ok, I'll add this. But I wish it fixed your problem.
Same here :-)
> Then it seems the logical solution would be to close the Listen socket
> and start over (which you tried unsuccessfully). Perhaps close() does
> not relinquish the listening socket. Try $daemon->shutdown()? From the
> perlfunc man page:
>
> This is useful with sockets when you want to tell the other side
> you're done writing but not done reading, or vice versa. It's also a
> more insistent form of close because it also disables the file
> descriptor in any forked copies in other processes.
>
> Note that for this to work consistently, FilterProxy should kill any
> forked children. The shutdown() will probably forcibly close all
> sockets to all clients.
Hmm, I missed that method in the Perl manual. I'll try that. I've
basically implemented this in a shell script. I check if there still is
a listen socket, if not I kill all children with a killall and restart
FilterProxy.
Thanks again for your help.
Regards,
--
Guillaume Morin <gui...@mo...>
<Overfiend> canard: thanks
|
|
From: Yann D. <yd...@fr...> - 2002-05-14 11:04:01
|
Hi,
I'm looking at this problem while Guillaume is away.
On Tue, May 07, 2002 at 11:09:59AM -0500, Bob McElrath wrote:
> > > 2) $daemon->accept is returning false or undef (line 295).
> > > IO::Socket (parent of HTTP::Daemon) is supposed to return undef upon
> > > 'failure'. The only failure I can see is if the port it is binding
> > > to is closed, or timeout, and clearly the latter is not happening.
> > > Under some circumstances (timeout) IO::Socket::accept returns an
> > > error code in $@. Try printing out $@ in the "Exiting outside main
> > > loop (BUG!)" message.
> >
> > Yes, but accept(2) can't fail for other reasons that you should test
> > such as ECONNRESET. After the getpeername error, I get EPIPE, I tried to
> > launch accept again, it seems to trigger endless EPIPE errors.
>
> This should not be possible.
>
> The secondary connections (the HTTP::Daemon::ClientConn object returned
> by accept()) should be resettable. But the listen socket should not be
> resettable.
A google search on "accept ECONNRESET" told me that BIND 9 had handled
a similar problem. Comments from the source (BIND 9.2.1) says:
* Try to accept the new connection. If the accept fails with
* EAGAIN or EINTR, simply poke the watcher to watch this socket
* again. Also ignore ECONNRESET, which has been reported to
* be spuriously returned on Linux 2.2.19 although it is not
* a documented error for accept().
A quick search in linux-kernel and linux-net archives did no give
anything useful.
About errors to be "ignored", the accept(2) manpage says:
ERROR HANDLING
Linux accept passes already-pending network errors on the
new socket as an error code from accept. This behaviour
differs from other BSD socket implementations. For reli
able operation the application should detect the network
errors defined for the protocol after accept and treat
them like EAGAIN by retrying. In case of TCP/IP these are
ENETDOWN, EPROTO, ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUN
REACH, EOPNOTSUPP, and ENETUNREACH.
I suppose that perl's accept() call takes care of this (or should) for
portability.
The EPIPE looks even more abnormal, and might be a result of perl's
accept() *not* handling this poorly-documented ECONNRESET case. It's
probably useful to seek confirmation of this in perl lists.
--
Yann Dirson <Yan...@fr...> http://www.alcove.com/
Technical support manager Responsable de l'assistance technique
Senior Free-Software Consultant Consultant senior en Logiciels Libres
Debian developer (di...@de...) Développeur Debian
|