|
From: <don...@is...> - 2008-12-24 19:11:30
|
I've now seen a real error and the packet sequence looks similar to
what I sent before. My debug code also crashes with this amusing
result:
ignore error
UNIX error 104 (ECONNRESET): Connection reset by peer
*** - PRINT: Despite *PRINT-READABLY*, #<SYSTEM::SIMPLE-OS-ERROR #x20CA5D56>
cannot be printed readably.
> how about you compile clisp with DEBUG_OS_ERROR defined - this way
(defined where?)
> you will see which line in which file has signaled the error. or,
> better yet, configure --with-debug and run clisp under gdb, setting
$ ./configure --with-module=rawsock --cbc --with-debug build-dir
seems to work
$ gdb ../build-dir/clisp
...
warning: not using untrusted file ".gdbinit"
(I had hoped to use the one in src. so I did cd src before gdb.
I guess that didn't work.)
(gdb) break prepare_error
Function "prepare_error" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
I hope that's ok.
Breakpoint 1 (prepare_error) pending.
(gdb) run
Starting program: /root/clisp-2.47/build-dir/clisp
Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0x6dc000
STACK depth: 98206 [0xaf3f00 0xa94088]
I can't tell whether the break is now in effect.
...
[1]> (load "/home/devel/http-forward-test")
...
[../src/stream.d:6143]
ignore error
UNIX error 104 (ECONNRESET): Connection reset by peer
I guess above is the line number of an error that was caught and
ignored -- which is good, since I don't want to have to continue
on every intentionally ignored error that arrives before the one
of interest.
So now I wait for a break, I guess, and then do something like
(gdb) bt
if I understand correctly?
> a break in prepare_error and send the backtrace here. I suspect
> that the error is signaled by listen_char() which is called by
> socket-status to ensure that a whole unicode char is actually
> available if a byte is.
> the code is:
>
> if (FD_ISSET(in_sock,readfds) || (stream_isbuffered(sock) & bit(1)))
> rd = (char_p ? listen_char(sock) : listen_byte(sock));
>
> there is no error on in_sock, that has been checked, so, apparently, there is a
> race condition here: an error (ECONNRESET) arrives _after_ select() but
> _before_ listen_char() could finish it's work.
This seems very plausible. I see no obvious difference between many
packet traces that don't cause the error and the few that do.
The timestamps reported by tcpdump show time to the microsec, and I
see many cases where the time between the ack creating the connection
and the reset closing it are ~1ms with no error, one of ~.2 sec (no
error), the one with the error:
08:35:20.354589 for the ack packet opening the connection
08:35:20.359225 for the reset packet closing it
i.e., 5ms.
|