|
From: Jeremy F. <je...@go...> - 2005-03-17 16:21:58
|
Julian Seward wrote:
>But it exits normally when run natively, and it also works fine on 2.2.0.
>
>
That doesn't preclude a race or some timing problem in their code. Does
it happen with other tools?
>Is FC3 LinuxThreads or NPTL ? If NPTL, do you have a LinuxThreads
>system you can try reproducing this on? Is setting LD_ASSUME_KERNEL=2.4.1
>really exactly the same as running it on a LinuxThreads-only system?
>
>
Well, it uses the same libpthread.so. The big difference is that it's
using a 2.6 kernel. I assume SuSE 9.1 is using some 2.4 kernel.
>>> Hard to tell, really. To be honest, it looks like an application bug.
>>> Two threads remain; thread 2 is the LinuxThreads manager thread, so it
>>> isn't going to go away until the other thread, thread 5, dies. Thread 5
>>> seems to just be looping waiting for an FD which never happens. It
>>> would be interesting to attach strace to thread 5 in this state to see
>>> what FD it is actually waiting on.
>>
>>
>
>It looks like fd 12.
>
> poll([{fd=12, events=POLLIN}], 1, 1000) = 0
> rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
> gettid() = 10505
> read(1017, "T", 2) = 1
> getpid() = 10505
> write(1016, "SYSCALL[10505,5](168) --> 0 (0x0"..., 34) = 34
> getpid() = 10505
> write(1016, "SYSCALL[10505,5]( 78):", 22) = 22
> write(1016, "sys_gettimeofday ( 0xAEFFFA04, 0"..., 36) = 36
> gettimeofday({1111060117, 380070}, NULL) = 0
> write(1016, " --> 0 (0x0)\n", 13) = 13
> getpid() = 10505
> write(1016, "SYSCALL[10505,5](168) mayBlock:", 31) = 31
> write(1016, "sys_poll ( 0xAEFFF94C, 1, 1000 )"..., 33) = 33
> write(1016, " --> ...\n", 9) = 9
> gettid() = 10505
> write(1018, "T", 1) = 1
> rt_sigprocmask(SIG_SETMASK, [RTMIN RT_31], ~[ILL TRAP BUS FPE KILL SEGV
>STOP], 8) = 0
> poll(
>
>In the (non-truncated version of the) log I sent yesterday,
>fd 12 is used many times (open, mmap, close). The last
>place it appears to have been *created* is
>
>SYSCALL[4719,1](102) mayBlock:sys_socketcall ( 1, 0xAFEFCD40 ) --> ...
>SYSCALL[4719,1](102) --> 12 (0xC)
>
>The last place I can see it referenced is:
>
>SYSCALL[4743,5]( 54) mayBlock:sys_ioctl ( 12, 0x541B (type=54, nr=1B, size=0),
>0x2D ) --> ...
>SYSCALL[4743,5]( 54) --> 0 (0x0)
>SYSCALL[4743,5]( 3) mayBlock:sys_read ( 12, 0xAEFFF0F4, 128 ) --> ...
>SYSCALL[4743,5]( 3) --> 128 (0x80)
>SYSCALL[4743,5]( 54) mayBlock:sys_ioctl ( 12, 0x541B (type=54, nr=1B, size=0),
>0x2D ) --> ...
>SYSCALL[4743,5]( 54) --> 0 (0x0)
>
>So I'm none the wiser.
>
>Ioctl 0x541B is FIONREAD.
>
>
I guess we need to find out what's at the other end of the socket. Just
after its creation, what other socket operations happen on it? It would
be useful to have strace output, since it gives more detail about what
the syscall args are.
Are there any other processes sitting around which it might be trying to
talk to? "lsof" might give a clue.
>I'd prefer not to ship 2.4.0 with this bug in, if we can resolve it
>relatively quickly. What else can I do to help you repro it?
>
>
This doesn't strike me as a showstopper bug: it seems easy to work
around (you can just ^C the process, yes?), it doesn't crash and it
doesn't seem to be affecting many people. If we can track it down in
the near future and the fix is a one-liner then OK, but otherwise I
think 2.4.0 is cooked.
J
|