|
From: Leif M. <lei...@ta...> - 2010-11-24 08:27:04
|
Christopher & Johannes,
Thank you for the detailed report. This is not a problem which I am
aware of to date. We set up a 2 CPU VM today with cent OS and are
running some tests locally. So far we have not been able to reproduce
the problem.
>From what you said, it sounds like this problem only happens on
shutdown, and can be reproduced easily?
Could you please try and reproduce this with the following settings
included in the wrapper.conf and then send me the resulting
wrapper.log file directly? It would be a bit large for the list.
wrapper.debug=true
wrapper.state_output=TRUE
It sounds like the nanosleep implementation is getting stuck for some
reason. It is interesting that an interrupt caused by a signal is
waking it up. We do not currently have a way to change to usleep
without recompiling.
Cheers,
Leif
On Tue, Nov 23, 2010 at 7:31 PM, Christopher Taylor <ct...@co...> wrote:
> Hi all,
>
>
>
> I’m forwarding this on behalf of Johannes, he seems to be having problems
> with his list subscription. I apologize if this is a double post, I forgot
> to add the subject on the first mail.
>
>
>
> --- snip ---
>
>
>
> I am using ServiceWrapper (stable version 3.5.6) under 32bit CentOS 5 in a
> VM with 2 CPUs.
>
>
>
> Whenever I stopped a Java program with the wrapper (different
> configurations), the wrapper stopped pinging the JVM after some seconds
> (which eventually leads to the JVM ending itself because it does not receive
> ping packages from the wrapper any more).
>
>
>
> The last system call that never returns according to strace is always
> nanosleep:
>
>
>
> gettimeofday({1290172014, 465559}, NULL) = 0
>
> read(5, 0x8895070, 1024) = -1 EAGAIN (Resource temporarily
> unavailable)
>
> gettimeofday({1290172014, 465610}, NULL) = 0
>
> recv(7, 0xbffa98cb, 1, 0) = -1 EAGAIN (Resource temporarily
> unavailable)
>
> waitpid(13065, 0xbffa98a8, WNOHANG|WSTOPPED) = 0
>
> nanosleep({0, 100000000}, NULL) = 0
>
> gettimeofday({1290172014, 566505}, NULL) = 0
>
> read(5, 0x8895070, 1024) = -1 EAGAIN (Resource temporarily
> unavailable)
>
> gettimeofday({1290172014, 566552}, NULL) = 0
>
> recv(7, 0xbffa98cb, 1, 0) = -1 EAGAIN (Resource temporarily
> unavailable)
>
> waitpid(13065, 0xbffa98a8, WNOHANG|WSTOPPED) = 0
>
> nanosleep({0, 100000000},
>
>
>
> If I call pstack on the wrapper, you can see that two threads currently hang
> in the nanosleep method:
>
>
>
> Thread 2 (Thread -1210287216 (LWP 13115)):
>
> #0 0x00994410 in __kernel_vsyscall ()
>
> #1 0x00d22506 in __nanosleep_nocancel () from /lib/libpthread.so.0
>
> #2 0x0805b0d0 in wrapperSleep ()
>
> #3 0x0805b420 in timerRunner ()
>
> #4 0x00d1b2db in start_thread () from /lib/libpthread.so.0
>
> #5 0x00c7512e in clone () from /lib/libc.so.6
>
> Thread 1 (Thread -1208169792 (LWP 13114)):
>
> #0 0x00994410 in __kernel_vsyscall ()
>
> #1 0x00d22506 in __nanosleep_nocancel () from /lib/libpthread.so.0
>
> #2 0x0805b0d0 in wrapperSleep ()
>
> #3 0x08059f6c in wrapperEventLoop ()
>
> #4 0x08056628 in wrapperRunConsole ()
>
> #5 0x0805cce3 in main ()
>
> #0 0x00994410 in __kernel_vsyscall ()
>
>
>
> If I send a signal to the wrapper, it reacts again, but before, it hangs
> forever in the nanosleep method.
>
> Is this a known problem of ServiceWrapper running with multiple CPUs?
>
> Do I have to recompile with usleep support instead or is there an option to
> always use usleep?
>
>
>
> Best,
>
> --Christopher (on behalf of Johannes)
|