Re: [Wrapper-user] ServiceWrapper starts to hang in wrapperSleep after some seconds

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Johannes,
When you get the hang, are you consistently seeing that both of the
threads are getting stuck in the nanosleep calls?

We have a low level debug mode that can be enabled to show calls to
sleep.  This was originally added to debug problems with usleep many
years ago.  It is not possible to go back to usleep because it has
many problems with multithreads and interrupts.   Please try setting
the following and reproducing the problem.

wrapper.sleep_output=TRUE

It will only give us new information if any nanosleep calls are
failing for any reason however.  Otherwise it will simply say when the
calls start and stop.

We are continuing to look into this.

Cheers,
Leif

On Fri, Nov 26, 2010 at 11:58 AM, Christian Mueller
<chr...@ta...> wrote:
> Christopher & Johannes,
>
> nanosleep is kind of very low-level system function, for which I can't
> see any problems like this to be common.
>
> There are some hooks in your mail I'd like to get closer, you said you
> are running a 2CPU instance and on vmware. So i'm wondering whether
> sth. with this configuration is not working as expected.
>
> could you please run "cat /proc/<pid>/status" on the machine where
> <pid> is the and send us also the output from there?
> Furthermore, is NTP activated on the host?
>
> I did some tests on our virtual machines but were not able to
> reproduce the behavior you described yet...
>
> Cheers,
> Christian
>
>
>
> On Wed, Nov 24, 2010 at 5:26 PM, Leif Mortenson
> <lei...@ta...> wrote:
>> Christopher & Johannes,
>> Thank you for the detailed report.  This is not a problem which I am
>> aware of to date.  We set up a 2 CPU VM today with cent OS and are
>> running some tests locally.  So far we have not been able to reproduce
>> the problem.
>>
>> >From what you said, it sounds like this problem only happens on
>> shutdown, and can be reproduced easily?
>>
>> Could you please try and reproduce this with the following settings
>> included in the wrapper.conf and then send me the resulting
>> wrapper.log file directly?  It would be a bit large for the list.
>> wrapper.debug=true
>> wrapper.state_output=TRUE
>>
>> It sounds like the nanosleep implementation is getting stuck for some
>> reason.  It is interesting that an interrupt caused by a signal is
>> waking it up. We do not currently have a way to change to usleep
>> without recompiling.
>>
>> Cheers,
>> Leif
>>
>>
>> On Tue, Nov 23, 2010 at 7:31 PM, Christopher Taylor <ct...@co...> wrote:
>>> Hi all,
>>>
>>>
>>>
>>> I’m forwarding this on behalf of Johannes, he seems to be having problems
>>> with his list subscription. I apologize if this is a double post, I forgot
>>> to add the subject on the first mail.
>>>
>>>
>>>
>>> --- snip ---
>>>
>>>
>>>
>>> I am using ServiceWrapper (stable version 3.5.6) under 32bit CentOS 5 in a
>>> VM with 2 CPUs.
>>>
>>>
>>>
>>> Whenever I stopped a Java program with the wrapper (different
>>> configurations), the wrapper stopped pinging the JVM after some seconds
>>> (which eventually leads to the JVM ending itself because it does not receive
>>> ping packages from the wrapper any more).
>>>
>>>
>>>
>>> The last system call that never returns according to strace is always
>>> nanosleep:
>>>
>>>
>>>
>>> gettimeofday({1290172014, 465559}, NULL) = 0
>>>
>>> read(5, 0x8895070, 1024)                = -1 EAGAIN (Resource temporarily
>>> unavailable)
>>>
>>> gettimeofday({1290172014, 465610}, NULL) = 0
>>>
>>> recv(7, 0xbffa98cb, 1, 0)               = -1 EAGAIN (Resource temporarily
>>> unavailable)
>>>
>>> waitpid(13065, 0xbffa98a8, WNOHANG|WSTOPPED) = 0
>>>
>>> nanosleep({0, 100000000}, NULL)         = 0
>>>
>>> gettimeofday({1290172014, 566505}, NULL) = 0
>>>
>>> read(5, 0x8895070, 1024)                = -1 EAGAIN (Resource temporarily
>>> unavailable)
>>>
>>> gettimeofday({1290172014, 566552}, NULL) = 0
>>>
>>> recv(7, 0xbffa98cb, 1, 0)               = -1 EAGAIN (Resource temporarily
>>> unavailable)
>>>
>>> waitpid(13065, 0xbffa98a8, WNOHANG|WSTOPPED) = 0
>>>
>>> nanosleep({0, 100000000},
>>>
>>>
>>>
>>> If I call pstack on the wrapper, you can see that two threads currently hang
>>> in the nanosleep method:
>>>
>>>
>>>
>>> Thread 2 (Thread -1210287216 (LWP 13115)):
>>>
>>> #0  0x00994410 in __kernel_vsyscall ()
>>>
>>> #1  0x00d22506 in __nanosleep_nocancel () from /lib/libpthread.so.0
>>>
>>> #2  0x0805b0d0 in wrapperSleep ()
>>>
>>> #3  0x0805b420 in timerRunner ()
>>>
>>> #4  0x00d1b2db in start_thread () from /lib/libpthread.so.0
>>>
>>> #5  0x00c7512e in clone () from /lib/libc.so.6
>>>
>>> Thread 1 (Thread -1208169792 (LWP 13114)):
>>>
>>> #0  0x00994410 in __kernel_vsyscall ()
>>>
>>> #1  0x00d22506 in __nanosleep_nocancel () from /lib/libpthread.so.0
>>>
>>> #2  0x0805b0d0 in wrapperSleep ()
>>>
>>> #3  0x08059f6c in wrapperEventLoop ()
>>>
>>> #4  0x08056628 in wrapperRunConsole ()
>>>
>>> #5  0x0805cce3 in main ()
>>>
>>> #0  0x00994410 in __kernel_vsyscall ()
>>>
>>>
>>>
>>> If I send a signal to the wrapper, it reacts again, but before, it hangs
>>> forever in the nanosleep method.
>>>
>>> Is this a known problem of ServiceWrapper  running with multiple CPUs?
>>>
>>> Do I have to recompile with usleep support instead or is there an option to
>>> always use usleep?
>>>
>>>
>>>
>>> Best,