|
From: Richard E. <rem...@ed...> - 2004-03-04 16:40:33
|
I get the following in wrapper logs on linux (not windows or solaris):
JVM appears hung: Timed out waiting for signal from JVM.
JVM did not exit on request, terminated
Looking through the code in wrapper_unix.c I see the following in the
function wrapperKillProcess() :
kill(jvmPid, SIGKILL);
log_printf(WRAPPER_SOURCE_WRAPPER, LEVEL_ERROR, "JVM did not
exit on request, terminated");
What does the message "JVM did not exit on request, terminated" mean?
There is no test after the kill function is called to tell whether or
not the JVM exited or not so how can one say that the JVM did not exit?
Thanks.
Richard
|
|
From: Leif M. <le...@ta...> - 2004-03-06 15:01:20
|
Richard,
When are you seeing this message? Does it happen shortly after your
application
was launched (within 30 seconds) or after your app has been running for
a while?
The first message "JVM appears hung: Timed out waiting for signal
from JVM."
is displayed when the Wrapper process first decides that there is a
problem with the
JVM. It then attempts to ask the JVM to shut itself down cleanly by
sending it a
stop command. This gives the JVM a chance to do any cleanup and shut itself
down cleanly.
If the JVM fails to shutdown on its own then the Wrapper will give
up and
forcibly kill it with a "kill -9". That is the message that you are seeing:
"JVM did not exit on request, terminated " It is printed right after
the kill
command because at that point, the JVM has failed to exit. The kill
command
will always work because it is killing a child process.
I assume that you digging into the code because you are having a
problem.
could you describe it?
Cheers,
Leif
Richard Emberson wrote:
> I get the following in wrapper logs on linux (not windows or solaris):
>
> JVM appears hung: Timed out waiting for signal from JVM.
> JVM did not exit on request, terminated
>
> Looking through the code in wrapper_unix.c I see the following in the
> function wrapperKillProcess() :
>
> kill(jvmPid, SIGKILL);
> log_printf(WRAPPER_SOURCE_WRAPPER, LEVEL_ERROR, "JVM did not
> exit on request, terminated");
>
> What does the message "JVM did not exit on request, terminated" mean?
> There is no test after the kill function is called to tell whether or
> not the JVM exited or not so how can one say that the JVM did not exit?
>
> Thanks.
> Richard
|
|
From: Richard E. <rem...@ed...> - 2004-03-08 15:30:41
|
Leif, Thank you for the explanation; the message: "JVM did not exit on request, terminated " refers to the fact that a previous request to stop the process failed and now it will be shutdown with a SIGKILL. We run load tests every night a multiple machine types. Every couple of days on one of our Linux boxes after running for a couple of hours we get the twin messages: JVM appears hung: Timed out waiting for signal from JVM. JVM did not exit on request, terminated one right after the other within milliseconds. The problem that arises after sending a SIGKILL to the process controlled by the wrapper is that that primary process has spawned secondary processes (not child processes) so that killing the primary with at SIGKILL does not kill the secondary process - shutdown hooks are registered but java will not execute them when a SIGKILL is received. In the file wrapper_unix.c in the function wrapperKillProcess() how about first signaling with at SIGTERM, wait a while and then a SIGKILL. That way the primary process' shutdown hook might run? Richard Leif Mortenson wrote: > Richard, > When are you seeing this message? Does it happen shortly after your > application > was launched (within 30 seconds) or after your app has been running for > a while? > > The first message "JVM appears hung: Timed out waiting for signal > from JVM." > is displayed when the Wrapper process first decides that there is a > problem with the > JVM. It then attempts to ask the JVM to shut itself down cleanly by > sending it a > stop command. This gives the JVM a chance to do any cleanup and shut > itself > down cleanly. > > If the JVM fails to shutdown on its own then the Wrapper will give up > and > forcibly kill it with a "kill -9". That is the message that you are > seeing: > "JVM did not exit on request, terminated " It is printed right after > the kill > command because at that point, the JVM has failed to exit. The kill > command > will always work because it is killing a child process. > > I assume that you digging into the code because you are having a > problem. > could you describe it? > > Cheers, > Leif > > Richard Emberson wrote: > >> I get the following in wrapper logs on linux (not windows or solaris): >> >> JVM appears hung: Timed out waiting for signal from JVM. >> JVM did not exit on request, terminated >> >> Looking through the code in wrapper_unix.c I see the following in the >> function wrapperKillProcess() : >> >> kill(jvmPid, SIGKILL); >> log_printf(WRAPPER_SOURCE_WRAPPER, LEVEL_ERROR, "JVM did not >> exit on request, terminated"); >> >> What does the message "JVM did not exit on request, terminated" mean? >> There is no test after the kill function is called to tell whether or >> not the JVM exited or not so how can one say that the JVM did not exit? >> >> Thanks. >> Richard > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > Wrapper-user mailing list > Wra...@li... > https://lists.sourceforge.net/lists/listinfo/wrapper-user > |
|
From: Leif M. <le...@ta...> - 2004-03-08 15:49:15
|
Richard, > Thank you for the explanation; the message: > "JVM did not exit on request, terminated " > refers to the fact that a previous request to stop the process > failed and now it will be shutdown with a SIGKILL. > > We run load tests every night a multiple machine types. Every couple of > days on one of our Linux boxes after running for a couple of hours > we get the twin messages: > > JVM appears hung: Timed out waiting for signal from JVM. > JVM did not exit on request, terminated > > one right after the other within milliseconds. > > The problem that arises after sending a SIGKILL to the process > controlled by the wrapper is that that primary process has spawned > secondary processes (not child processes) so that killing the primary > with at SIGKILL does not kill the secondary process - shutdown hooks > are registered but java will not execute them when a SIGKILL is received. > > In the file wrapper_unix.c in the function wrapperKillProcess() how > about first signaling with at SIGTERM, wait a while and then a SIGKILL. > That way the primary process' shutdown hook might run? The Wrapper does not send a SIGTERM to the JVM, but it does attempt to get the JVM to shutdown cleanly. The wrapperKillProcess function is only called when a clean shutdown of the JVM has failed. At that point, the JVM is most likely not listening for SIGTERM or any other signals. The SIGKILL is a last resort to get rid of it. Most likely your application is frozen at this point which is why it had not responded to the exit requests. If you are able to reproduce this so easily. Could you try turning on debug output with wrapper.debug=true and then letting that run until the JVM is restarted? The debug output will show exactly why the Wrapper decides that it is time to kill the JVM process. Most likely the JVM has stopped responding to ping requests. Cheers, Leif |
|
From: Richard E. <rem...@ed...> - 2004-03-08 16:31:51
|
Leif, see below. Richard Leif Mortenson wrote: > Richard, > >> Thank you for the explanation; the message: >> "JVM did not exit on request, terminated " >> refers to the fact that a previous request to stop the process >> failed and now it will be shutdown with a SIGKILL. >> >> We run load tests every night a multiple machine types. Every couple of >> days on one of our Linux boxes after running for a couple of hours >> we get the twin messages: >> >> JVM appears hung: Timed out waiting for signal from JVM. >> JVM did not exit on request, terminated >> >> one right after the other within milliseconds. >> >> The problem that arises after sending a SIGKILL to the process >> controlled by the wrapper is that that primary process has spawned >> secondary processes (not child processes) so that killing the primary >> with at SIGKILL does not kill the secondary process - shutdown hooks >> are registered but java will not execute them when a SIGKILL is received. >> >> In the file wrapper_unix.c in the function wrapperKillProcess() how >> about first signaling with at SIGTERM, wait a while and then a SIGKILL. >> That way the primary process' shutdown hook might run? > > > The Wrapper does not send a SIGTERM to the JVM, but it does attempt to > get the > JVM to shutdown cleanly. The wrapperKillProcess function is only > called when a > clean shutdown of the JVM has failed. At that point, the JVM is most > likely not > listening for SIGTERM or any other signals. The SIGKILL is a last > resort to get > rid of it. > > Most likely your application is frozen at this point which is why it had > not responded > to the exit requests. Looking through the code in wrapper.c, the message: "JVM appears hung: Timed out waiting for signal from JVM." is printed when the primary process has not responded to a ping. Immediately after this the function wrapperKillProcess() is called. As far as I can see there were no "exit requests" prior to the SIGKILL. Obviously, you are much more familiar than I am with the code, could you point out where a previous exit request was sent after the ping timeout so I can better understand the flow of control. Thanks. If there was no previous exit request, would there be any harm is first sending the process a SIGTERM followed (100 ms later or so) by a SIGKILL? As an aside, using the "unsupported" classes sun.misc.Signal and sun.misc.SignalHandler one can register signal handlers. At least on Linux using java 1.4.2_03 registering to catch TERM works but reqistering to catch KILL results in a nice core dump at the point of registration :-) > > If you are able to reproduce this so easily. Could you try turning on > debug output > with wrapper.debug=true and then letting that run until the JVM is > restarted? The > debug output will show exactly why the Wrapper decides that it is time > to kill the > JVM process. Most likely the JVM has stopped responding to ping requests. > > Cheers, > Leif > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > Wrapper-user mailing list > Wra...@li... > https://lists.sourceforge.net/lists/listinfo/wrapper-user > |
|
From: Leif M. <le...@ta...> - 2004-03-14 14:30:16
|
Richard, > Looking through the code in wrapper.c, the message: > > "JVM appears hung: Timed out waiting for signal from JVM." > > is printed when the primary process has not responded to a ping. > Immediately after this the function wrapperKillProcess() is called. > As far as I can see there were no "exit requests" prior to the SIGKILL. > Obviously, you are much more familiar than I am with the code, could > you point out where a previous exit request was sent after > the ping timeout so I can better understand the flow of control. Thanks. The JVM was never asked to exit using a signal. But the Wrapper does maintain a socket which is used for all such communications. In my experience when that communication link has failed, the Wrapper can assume that the JVM is frozen or at least in a very bad state. This is one of the points at which the Wrapper will attempt to kill the JVM. > > If there was no previous exit request, would there be any harm is > first sending the process a SIGTERM followed (100 ms later or so) > by a SIGKILL? I went ahead and added this. If the JVM is truly frozen it will not have any effect as the JVM will not be able to respond to the SIGTERM. This is why I was just going ahead and sending the SIGKILL when I was convinced that the JVM was dead. As I understand it. The kill function does not send the signal to the child processes so I am not sure if this change will make any difference for the problem you are having with the child processes. It may be necessary to loop over any child processes of the JVM, sending the SIGTERM then SIGKILL signals to each of them. (Need to look into whether or not this is even possible) The additional debug information associated with this will be useful in detecting whether the JVM is actually frozen or not. It has the drawback of adding up to 5 seconds to the time that it takes to kill and then restart a frozen JVM. It takes 24 hours for the public CVS archive to be synched with the dev archives. But I would appreciate it if you could check out the CVS code, build and then test this fix with your application. I am interested to find out if it makes any difference for you. > As an aside, using the "unsupported" classes > sun.misc.Signal and sun.misc.SignalHandler one can register > signal handlers. At least on Linux using java 1.4.2_03 registering > to catch TERM works but reqistering to catch KILL results in a > nice core dump at the point of registration :-) That is interesting to know about. The Wrapper already supports this when you use integration method #3. You can receive any and all signals sent to the JVM using the WrapperListener.controlEvent method. Cheers, Leif |
|
From: Richard E. <rem...@ed...> - 2004-03-15 14:53:35
|
Leif, You are correct, if the JVM is really hung, then sending it a SIGTERM will do no good. I have a process that the wrapper manages and that process has a subprocess. The process has a shutdown hook which will kill the subprocess when the process receives a SIGTERM; the process's shutdown hooks are not invoked when it receives a SIGKILL. ... nothing new here. What I've now done is when the wrapper launches the process, if it detects that there is a subprocess running (from a previous run), it will tell the old subprocess to die before it creates its new subprocess. For my situation this works because I have only one, well known subprocess with a well known communication port. Thanks, Richard |