|
From: Leif M. <lei...@ta...> - 2010-04-22 08:23:00
|
Tomer, The Wrapper is designed to monitor the JVMs state to make sure that it is running in a normal state. As one of its many checks, it sends a ping every 5 seconds to the Java process. The Java process reads and then responds to this ping, letting the Wrapper know that the JVM is still alive. The wrapper.ping.timeout is the maximum amount of time that the Wrapper will allow without a ping reply before giving up and killing the JVM. There is another issue that is especially important on Windows machines, but also on UNIX. If the Wrapper process should ever crash, then the Java process would be left running in the background as a zombie without any way of terminating it. On Windows when running in the service space, this means that the system would need to be rebooted to kill the JVM. To avoid this case, the Java side of the Wrapper keeps track of when the last ping was received from the Wrapper. After the wrapper.ping.timeout has expired, the JVM will request that the Wrapper restart it. If that still fails for another 90 seconds then the JVM will assume that the Wrapper is dead and exit. The Java side of the Wrapper also checks to make sure that all of this only happens if the Java-side tick timer has been progressing to make sure that it is not a matter of the JVM being starved of CPU. This is a feature that has been in the Wrapper without modification for several years without modification. This is the first case I have seen where it causes any problems. When the Java side of the Wrapper decides that the Wrapper is gone, it calls System.exit to shut itself down. In your case however, the Wrapper is still there. It sees that the JVM requested that it exit and complies. This is why the JVM and then Wrapper are shutting down. I would really like to take a look at your unmodified wrapper.log as well as the wrapper.conf file so I can make sense of exactly what is happening on your system. As I wait, we are doing some more tests on our end to try and figure out what combination of events would make this kind of failure possible. I look forward to your response. Cheers, Leif On Wed, Apr 21, 2010 at 10:02 PM, Tomer B <tom...@gm...> wrote: > Hi, > I'm checking about sending the logs. > > But what do you think of the following analysis (and question I raise). > As I know that if I reservate some cpu processing to the vm that runs the > wrapper+jvm then this issue does not happen - there is no shutdown, that > means that maybe whats happening is that without cpu reservation the vmware > decides not to give cpu to the wrapper, therefore the wrapper does not ping > my jvm, and therefore the jvm asks the wrapper to restart itself (as seen > from logs), in that case the wrapper manager is not confused because it did > not get any pings from the wrapper (in the logs I don't see that the wrapper > is sending pings at that time). However my question here is assuming this > is what is happening, how come when the wrapper shuts down the jvm and not > restarts it - the wrapper says: > >> DEBUG | wrapper | 2010/04/20 22:26:24 | Signal trapped. Details: >> DEBUG | wrapper | 2010/04/20 22:26:24 | signal number=18 (SIGCHLD), >> source="unknown" >> DEBUG | wrapper | 2010/04/20 22:26:24 | Received SIGCHLD, checking JVM >> process status. >> DEBUG | wrapper | 2010/04/20 22:26:24 | JVM process exited with a code >> of >> 1, setting the wrapper exit code to 1. >> DEBUG | wrapper | 2010/04/20 22:26:24 | JVM exited normally. >> STATUS | wrapper | 2010/04/20 22:26:24 | <-- Wrapper Stopped > > Which looks like saying the wrapper is stopping the application, shouldn't > it just restart the jvm in case of ping timeouts? it looks like when my jvm > shuts down (as it requested the wrapper to shut itself down because it did > not recieve pings from wrapper) at that point when it finishes shut down the > wrapper receives SIGCHLD and decides to terminate the jvm and then also > shuts down itself - what do you think of that? why is that happening? why is > there a SIGCHLD? isn't that normal? > > Thanks > Tomer > > > On Wed, Apr 21, 2010 at 9:52 AM, Leif Mortenson <le...@ta...> > wrote: >> >> Tomer, >> Sorry for the trouble. Looking at subset of the log file, I am having >> a hard time seeing exactly what is happening. Would it be possible >> for you to send the full unedited wrapper.log file along with your >> wrapper.conf to me directly off list? >> >> Something is causing read timeouts like the following. Are you aware >> of any system time shifts taking place. The one below was about 5 >> minutes. >> INFO | jvm 1 | 2010/04/20 22:21:13 | WrapperManager Debug: Read >> Timed out. (Last Ping was 298400 milliseconds ago) >> >> The Wrapper's tick timer was designed to handle such time shifts >> without problem. >> >> It looks like the WrapperManager is confused and thinks that the >> Wrapper has not pinged it for 90 seconds more than the ping timeout. >> This should only happen if the Wrapper process is dead, which is >> obviously not true in your case. >> >> I will go back over the code and try to figure out exactly what is >> going on, but the full log will be very helpful. >> >> As a note to that old thread. The problem they were having with the >> EAR file needing to be reset can be easily handled with in the >> Professional Edition with the ability to execute shell scripts or >> other external commands prior to launching a new Java instance: >> http://wrapper.tanukisoftware.org/doc/english/props-event.html#command >> >> Cheers, >> Leif >> >> On Wed, Apr 21, 2010 at 6:05 AM, Tomer B <tom...@gm...> wrote: >> > Hi, >> > >> > I am currently testing wrapper 3.4.0 on solaris 64 bit machine. >> > Under regular "physical" machine its already working for days and is >> > fine >> > (tried that on a couple of machines). >> > >> > However if i run the wrapper under vm instances (of vmware) i get >> > abnormal >> > shutdowns always both to my server and to the wrapper (after about 2 >> > hours >> > both process are down). >> > >> > I'm attaching relevant logs: >> > Note that I tested a configured vmware for reservation of cpu to one of >> > the >> > vm instances then on that vm instance it seems that I don't have this >> > shutdown, i have multiple vm virtual machines on the same physical >> > machine >> > and the shutdown happens on all of them (except for that instance with >> > that >> > cpu reservation). I would like vm configuration to be working as well >> > because I'm thinking of having such a configuration possibly in future >> > in >> > production. >> > I also saw this over the web: >> > >> > http://old.nabble.com/JVM-appears-hung:-VMware,-verbose-garbage-collection-td13470076.html >> > >> > Here are relevant logs if you could provide me with information of why >> > these >> > abnormal shutdowns are happening: >> > >> > INFO | jvm 1 | 2010/04/20 19:43:30 | WrapperManager Debug: Java VM >> > Vendor : Sun Microsystems Inc. >> > INFO | jvm 1 | 2010/04/20 19:43:30 | WrapperManager Debug: OS >> > Name : SunOS >> > INFO | jvm 1 | 2010/04/20 19:43:30 | WrapperManager Debug: OS >> > Arch : amd64 >> > INFO | jvm 1 | 2010/04/20 19:43:30 | WrapperManager Debug: >> > INFO | jvm 1 | 2010/04/20 19:43:30 | [Loaded >> > org.tanukisoftware.wrapper.WrapperManager$5 from xxxxxxxxxx (i made >> > this.. >> > xxxxxx)] >> > >> > >> > DEBUG | wrapperp | 2010/04/20 22:16:09 | read a packet PING : ping >> > DEBUG | wrapperp | 2010/04/20 22:16:13 | send a packet PING : ping >> > INFO | jvm 1 | 2010/04/20 22:16:13 | WrapperManager Debug: Received >> > a >> > packet PING : ping >> > INFO | jvm 1 | 2010/04/20 22:16:13 | WrapperManager Debug: Send a >> > packet PING : ping >> > DEBUG | wrapperp | 2010/04/20 22:16:13 | read a packet PING : ping >> > >> > . >> > . >> > . >> > >> > INFO | jvm 1 | 2010/04/20 22:21:13 | WrapperManager Debug: Read >> > Timed >> > out. (Last Ping was 298400 milliseconds ago) >> > >> > . >> > . >> > . >> > >> > INFO | jvm 1 | 2010/04/20 22:26:13 | WrapperManager Debug: Read >> > Timed >> > out. (Last Ping was 596700 milliseconds ago) >> > INFO | jvm 1 | 2010/04/20 22:26:13 | WrapperManager: JVM did not >> > exit. >> > Give up. >> > >> > . >> > . >> > . >> > >> > INFO | jvm 1 | 2010/04/20 22:26:13 | WrapperManager Debug: >> > ShutdownHook >> > started >> > INFO | jvm 1 | 2010/04/20 22:26:13 | WrapperManager Debug: >> > WrapperManager.stop(0) called by thread: Wrapper-Shutdown-Hook >> > INFO | jvm 1 | 2010/04/20 22:26:13 | WrapperManager Debug: Send a >> > packet STOP : 0 >> > INFO | jvm 1 | 2010/04/20 22:26:13 | WrapperManager Debug: Stopped >> > checking for control events. >> > DEBUG | wrapperp | 2010/04/20 22:26:13 | read a packet STOP : 0 >> > DEBUG | wrapper | 2010/04/20 22:26:13 | JVM requested a shutdown. (0) >> > DEBUG | wrapper | 2010/04/20 22:26:13 | wrapperStopProcess(0) called. >> > DEBUG | wrapper | 2010/04/20 22:26:13 | Sending stop signal to JVM >> > DEBUG | wrapperp | 2010/04/20 22:26:13 | send a packet STOP : NULL >> > >> > . >> > . >> > . >> > >> > INFO | jvm 1 | 2010/04/20 22:26:14 | WrapperManager Debug: Thread, >> > Wrapper-Shutdown-Hook, handling the shutdown process. >> > INFO | jvm 1 | 2010/04/20 22:26:14 | WrapperManager Debug: calling >> > listener.stop() >> > INFO | jvm 1 | 2010/04/20 22:26:14 | WrapperManager Debug: Send a >> > packet STOP_PENDING : 180000 >> > >> > . >> > . >> > . >> > >> > >> > DEBUG | wrapperp | 2010/04/20 22:26:14 | read a packet STOP_PENDING : >> > 180000 >> > DEBUG | wrapper | 2010/04/20 22:26:14 | JVM signalled a stop pending >> > with >> > waitHint of 180000 millis. >> > >> > INFO | jvm 1 | 2010/04/20 22:26:14 | Start method stopped >> > >> > INFO | jvm 1 | 2010/04/20 22:26:22 | WrapperManager Debug: returned >> > from listener.stop() -> 0 >> > INFO | jvm 1 | 2010/04/20 22:26:22 | WrapperManager Debug: >> > shutdownJVM(0) Thread:Wrapper-Shutdown-Hook >> > INFO | jvm 1 | 2010/04/20 22:26:22 | WrapperManager Debug: wait for >> > 0 >> > shutdown locs to be released. >> > INFO | jvm 1 | 2010/04/20 22:26:22 | WrapperManager Debug: Send a >> > packet STOPPED : 0 >> > >> > DEBUG | wrapperp | 2010/04/20 22:26:22 | read a packet STOPPED : 0 >> > DEBUG | wrapper | 2010/04/20 22:26:22 | JVM signalled that it was >> > stopped. >> > INFO | jvm 1 | 2010/04/20 22:26:22 | WrapperManager Debug: Closing >> > socket. >> > DEBUG | wrapperp | 2010/04/20 22:26:22 | socket read no code (closed?). >> > >> > INFO | jvm 1 | 2010/04/20 22:26:23 | WrapperManager Debug: >> > ShutdownHook >> > complete >> > DEBUG | wrapper | 2010/04/20 22:26:24 | Signal trapped. Details: >> > DEBUG | wrapper | 2010/04/20 22:26:24 | signal number=18 (SIGCHLD), >> > source="unknown" >> > DEBUG | wrapper | 2010/04/20 22:26:24 | Received SIGCHLD, checking JVM >> > process status. >> > DEBUG | wrapper | 2010/04/20 22:26:24 | JVM process exited with a code >> > of >> > 1, setting the wrapper exit code to 1. >> > DEBUG | wrapper | 2010/04/20 22:26:24 | JVM exited normally. >> > STATUS | wrapper | 2010/04/20 22:26:24 | <-- Wrapper Stopped >> > >> > >> > Note that at this stage i have both wrapper & my server processes down. >> > >> > Thanks for the help.. |