|
From: Christian B. <chr...@be...> - 2002-10-16 06:54:49
|
Hi. While overall i am very pleased with the functionality and stability of the Wrapper, i just cannot seem to figure out one problem. I went through the docs and the mailing list archives, yet no idea. So here goes: The basic problem is that the Wrapper will restart my application if it takes too long to garbage collect (this is as per observation). Consider the following Wrapper log: INFO | jvm 1 | 2002/10/14 13:43:49 | Memory Status: 112 MB Used, 127 MB Total INFO | jvm 1 | 2002/10/14 13:43:56 | [Full GC ERROR | wrapper | 2002/10/14 13:46:02 | JVM appears hung: Timed out waiting for signal from JVM. ERROR | wrapper | 2002/10/14 13:46:02 | Java Virtual Machine did not exit on request, terminated STATUS | wrapper | 2002/10/14 13:46:08 | Launching a JVM... As can be seen, i am hitting the heap limit, at which point a full GC kicks in (JVM running in verbose mode). The machine the JVM is running on is short on main memory and hits the page file at this point, swapping for about two minutes (see timestamps) - finally, the Wrapper decides it's dead and restarts it. While i don't want to argue the fact the 2 minute GCs are pretty bad ;) - i still think the Wrapper should not have restarted the JVM. I have gone through most of the docs and tuned some of the advanced parameters, namely: wrapper.ping.timeout=120 From my understanding, this will make the Wrapper wait for 120 seconds if no pings get answered from the JVM, then decide it's dead. From the above log, it looks like this timeout is working (death comes after some 2 minutes). Now i did look into the wrapper.cpu.timeout=10 parameter and left it at the default. my understanding is, that once the wrapper detects that either itself or the JVM is stalled, then the ping timeout is extended. from the docs, it looks like this functionality is there to address the problem i am describing above. however, in my case, it looks like the decision to restart is made before the stalled process detection kicks in - is this a bug or the expected behaviour? I'd appreciate any help in figuring this out. Oh, and i've seen this on (underpowerd) Windows and Solaris boxen. thanks, chr. |