|
From: Leif M. <le...@ta...> - 2004-02-17 04:03:17
|
Paul, Paul Casanova wrote: >The JVM for the main process of our application (-Xmx1600M) was restarted >by the Java Service Wrapper today after pinging with no response from the >JVM for 10 mintues (as configured). > >I can't for the life of me work out why it was hung though - there were no >exceptions in either the JSW log file nor the application's log file. > > See more below, but if the JVM does lock up, there will usually not be any stack traces or errors. The display of such errors requires that the JVM still be running. >Moreover, when the JSW tried to get a thread dump on exit, it failed. >Here's a snippet from the log file: >ERROR | wrapper | 2004/02/17 12:02:55 | JVM appears hung: Timed out >waiting for signal from JVM. >STATUS | wrapper | 2004/02/17 12:02:55 | Dumping JVM state. >DEBUG | wrapper | 2004/02/17 12:02:55 | Sending BREAK event to process >group 336. >ERROR | wrapper | 2004/02/17 12:02:55 | Unable to send BREAK event to JVM >process. Err(6 : The handle is invalid. (0x6)) > > This is a bug that has been fixed in 3.1.0. It was not previously possible to invoke a thread dump on exit when running as an NT service due to the lack of a console. http://sourceforge.net/tracker/index.php?func=detail&aid=831775&group_id=39428&atid=425187 Thread dumps invoked from within the JVM had always worked. >ERROR | wrapper | 2004/02/17 12:02:58 | Java Virtual Machine did not exit >on request, terminated > >Before this was just 10 minutes of pinging without response from the JVM. >I know that noone can tell me what happened, but does anyone have some >ideas on where to start looking (ie make the haystack smaller so that the >needle is more obvious!). > > If you were able to see pings being sent to the JVM but no replies then the problem is most likely a problem with the JVM. During those 10 minutes, do you know whether or not your application was responsive? Do you know what the CPU usage of the machine was during this time? If your app was unresponsive and CPU usage was low then the problem was most likely that the JVM froze up, and the Wrapper did its job. Are you sure that all 1600MB of the JVM is able to fit entirely in real memory? If the memory is being swapped to disk it is possible that the JVM was simply unresponsive as it was being swapped. I have never run an app quite this large, but I have seen a JVM freeze up for up to 2 minutes as it attempts to do a GC sweep in cases where there is not enough memory. That was for a JVM using around 200MB. So It seems entirely possible that such a sweep could take 10 minutes for 1600MB. This will happen even without using the Wrapper. Are you able to reproduce this? If so, try running in a console so the dump on exit feature works. Version 3.1.0 also fixes a timeout problem with very large dumps so it may be worth testing with the prerelease version. You can build from CVS or I could get you a snapshot build if you need. Note however that it the JVM is truly hung then thread dumps will not work I always like to learn as much as possible about these problems so the Wrapper can be improved, were possible, to make their root causes as obvious as possible. Cheers, Leif |