|
From: Paul C. <cas...@au...> - 2004-02-17 05:05:08
|
Thanks Leif. The server is Windows 2000 Advanced Server with 6.3GB of physical RAM - according to the task manager the highest peak that I've ever seen was 5.1 GB, so there's still over a GB of physical RAM left always. So, no unfortunately I can't put it down to disk swapping. We inherited support of this application, which was developed by another organization around 5 years ago. The logging is pretty ordinary (hooray for the extra logging flexibility provided by the Wrapper!), and the logs showed nothing for around 1/2 an hour prior to the ping timeout - BUT the task table in the database showed that reports were being run and successfully completing right up until around 30 seconds before the JVM failed to respond to pings. Sounds like a coincidence, but the report completed normally. It's possible that the logging system for our application needs an overhaul - there's a HUGE difference in output between normal and debug levels for the application - with nothing in between. System.out and System.err get output to the Wrapper logs when the log level is INFO correct? I was thinking of supplementing the current behaviours with a printStackTrace(System.err) whenever an exception occurs, and adding other informative output to System.out so that we can at least know when tasks are being scheduled and by whom. Is the JVM for 1.4.2_03 less prone to lock up than 1.3.1_09 by any chance? Our client intends to update the JDK, but it probably won't be for a few months. I'm not sure what the CPU was doing, but when I logged on the Wrapper had just restarted the JVM, so I could see the "blip" in the RAM usage - meaning that the process was still using it's full allocation of RAM while it was hung. I have no doubt that the Wrapper did as it should have - I'm just not sure where abouts in the hay stack I should start looking for the needle. That's why I'm looking for any suggestions and/or advice available :). Sorry I don't quite understand this: "This is a bug that has been fixed in 3.1.0. It was not previously possible to invoke a thread dump on exit when running as an NT service due to the lack of a console. " Can I still run my app as a service (not as console) and get a thread dump now with V 3.10? Because our application only does this once every couple of months I'd rather not run it as a console if it means that it can't be shut down easily (as per as a service) via a call to the Wrapper - or can it? Thanks for your help. Paul Casanova |---------+----------------------------------------> | | Leif Mortenson | | | <le...@ta...> | | | Sent by: | | | wra...@li...| | | ceforge.net | | | | | | | | | 17/02/2004 02:59 PM | | | Please respond to | | | wrapper-user | |---------+----------------------------------------> >--------------------------------------------------------------------------------------------------------------| | | | To: wra...@li... | | cc: | | Subject: Re: [Wrapper-user] JVM hang causes | >--------------------------------------------------------------------------------------------------------------| Paul, Paul Casanova wrote: >The JVM for the main process of our application (-Xmx1600M) was restarted >by the Java Service Wrapper today after pinging with no response from the >JVM for 10 mintues (as configured). > >I can't for the life of me work out why it was hung though - there were no >exceptions in either the JSW log file nor the application's log file. > > See more below, but if the JVM does lock up, there will usually not be any stack traces or errors. The display of such errors requires that the JVM still be running. >Moreover, when the JSW tried to get a thread dump on exit, it failed. >Here's a snippet from the log file: >ERROR | wrapper | 2004/02/17 12:02:55 | JVM appears hung: Timed out >waiting for signal from JVM. >STATUS | wrapper | 2004/02/17 12:02:55 | Dumping JVM state. >DEBUG | wrapper | 2004/02/17 12:02:55 | Sending BREAK event to process >group 336. >ERROR | wrapper | 2004/02/17 12:02:55 | Unable to send BREAK event to JVM >process. Err(6 : The handle is invalid. (0x6)) > > This is a bug that has been fixed in 3.1.0. It was not previously possible to invoke a thread dump on exit when running as an NT service due to the lack of a console. http://sourceforge.net/tracker/index.php?func=detail&aid=831775&group_id=39428&atid=425187 Thread dumps invoked from within the JVM had always worked. >ERROR | wrapper | 2004/02/17 12:02:58 | Java Virtual Machine did not exit >on request, terminated > >Before this was just 10 minutes of pinging without response from the JVM. >I know that noone can tell me what happened, but does anyone have some >ideas on where to start looking (ie make the haystack smaller so that the >needle is more obvious!). > > If you were able to see pings being sent to the JVM but no replies then the problem is most likely a problem with the JVM. During those 10 minutes, do you know whether or not your application was responsive? Do you know what the CPU usage of the machine was during this time? If your app was unresponsive and CPU usage was low then the problem was most likely that the JVM froze up, and the Wrapper did its job. Are you sure that all 1600MB of the JVM is able to fit entirely in real memory? If the memory is being swapped to disk it is possible that the JVM was simply unresponsive as it was being swapped. I have never run an app quite this large, but I have seen a JVM freeze up for up to 2 minutes as it attempts to do a GC sweep in cases where there is not enough memory. That was for a JVM using around 200MB. So It seems entirely possible that such a sweep could take 10 minutes for 1600MB. This will happen even without using the Wrapper. Are you able to reproduce this? If so, try running in a console so the dump on exit feature works. Version 3.1.0 also fixes a timeout problem with very large dumps so it may be worth testing with the prerelease version. You can build from CVS or I could get you a snapshot build if you need. Note however that it the JVM is truly hung then thread dumps will not work I always like to learn as much as possible about these problems so the Wrapper can be improved, were possible, to make their root causes as obvious as possible. Cheers, Leif ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Wrapper-user mailing list Wra...@li... https://lists.sourceforge.net/lists/listinfo/wrapper-user |