Re: [Wrapper-user] Occasional wrapper restart under light load

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Bill,
    That is a big log (20MB)  I asked for it though. :-)   I found a 
single restart in the logs.
You originally started the application at 2003/10/08 17:30:04 and it was 
running fine until
2003/10/14 10:43:03 when it was restarted due to a ping timeout.  The 
service was then
stopped manually at 2003/10/15 07:49:06.  You are correct.  That is a 
long time to
reproduce the problem.

    Scanning through the logs, it looks like the highest frequency of 
garbage collection
happened right before the JVM was restarted.  Each of the individual GC 
sweeps was
very short, but there were a lot of them.

    Looking at the log, the last successful ping was at 2003/10/14 
10:39:57, or 186
seconds before Wrapper timed out waiting for a ping.  The previous pings 
had all been
completing like clockwork once every 6 seconds.

    One thing I noticed is that there is no Java side output in the log 
except for immediately
after the JVM is launched.   Is your application redirecting this 
output?  And if so would it
be possible for you to send me that as well?   It might give me some 
additional clues.
Esp whether or not the JVM is receiving the final ping request.

    From the log so far, I do not have a lot of ideas.  Everything is 
running fine and then the
JVM stops receiving or responding to pings.   I have a Wrapper 
controlled app running on
a Win2k at home that has been up for about 7 weeks, so I don't think it 
is a time issue.

    I'll try and think of other ideas.

>>    The problem is that before the JVM is restarted, there are no 
>>messages from
>>the JVM about having received any packets.
>>    
>>
>
>
>I will go back through the logs and see when the wrapper behavior
>changed and will see if it correlates with any events on the application
>side.
>
Great, let me know what you find out.

>>collection by adding the -Xincgc.   I was not sure what the
>>-XX:+UseConcMarkSweepGC option does?
>>    
>>
>
>
>A couple of months ago, we had some major memory/garbage collection
>issues. After investigation we have found that for our application:
>
>1. When using the default garbage collector, if a major collection is
>performed while some of the JVM is sitting in the paging file, the GC
>times can increase up to 2 orders of magnitude. We were getting some 80
>- 90 second garbage collections! Doubling the RAM solved this problem.
>
>2. We made further improvements in our GC times by using a GC strategy
>that is new to 1.4.2, the Concurrent Low Pause collector. There is lots
>of information out there about the new GC strategies. One of the better
>ones is here: http://java.sun.com/docs/hotspot/gc1.4.2/. From that web
>page, it says to: "Use the concurrent low pause collector if your
>application would benefit from shorter garbage collector pauses and can
>afford to share processor resources with the garbage collector when the
>application is running." I could be wrong, but I am pretty sure that
>time in GC is not the issue here.
>
    Thanks always more things to study up on....  Thanks for the link.

> 
>
>>Also try extending your wrapper.ping.timeout to around 300, 5 
>>minutes.  
>>If the
>>problem is GC related, that will hopefully be long enough to make the 
>>problem
>>go away.   If the problem is GC related, then your 
>>application would be
>>unresponsive to its clients and not just the Wrapper during this time 
>>however,
>>have you seen such problems?
>>    
>>
>
>
>I would rather not do that right now. It feels to me like there is some
>problem between the wrapper and the application. The application is not
>working hard and I don't think it is experiencing major GC pauses.
>Because it happens so infrequently I would like to do that as a last
>resort because I won't be comfortable that the issue is fixed for a
>while.
>
Ok, I'll try to think of some other causes.  That is the only thing in 
the logs right now so it
is what first comes to mind...

>>    I can't think of anything off hand that I have fixed 
>>since version 
>>3.0.2 that would
>>affect this, but there have been lots of improvements to the wrapper. 
>>You may want
>>to consider upgrading to version 3.0.5
>>    
>>
>
>
>We can upgrade on a future version, however the application is part of a
>medical device that has tight FDA constraints. We could change the
>version, but it would be a lot of work. There would be documentation to
>change, and even worse, we would have to rerun many tests. If we knew it
>would fix the problem, then we would go ahead and do it. Otherwise, I
>don't want to change.
>
Ok, go ahead and stick with 3.0.2 for now.  I don't think there were any 
changes
that would affect this anyway.

You can play with the ping timeout in your version.   But if you use the 
latest version, you
can also change the actual ping interval.  May be useful.

Cheers,
Leif