|
From: Leif M. <le...@ta...> - 2006-10-12 14:45:26
|
Heather and all.
To keep you updated on this. I think I have gotten it fixed. There
was a
synchronization problem introduced in 3.2.0 which made it possible for
two threads
to access the backend socket at the same time.
This would only happen on Windows when running as a service. And
then only
if the service received an INTERROGATE or other signal from the service
manager.
If it was received at the same time as a ping was scheduled then those
two packets
would be sent to the JVM at the same time.
In most cases, this would result in the packets being mixed and thus
appearing
corrupted. But in some cases, it would cause the wrapper to crash.
Unfortunately this makes 3.2.0 and 3.2.1 unreliable when run as a
service
under Windows. I am trying to get everything tied up and tested to get the
3.2.2 release out as quickly as possible.
Versions before 3.2.0 were not effected by this bug.
As many users are running mission critical applications which would
suffer
from going down, I wanted to mention a couple things to watch out for.
Not all systems appear to have many INTERROGATE signals being sent
around. But those that do appear to get them at a fairly high frequency.
I assume they are being triggered by administration or monitoring tools.
Systems which are at high risk of a crash will have messages like the
following in their wrapper.log. These are caused by the overlapping
packets.
---
INFO | jvm 1 | 2006/09/29 18:46:20 | Wrapper code received an
unknown packet type: 110
---
The 110 can be anything. If you are seeing this then you will want
to keep an eye on the wrapper.
I got pretty good at reproducing this over the last couple days by
artificially
increasing the number of calls to the unsynchronized function. In all cases
I got several messages like the one above before having the wrapper crash.
But I see no reason why a crash on the first occurrence would not be
possible
1) If possible running as a console app rather than a service would remove
the risk of a crash all together.
2) Reduce the likelyhood of a collision by decreasing the ping interval.
This will hinder the wrapper's ability to recover from a JVM crash promptly,
so you will want to be sure to restore these to their original values as
soon
as you upgrade to 3.2.2
---
wrapper.ping.interval=300
wrapper.ping.timeout=630
---
I'll try to get the release out as soon as things are tested.
Let me know if you have any questions about this.
Cheers,
Leif
Leif Mortenson wrote:
> Heather,
> Another user just reported a similar problem. They seem to be able to
> reproduce it more
> easily. I have requested more information. But you might want to
> monitor the following
> bug issue.
> https://sourceforge.net/tracker/?func=detail&atid=425187&aid=1574537&group_id=39428
>
> Cheers,
> Leif
>
> Heather Leonard wrote:
>
>> Leif,
>>
>> Version: 3.2.0
>> Platform: Windows Server 2003
>>
>> I have verified that wrapper.jar, wrapper.exe and wrapper.dll are all
>> from the same version.
>>
>> The error occurred at least 11 days after the JVM was launched so
>> definitely not at start up. After the crash was discovered, the service
>> was restarted without error (a couple of days later). Unfortunately, I
>> have been unable to reproduce the error. Since it occurred in
>> customer's production system, I have copied their installation on my
>> machine to try to reproduce the error. In this way, I can reproduce the
>> exact conditions when the error occurred. However, it is not exactly
>> the same system. Since the application receives data over a socket, is
>> it possible for the client to send something that would cause this
>> error?
>>
>> Thanks,
>> Heather
>>
>> -----Original Message-----
>> From: wra...@li...
>> [mailto:wra...@li...] On Behalf Of Leif
>> Mortenson
>> Sent: Thursday, October 05, 2006 11:05 AM
>> To: wra...@li...
>> Subject: Re: [Wrapper-user] Fatal error in Wrapper
>>
>> Heather,
>> You somehow figured out a way to crash the Wrapper process itself.
>> Unfortunately,
>> this is a state that the Wrapper is not able to recover from on its own.
>> I would definitely like to figure out a way to reproduce this as this is
>> the first I have heard of this problem.
>>
>> Could you tell me what version of the Wrapper you are using as well
>> as the platform.
>> Are you sure that you are using the wrapper.jar, wrapper.exe, and
>> wrapper.dll all from the same version?
>>
>> The message about packet type 110 is perplexing. Packet Id #110 is
>> the access key.
>> I double checked the code, but this is only sent from the JVM to the
>> Wrapper process
>> and then only once at startup. Your error log shows that this packet
>> Id 110 was
>> received by the JVM from the Wrapper.
>>
>> How long after the JVM was launched did you encounter this error?
>>
>> Cheers,
>> Leif
>>
>> Heather Leonard wrote:
>>
>>
>>> Hi,
>>>
>>> My application crashed and the following error was in the wrapper log.
>>>
>>> INFO | jvm 1 | 2006/09/29 18:46:20 | Wrapper code received an
>>> unknown packet type: 110
>>> FATAL | wrapper | 2006/09/29 18:46:25 | encountered a fatal error in
>>>
>>>
>>
>>
>>> Wrapper
>>> FATAL | wrapper | 2006/09/29 18:46:25 | exceptionCode =
>>> EXCEPTION_ACCESS_VIOLATION
>>> FATAL | wrapper | 2006/09/29 18:46:25 | exceptionFlag =
>>> EXCEPTION_NONCONTINUABLE_EXCEPTION
>>> FATAL | wrapper | 2006/09/29 18:46:25 | exceptionAddress =
>>>
>>>
>> 7C82F527
>>
>>
>>> FATAL | wrapper | 2006/09/29 18:46:25 | Read access exception from
>>>
>>>
>>
>>
>>> 0061FE7D
>>> FATAL | wrapper | 2006/09/29 18:46:25 | <-- Wrapper Stopping due to
>>> error in service main.
>>>
>>> I have since been unable to reproduce the crash. I have two
>>> questions. Can anyone explain this error and why it would occur?
>>> Also, I would like the application to restart when an error like this
>>> occurs. Will setting the wrapper.on_exit.default property to RESTART
>>> cause a restart in this case?
>>>
>>> Thanks,
>>> Heather Leonard
>>>
|