|
From: Christoph S. <csc...@am...> - 2019-01-09 09:11:32
|
CONFIDENTIAL & RESTRICTED
Hello Leif,
Thanks for your response.
Easy one first, the version we use:
[scheck@muctxp5b scheck_unix4]$ ./wrapper --version
Java Service Wrapper Community Edition 64-bit 3.5.30
Copyright (C) 1999-2016 Tanuki Software, Ltd. All Rights Reserved.
http://wrapper.tanukisoftware.com
concerning the forced kill, I think I have seen once on another instance and time in the wrapper log something like "..JVM received sigkill (9)..".
In the case I looked at, the JVM process owned by the wrapper was gone, which suits the DOWN_CLEAN as you explained.
I'll turn on debug output on a few of them in case it happens again.
As I interpret it, the configuration as such is OK, as well as the normal behaviour: when I i.e. kill the JVM manually, the wrapper brings it back online. And due to the OOM situation - more precisely, wrapper and JVM were limited by 1024 processes max in ulimits - the wrapper was not able i.e. to fork a command and that could explain why recovery stalled. Likely is that other wrapper / JVM tandems on the same machine (20-30 tandems) faced the same trouble and tried to recover, which would mean sometimes the ceiling was reached, sometimes not (i.e. when yet another jvm with many threads was killed or die). Does this makes sense to you?
Should I look into updating my script to interpret the output of "app.sh status" concerning certain Java:__ states and kill the wrapper ?
(in such a case the veritas cluster would consider the resource being offline and start the wrapper again).
If that is a good idea depends on the amount of states to consider and for how long such a state can be tolerated. Maybe it is paranoid, since our box is very big, we should be fine concerning OOM unless we screw up settings again. We're newbies on Linux, used windows for years.
Cheers,
Christoph
From: Leif Mortenson [mailto:lei...@ta...]
Sent: 09 January 2019 03:10
To: Wrapper User List <wra...@li...>
Subject: [EXT] Re: [Wrapper-user] no JVM running (state: DOWN_CLEAN) on linux after OOM
Christoph
1) Could you please send me the wrapper.log file with debug output enabled (wrapper.debug=true) that shows what is happening when the Wrapper is failing to restart the JVM?
Please include the part of the log showing the last few moments of the JVM that runs out of memory as well.
2) What version of the Wrapper are you running?
The following issue was fixed in 3.5.16 and sounds like it might be what you are seeing.
https://wrapper.tanukisoftware.com/doc/english/release-notes.html#3.5.16<https://clicktime.symantec.com/a/1/s_mZYlanJcqYJWQ55URpsksoMfAB69FuqpCaCaHcFZI=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Frelease-notes.html%233.5.16>
---
Fix a problem where a JVM process was not stopped completely on a UNIX platform and stayed defunct after a forced kill until the Wrapper process itself stopped. This was especially noticeable if the JVM is frozen and the JVM is being killed forcibly.
---
Are you seeing a zombie Java process still running?
This bug meant that the JVM was being left around in the background when the Wrapper thought it was gone.
If you are out of memory then the next JVM would not have enough memory to launch.
If the first JVM is not actually frozen, it would shut itself down after losing its backend connection to the Wrapper. But that might be happening too late and result in what you are seeing.
3) The DOWN_CLEAN state means that the Wrapper has completely shutdown the JVM and cleaned up any associated resources.
We will take a look at the documentation on the following page as you are correct that it is missing some information.
https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html<https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html>
Cheers,
Leif
On Tue, Jan 8, 2019 at 8:32 PM Christoph SCHWAIGER <csc...@am...<mailto:csc...@am...>> wrote:
CONFIDENTIAL & RESTRICTED
Hello Leif,
Thanks for the information about the subscription. I did so.
We have been using the wrapper on windows for many years, since a couple of years we have a standard support version.
Our problem is on linux RH. After an out of memory situation (the jvm exited) it is not restarted and remains down indefinitely, the status script exits with status zero, so all looks up for the cluster. (integrated into veritas cluster). The OOM was bad: not related to JVM, but caused by overly optimistic ulimits of the user - that has been corrected.
STATUS | wrapper | 2019/01/07 13:38:41 | Launching a JVM...
INFO | jvm 1 | 2019/01/07 13:38:43 | WrapperManager: Initializing...
INFO | jvm 1 | 2019/01/07 13:38:45 | S-Check version 3.0.4 Monte Rosa from 12-Sep-2018 08:02 by cschwaiger
INFO | jvm 1 | 2019/01/07 13:38:45 | Scheck is starting on server MUCTXP5B
INFO | jvm 1 | 2019/01/07 13:38:52 | parsed 1 xml files and created 0 service records.
INFO | jvm 1 | 2019/01/07 15:02:11 | Exception in thread "InactivityMonitor WriteCheck" java.lang.OutOfMemoryError: unable to create new native thread
STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory. Restarting JVM.
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.lang.Thread.start0(Native Method)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.lang.Thread.start(Thread.java:717)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.transport.InactivityMonitor.writeCheck(InactivityMonitor.java:147)
INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.transport.InactivityMonitor$2.run(InactivityMonitor.java:113)
INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.TimerThread.mainLoop(Timer.java:555)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.TimerThread.run(Timer.java:505)
ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out waiting for signal from JVM.
ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request, termination requested.
STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL (9).
STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone.
STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested to terminate.
STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper configuration...
STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM...
[scheck@muctxp5b scheck_unix11]$ ./scheck.sh status
Service check monitoring instance (not installed) is running: PID:56766, Wrapper:STARTED, Java:DOWN_CLEAN
I could not find the DOWN_CLEAN state documented - looked at: https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html<https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html>
"scheck.sh stop" fails - indefinitely waits for wrapper to stop. A simple kill <pid> terminates it.
Any recommendations - i.e. measures to avoid hanging in the "looks good = status zero, but down" state?
Below/attached is the information about os version and configuration.
Thanks in advance,
Christoph
Linux muctxp5b 2.6.32-754.3.5.el6.x86_64 #1 SMP Thu Aug 9 11:56:22 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
|