You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(13) |
Nov
(16) |
Dec
(29) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(38) |
Feb
(51) |
Mar
(51) |
Apr
(115) |
May
(82) |
Jun
(30) |
Jul
(50) |
Aug
(68) |
Sep
(57) |
Oct
(160) |
Nov
(80) |
Dec
(78) |
| 2004 |
Jan
(71) |
Feb
(75) |
Mar
(108) |
Apr
(87) |
May
(79) |
Jun
(70) |
Jul
(69) |
Aug
(39) |
Sep
(52) |
Oct
(47) |
Nov
(50) |
Dec
(32) |
| 2005 |
Jan
(22) |
Feb
(122) |
Mar
(46) |
Apr
(76) |
May
(31) |
Jun
(51) |
Jul
(61) |
Aug
(70) |
Sep
(37) |
Oct
(46) |
Nov
(57) |
Dec
(83) |
| 2006 |
Jan
(55) |
Feb
(81) |
Mar
(51) |
Apr
(67) |
May
(77) |
Jun
(43) |
Jul
(106) |
Aug
(64) |
Sep
(47) |
Oct
(64) |
Nov
(60) |
Dec
(12) |
| 2007 |
Jan
(50) |
Feb
(93) |
Mar
(49) |
Apr
(56) |
May
(40) |
Jun
(63) |
Jul
(40) |
Aug
(47) |
Sep
(54) |
Oct
(37) |
Nov
(54) |
Dec
(37) |
| 2008 |
Jan
(35) |
Feb
(39) |
Mar
(26) |
Apr
(14) |
May
(23) |
Jun
(51) |
Jul
(43) |
Aug
(26) |
Sep
(29) |
Oct
(31) |
Nov
(24) |
Dec
(16) |
| 2009 |
Jan
(21) |
Feb
(30) |
Mar
(74) |
Apr
(26) |
May
(26) |
Jun
(43) |
Jul
(23) |
Aug
(23) |
Sep
(15) |
Oct
(27) |
Nov
(37) |
Dec
(10) |
| 2010 |
Jan
(16) |
Feb
(28) |
Mar
(16) |
Apr
(45) |
May
(8) |
Jun
(68) |
Jul
(45) |
Aug
(44) |
Sep
(51) |
Oct
(7) |
Nov
(20) |
Dec
(21) |
| 2011 |
Jan
(14) |
Feb
(17) |
Mar
(7) |
Apr
(7) |
May
(48) |
Jun
(23) |
Jul
(5) |
Aug
(33) |
Sep
(22) |
Oct
(14) |
Nov
(14) |
Dec
(5) |
| 2012 |
Jan
|
Feb
(10) |
Mar
(12) |
Apr
(51) |
May
(10) |
Jun
(8) |
Jul
(14) |
Aug
(22) |
Sep
(9) |
Oct
(24) |
Nov
(14) |
Dec
(13) |
| 2013 |
Jan
(12) |
Feb
(4) |
Mar
(14) |
Apr
(19) |
May
(2) |
Jun
(5) |
Jul
(13) |
Aug
(10) |
Sep
(4) |
Oct
(11) |
Nov
(13) |
Dec
(2) |
| 2014 |
Jan
(3) |
Feb
(14) |
Mar
(5) |
Apr
(10) |
May
(10) |
Jun
(11) |
Jul
(10) |
Aug
(3) |
Sep
(13) |
Oct
(22) |
Nov
(14) |
Dec
(32) |
| 2015 |
Jan
(8) |
Feb
(2) |
Mar
(17) |
Apr
(1) |
May
(24) |
Jun
|
Jul
(4) |
Aug
|
Sep
(9) |
Oct
(9) |
Nov
(5) |
Dec
(2) |
| 2016 |
Jan
(8) |
Feb
(6) |
Mar
(6) |
Apr
(9) |
May
(3) |
Jun
(2) |
Jul
(7) |
Aug
(6) |
Sep
|
Oct
|
Nov
(1) |
Dec
(6) |
| 2017 |
Jan
(9) |
Feb
(8) |
Mar
(6) |
Apr
|
May
|
Jun
(3) |
Jul
(13) |
Aug
(10) |
Sep
(8) |
Oct
|
Nov
(6) |
Dec
|
| 2018 |
Jan
|
Feb
(5) |
Mar
(7) |
Apr
(2) |
May
|
Jun
|
Jul
(3) |
Aug
(2) |
Sep
(9) |
Oct
(1) |
Nov
(1) |
Dec
(1) |
| 2019 |
Jan
(9) |
Feb
|
Mar
|
Apr
(10) |
May
(3) |
Jun
|
Jul
(7) |
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
| 2020 |
Jan
(1) |
Feb
(1) |
Mar
(1) |
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
| 2021 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(3) |
| 2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
| 2023 |
Jan
(3) |
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
| 2025 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2026 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Maxime <ma...@ta...> - 2019-04-11 03:54:17
|
Dear Milind Rao Thank you for your email. Normally the Wrapper should only launch the org.springframework.boot.loader.JarLauncher. Then Spring Boot will in turn scan the manifest and launch com.example.PaymentBridge with the dependencies specified in the manifest (inside BOOT-INF/lib/). I am wondering why this is failing to be done by Spring Boot... It would be useful to see the full log file to better understand what is happening. If possible, can you send it to su...@ta...? You may also set wrapper.debug=TRUE in your configuration file to get detailed output. Did you try to run your application with Spring Boot 2 without the Wrapper (without extracting the Jar and editing the classpath)? Do you encounter the same issue? Best Regards, Maxime On Thu, Apr 11, 2019 at 5:01 AM Milind Rao <mi...@gm...> wrote: > I'm using Spring Boot to create an executable jar with an embedded Tomcat > container. > > This is the relevant information from the MANIFEST.MF file > > Spring-Boot-Version: 2.0.5.RELEASE > Main-Class: org.springframework.boot.loader.JarLauncher > Start-Class: com.example.PaymentBridge > Spring-Boot-Classes: BOOT-INF/classes/ > Spring-Boot-Lib: BOOT-INF/lib/ > Created-By: Apache Maven 3.0.5 > Build-Jdk: 1.8.0_191 > > I can run the jar file on Linux with no problem. > java -jar example.jar > > I used method 4 to wrap the jar file and when I run it, I get an error > > Caused by: java.lang.RuntimeException: XPathFactory#newInstance() failed > to create an XPathFactory for the default object model: > http://java.sun.com/jaxp/xpath/dom with the > XPathFactoryConfigurationException: > javax.xml.xpath.XPathFactoryConfigurationException: > java.util.ServiceConfigurationError: javax.xml.xpath.XPathFactory: Provider > com.saxonica.config.EnterpriseXPathFactory could not be instantiated > > This is because it couldn't find the Saxon-EE-9.5.1.9.jar file which is in > the BOOT-INF/lib/ directory of the spring boot jar file. > > I pulled out all the jars in the BOOT-INF/lib directory and copied them to > a directory that I added to a lib2 directory to test. Added the following > property to the wrapper.conf file > wrapper.java.classpath.2=../lib2/*.jar > and it worked. > > Clearly I don't want to do that with every file in the first place. And I > don't want to have to keep doing that every time my dependencies change. > > How can I get the wrapper to add all the embedded jars to the classpath? > > On startup, I do see this line. > > Application started with classpath: [ > jar:file:/opt/example/0.1.0/bin/../lib/example-bridge-0.1.0-SNAPSHOT.jar!/BOOT-INF/classes!/, > > jar:file:/opt/example/0.1.0/bin/../lib/example-bridge-0.1.0-SNAPSHOT.jar!/BOOT-INF/lib/spring-boot-starter-web-2.0.5.RELEASE.jar!/, > > jar:file:/opt/example/0.1.0/bin/../lib/example-bridge-0.1.0-SNAPSHOT.jar!/BOOT-INF/lib/spring-boot-starter-2.0.5.RELEASE.jar!/, > > ... > , > jar:file:/opt/example/0.1.0/bin/../lib/example-bridge-0.1.0-SNAPSHOT.jar!/BOOT-INF/lib/ > *Saxon-EE-9.5.1.9.jar!*/, > jar:file:/opt/example/0.1.0/bin/../lib/example-bridge-0.1.0-SNAPSHOT.jar!/BOOT-INF/lib/xercesImpl-2.10.0.jar > ! > > On linux the classpath is separated by ':', I'm not sure if the comma > between the jars is causing a problem or what. > > Any help would be appreciated. > > _______________________________________________ > Wrapper-user mailing list > Wra...@li... > https://lists.sourceforge.net/lists/listinfo/wrapper-user > |
|
From: Milind R. <mi...@gm...> - 2019-04-10 20:01:39
|
I'm using Spring Boot to create an executable jar with an embedded
Tomcat container.
This is the relevant information from the MANIFEST.MF file
Spring-Boot-Version: 2.0.5.RELEASE
Main-Class: org.springframework.boot.loader.JarLauncher
Start-Class: com.example.PaymentBridge
Spring-Boot-Classes: BOOT-INF/classes/
Spring-Boot-Lib: BOOT-INF/lib/
Created-By: Apache Maven 3.0.5
Build-Jdk: 1.8.0_191
I can run the jar file on Linux with no problem.
java -jar example.jar
I used method 4 to wrap the jar file and when I run it, I get an error
Caused by: java.lang.RuntimeException: XPathFactory#newInstance()
failed to create an XPathFactory for the default object model:
http://java.sun.com/jaxp/xpath/dom with the
XPathFactoryConfigurationException:
javax.xml.xpath.XPathFactoryConfigurationException:
java.util.ServiceConfigurationError: javax.xml.xpath.XPathFactory:
Provider com.saxonica.config.EnterpriseXPathFactory could not be
instantiated
This is because it couldn't find the Saxon-EE-9.5.1.9.jar file which is
in the BOOT-INF/lib/ directory of the spring boot jar file.
I pulled out all the jars in the BOOT-INF/lib directory and copied them
to a directory that I added to a lib2 directory to test. Added the
following property to the wrapper.conf file
wrapper.java.classpath.2=../lib2/*.jar
and it worked.
Clearly I don't want to do that with every file in the first place. And
I don't want to have to keep doing that every time my dependencies change.
How can I get the wrapper to add all the embedded jars to the classpath?
On startup, I do see this line.
Application started with classpath:
[jar:file:/opt/example/0.1.0/bin/../lib/example-bridge-0.1.0-SNAPSHOT.jar!/BOOT-INF/classes!/,
jar:file:/opt/example/0.1.0/bin/../lib/example-bridge-0.1.0-SNAPSHOT.jar!/BOOT-INF/lib/spring-boot-starter-web-2.0.5.RELEASE.jar!/,
jar:file:/opt/example/0.1.0/bin/../lib/example-bridge-0.1.0-SNAPSHOT.jar!/BOOT-INF/lib/spring-boot-starter-2.0.5.RELEASE.jar!/,
...
,
jar:file:/opt/example/0.1.0/bin/../lib/example-bridge-0.1.0-SNAPSHOT.jar!/BOOT-INF/lib/*Saxon-EE-9.5.1.9.jar!*/,
jar:file:/opt/example/0.1.0/bin/../lib/example-bridge-0.1.0-SNAPSHOT.jar!/BOOT-INF/lib/xercesImpl-2.10.0.jar!
On linux the classpath is separated by ':', I'm not sure if the comma
between the jars is causing a problem or what.
Any help would be appreciated.
|
|
From: Maxime <ma...@ta...> - 2019-04-08 06:22:09
|
Etienne We just released version 3.5.38 of the Wrapper which adds a set of properties to change the group of the files created by the Wrapper. The group can be set globally for all files (with the wrapper.group property), or individually for each file (log file, pid file, anchor file, etc.). Please check the following page for details: https://wrapper.tanukisoftware.com/doc/english/prop-group.html This new version contains several other improvements which are listed in our release notes: https://wrapper.tanukisoftware.com/doc/english/release-notes.html It can be downloaded from the following link: https://wrapper.tanukisoftware.com/doc/english/download.jsp Best Regards, The Java Service Wrapper Team On Wed, Sep 12, 2018 at 9:33 PM Etienne Jouvin <lap...@gm...> wrote: > And finally. > > What I did, because I work on Linux plateform. > Log folder is /var/log/jenkins > > And I execute the following command > chmod g+s /var/log/jenkins > > > Regards > > Etienne Jouvin > > > Le mer. 12 sept. 2018 à 14:10, Etienne Jouvin <lap...@gm...> a > écrit : > >> Ok fine. >> >> I was "just" wondering and I have the answer. >> >> I will manage in other way. >> >> Regards >> >> Etienne >> >> >> Le mer. 12 sept. 2018 à 04:16, Maxime Andrighetto < >> max...@ta...> a écrit : >> >>> Etienne >>> >>> Sorry, I was mistaken. It is possible to change the group if your user >>> is the owner of the file and also belong to the group. >>> However there is currently no configuration properties to change the >>> group for now. This is something that we will add in a future version, >>> similar to the umask properties. >>> >>> In the meantime, I can suggest you the following workaround even though >>> it is not straightforward: >>> >>> When the Wrapper starts and whenever the log file changes, a >>> notification is sent to the JVM. In response, the WrapperManager will raise >>> a Java event which you could subscribe to. >>> For this you need to have a class that implements the >>> WrapperEventListener interface. Basically you need to have a fired() method >>> which receives a WrapperEvent instance, check that this instance is of type >>> WrapperLogFileChangedEvent, and then execute your code to change the group >>> of the log file. >>> >>> If you use the professional edition, you can trigger a User event from >>> the Java code and execute a shell script in response which would update the >>> group of the log file whenever it changes. Alternatively, you could also >>> use timers to regularly ensure that the log file has the correct group and >>> update it if needed. >>> >>> Please let me know if you need further details on one of the above >>> methods. >>> >>> Best Regards, >>> >>> Maxime >>> >>> >>> On Wed, Sep 12, 2018 at 9:16 AM, Maxime Andrighetto < >>> max...@ta...> wrote: >>> >>>> Etienne >>>> >>>> Thank you for your reply. >>>> >>>> Unfortunately it is not possible to change the group of the log file >>>> without having the root/sudo permission. >>>> So this is not possible when the Wrapper is running with the jenkins >>>> user. >>>> >>>> You will have to edit the ownership of your file with a linux command >>>> or manually, using the root user. >>>> >>>> Best Regards, >>>> >>>> Maxime >>>> >>>> On Tue, Sep 11, 2018 at 7:37 PM, Etienne Jouvin < >>>> lap...@gm...> wrote: >>>> >>>>> Hello. >>>>> >>>>> In fact, I am using Wrapper with the projet Jenkins Runner: >>>>> https://github.com/mnadeem/JenkinsRunner >>>>> >>>>> The service is run as a specific user, let's say "jenkins". >>>>> >>>>> As I am using it under "Ubuntu", I wanted to centralize logs as it is >>>>> done. >>>>> So I created a folder /var/log/jenkins, and logs are created with name >>>>> like jenkins.log. >>>>> >>>>> What I wanted, is to have permissions for owner jenkins, and group >>>>> adm, as if I did something like this : >>>>> mkdir /var/log/jenkins >>>>> chown jenkins:adm /var/log/jenkins >>>>> >>>>> But when log files are created, the ownership is something like >>>>> jenkins:jenkins. Group may comes from the default group for user jenkins. >>>>> But I do not want to put user jenkins in group adm by default, because this >>>>> is not an administrator. >>>>> >>>>> So in fact, this is not a matter of changing the owner (my bad for the >>>>> description), but more changing the group. >>>>> >>>>> If not possible, I will find a way to do it with configuration on the >>>>> LInux system. >>>>> >>>>> Regards >>>>> >>>>> Etienne Jouvin >>>>> >>>>> >>>>> >>>>> Le mar. 11 sept. 2018 à 04:17, Maxime <ma...@ta...> a >>>>> écrit : >>>>> >>>>>> Etienne >>>>>> >>>>>> Thank you for your email. >>>>>> >>>>>> Are you running the Wrapper as root? >>>>>> The Wrapper can change the permissions of the log file because it is >>>>>> owner of it, but it cannot change the ownership (this would require running >>>>>> itself as root anyway). >>>>>> The Wrapper creates the log file, writes in it and rolls it if >>>>>> needed, so usually the user of the Wrapper process is also the owner of the >>>>>> log file. >>>>>> For this reason there is currently no property to change the owner of >>>>>> the log file. >>>>>> >>>>>> May I ask the use case in which you need to have the owner of the log >>>>>> file different than the user of the Wrapper? >>>>>> >>>>>> Best Regards, >>>>>> >>>>>> Maxime >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Wrapper-user mailing list >>>>> Wra...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/wrapper-user >>>>> >>>>> >>>> >>>> >>>> -- >>>> Maxime Andrighetto >>>> Tanuki Software Ltd. >>>> 6-18-10-4F Nishi-Kasai, Edogawa-ku >>>> Tokyo 134-0088 Japan >>>> Tel: +81-3-3878-3211 >>>> Fax: +81-3-3878-0313 >>>> http://www.tanukisoftware.com >>>> >>> >>> >>> >>> -- >>> Maxime Andrighetto >>> Tanuki Software Ltd. >>> 6-18-10-4F Nishi-Kasai, Edogawa-ku >>> Tokyo 134-0088 Japan >>> Tel: +81-3-3878-3211 >>> Fax: +81-3-3878-0313 >>> http://www.tanukisoftware.com >>> _______________________________________________ >>> Wrapper-user mailing list >>> Wra...@li... >>> https://lists.sourceforge.net/lists/listinfo/wrapper-user >>> >> _______________________________________________ > Wrapper-user mailing list > Wra...@li... > https://lists.sourceforge.net/lists/listinfo/wrapper-user > |
|
From: Maxime <ma...@ta...> - 2019-04-08 03:30:25
|
Christoph We just release a new version of the Wrapper (3.5.38) which fixes the problem where Wrapper could hang trying to read from the output pipe of a JVM if the fork to launch the JVM process failed. This new version contains several other improvements which are listed in our release notes: https://wrapper.tanukisoftware.com/doc/english/release-notes.html Since it has just been released, it is currently marked as the "Latest Release" on our website, and will be advised for production after we get enough feedback that it is running without problem (generally after 2 weeks). https://wrapper.tanukisoftware.com/doc/english/download.jsp We will be happy to hear any feedback you may have. Best Regards, The Java Service Wrapper Team On Mon, Jan 28, 2019 at 10:33 PM Christoph SCHWAIGER <csc...@am...> wrote: > CONFIDENTIAL & RESTRICTED > > > > Hello Leif, > > > > Yes, the error message in in the wrapper log. Indeed it is only in the two > logs of the resources which caused problems when stopped. I was mistaken > that it is not OOM related - processes were missed to be restarted since > the OOM problem. On my side I added a cleanup script to the cluster setup, > which will kill hanging processes if something goes wrong. > > > > Thanks for your help and the fix! > > > > I admit, running into OOM is by far the biggest problem. > > > > Cheers > > Christoph > > > > *From:* Leif Mortenson [mailto:lei...@ta...] > *Sent:* 28 January 2019 08:23 > *To:* Wrapper User List <wra...@li...> > *Subject:* Re: [Wrapper-user] [EXT] Re: no JVM running (state: > DOWN_CLEAN) on linux after OOM > > > > Christoph > > > > Thank you for the very detailed analysis. > > This is in line with what we found and are testing a fix for. > > > > If we play with ulimit to allow exactly enough processes to be able to > launch the Wrapper, but not the JVM then the Wrapper will fail to fork the > JVM and fall into error code. > > Prior to the fork, the wrapper was opening a set of pipes whose opposite > end are normally used by the child process. > > The problem was that the error code on failed fork was not correctly > closing down those pipes. > > > > Then later on in the main loop, the Wrapper was attempting to read child > output from those pipes. > > The system calls appear to block on those reads even when non-blocking > mode is set in the case that the child has not yet connected. > > The fix is to simply close those pipes in the error code. > > > > This is actually a very old bug. It has not been seen before because most > errors involve a successful fork followed by an error launching the JVM. > > That error code was working. So this problem is very specific to this > exact low resource state. > > > > Never the less this is a fairly critical problem as it could affect anyone > in this situation. > > When it happens though, all the Wrapper can really do is shutdown anyway. > So even when the bug is fixed, there are still going to be resource > problems that must be resolved. > > The Wrapper will of course shutdown cleanly rather than hanging. > > > > The only thing we were not sure about is that in our tests, we were always > seeing a FATAL error "Could not spawn JVM process" in the log file. > > This was not in the log that you sent. Can you confirm whether or not you > are seeing that? > > > > Assuming all tests go well, this will be in the upcoming 3.5.38. > > > > Unfortunately there is not currently a workaround for this other than > making sure that there are enough free processes to launch the Wrapper and > fork. > > > > Cheers, > > Leif > > > > On Fri, Jan 25, 2019 at 11:35 PM Christoph SCHWAIGER < > csc...@am...> wrote: > > CONFIDENTIAL & RESTRICTED > > > > Hello Leif, > > > > Thanks for answering that – could have checked myself. > > > > But that is not why I contact you again. The problem occurred again while > the infrastructure was in a healthy state, I looked a bit deeper. > > > > Background is that the cluster wanted to switch over resources to the > other server and performed local ./scheck.sh (the name of our wrapper > script) stop, but the stop never finished: I had a look to the shell > processes still lingering around. > > > > The status output is as it was last time: > > [scheck@muctxp5b scheck_tcpbatch]$ ./scheck.sh status > > Service check monitoring instance (not installed) is running: PID:9414, > Wrapper:STOPPING, Java:DOWN_CLEAN > > [scheck@muctxp5b scheck_ tcpbatch]$ echo $? > > 0 > > > > Here is the wrapper and the tree of the hanging commands (parent first) > performing the stop: > > [scheck@muctxp5b scheck]$ ps -elf | grep tcpbatch > > 1 S scheck 9411 1 0 80 0 - 29194 pipe_w Jan04 ? 00:05:40 > /opt/scheck/muctxp5j/scheck_tcpbatch/./wrapper > /opt/scheck/muctxp5j/scheck_tcpbatch/conf/wrapper.conf > wrapper.syslog.ident=scheck wrapper.pidfile=/opt/scheck/muctxp5 > > j/scheck_tcpbatch/./scheck.pid wrapper.daemonize=TRUE wrapper.name > <https://clicktime.symantec.com/3RrHqwUUzytgyMHF2iCct2v6H2?u=http%3A%2F%2Fwrapper.name>=scheck > wrapper.displayname=Service check monitoring instance > wrapper.statusfile=/opt/scheck/muctxp5j/scheck_tcpbatch/./scheck.status > wrapper.java.statusfile=/opt/scheck/muctxp5j/scheck_tcpbatch/./scheck.java.status > wrapper.script.version=3.5.30 > > 0 S scheck 13900 40818 0 80 0 - 25832 pipe_w 14:41 pts/0 00:00:00 > grep tcpbatch > > 4 S scheck 23388 1 0 80 0 - 26529 do_wai Jan23 ? 00:00:00 > bash -c USER=scheck; export USER; LOGNAME=sch > > eck; export LOGNAME; HOME=/home/scheck; export HOME; > /opt/scheck/resources.sh muctxp5j scheck_tcpbatch stop > > 0 S scheck 23394 23388 0 80 0 - 26529 do_wai Jan23 ? 00:00:00 > /bin/bash /opt/scheck/resources.sh muctxp5j scheck_tcpbatch stop > > 0 S scheck 23398 23394 0 80 0 - 26758 do_wai Jan23 ? 00:02:31 > /bin/sh /opt/scheck/muctxp5j/scheck_tcpbatch/scheck.sh stop > > > > I attached to the scheck.sh script and see it looping, every second > performing some syscalls, which I guess is this function: > > waitforwrapperstop() { > > getpid > > while [ "X$pid" != "X" ] ; do > > sleep 1 > > getpid > > done > > } > > > > So I had a look to the wrapper process 9411, which did not want to vanish, > with strace: > > Process 9411 attached > > 15:31:10 read(5, > > > > And see file descriptor 5 is pipe # 813574553: > > scheck@muctxp5b 9411]$ ls -lr fd > > total 0 > > l-wx------ 1 scheck scheck 64 Jan 7 13:25 6 -> pipe:[813574553] (this > is shown in red color - assume since pipe is broken) > > lr-x------ 1 scheck scheck 64 Jan 4 12:48 5 -> pipe:[813574553] (this > is shown in red color - assume since pipe is broken) > > lrwx------ 1 scheck scheck 64 Jan 4 12:48 4 -> > /opt/scheck/muctxp5j/scheck_tcpbatch/log/wrapper_donotmonitor.log > > lrwx------ 1 scheck scheck 64 Jan 4 12:49 3 -> socket:[813555625] (this > is shown in red color - assume since pipe is broken) > > lrwx------ 1 scheck scheck 64 Jan 4 12:48 2 -> /dev/null > > lrwx------ 1 scheck scheck 64 Jan 4 12:48 1 -> /dev/null > > lrwx------ 1 scheck scheck 64 Jan 4 12:48 0 -> /dev/null > > > > concerning stackoverflow, a grep of the # from lsof should list both sides > of the pipe, but it shows only read/write of one side. Unless I > misunderstood. > > [scheck@muctxp5b 9411]$ lsof | grep 813574553 > > wrapper 9411 scheck 5r FIFO 0,8 0t0 > 813574553 pipe > > wrapper 9411 scheck 6w FIFO 0,8 0t0 > 813574553 pipe > > > > Is the process on other side of the pipe the JVM? Which was, as the status > output indicates, down already. Then for some reason the wrapper process > was still waiting in a read to the pipe to a stopped process – and maybe > because of this remained up. > > > > I didn’t try further, since I have no debugger on the system and no source. > > > > When I detected the problem, it was about 10hours after the stop attempt. > > > > As last time, scheck.sh top immediately came back and stopped with wrapper > process! > > > > Hopefully this helps investigating. > > > > Cheers > > Christoph > > *From:* Leif Mortenson [mailto:lei...@ta...] > *Sent:* 10 January 2019 17:27 > *To:* Wrapper User List <wra...@li...> > *Subject:* Re: [Wrapper-user] [EXT] Re: no JVM running (state: > DOWN_CLEAN) on linux after OOM > > > > Christoph > > > > Yes, the following configuration is what causes the Wrapper do restart > based on the text in the console output: > > --- > > wrapper.filter.trigger.1001=java.lang.OutOfMemoryError > > wrapper.filter.action.1001=RESTART > > wrapper.filter.message.1001=The JVM has run out of memory. > > --- > > > > It sounds like you are on the right track. We will see if we can > reproduce something here as well. > > > > Cheers, > > Leif > > > > > > On Wed, Jan 9, 2019 at 10:16 PM Christoph SCHWAIGER < > csc...@am...> wrote: > > CONFIDENTIAL & RESTRICTED > > > > > > Hello Leif, > > > > Those were the very last entries in the wrapper log: > > ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out > waiting for signal from JVM. > > ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request, > termination requested. > > STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL > (9). > > STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone. > > STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested > to terminate. > > STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper > configuration... > > STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM... > > Time of last entry was the timestamp of java.status file. I had noticed > that one day later. > > > > I didn’t notice this log entry before: > > STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory. > Restarting JVM. > > …to me it sounds like the wrapper was using the exception (in first email > it the complete log section) to consider JVM should best be restarted due > to running out of memory. Which, in this situation was not the case. I > don’t know why for the JVM heap depletion is the culprit by default when a > thread cannot be spawned. Definitely there are other reasons for failure. > > > > The user was limited to 1k processes max, but for what I read, threads are > counted. In stackoverflow I found this command to count the # of threads > for a user: > > > > ps -eo euser,nlwp | grep scheck | awk '{print $2}' | awk '{ num_threads > += $1 } END { print num_threads }' > > > > Currently it shows 7579 - the JVMs are heavily multithreaded. > > > > Unfortunately I don’t even got a test box to simulate this. It gets time > to get one. > > > > Cheers, > > Christoph > > *From:* Leif Mortenson [mailto:lei...@ta...] > *Sent:* 09 January 2019 10:50 > *To:* Wrapper User List <wra...@li...> > *Subject:* Re: [Wrapper-user] [EXT] Re: no JVM running (state: > DOWN_CLEAN) on linux after OOM > > > > Christoph > > Ok. So you are using a newer version of the Wrapper, so ignore the issue > I mentioned about failing to kill the JVM. That was an old problem. > > > > Please send the debug output if you get it again. > > > > We will play around with the ulimits here as well and make sure the > Wrapper behaves correctly. > > > > I am maybe not understanding the exact problem. > > After you get the OOM and the wrapper tries to restart, is the Wrapper > just failing to start the next JVM and exiting? Or is it getting stuck. > > The later would be bad, and something we will want to get to the bottom of. > > > > It does not sound like this is easily reproduceable. But so, then the > following will output detailed information about the state. It is a LOT of > output though so not realistic unless you are testing. > > wrapper.state_output=TRUE > > > > Cheers, > > Leif > > > > On Wed, Jan 9, 2019 at 6:11 PM Christoph SCHWAIGER <csc...@am...> > wrote: > > CONFIDENTIAL & RESTRICTED > > > > Hello Leif, > > > > Thanks for your response. > > > > Easy one first, the version we use: > > [scheck@muctxp5b scheck_unix4]$ ./wrapper --version > > Java Service Wrapper Community Edition 64-bit 3.5.30 > > Copyright (C) 1999-2016 Tanuki Software, Ltd. All Rights Reserved. > > http://wrapper.tanukisoftware.com > <https://clicktime.symantec.com/32PRRMdpdoTcCTFhKkEQTZ96H2?u=http%3A%2F%2Fwrapper.tanukisoftware.com> > > > > concerning the forced kill, I think I have seen once on another instance > and time in the wrapper log something like “..JVM received sigkill (9)..”. > > > > In the case I looked at, the JVM process owned by the wrapper was gone, > which suits the DOWN_CLEAN as you explained. > > > > I’ll turn on debug output on a few of them in case it happens again. > > > > As I interpret it, the configuration as such is OK, as well as the normal > behaviour: when I i.e. kill the JVM manually, the wrapper brings it back > online. And due to the OOM situation – more precisely, wrapper and JVM were > limited by 1024 processes max in ulimits – the wrapper was not able i.e. to > fork a command and that could explain why recovery stalled. Likely is that > other wrapper / JVM tandems on the same machine (20-30 tandems) faced the > same trouble and tried to recover, which would mean sometimes the ceiling > was reached, sometimes not (i.e. when yet another jvm with many threads was > killed or die). Does this makes sense to you? > > > > Should I look into updating my script to interpret the output of “app.sh > status” concerning certain Java:__ states and kill the wrapper ? > > (in such a case the veritas cluster would consider the resource being > offline and start the wrapper again). > > If that is a good idea depends on the amount of states to consider and for > how long such a state can be tolerated. Maybe it is paranoid, since our > box is very big, we should be fine concerning OOM unless we screw up > settings again. We’re newbies on Linux, used windows for years. > > > > Cheers, > > Christoph > > > > *From:* Leif Mortenson [mailto:lei...@ta...] > *Sent:* 09 January 2019 03:10 > *To:* Wrapper User List <wra...@li...> > *Subject:* [EXT] Re: [Wrapper-user] no JVM running (state: DOWN_CLEAN) on > linux after OOM > > > > Christoph > > > > 1) Could you please send me the wrapper.log file with debug output enabled > (wrapper.debug=true) that shows what is happening when the Wrapper is > failing to restart the JVM? > > Please include the part of the log showing the last few moments of the JVM > that runs out of memory as well. > > > > 2) What version of the Wrapper are you running? > > The following issue was fixed in 3.5.16 and sounds like it might be what > you are seeing. > > https://wrapper.tanukisoftware.com/doc/english/release-notes.html#3.5.16 > <https://clicktime.symantec.com/a/1/s_mZYlanJcqYJWQ55URpsksoMfAB69FuqpCaCaHcFZI=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Frelease-notes.html%233.5.16> > > --- > > Fix a problem where a JVM process was not stopped completely on a UNIX > platform and stayed defunct after a forced kill until the Wrapper process > itself stopped. This was especially noticeable if the JVM is frozen and the > JVM is being killed forcibly. > > --- > > Are you seeing a zombie Java process still running? > > This bug meant that the JVM was being left around in the background when > the Wrapper thought it was gone. > > If you are out of memory then the next JVM would not have enough memory to > launch. > > If the first JVM is not actually frozen, it would shut itself down after > losing its backend connection to the Wrapper. But that might be happening > too late and result in what you are seeing. > > > > 3) The DOWN_CLEAN state means that the Wrapper has completely shutdown the > JVM and cleaned up any associated resources. > > We will take a look at the documentation on the following page as you are > correct that it is missing some information. > > https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html > <https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html> > > > > Cheers, > > Leif > > > > On Tue, Jan 8, 2019 at 8:32 PM Christoph SCHWAIGER <csc...@am...> > wrote: > > CONFIDENTIAL & RESTRICTED > > > > Hello Leif, > > > > Thanks for the information about the subscription. I did so. > > > > We have been using the wrapper on windows for many years, since a couple > of years we have a standard support version. > > > > Our problem is on linux RH. *After an out of memory situation (the jvm > exited) it is not restarted and remains down indefinitely, the status > script exits with status zero*, so all looks up for the cluster. > (integrated into veritas cluster). The OOM was bad: not related to JVM, but > caused by overly optimistic ulimits of the user - that has been corrected. > > > > STATUS | wrapper | 2019/01/07 13:38:41 | Launching a JVM... > > INFO | jvm 1 | 2019/01/07 13:38:43 | WrapperManager: Initializing... > > INFO | jvm 1 | 2019/01/07 13:38:45 | S-Check version 3.0.4 Monte Rosa > from 12-Sep-2018 08:02 by cschwaiger > > INFO | jvm 1 | 2019/01/07 13:38:45 | Scheck is starting on server > MUCTXP5B > > INFO | jvm 1 | 2019/01/07 13:38:52 | parsed 1 xml files and created 0 > service records. > > INFO | jvm 1 | 2019/01/07 15:02:11 | Exception in thread > "InactivityMonitor WriteCheck" java.lang.OutOfMemoryError: unable to create > new native thread > > STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory. > Restarting JVM. > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.lang.Thread.start0(Native Method) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.lang.Thread.start(Thread.java:717) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > org.apache.activemq.transport.InactivityMonitor.writeCheck(InactivityMonitor.java:147) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > org.apache.activemq.transport.InactivityMonitor$2.run(InactivityMonitor.java:113) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.util.TimerThread.mainLoop(Timer.java:555) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.util.TimerThread.run(Timer.java:505) > > ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out > waiting for signal from JVM. > > ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request, > termination requested. > > STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL > (9). > > STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone. > > STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested > to terminate. > > STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper > configuration... > > STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM... > > > > [scheck@muctxp5b scheck_unix11]$ ./scheck.sh status > > *Service check monitoring instance (not installed) is running: PID:56766, > Wrapper:STARTED, Java:DOWN_CLEAN* > > > > I could not find the DOWN_CLEAN state documented – looked at: > https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html > <https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html> > > > > ”scheck.sh stop” fails – indefinitely waits for wrapper to stop. A simple > kill <pid> terminates it. > > > > Any recommendations – i.e. measures to avoid hanging in the “looks good = > status zero, but down” state? > > > > Below/attached is the information about os version and configuration. > > > > Thanks in advance, > > Christoph > > > > Linux muctxp5b 2.6.32-754.3.5.el6.x86_64 #1 SMP Thu Aug 9 11:56:22 EDT > 2018 x86_64 x86_64 x86_64 GNU/Linux > > _______________________________________________ > Wrapper-user mailing list > Wra...@li... > https://lists.sourceforge.net/lists/listinfo/wrapper-user > |
|
From: Maxime <ma...@ta...> - 2019-04-08 02:45:36
|
Hello everyone, We are proud to announce the release of version 3.5.38 of the Java Service Wrapper. http://wrapper.tanukisoftware.org/doc/english/download.jsp This version includes several bug fixes and improvements, and new distributions for the 64-bit ARMHF platforms (aarch64)! You can review the release notes for a full list of changes. http://wrapper.tanukisoftware.org/doc/english/release-notes.html Please let us know if you have any questions about the release. Sincerely, Java Service Wrapper Team Tanuki Software, Ltd. |
|
From: Christoph S. <csc...@am...> - 2019-01-28 13:33:10
|
CONFIDENTIAL & RESTRICTED Hello Leif, Yes, the error message in in the wrapper log. Indeed it is only in the two logs of the resources which caused problems when stopped. I was mistaken that it is not OOM related - processes were missed to be restarted since the OOM problem. On my side I added a cleanup script to the cluster setup, which will kill hanging processes if something goes wrong. Thanks for your help and the fix! I admit, running into OOM is by far the biggest problem. Cheers Christoph From: Leif Mortenson [mailto:lei...@ta...] Sent: 28 January 2019 08:23 To: Wrapper User List <wra...@li...> Subject: Re: [Wrapper-user] [EXT] Re: no JVM running (state: DOWN_CLEAN) on linux after OOM Christoph Thank you for the very detailed analysis. This is in line with what we found and are testing a fix for. If we play with ulimit to allow exactly enough processes to be able to launch the Wrapper, but not the JVM then the Wrapper will fail to fork the JVM and fall into error code. Prior to the fork, the wrapper was opening a set of pipes whose opposite end are normally used by the child process. The problem was that the error code on failed fork was not correctly closing down those pipes. Then later on in the main loop, the Wrapper was attempting to read child output from those pipes. The system calls appear to block on those reads even when non-blocking mode is set in the case that the child has not yet connected. The fix is to simply close those pipes in the error code. This is actually a very old bug. It has not been seen before because most errors involve a successful fork followed by an error launching the JVM. That error code was working. So this problem is very specific to this exact low resource state. Never the less this is a fairly critical problem as it could affect anyone in this situation. When it happens though, all the Wrapper can really do is shutdown anyway. So even when the bug is fixed, there are still going to be resource problems that must be resolved. The Wrapper will of course shutdown cleanly rather than hanging. The only thing we were not sure about is that in our tests, we were always seeing a FATAL error "Could not spawn JVM process" in the log file. This was not in the log that you sent. Can you confirm whether or not you are seeing that? Assuming all tests go well, this will be in the upcoming 3.5.38. Unfortunately there is not currently a workaround for this other than making sure that there are enough free processes to launch the Wrapper and fork. Cheers, Leif On Fri, Jan 25, 2019 at 11:35 PM Christoph SCHWAIGER <csc...@am...<mailto:csc...@am...>> wrote: CONFIDENTIAL & RESTRICTED Hello Leif, Thanks for answering that - could have checked myself. But that is not why I contact you again. The problem occurred again while the infrastructure was in a healthy state, I looked a bit deeper. Background is that the cluster wanted to switch over resources to the other server and performed local ./scheck.sh (the name of our wrapper script) stop, but the stop never finished: I had a look to the shell processes still lingering around. The status output is as it was last time: [scheck@muctxp5b scheck_tcpbatch]$ ./scheck.sh status Service check monitoring instance (not installed) is running: PID:9414, Wrapper:STOPPING, Java:DOWN_CLEAN [scheck@muctxp5b scheck_ tcpbatch]$ echo $? 0 Here is the wrapper and the tree of the hanging commands (parent first) performing the stop: [scheck@muctxp5b scheck]$ ps -elf | grep tcpbatch 1 S scheck 9411 1 0 80 0 - 29194 pipe_w Jan04 ? 00:05:40 /opt/scheck/muctxp5j/scheck_tcpbatch/./wrapper /opt/scheck/muctxp5j/scheck_tcpbatch/conf/wrapper.conf wrapper.syslog.ident=scheck wrapper.pidfile=/opt/scheck/muctxp5 j/scheck_tcpbatch/./scheck.pid wrapper.daemonize=TRUE wrapper.name<https://clicktime.symantec.com/3RrHqwUUzytgyMHF2iCct2v6H2?u=http%3A%2F%2Fwrapper.name>=scheck wrapper.displayname=Service check monitoring instance wrapper.statusfile=/opt/scheck/muctxp5j/scheck_tcpbatch/./scheck.status wrapper.java.statusfile=/opt/scheck/muctxp5j/scheck_tcpbatch/./scheck.java.status wrapper.script.version=3.5.30 0 S scheck 13900 40818 0 80 0 - 25832 pipe_w 14:41 pts/0 00:00:00 grep tcpbatch 4 S scheck 23388 1 0 80 0 - 26529 do_wai Jan23 ? 00:00:00 bash -c USER=scheck; export USER; LOGNAME=sch eck; export LOGNAME; HOME=/home/scheck; export HOME; /opt/scheck/resources.sh muctxp5j scheck_tcpbatch stop 0 S scheck 23394 23388 0 80 0 - 26529 do_wai Jan23 ? 00:00:00 /bin/bash /opt/scheck/resources.sh muctxp5j scheck_tcpbatch stop 0 S scheck 23398 23394 0 80 0 - 26758 do_wai Jan23 ? 00:02:31 /bin/sh /opt/scheck/muctxp5j/scheck_tcpbatch/scheck.sh stop I attached to the scheck.sh script and see it looping, every second performing some syscalls, which I guess is this function: waitforwrapperstop() { getpid while [ "X$pid" != "X" ] ; do sleep 1 getpid done } So I had a look to the wrapper process 9411, which did not want to vanish, with strace: Process 9411 attached 15:31:10 read(5, And see file descriptor 5 is pipe # 813574553: scheck@muctxp5b 9411]$ ls -lr fd total 0 l-wx------ 1 scheck scheck 64 Jan 7 13:25 6 -> pipe:[813574553] (this is shown in red color - assume since pipe is broken) lr-x------ 1 scheck scheck 64 Jan 4 12:48 5 -> pipe:[813574553] (this is shown in red color - assume since pipe is broken) lrwx------ 1 scheck scheck 64 Jan 4 12:48 4 -> /opt/scheck/muctxp5j/scheck_tcpbatch/log/wrapper_donotmonitor.log lrwx------ 1 scheck scheck 64 Jan 4 12:49 3 -> socket:[813555625] (this is shown in red color - assume since pipe is broken) lrwx------ 1 scheck scheck 64 Jan 4 12:48 2 -> /dev/null lrwx------ 1 scheck scheck 64 Jan 4 12:48 1 -> /dev/null lrwx------ 1 scheck scheck 64 Jan 4 12:48 0 -> /dev/null concerning stackoverflow, a grep of the # from lsof should list both sides of the pipe, but it shows only read/write of one side. Unless I misunderstood. [scheck@muctxp5b 9411]$ lsof | grep 813574553 wrapper 9411 scheck 5r FIFO 0,8 0t0 813574553 pipe wrapper 9411 scheck 6w FIFO 0,8 0t0 813574553 pipe Is the process on other side of the pipe the JVM? Which was, as the status output indicates, down already. Then for some reason the wrapper process was still waiting in a read to the pipe to a stopped process - and maybe because of this remained up. I didn't try further, since I have no debugger on the system and no source. When I detected the problem, it was about 10hours after the stop attempt. As last time, scheck.sh top immediately came back and stopped with wrapper process! Hopefully this helps investigating. Cheers Christoph From: Leif Mortenson [mailto:lei...@ta...<mailto:lei...@ta...>] Sent: 10 January 2019 17:27 To: Wrapper User List <wra...@li...<mailto:wra...@li...>> Subject: Re: [Wrapper-user] [EXT] Re: no JVM running (state: DOWN_CLEAN) on linux after OOM Christoph Yes, the following configuration is what causes the Wrapper do restart based on the text in the console output: --- wrapper.filter.trigger.1001=java.lang.OutOfMemoryError wrapper.filter.action.1001=RESTART wrapper.filter.message.1001=The JVM has run out of memory. --- It sounds like you are on the right track. We will see if we can reproduce something here as well. Cheers, Leif On Wed, Jan 9, 2019 at 10:16 PM Christoph SCHWAIGER <csc...@am...<mailto:csc...@am...>> wrote: CONFIDENTIAL & RESTRICTED Hello Leif, Those were the very last entries in the wrapper log: ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out waiting for signal from JVM. ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request, termination requested. STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL (9). STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone. STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested to terminate. STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper configuration... STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM... Time of last entry was the timestamp of java.status file. I had noticed that one day later. I didn't notice this log entry before: STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory. Restarting JVM. ...to me it sounds like the wrapper was using the exception (in first email it the complete log section) to consider JVM should best be restarted due to running out of memory. Which, in this situation was not the case. I don't know why for the JVM heap depletion is the culprit by default when a thread cannot be spawned. Definitely there are other reasons for failure. The user was limited to 1k processes max, but for what I read, threads are counted. In stackoverflow I found this command to count the # of threads for a user: ps -eo euser,nlwp | grep scheck | awk '{print $2}' | awk '{ num_threads += $1 } END { print num_threads }' Currently it shows 7579 - the JVMs are heavily multithreaded. Unfortunately I don't even got a test box to simulate this. It gets time to get one. Cheers, Christoph From: Leif Mortenson [mailto:lei...@ta...<mailto:lei...@ta...>] Sent: 09 January 2019 10:50 To: Wrapper User List <wra...@li...<mailto:wra...@li...>> Subject: Re: [Wrapper-user] [EXT] Re: no JVM running (state: DOWN_CLEAN) on linux after OOM Christoph Ok. So you are using a newer version of the Wrapper, so ignore the issue I mentioned about failing to kill the JVM. That was an old problem. Please send the debug output if you get it again. We will play around with the ulimits here as well and make sure the Wrapper behaves correctly. I am maybe not understanding the exact problem. After you get the OOM and the wrapper tries to restart, is the Wrapper just failing to start the next JVM and exiting? Or is it getting stuck. The later would be bad, and something we will want to get to the bottom of. It does not sound like this is easily reproduceable. But so, then the following will output detailed information about the state. It is a LOT of output though so not realistic unless you are testing. wrapper.state_output=TRUE Cheers, Leif On Wed, Jan 9, 2019 at 6:11 PM Christoph SCHWAIGER <csc...@am...<mailto:csc...@am...>> wrote: CONFIDENTIAL & RESTRICTED Hello Leif, Thanks for your response. Easy one first, the version we use: [scheck@muctxp5b scheck_unix4]$ ./wrapper --version Java Service Wrapper Community Edition 64-bit 3.5.30 Copyright (C) 1999-2016 Tanuki Software, Ltd. All Rights Reserved. http://wrapper.tanukisoftware.com<https://clicktime.symantec.com/32PRRMdpdoTcCTFhKkEQTZ96H2?u=http%3A%2F%2Fwrapper.tanukisoftware.com> concerning the forced kill, I think I have seen once on another instance and time in the wrapper log something like "..JVM received sigkill (9)..". In the case I looked at, the JVM process owned by the wrapper was gone, which suits the DOWN_CLEAN as you explained. I'll turn on debug output on a few of them in case it happens again. As I interpret it, the configuration as such is OK, as well as the normal behaviour: when I i.e. kill the JVM manually, the wrapper brings it back online. And due to the OOM situation - more precisely, wrapper and JVM were limited by 1024 processes max in ulimits - the wrapper was not able i.e. to fork a command and that could explain why recovery stalled. Likely is that other wrapper / JVM tandems on the same machine (20-30 tandems) faced the same trouble and tried to recover, which would mean sometimes the ceiling was reached, sometimes not (i.e. when yet another jvm with many threads was killed or die). Does this makes sense to you? Should I look into updating my script to interpret the output of "app.sh status" concerning certain Java:__ states and kill the wrapper ? (in such a case the veritas cluster would consider the resource being offline and start the wrapper again). If that is a good idea depends on the amount of states to consider and for how long such a state can be tolerated. Maybe it is paranoid, since our box is very big, we should be fine concerning OOM unless we screw up settings again. We're newbies on Linux, used windows for years. Cheers, Christoph From: Leif Mortenson [mailto:lei...@ta...<mailto:lei...@ta...>] Sent: 09 January 2019 03:10 To: Wrapper User List <wra...@li...<mailto:wra...@li...>> Subject: [EXT] Re: [Wrapper-user] no JVM running (state: DOWN_CLEAN) on linux after OOM Christoph 1) Could you please send me the wrapper.log file with debug output enabled (wrapper.debug=true) that shows what is happening when the Wrapper is failing to restart the JVM? Please include the part of the log showing the last few moments of the JVM that runs out of memory as well. 2) What version of the Wrapper are you running? The following issue was fixed in 3.5.16 and sounds like it might be what you are seeing. https://wrapper.tanukisoftware.com/doc/english/release-notes.html#3.5.16<https://clicktime.symantec.com/a/1/s_mZYlanJcqYJWQ55URpsksoMfAB69FuqpCaCaHcFZI=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Frelease-notes.html%233.5.16> --- Fix a problem where a JVM process was not stopped completely on a UNIX platform and stayed defunct after a forced kill until the Wrapper process itself stopped. This was especially noticeable if the JVM is frozen and the JVM is being killed forcibly. --- Are you seeing a zombie Java process still running? This bug meant that the JVM was being left around in the background when the Wrapper thought it was gone. If you are out of memory then the next JVM would not have enough memory to launch. If the first JVM is not actually frozen, it would shut itself down after losing its backend connection to the Wrapper. But that might be happening too late and result in what you are seeing. 3) The DOWN_CLEAN state means that the Wrapper has completely shutdown the JVM and cleaned up any associated resources. We will take a look at the documentation on the following page as you are correct that it is missing some information. https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html<https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html> Cheers, Leif On Tue, Jan 8, 2019 at 8:32 PM Christoph SCHWAIGER <csc...@am...<mailto:csc...@am...>> wrote: CONFIDENTIAL & RESTRICTED Hello Leif, Thanks for the information about the subscription. I did so. We have been using the wrapper on windows for many years, since a couple of years we have a standard support version. Our problem is on linux RH. After an out of memory situation (the jvm exited) it is not restarted and remains down indefinitely, the status script exits with status zero, so all looks up for the cluster. (integrated into veritas cluster). The OOM was bad: not related to JVM, but caused by overly optimistic ulimits of the user - that has been corrected. STATUS | wrapper | 2019/01/07 13:38:41 | Launching a JVM... INFO | jvm 1 | 2019/01/07 13:38:43 | WrapperManager: Initializing... INFO | jvm 1 | 2019/01/07 13:38:45 | S-Check version 3.0.4 Monte Rosa from 12-Sep-2018 08:02 by cschwaiger INFO | jvm 1 | 2019/01/07 13:38:45 | Scheck is starting on server MUCTXP5B INFO | jvm 1 | 2019/01/07 13:38:52 | parsed 1 xml files and created 0 service records. INFO | jvm 1 | 2019/01/07 15:02:11 | Exception in thread "InactivityMonitor WriteCheck" java.lang.OutOfMemoryError: unable to create new native thread STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory. Restarting JVM. INFO | jvm 1 | 2019/01/07 15:02:11 | at java.lang.Thread.start0(Native Method) INFO | jvm 1 | 2019/01/07 15:02:11 | at java.lang.Thread.start(Thread.java:717) INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.transport.InactivityMonitor.writeCheck(InactivityMonitor.java:147) INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.transport.InactivityMonitor$2.run(InactivityMonitor.java:113) INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33) INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.TimerThread.mainLoop(Timer.java:555) INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.TimerThread.run(Timer.java:505) ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out waiting for signal from JVM. ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request, termination requested. STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL (9). STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone. STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested to terminate. STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper configuration... STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM... [scheck@muctxp5b scheck_unix11]$ ./scheck.sh status Service check monitoring instance (not installed) is running: PID:56766, Wrapper:STARTED, Java:DOWN_CLEAN I could not find the DOWN_CLEAN state documented - looked at: https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html<https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html> "scheck.sh stop" fails - indefinitely waits for wrapper to stop. A simple kill <pid> terminates it. Any recommendations - i.e. measures to avoid hanging in the "looks good = status zero, but down" state? Below/attached is the information about os version and configuration. Thanks in advance, Christoph Linux muctxp5b 2.6.32-754.3.5.el6.x86_64 #1 SMP Thu Aug 9 11:56:22 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux |
|
From: Leif M. <lei...@ta...> - 2019-01-28 07:53:59
|
Christoph
Thank you for the very detailed analysis.
This is in line with what we found and are testing a fix for.
If we play with ulimit to allow exactly enough processes to be able to
launch the Wrapper, but not the JVM then the Wrapper will fail to fork the
JVM and fall into error code.
Prior to the fork, the wrapper was opening a set of pipes whose opposite
end are normally used by the child process.
The problem was that the error code on failed fork was not correctly
closing down those pipes.
Then later on in the main loop, the Wrapper was attempting to read child
output from those pipes.
The system calls appear to block on those reads even when non-blocking mode
is set in the case that the child has not yet connected.
The fix is to simply close those pipes in the error code.
This is actually a very old bug. It has not been seen before because most
errors involve a successful fork followed by an error launching the JVM.
That error code was working. So this problem is very specific to this
exact low resource state.
Never the less this is a fairly critical problem as it could affect anyone
in this situation.
When it happens though, all the Wrapper can really do is shutdown anyway.
So even when the bug is fixed, there are still going to be resource
problems that must be resolved.
The Wrapper will of course shutdown cleanly rather than hanging.
The only thing we were not sure about is that in our tests, we were always
seeing a FATAL error "Could not spawn JVM process" in the log file.
This was not in the log that you sent. Can you confirm whether or not you
are seeing that?
Assuming all tests go well, this will be in the upcoming 3.5.38.
Unfortunately there is not currently a workaround for this other than
making sure that there are enough free processes to launch the Wrapper and
fork.
Cheers,
Leif
On Fri, Jan 25, 2019 at 11:35 PM Christoph SCHWAIGER <csc...@am...>
wrote:
> CONFIDENTIAL & RESTRICTED
>
>
>
> Hello Leif,
>
>
>
> Thanks for answering that – could have checked myself.
>
>
>
> But that is not why I contact you again. The problem occurred again while
> the infrastructure was in a healthy state, I looked a bit deeper.
>
>
>
> Background is that the cluster wanted to switch over resources to the
> other server and performed local ./scheck.sh (the name of our wrapper
> script) stop, but the stop never finished: I had a look to the shell
> processes still lingering around.
>
>
>
> The status output is as it was last time:
>
> [scheck@muctxp5b scheck_tcpbatch]$ ./scheck.sh status
>
> Service check monitoring instance (not installed) is running: PID:9414,
> Wrapper:STOPPING, Java:DOWN_CLEAN
>
> [scheck@muctxp5b scheck_ tcpbatch]$ echo $?
>
> 0
>
>
>
> Here is the wrapper and the tree of the hanging commands (parent first)
> performing the stop:
>
> [scheck@muctxp5b scheck]$ ps -elf | grep tcpbatch
>
> 1 S scheck 9411 1 0 80 0 - 29194 pipe_w Jan04 ? 00:05:40
> /opt/scheck/muctxp5j/scheck_tcpbatch/./wrapper
> /opt/scheck/muctxp5j/scheck_tcpbatch/conf/wrapper.conf
> wrapper.syslog.ident=scheck wrapper.pidfile=/opt/scheck/muctxp5
>
> j/scheck_tcpbatch/./scheck.pid wrapper.daemonize=TRUE wrapper.name=scheck
> wrapper.displayname=Service check monitoring instance
> wrapper.statusfile=/opt/scheck/muctxp5j/scheck_tcpbatch/./scheck.status
> wrapper.java.statusfile=/opt/scheck/muctxp5j/scheck_tcpbatch/./scheck.java.status
> wrapper.script.version=3.5.30
>
> 0 S scheck 13900 40818 0 80 0 - 25832 pipe_w 14:41 pts/0 00:00:00
> grep tcpbatch
>
> 4 S scheck 23388 1 0 80 0 - 26529 do_wai Jan23 ? 00:00:00
> bash -c USER=scheck; export USER; LOGNAME=sch
>
> eck; export LOGNAME; HOME=/home/scheck; export HOME;
> /opt/scheck/resources.sh muctxp5j scheck_tcpbatch stop
>
> 0 S scheck 23394 23388 0 80 0 - 26529 do_wai Jan23 ? 00:00:00
> /bin/bash /opt/scheck/resources.sh muctxp5j scheck_tcpbatch stop
>
> 0 S scheck 23398 23394 0 80 0 - 26758 do_wai Jan23 ? 00:02:31
> /bin/sh /opt/scheck/muctxp5j/scheck_tcpbatch/scheck.sh stop
>
>
>
> I attached to the scheck.sh script and see it looping, every second
> performing some syscalls, which I guess is this function:
>
> waitforwrapperstop() {
>
> getpid
>
> while [ "X$pid" != "X" ] ; do
>
> sleep 1
>
> getpid
>
> done
>
> }
>
>
>
> So I had a look to the wrapper process 9411, which did not want to vanish,
> with strace:
>
> Process 9411 attached
>
> 15:31:10 read(5,
>
>
>
> And see file descriptor 5 is pipe # 813574553:
>
> scheck@muctxp5b 9411]$ ls -lr fd
>
> total 0
>
> l-wx------ 1 scheck scheck 64 Jan 7 13:25 6 -> pipe:[813574553] (this
> is shown in red color - assume since pipe is broken)
>
> lr-x------ 1 scheck scheck 64 Jan 4 12:48 5 -> pipe:[813574553] (this
> is shown in red color - assume since pipe is broken)
>
> lrwx------ 1 scheck scheck 64 Jan 4 12:48 4 ->
> /opt/scheck/muctxp5j/scheck_tcpbatch/log/wrapper_donotmonitor.log
>
> lrwx------ 1 scheck scheck 64 Jan 4 12:49 3 -> socket:[813555625] (this
> is shown in red color - assume since pipe is broken)
>
> lrwx------ 1 scheck scheck 64 Jan 4 12:48 2 -> /dev/null
>
> lrwx------ 1 scheck scheck 64 Jan 4 12:48 1 -> /dev/null
>
> lrwx------ 1 scheck scheck 64 Jan 4 12:48 0 -> /dev/null
>
>
>
> concerning stackoverflow, a grep of the # from lsof should list both sides
> of the pipe, but it shows only read/write of one side. Unless I
> misunderstood.
>
> [scheck@muctxp5b 9411]$ lsof | grep 813574553
>
> wrapper 9411 scheck 5r FIFO 0,8 0t0
> 813574553 pipe
>
> wrapper 9411 scheck 6w FIFO 0,8 0t0
> 813574553 pipe
>
>
>
> Is the process on other side of the pipe the JVM? Which was, as the status
> output indicates, down already. Then for some reason the wrapper process
> was still waiting in a read to the pipe to a stopped process – and maybe
> because of this remained up.
>
>
>
> I didn’t try further, since I have no debugger on the system and no source.
>
>
>
> When I detected the problem, it was about 10hours after the stop attempt.
>
>
>
> As last time, scheck.sh top immediately came back and stopped with wrapper
> process!
>
>
>
> Hopefully this helps investigating.
>
>
>
> Cheers
>
> Christoph
>
> *From:* Leif Mortenson [mailto:lei...@ta...]
> *Sent:* 10 January 2019 17:27
> *To:* Wrapper User List <wra...@li...>
> *Subject:* Re: [Wrapper-user] [EXT] Re: no JVM running (state:
> DOWN_CLEAN) on linux after OOM
>
>
>
> Christoph
>
>
>
> Yes, the following configuration is what causes the Wrapper do restart
> based on the text in the console output:
>
> ---
>
> wrapper.filter.trigger.1001=java.lang.OutOfMemoryError
>
> wrapper.filter.action.1001=RESTART
>
> wrapper.filter.message.1001=The JVM has run out of memory.
>
> ---
>
>
>
> It sounds like you are on the right track. We will see if we can
> reproduce something here as well.
>
>
>
> Cheers,
>
> Leif
>
>
>
>
>
> On Wed, Jan 9, 2019 at 10:16 PM Christoph SCHWAIGER <
> csc...@am...> wrote:
>
> CONFIDENTIAL & RESTRICTED
>
>
>
>
>
> Hello Leif,
>
>
>
> Those were the very last entries in the wrapper log:
>
> ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out
> waiting for signal from JVM.
>
> ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request,
> termination requested.
>
> STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL
> (9).
>
> STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone.
>
> STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested
> to terminate.
>
> STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper
> configuration...
>
> STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM...
>
> Time of last entry was the timestamp of java.status file. I had noticed
> that one day later.
>
>
>
> I didn’t notice this log entry before:
>
> STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory.
> Restarting JVM.
>
> …to me it sounds like the wrapper was using the exception (in first email
> it the complete log section) to consider JVM should best be restarted due
> to running out of memory. Which, in this situation was not the case. I
> don’t know why for the JVM heap depletion is the culprit by default when a
> thread cannot be spawned. Definitely there are other reasons for failure.
>
>
>
> The user was limited to 1k processes max, but for what I read, threads are
> counted. In stackoverflow I found this command to count the # of threads
> for a user:
>
>
>
> ps -eo euser,nlwp | grep scheck | awk '{print $2}' | awk '{ num_threads
> += $1 } END { print num_threads }'
>
>
>
> Currently it shows 7579 - the JVMs are heavily multithreaded.
>
>
>
> Unfortunately I don’t even got a test box to simulate this. It gets time
> to get one.
>
>
>
> Cheers,
>
> Christoph
>
> *From:* Leif Mortenson [mailto:lei...@ta...]
> *Sent:* 09 January 2019 10:50
> *To:* Wrapper User List <wra...@li...>
> *Subject:* Re: [Wrapper-user] [EXT] Re: no JVM running (state:
> DOWN_CLEAN) on linux after OOM
>
>
>
> Christoph
>
> Ok. So you are using a newer version of the Wrapper, so ignore the issue
> I mentioned about failing to kill the JVM. That was an old problem.
>
>
>
> Please send the debug output if you get it again.
>
>
>
> We will play around with the ulimits here as well and make sure the
> Wrapper behaves correctly.
>
>
>
> I am maybe not understanding the exact problem.
>
> After you get the OOM and the wrapper tries to restart, is the Wrapper
> just failing to start the next JVM and exiting? Or is it getting stuck.
>
> The later would be bad, and something we will want to get to the bottom of.
>
>
>
> It does not sound like this is easily reproduceable. But so, then the
> following will output detailed information about the state. It is a LOT of
> output though so not realistic unless you are testing.
>
> wrapper.state_output=TRUE
>
>
>
> Cheers,
>
> Leif
>
>
>
> On Wed, Jan 9, 2019 at 6:11 PM Christoph SCHWAIGER <csc...@am...>
> wrote:
>
> CONFIDENTIAL & RESTRICTED
>
>
>
> Hello Leif,
>
>
>
> Thanks for your response.
>
>
>
> Easy one first, the version we use:
>
> [scheck@muctxp5b scheck_unix4]$ ./wrapper --version
>
> Java Service Wrapper Community Edition 64-bit 3.5.30
>
> Copyright (C) 1999-2016 Tanuki Software, Ltd. All Rights Reserved.
>
> http://wrapper.tanukisoftware.com
> <https://clicktime.symantec.com/32PRRMdpdoTcCTFhKkEQTZ96H2?u=http%3A%2F%2Fwrapper.tanukisoftware.com>
>
>
>
> concerning the forced kill, I think I have seen once on another instance
> and time in the wrapper log something like “..JVM received sigkill (9)..”.
>
>
>
> In the case I looked at, the JVM process owned by the wrapper was gone,
> which suits the DOWN_CLEAN as you explained.
>
>
>
> I’ll turn on debug output on a few of them in case it happens again.
>
>
>
> As I interpret it, the configuration as such is OK, as well as the normal
> behaviour: when I i.e. kill the JVM manually, the wrapper brings it back
> online. And due to the OOM situation – more precisely, wrapper and JVM were
> limited by 1024 processes max in ulimits – the wrapper was not able i.e. to
> fork a command and that could explain why recovery stalled. Likely is that
> other wrapper / JVM tandems on the same machine (20-30 tandems) faced the
> same trouble and tried to recover, which would mean sometimes the ceiling
> was reached, sometimes not (i.e. when yet another jvm with many threads was
> killed or die). Does this makes sense to you?
>
>
>
> Should I look into updating my script to interpret the output of “app.sh
> status” concerning certain Java:__ states and kill the wrapper ?
>
> (in such a case the veritas cluster would consider the resource being
> offline and start the wrapper again).
>
> If that is a good idea depends on the amount of states to consider and for
> how long such a state can be tolerated. Maybe it is paranoid, since our
> box is very big, we should be fine concerning OOM unless we screw up
> settings again. We’re newbies on Linux, used windows for years.
>
>
>
> Cheers,
>
> Christoph
>
>
>
> *From:* Leif Mortenson [mailto:lei...@ta...]
> *Sent:* 09 January 2019 03:10
> *To:* Wrapper User List <wra...@li...>
> *Subject:* [EXT] Re: [Wrapper-user] no JVM running (state: DOWN_CLEAN) on
> linux after OOM
>
>
>
> Christoph
>
>
>
> 1) Could you please send me the wrapper.log file with debug output enabled
> (wrapper.debug=true) that shows what is happening when the Wrapper is
> failing to restart the JVM?
>
> Please include the part of the log showing the last few moments of the JVM
> that runs out of memory as well.
>
>
>
> 2) What version of the Wrapper are you running?
>
> The following issue was fixed in 3.5.16 and sounds like it might be what
> you are seeing.
>
> https://wrapper.tanukisoftware.com/doc/english/release-notes.html#3.5.16
> <https://clicktime.symantec.com/a/1/s_mZYlanJcqYJWQ55URpsksoMfAB69FuqpCaCaHcFZI=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Frelease-notes.html%233.5.16>
>
> ---
>
> Fix a problem where a JVM process was not stopped completely on a UNIX
> platform and stayed defunct after a forced kill until the Wrapper process
> itself stopped. This was especially noticeable if the JVM is frozen and the
> JVM is being killed forcibly.
>
> ---
>
> Are you seeing a zombie Java process still running?
>
> This bug meant that the JVM was being left around in the background when
> the Wrapper thought it was gone.
>
> If you are out of memory then the next JVM would not have enough memory to
> launch.
>
> If the first JVM is not actually frozen, it would shut itself down after
> losing its backend connection to the Wrapper. But that might be happening
> too late and result in what you are seeing.
>
>
>
> 3) The DOWN_CLEAN state means that the Wrapper has completely shutdown the
> JVM and cleaned up any associated resources.
>
> We will take a look at the documentation on the following page as you are
> correct that it is missing some information.
>
> https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html
> <https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html>
>
>
>
> Cheers,
>
> Leif
>
>
>
> On Tue, Jan 8, 2019 at 8:32 PM Christoph SCHWAIGER <csc...@am...>
> wrote:
>
> CONFIDENTIAL & RESTRICTED
>
>
>
> Hello Leif,
>
>
>
> Thanks for the information about the subscription. I did so.
>
>
>
> We have been using the wrapper on windows for many years, since a couple
> of years we have a standard support version.
>
>
>
> Our problem is on linux RH. *After an out of memory situation (the jvm
> exited) it is not restarted and remains down indefinitely, the status
> script exits with status zero*, so all looks up for the cluster.
> (integrated into veritas cluster). The OOM was bad: not related to JVM, but
> caused by overly optimistic ulimits of the user - that has been corrected.
>
>
>
> STATUS | wrapper | 2019/01/07 13:38:41 | Launching a JVM...
>
> INFO | jvm 1 | 2019/01/07 13:38:43 | WrapperManager: Initializing...
>
> INFO | jvm 1 | 2019/01/07 13:38:45 | S-Check version 3.0.4 Monte Rosa
> from 12-Sep-2018 08:02 by cschwaiger
>
> INFO | jvm 1 | 2019/01/07 13:38:45 | Scheck is starting on server
> MUCTXP5B
>
> INFO | jvm 1 | 2019/01/07 13:38:52 | parsed 1 xml files and created 0
> service records.
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | Exception in thread
> "InactivityMonitor WriteCheck" java.lang.OutOfMemoryError: unable to create
> new native thread
>
> STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory.
> Restarting JVM.
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> java.lang.Thread.start0(Native Method)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> java.lang.Thread.start(Thread.java:717)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> org.apache.activemq.transport.InactivityMonitor.writeCheck(InactivityMonitor.java:147)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> org.apache.activemq.transport.InactivityMonitor$2.run(InactivityMonitor.java:113)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> java.util.TimerThread.mainLoop(Timer.java:555)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> java.util.TimerThread.run(Timer.java:505)
>
> ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out
> waiting for signal from JVM.
>
> ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request,
> termination requested.
>
> STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL
> (9).
>
> STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone.
>
> STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested
> to terminate.
>
> STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper
> configuration...
>
> STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM...
>
>
>
> [scheck@muctxp5b scheck_unix11]$ ./scheck.sh status
>
> *Service check monitoring instance (not installed) is running: PID:56766,
> Wrapper:STARTED, Java:DOWN_CLEAN*
>
>
>
> I could not find the DOWN_CLEAN state documented – looked at:
> https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html
> <https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html>
>
>
>
> ”scheck.sh stop” fails – indefinitely waits for wrapper to stop. A simple
> kill <pid> terminates it.
>
>
>
> Any recommendations – i.e. measures to avoid hanging in the “looks good =
> status zero, but down” state?
>
>
>
> Below/attached is the information about os version and configuration.
>
>
>
> Thanks in advance,
>
> Christoph
>
>
>
> Linux muctxp5b 2.6.32-754.3.5.el6.x86_64 #1 SMP Thu Aug 9 11:56:22 EDT
> 2018 x86_64 x86_64 x86_64 GNU/Linux
>
>
|
|
From: Christoph S. <csc...@am...> - 2019-01-25 14:34:50
|
CONFIDENTIAL & RESTRICTED
Hello Leif,
Thanks for answering that - could have checked myself.
But that is not why I contact you again. The problem occurred again while the infrastructure was in a healthy state, I looked a bit deeper.
Background is that the cluster wanted to switch over resources to the other server and performed local ./scheck.sh (the name of our wrapper script) stop, but the stop never finished: I had a look to the shell processes still lingering around.
The status output is as it was last time:
[scheck@muctxp5b scheck_tcpbatch]$ ./scheck.sh status
Service check monitoring instance (not installed) is running: PID:9414, Wrapper:STOPPING, Java:DOWN_CLEAN
[scheck@muctxp5b scheck_ tcpbatch]$ echo $?
0
Here is the wrapper and the tree of the hanging commands (parent first) performing the stop:
[scheck@muctxp5b scheck]$ ps -elf | grep tcpbatch
1 S scheck 9411 1 0 80 0 - 29194 pipe_w Jan04 ? 00:05:40 /opt/scheck/muctxp5j/scheck_tcpbatch/./wrapper /opt/scheck/muctxp5j/scheck_tcpbatch/conf/wrapper.conf wrapper.syslog.ident=scheck wrapper.pidfile=/opt/scheck/muctxp5
j/scheck_tcpbatch/./scheck.pid wrapper.daemonize=TRUE wrapper.name=scheck wrapper.displayname=Service check monitoring instance wrapper.statusfile=/opt/scheck/muctxp5j/scheck_tcpbatch/./scheck.status wrapper.java.statusfile=/opt/scheck/muctxp5j/scheck_tcpbatch/./scheck.java.status wrapper.script.version=3.5.30
0 S scheck 13900 40818 0 80 0 - 25832 pipe_w 14:41 pts/0 00:00:00 grep tcpbatch
4 S scheck 23388 1 0 80 0 - 26529 do_wai Jan23 ? 00:00:00 bash -c USER=scheck; export USER; LOGNAME=sch
eck; export LOGNAME; HOME=/home/scheck; export HOME; /opt/scheck/resources.sh muctxp5j scheck_tcpbatch stop
0 S scheck 23394 23388 0 80 0 - 26529 do_wai Jan23 ? 00:00:00 /bin/bash /opt/scheck/resources.sh muctxp5j scheck_tcpbatch stop
0 S scheck 23398 23394 0 80 0 - 26758 do_wai Jan23 ? 00:02:31 /bin/sh /opt/scheck/muctxp5j/scheck_tcpbatch/scheck.sh stop
I attached to the scheck.sh script and see it looping, every second performing some syscalls, which I guess is this function:
waitforwrapperstop() {
getpid
while [ "X$pid" != "X" ] ; do
sleep 1
getpid
done
}
So I had a look to the wrapper process 9411, which did not want to vanish, with strace:
Process 9411 attached
15:31:10 read(5,
And see file descriptor 5 is pipe # 813574553:
scheck@muctxp5b 9411]$ ls -lr fd
total 0
l-wx------ 1 scheck scheck 64 Jan 7 13:25 6 -> pipe:[813574553] (this is shown in red color - assume since pipe is broken)
lr-x------ 1 scheck scheck 64 Jan 4 12:48 5 -> pipe:[813574553] (this is shown in red color - assume since pipe is broken)
lrwx------ 1 scheck scheck 64 Jan 4 12:48 4 -> /opt/scheck/muctxp5j/scheck_tcpbatch/log/wrapper_donotmonitor.log
lrwx------ 1 scheck scheck 64 Jan 4 12:49 3 -> socket:[813555625] (this is shown in red color - assume since pipe is broken)
lrwx------ 1 scheck scheck 64 Jan 4 12:48 2 -> /dev/null
lrwx------ 1 scheck scheck 64 Jan 4 12:48 1 -> /dev/null
lrwx------ 1 scheck scheck 64 Jan 4 12:48 0 -> /dev/null
concerning stackoverflow, a grep of the # from lsof should list both sides of the pipe, but it shows only read/write of one side. Unless I misunderstood.
[scheck@muctxp5b 9411]$ lsof | grep 813574553
wrapper 9411 scheck 5r FIFO 0,8 0t0 813574553 pipe
wrapper 9411 scheck 6w FIFO 0,8 0t0 813574553 pipe
Is the process on other side of the pipe the JVM? Which was, as the status output indicates, down already. Then for some reason the wrapper process was still waiting in a read to the pipe to a stopped process - and maybe because of this remained up.
I didn't try further, since I have no debugger on the system and no source.
When I detected the problem, it was about 10hours after the stop attempt.
As last time, scheck.sh top immediately came back and stopped with wrapper process!
Hopefully this helps investigating.
Cheers
Christoph
From: Leif Mortenson [mailto:lei...@ta...]
Sent: 10 January 2019 17:27
To: Wrapper User List <wra...@li...>
Subject: Re: [Wrapper-user] [EXT] Re: no JVM running (state: DOWN_CLEAN) on linux after OOM
Christoph
Yes, the following configuration is what causes the Wrapper do restart based on the text in the console output:
---
wrapper.filter.trigger.1001=java.lang.OutOfMemoryError
wrapper.filter.action.1001=RESTART
wrapper.filter.message.1001=The JVM has run out of memory.
---
It sounds like you are on the right track. We will see if we can reproduce something here as well.
Cheers,
Leif
On Wed, Jan 9, 2019 at 10:16 PM Christoph SCHWAIGER <csc...@am...<mailto:csc...@am...>> wrote:
CONFIDENTIAL & RESTRICTED
Hello Leif,
Those were the very last entries in the wrapper log:
ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out waiting for signal from JVM.
ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request, termination requested.
STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL (9).
STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone.
STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested to terminate.
STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper configuration...
STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM...
Time of last entry was the timestamp of java.status file. I had noticed that one day later.
I didn't notice this log entry before:
STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory. Restarting JVM.
...to me it sounds like the wrapper was using the exception (in first email it the complete log section) to consider JVM should best be restarted due to running out of memory. Which, in this situation was not the case. I don't know why for the JVM heap depletion is the culprit by default when a thread cannot be spawned. Definitely there are other reasons for failure.
The user was limited to 1k processes max, but for what I read, threads are counted. In stackoverflow I found this command to count the # of threads for a user:
ps -eo euser,nlwp | grep scheck | awk '{print $2}' | awk '{ num_threads += $1 } END { print num_threads }'
Currently it shows 7579 - the JVMs are heavily multithreaded.
Unfortunately I don't even got a test box to simulate this. It gets time to get one.
Cheers,
Christoph
From: Leif Mortenson [mailto:lei...@ta...<mailto:lei...@ta...>]
Sent: 09 January 2019 10:50
To: Wrapper User List <wra...@li...<mailto:wra...@li...>>
Subject: Re: [Wrapper-user] [EXT] Re: no JVM running (state: DOWN_CLEAN) on linux after OOM
Christoph
Ok. So you are using a newer version of the Wrapper, so ignore the issue I mentioned about failing to kill the JVM. That was an old problem.
Please send the debug output if you get it again.
We will play around with the ulimits here as well and make sure the Wrapper behaves correctly.
I am maybe not understanding the exact problem.
After you get the OOM and the wrapper tries to restart, is the Wrapper just failing to start the next JVM and exiting? Or is it getting stuck.
The later would be bad, and something we will want to get to the bottom of.
It does not sound like this is easily reproduceable. But so, then the following will output detailed information about the state. It is a LOT of output though so not realistic unless you are testing.
wrapper.state_output=TRUE
Cheers,
Leif
On Wed, Jan 9, 2019 at 6:11 PM Christoph SCHWAIGER <csc...@am...<mailto:csc...@am...>> wrote:
CONFIDENTIAL & RESTRICTED
Hello Leif,
Thanks for your response.
Easy one first, the version we use:
[scheck@muctxp5b scheck_unix4]$ ./wrapper --version
Java Service Wrapper Community Edition 64-bit 3.5.30
Copyright (C) 1999-2016 Tanuki Software, Ltd. All Rights Reserved.
http://wrapper.tanukisoftware.com<https://clicktime.symantec.com/32PRRMdpdoTcCTFhKkEQTZ96H2?u=http%3A%2F%2Fwrapper.tanukisoftware.com>
concerning the forced kill, I think I have seen once on another instance and time in the wrapper log something like "..JVM received sigkill (9)..".
In the case I looked at, the JVM process owned by the wrapper was gone, which suits the DOWN_CLEAN as you explained.
I'll turn on debug output on a few of them in case it happens again.
As I interpret it, the configuration as such is OK, as well as the normal behaviour: when I i.e. kill the JVM manually, the wrapper brings it back online. And due to the OOM situation - more precisely, wrapper and JVM were limited by 1024 processes max in ulimits - the wrapper was not able i.e. to fork a command and that could explain why recovery stalled. Likely is that other wrapper / JVM tandems on the same machine (20-30 tandems) faced the same trouble and tried to recover, which would mean sometimes the ceiling was reached, sometimes not (i.e. when yet another jvm with many threads was killed or die). Does this makes sense to you?
Should I look into updating my script to interpret the output of "app.sh status" concerning certain Java:__ states and kill the wrapper ?
(in such a case the veritas cluster would consider the resource being offline and start the wrapper again).
If that is a good idea depends on the amount of states to consider and for how long such a state can be tolerated. Maybe it is paranoid, since our box is very big, we should be fine concerning OOM unless we screw up settings again. We're newbies on Linux, used windows for years.
Cheers,
Christoph
From: Leif Mortenson [mailto:lei...@ta...<mailto:lei...@ta...>]
Sent: 09 January 2019 03:10
To: Wrapper User List <wra...@li...<mailto:wra...@li...>>
Subject: [EXT] Re: [Wrapper-user] no JVM running (state: DOWN_CLEAN) on linux after OOM
Christoph
1) Could you please send me the wrapper.log file with debug output enabled (wrapper.debug=true) that shows what is happening when the Wrapper is failing to restart the JVM?
Please include the part of the log showing the last few moments of the JVM that runs out of memory as well.
2) What version of the Wrapper are you running?
The following issue was fixed in 3.5.16 and sounds like it might be what you are seeing.
https://wrapper.tanukisoftware.com/doc/english/release-notes.html#3.5.16<https://clicktime.symantec.com/a/1/s_mZYlanJcqYJWQ55URpsksoMfAB69FuqpCaCaHcFZI=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Frelease-notes.html%233.5.16>
---
Fix a problem where a JVM process was not stopped completely on a UNIX platform and stayed defunct after a forced kill until the Wrapper process itself stopped. This was especially noticeable if the JVM is frozen and the JVM is being killed forcibly.
---
Are you seeing a zombie Java process still running?
This bug meant that the JVM was being left around in the background when the Wrapper thought it was gone.
If you are out of memory then the next JVM would not have enough memory to launch.
If the first JVM is not actually frozen, it would shut itself down after losing its backend connection to the Wrapper. But that might be happening too late and result in what you are seeing.
3) The DOWN_CLEAN state means that the Wrapper has completely shutdown the JVM and cleaned up any associated resources.
We will take a look at the documentation on the following page as you are correct that it is missing some information.
https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html<https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html>
Cheers,
Leif
On Tue, Jan 8, 2019 at 8:32 PM Christoph SCHWAIGER <csc...@am...<mailto:csc...@am...>> wrote:
CONFIDENTIAL & RESTRICTED
Hello Leif,
Thanks for the information about the subscription. I did so.
We have been using the wrapper on windows for many years, since a couple of years we have a standard support version.
Our problem is on linux RH. After an out of memory situation (the jvm exited) it is not restarted and remains down indefinitely, the status script exits with status zero, so all looks up for the cluster. (integrated into veritas cluster). The OOM was bad: not related to JVM, but caused by overly optimistic ulimits of the user - that has been corrected.
STATUS | wrapper | 2019/01/07 13:38:41 | Launching a JVM...
INFO | jvm 1 | 2019/01/07 13:38:43 | WrapperManager: Initializing...
INFO | jvm 1 | 2019/01/07 13:38:45 | S-Check version 3.0.4 Monte Rosa from 12-Sep-2018 08:02 by cschwaiger
INFO | jvm 1 | 2019/01/07 13:38:45 | Scheck is starting on server MUCTXP5B
INFO | jvm 1 | 2019/01/07 13:38:52 | parsed 1 xml files and created 0 service records.
INFO | jvm 1 | 2019/01/07 15:02:11 | Exception in thread "InactivityMonitor WriteCheck" java.lang.OutOfMemoryError: unable to create new native thread
STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory. Restarting JVM.
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.lang.Thread.start0(Native Method)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.lang.Thread.start(Thread.java:717)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.transport.InactivityMonitor.writeCheck(InactivityMonitor.java:147)
INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.transport.InactivityMonitor$2.run(InactivityMonitor.java:113)
INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.TimerThread.mainLoop(Timer.java:555)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.TimerThread.run(Timer.java:505)
ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out waiting for signal from JVM.
ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request, termination requested.
STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL (9).
STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone.
STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested to terminate.
STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper configuration...
STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM...
[scheck@muctxp5b scheck_unix11]$ ./scheck.sh status
Service check monitoring instance (not installed) is running: PID:56766, Wrapper:STARTED, Java:DOWN_CLEAN
I could not find the DOWN_CLEAN state documented - looked at: https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html<https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html>
"scheck.sh stop" fails - indefinitely waits for wrapper to stop. A simple kill <pid> terminates it.
Any recommendations - i.e. measures to avoid hanging in the "looks good = status zero, but down" state?
Below/attached is the information about os version and configuration.
Thanks in advance,
Christoph
Linux muctxp5b 2.6.32-754.3.5.el6.x86_64 #1 SMP Thu Aug 9 11:56:22 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
|
|
From: Leif M. <lei...@ta...> - 2019-01-10 16:27:05
|
Christoph
Yes, the following configuration is what causes the Wrapper do restart
based on the text in the console output:
---
wrapper.filter.trigger.1001=java.lang.OutOfMemoryError
wrapper.filter.action.1001=RESTART
wrapper.filter.message.1001=The JVM has run out of memory.
---
It sounds like you are on the right track. We will see if we can reproduce
something here as well.
Cheers,
Leif
On Wed, Jan 9, 2019 at 10:16 PM Christoph SCHWAIGER <csc...@am...>
wrote:
> CONFIDENTIAL & RESTRICTED
>
>
>
> Hello Leif,
>
>
>
> Those were the very last entries in the wrapper log:
>
> ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out
> waiting for signal from JVM.
>
> ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request,
> termination requested.
>
> STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL
> (9).
>
> STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone.
>
> STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested
> to terminate.
>
> STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper
> configuration...
>
> STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM...
>
> Time of last entry was the timestamp of java.status file. I had noticed
> that one day later.
>
>
>
> I didn’t notice this log entry before:
>
> STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory.
> Restarting JVM.
>
> …to me it sounds like the wrapper was using the exception (in first email
> it the complete log section) to consider JVM should best be restarted due
> to running out of memory. Which, in this situation was not the case. I
> don’t know why for the JVM heap depletion is the culprit by default when a
> thread cannot be spawned. Definitely there are other reasons for failure.
>
>
>
> The user was limited to 1k processes max, but for what I read, threads are
> counted. In stackoverflow I found this command to count the # of threads
> for a user:
>
>
>
> ps -eo euser,nlwp | grep scheck | awk '{print $2}' | awk '{ num_threads
> += $1 } END { print num_threads }'
>
>
>
> Currently it shows 7579 - the JVMs are heavily multithreaded.
>
>
>
> Unfortunately I don’t even got a test box to simulate this. It gets time
> to get one.
>
>
>
> Cheers,
>
> Christoph
>
> *From:* Leif Mortenson [mailto:lei...@ta...]
> *Sent:* 09 January 2019 10:50
> *To:* Wrapper User List <wra...@li...>
> *Subject:* Re: [Wrapper-user] [EXT] Re: no JVM running (state:
> DOWN_CLEAN) on linux after OOM
>
>
>
> Christoph
>
> Ok. So you are using a newer version of the Wrapper, so ignore the issue
> I mentioned about failing to kill the JVM. That was an old problem.
>
>
>
> Please send the debug output if you get it again.
>
>
>
> We will play around with the ulimits here as well and make sure the
> Wrapper behaves correctly.
>
>
>
> I am maybe not understanding the exact problem.
>
> After you get the OOM and the wrapper tries to restart, is the Wrapper
> just failing to start the next JVM and exiting? Or is it getting stuck.
>
> The later would be bad, and something we will want to get to the bottom of.
>
>
>
> It does not sound like this is easily reproduceable. But so, then the
> following will output detailed information about the state. It is a LOT of
> output though so not realistic unless you are testing.
>
> wrapper.state_output=TRUE
>
>
>
> Cheers,
>
> Leif
>
>
>
> On Wed, Jan 9, 2019 at 6:11 PM Christoph SCHWAIGER <csc...@am...>
> wrote:
>
> CONFIDENTIAL & RESTRICTED
>
>
>
> Hello Leif,
>
>
>
> Thanks for your response.
>
>
>
> Easy one first, the version we use:
>
> [scheck@muctxp5b scheck_unix4]$ ./wrapper --version
>
> Java Service Wrapper Community Edition 64-bit 3.5.30
>
> Copyright (C) 1999-2016 Tanuki Software, Ltd. All Rights Reserved.
>
> http://wrapper.tanukisoftware.com
> <https://clicktime.symantec.com/32PRRMdpdoTcCTFhKkEQTZ96H2?u=http%3A%2F%2Fwrapper.tanukisoftware.com>
>
>
>
> concerning the forced kill, I think I have seen once on another instance
> and time in the wrapper log something like “..JVM received sigkill (9)..”.
>
>
>
> In the case I looked at, the JVM process owned by the wrapper was gone,
> which suits the DOWN_CLEAN as you explained.
>
>
>
> I’ll turn on debug output on a few of them in case it happens again.
>
>
>
> As I interpret it, the configuration as such is OK, as well as the normal
> behaviour: when I i.e. kill the JVM manually, the wrapper brings it back
> online. And due to the OOM situation – more precisely, wrapper and JVM were
> limited by 1024 processes max in ulimits – the wrapper was not able i.e. to
> fork a command and that could explain why recovery stalled. Likely is that
> other wrapper / JVM tandems on the same machine (20-30 tandems) faced the
> same trouble and tried to recover, which would mean sometimes the ceiling
> was reached, sometimes not (i.e. when yet another jvm with many threads was
> killed or die). Does this makes sense to you?
>
>
>
> Should I look into updating my script to interpret the output of “app.sh
> status” concerning certain Java:__ states and kill the wrapper ?
>
> (in such a case the veritas cluster would consider the resource being
> offline and start the wrapper again).
>
> If that is a good idea depends on the amount of states to consider and for
> how long such a state can be tolerated. Maybe it is paranoid, since our
> box is very big, we should be fine concerning OOM unless we screw up
> settings again. We’re newbies on Linux, used windows for years.
>
>
>
> Cheers,
>
> Christoph
>
>
>
> *From:* Leif Mortenson [mailto:lei...@ta...]
> *Sent:* 09 January 2019 03:10
> *To:* Wrapper User List <wra...@li...>
> *Subject:* [EXT] Re: [Wrapper-user] no JVM running (state: DOWN_CLEAN) on
> linux after OOM
>
>
>
> Christoph
>
>
>
> 1) Could you please send me the wrapper.log file with debug output enabled
> (wrapper.debug=true) that shows what is happening when the Wrapper is
> failing to restart the JVM?
>
> Please include the part of the log showing the last few moments of the JVM
> that runs out of memory as well.
>
>
>
> 2) What version of the Wrapper are you running?
>
> The following issue was fixed in 3.5.16 and sounds like it might be what
> you are seeing.
>
> https://wrapper.tanukisoftware.com/doc/english/release-notes.html#3.5.16
> <https://clicktime.symantec.com/a/1/s_mZYlanJcqYJWQ55URpsksoMfAB69FuqpCaCaHcFZI=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Frelease-notes.html%233.5.16>
>
> ---
>
> Fix a problem where a JVM process was not stopped completely on a UNIX
> platform and stayed defunct after a forced kill until the Wrapper process
> itself stopped. This was especially noticeable if the JVM is frozen and the
> JVM is being killed forcibly.
>
> ---
>
> Are you seeing a zombie Java process still running?
>
> This bug meant that the JVM was being left around in the background when
> the Wrapper thought it was gone.
>
> If you are out of memory then the next JVM would not have enough memory to
> launch.
>
> If the first JVM is not actually frozen, it would shut itself down after
> losing its backend connection to the Wrapper. But that might be happening
> too late and result in what you are seeing.
>
>
>
> 3) The DOWN_CLEAN state means that the Wrapper has completely shutdown the
> JVM and cleaned up any associated resources.
>
> We will take a look at the documentation on the following page as you are
> correct that it is missing some information.
>
> https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html
> <https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html>
>
>
>
> Cheers,
>
> Leif
>
>
>
> On Tue, Jan 8, 2019 at 8:32 PM Christoph SCHWAIGER <csc...@am...>
> wrote:
>
> CONFIDENTIAL & RESTRICTED
>
>
>
> Hello Leif,
>
>
>
> Thanks for the information about the subscription. I did so.
>
>
>
> We have been using the wrapper on windows for many years, since a couple
> of years we have a standard support version.
>
>
>
> Our problem is on linux RH. *After an out of memory situation (the jvm
> exited) it is not restarted and remains down indefinitely, the status
> script exits with status zero*, so all looks up for the cluster.
> (integrated into veritas cluster). The OOM was bad: not related to JVM, but
> caused by overly optimistic ulimits of the user - that has been corrected.
>
>
>
> STATUS | wrapper | 2019/01/07 13:38:41 | Launching a JVM...
>
> INFO | jvm 1 | 2019/01/07 13:38:43 | WrapperManager: Initializing...
>
> INFO | jvm 1 | 2019/01/07 13:38:45 | S-Check version 3.0.4 Monte Rosa
> from 12-Sep-2018 08:02 by cschwaiger
>
> INFO | jvm 1 | 2019/01/07 13:38:45 | Scheck is starting on server
> MUCTXP5B
>
> INFO | jvm 1 | 2019/01/07 13:38:52 | parsed 1 xml files and created 0
> service records.
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | Exception in thread
> "InactivityMonitor WriteCheck" java.lang.OutOfMemoryError: unable to create
> new native thread
>
> STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory.
> Restarting JVM.
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> java.lang.Thread.start0(Native Method)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> java.lang.Thread.start(Thread.java:717)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> org.apache.activemq.transport.InactivityMonitor.writeCheck(InactivityMonitor.java:147)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> org.apache.activemq.transport.InactivityMonitor$2.run(InactivityMonitor.java:113)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> java.util.TimerThread.mainLoop(Timer.java:555)
>
> INFO | jvm 1 | 2019/01/07 15:02:11 | at
> java.util.TimerThread.run(Timer.java:505)
>
> ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out
> waiting for signal from JVM.
>
> ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request,
> termination requested.
>
> STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL
> (9).
>
> STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone.
>
> STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested
> to terminate.
>
> STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper
> configuration...
>
> STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM...
>
>
>
> [scheck@muctxp5b scheck_unix11]$ ./scheck.sh status
>
> *Service check monitoring instance (not installed) is running: PID:56766,
> Wrapper:STARTED, Java:DOWN_CLEAN*
>
>
>
> I could not find the DOWN_CLEAN state documented – looked at:
> https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html
> <https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html>
>
>
>
> ”scheck.sh stop” fails – indefinitely waits for wrapper to stop. A simple
> kill <pid> terminates it.
>
>
>
> Any recommendations – i.e. measures to avoid hanging in the “looks good =
> status zero, but down” state?
>
>
>
> Below/attached is the information about os version and configuration.
>
>
>
> Thanks in advance,
>
> Christoph
>
>
>
> Linux muctxp5b 2.6.32-754.3.5.el6.x86_64 #1 SMP Thu Aug 9 11:56:22 EDT
> 2018 x86_64 x86_64 x86_64 GNU/Linux
>
>
>
>
|
|
From: Christoph S. <csc...@am...> - 2019-01-09 13:16:18
|
CONFIDENTIAL & RESTRICTED
Hello Leif,
Those were the very last entries in the wrapper log:
ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out waiting for signal from JVM.
ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request, termination requested.
STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL (9).
STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone.
STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested to terminate.
STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper configuration...
STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM...
Time of last entry was the timestamp of java.status file. I had noticed that one day later.
I didn't notice this log entry before:
STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory. Restarting JVM.
...to me it sounds like the wrapper was using the exception (in first email it the complete log section) to consider JVM should best be restarted due to running out of memory. Which, in this situation was not the case. I don't know why for the JVM heap depletion is the culprit by default when a thread cannot be spawned. Definitely there are other reasons for failure.
The user was limited to 1k processes max, but for what I read, threads are counted. In stackoverflow I found this command to count the # of threads for a user:
ps -eo euser,nlwp | grep scheck | awk '{print $2}' | awk '{ num_threads += $1 } END { print num_threads }'
Currently it shows 7579 - the JVMs are heavily multithreaded.
Unfortunately I don't even got a test box to simulate this. It gets time to get one.
Cheers,
Christoph
From: Leif Mortenson [mailto:lei...@ta...]
Sent: 09 January 2019 10:50
To: Wrapper User List <wra...@li...>
Subject: Re: [Wrapper-user] [EXT] Re: no JVM running (state: DOWN_CLEAN) on linux after OOM
Christoph
Ok. So you are using a newer version of the Wrapper, so ignore the issue I mentioned about failing to kill the JVM. That was an old problem.
Please send the debug output if you get it again.
We will play around with the ulimits here as well and make sure the Wrapper behaves correctly.
I am maybe not understanding the exact problem.
After you get the OOM and the wrapper tries to restart, is the Wrapper just failing to start the next JVM and exiting? Or is it getting stuck.
The later would be bad, and something we will want to get to the bottom of.
It does not sound like this is easily reproduceable. But so, then the following will output detailed information about the state. It is a LOT of output though so not realistic unless you are testing.
wrapper.state_output=TRUE
Cheers,
Leif
On Wed, Jan 9, 2019 at 6:11 PM Christoph SCHWAIGER <csc...@am...<mailto:csc...@am...>> wrote:
CONFIDENTIAL & RESTRICTED
Hello Leif,
Thanks for your response.
Easy one first, the version we use:
[scheck@muctxp5b scheck_unix4]$ ./wrapper --version
Java Service Wrapper Community Edition 64-bit 3.5.30
Copyright (C) 1999-2016 Tanuki Software, Ltd. All Rights Reserved.
http://wrapper.tanukisoftware.com<https://clicktime.symantec.com/32PRRMdpdoTcCTFhKkEQTZ96H2?u=http%3A%2F%2Fwrapper.tanukisoftware.com>
concerning the forced kill, I think I have seen once on another instance and time in the wrapper log something like "..JVM received sigkill (9)..".
In the case I looked at, the JVM process owned by the wrapper was gone, which suits the DOWN_CLEAN as you explained.
I'll turn on debug output on a few of them in case it happens again.
As I interpret it, the configuration as such is OK, as well as the normal behaviour: when I i.e. kill the JVM manually, the wrapper brings it back online. And due to the OOM situation - more precisely, wrapper and JVM were limited by 1024 processes max in ulimits - the wrapper was not able i.e. to fork a command and that could explain why recovery stalled. Likely is that other wrapper / JVM tandems on the same machine (20-30 tandems) faced the same trouble and tried to recover, which would mean sometimes the ceiling was reached, sometimes not (i.e. when yet another jvm with many threads was killed or die). Does this makes sense to you?
Should I look into updating my script to interpret the output of "app.sh status" concerning certain Java:__ states and kill the wrapper ?
(in such a case the veritas cluster would consider the resource being offline and start the wrapper again).
If that is a good idea depends on the amount of states to consider and for how long such a state can be tolerated. Maybe it is paranoid, since our box is very big, we should be fine concerning OOM unless we screw up settings again. We're newbies on Linux, used windows for years.
Cheers,
Christoph
From: Leif Mortenson [mailto:lei...@ta...<mailto:lei...@ta...>]
Sent: 09 January 2019 03:10
To: Wrapper User List <wra...@li...<mailto:wra...@li...>>
Subject: [EXT] Re: [Wrapper-user] no JVM running (state: DOWN_CLEAN) on linux after OOM
Christoph
1) Could you please send me the wrapper.log file with debug output enabled (wrapper.debug=true) that shows what is happening when the Wrapper is failing to restart the JVM?
Please include the part of the log showing the last few moments of the JVM that runs out of memory as well.
2) What version of the Wrapper are you running?
The following issue was fixed in 3.5.16 and sounds like it might be what you are seeing.
https://wrapper.tanukisoftware.com/doc/english/release-notes.html#3.5.16<https://clicktime.symantec.com/a/1/s_mZYlanJcqYJWQ55URpsksoMfAB69FuqpCaCaHcFZI=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Frelease-notes.html%233.5.16>
---
Fix a problem where a JVM process was not stopped completely on a UNIX platform and stayed defunct after a forced kill until the Wrapper process itself stopped. This was especially noticeable if the JVM is frozen and the JVM is being killed forcibly.
---
Are you seeing a zombie Java process still running?
This bug meant that the JVM was being left around in the background when the Wrapper thought it was gone.
If you are out of memory then the next JVM would not have enough memory to launch.
If the first JVM is not actually frozen, it would shut itself down after losing its backend connection to the Wrapper. But that might be happening too late and result in what you are seeing.
3) The DOWN_CLEAN state means that the Wrapper has completely shutdown the JVM and cleaned up any associated resources.
We will take a look at the documentation on the following page as you are correct that it is missing some information.
https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html<https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html>
Cheers,
Leif
On Tue, Jan 8, 2019 at 8:32 PM Christoph SCHWAIGER <csc...@am...<mailto:csc...@am...>> wrote:
CONFIDENTIAL & RESTRICTED
Hello Leif,
Thanks for the information about the subscription. I did so.
We have been using the wrapper on windows for many years, since a couple of years we have a standard support version.
Our problem is on linux RH. After an out of memory situation (the jvm exited) it is not restarted and remains down indefinitely, the status script exits with status zero, so all looks up for the cluster. (integrated into veritas cluster). The OOM was bad: not related to JVM, but caused by overly optimistic ulimits of the user - that has been corrected.
STATUS | wrapper | 2019/01/07 13:38:41 | Launching a JVM...
INFO | jvm 1 | 2019/01/07 13:38:43 | WrapperManager: Initializing...
INFO | jvm 1 | 2019/01/07 13:38:45 | S-Check version 3.0.4 Monte Rosa from 12-Sep-2018 08:02 by cschwaiger
INFO | jvm 1 | 2019/01/07 13:38:45 | Scheck is starting on server MUCTXP5B
INFO | jvm 1 | 2019/01/07 13:38:52 | parsed 1 xml files and created 0 service records.
INFO | jvm 1 | 2019/01/07 15:02:11 | Exception in thread "InactivityMonitor WriteCheck" java.lang.OutOfMemoryError: unable to create new native thread
STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory. Restarting JVM.
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.lang.Thread.start0(Native Method)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.lang.Thread.start(Thread.java:717)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.transport.InactivityMonitor.writeCheck(InactivityMonitor.java:147)
INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.transport.InactivityMonitor$2.run(InactivityMonitor.java:113)
INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.TimerThread.mainLoop(Timer.java:555)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.TimerThread.run(Timer.java:505)
ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out waiting for signal from JVM.
ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request, termination requested.
STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL (9).
STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone.
STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested to terminate.
STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper configuration...
STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM...
[scheck@muctxp5b scheck_unix11]$ ./scheck.sh status
Service check monitoring instance (not installed) is running: PID:56766, Wrapper:STARTED, Java:DOWN_CLEAN
I could not find the DOWN_CLEAN state documented - looked at: https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html<https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html>
"scheck.sh stop" fails - indefinitely waits for wrapper to stop. A simple kill <pid> terminates it.
Any recommendations - i.e. measures to avoid hanging in the "looks good = status zero, but down" state?
Below/attached is the information about os version and configuration.
Thanks in advance,
Christoph
Linux muctxp5b 2.6.32-754.3.5.el6.x86_64 #1 SMP Thu Aug 9 11:56:22 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
|
|
From: Leif M. <lei...@ta...> - 2019-01-09 09:50:13
|
Christoph Ok. So you are using a newer version of the Wrapper, so ignore the issue I mentioned about failing to kill the JVM. That was an old problem. Please send the debug output if you get it again. We will play around with the ulimits here as well and make sure the Wrapper behaves correctly. I am maybe not understanding the exact problem. After you get the OOM and the wrapper tries to restart, is the Wrapper just failing to start the next JVM and exiting? Or is it getting stuck. The later would be bad, and something we will want to get to the bottom of. It does not sound like this is easily reproduceable. But so, then the following will output detailed information about the state. It is a LOT of output though so not realistic unless you are testing. wrapper.state_output=TRUE Cheers, Leif On Wed, Jan 9, 2019 at 6:11 PM Christoph SCHWAIGER <csc...@am...> wrote: > CONFIDENTIAL & RESTRICTED > > > > Hello Leif, > > > > Thanks for your response. > > > > Easy one first, the version we use: > > [scheck@muctxp5b scheck_unix4]$ ./wrapper --version > > Java Service Wrapper Community Edition 64-bit 3.5.30 > > Copyright (C) 1999-2016 Tanuki Software, Ltd. All Rights Reserved. > > http://wrapper.tanukisoftware.com > > > > concerning the forced kill, I think I have seen once on another instance > and time in the wrapper log something like “..JVM received sigkill (9)..”. > > > > In the case I looked at, the JVM process owned by the wrapper was gone, > which suits the DOWN_CLEAN as you explained. > > > > I’ll turn on debug output on a few of them in case it happens again. > > > > As I interpret it, the configuration as such is OK, as well as the normal > behaviour: when I i.e. kill the JVM manually, the wrapper brings it back > online. And due to the OOM situation – more precisely, wrapper and JVM were > limited by 1024 processes max in ulimits – the wrapper was not able i.e. to > fork a command and that could explain why recovery stalled. Likely is that > other wrapper / JVM tandems on the same machine (20-30 tandems) faced the > same trouble and tried to recover, which would mean sometimes the ceiling > was reached, sometimes not (i.e. when yet another jvm with many threads was > killed or die). Does this makes sense to you? > > > > Should I look into updating my script to interpret the output of “app.sh > status” concerning certain Java:__ states and kill the wrapper ? > > (in such a case the veritas cluster would consider the resource being > offline and start the wrapper again). > > If that is a good idea depends on the amount of states to consider and for > how long such a state can be tolerated. Maybe it is paranoid, since our > box is very big, we should be fine concerning OOM unless we screw up > settings again. We’re newbies on Linux, used windows for years. > > > > Cheers, > > Christoph > > > > *From:* Leif Mortenson [mailto:lei...@ta...] > *Sent:* 09 January 2019 03:10 > *To:* Wrapper User List <wra...@li...> > *Subject:* [EXT] Re: [Wrapper-user] no JVM running (state: DOWN_CLEAN) on > linux after OOM > > > > Christoph > > > > 1) Could you please send me the wrapper.log file with debug output enabled > (wrapper.debug=true) that shows what is happening when the Wrapper is > failing to restart the JVM? > > Please include the part of the log showing the last few moments of the JVM > that runs out of memory as well. > > > > 2) What version of the Wrapper are you running? > > The following issue was fixed in 3.5.16 and sounds like it might be what > you are seeing. > > https://wrapper.tanukisoftware.com/doc/english/release-notes.html#3.5.16 > <https://clicktime.symantec.com/a/1/s_mZYlanJcqYJWQ55URpsksoMfAB69FuqpCaCaHcFZI=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Frelease-notes.html%233.5.16> > > --- > > Fix a problem where a JVM process was not stopped completely on a UNIX > platform and stayed defunct after a forced kill until the Wrapper process > itself stopped. This was especially noticeable if the JVM is frozen and the > JVM is being killed forcibly. > > --- > > Are you seeing a zombie Java process still running? > > This bug meant that the JVM was being left around in the background when > the Wrapper thought it was gone. > > If you are out of memory then the next JVM would not have enough memory to > launch. > > If the first JVM is not actually frozen, it would shut itself down after > losing its backend connection to the Wrapper. But that might be happening > too late and result in what you are seeing. > > > > 3) The DOWN_CLEAN state means that the Wrapper has completely shutdown the > JVM and cleaned up any associated resources. > > We will take a look at the documentation on the following page as you are > correct that it is missing some information. > > https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html > <https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html> > > > > Cheers, > > Leif > > > > On Tue, Jan 8, 2019 at 8:32 PM Christoph SCHWAIGER <csc...@am...> > wrote: > > CONFIDENTIAL & RESTRICTED > > > > Hello Leif, > > > > Thanks for the information about the subscription. I did so. > > > > We have been using the wrapper on windows for many years, since a couple > of years we have a standard support version. > > > > Our problem is on linux RH. *After an out of memory situation (the jvm > exited) it is not restarted and remains down indefinitely, the status > script exits with status zero*, so all looks up for the cluster. > (integrated into veritas cluster). The OOM was bad: not related to JVM, but > caused by overly optimistic ulimits of the user - that has been corrected. > > > > STATUS | wrapper | 2019/01/07 13:38:41 | Launching a JVM... > > INFO | jvm 1 | 2019/01/07 13:38:43 | WrapperManager: Initializing... > > INFO | jvm 1 | 2019/01/07 13:38:45 | S-Check version 3.0.4 Monte Rosa > from 12-Sep-2018 08:02 by cschwaiger > > INFO | jvm 1 | 2019/01/07 13:38:45 | Scheck is starting on server > MUCTXP5B > > INFO | jvm 1 | 2019/01/07 13:38:52 | parsed 1 xml files and created 0 > service records. > > INFO | jvm 1 | 2019/01/07 15:02:11 | Exception in thread > "InactivityMonitor WriteCheck" java.lang.OutOfMemoryError: unable to create > new native thread > > STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory. > Restarting JVM. > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.lang.Thread.start0(Native Method) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.lang.Thread.start(Thread.java:717) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > org.apache.activemq.transport.InactivityMonitor.writeCheck(InactivityMonitor.java:147) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > org.apache.activemq.transport.InactivityMonitor$2.run(InactivityMonitor.java:113) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.util.TimerThread.mainLoop(Timer.java:555) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.util.TimerThread.run(Timer.java:505) > > ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out > waiting for signal from JVM. > > ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request, > termination requested. > > STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL > (9). > > STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone. > > STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested > to terminate. > > STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper > configuration... > > STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM... > > > > [scheck@muctxp5b scheck_unix11]$ ./scheck.sh status > > *Service check monitoring instance (not installed) is running: PID:56766, > Wrapper:STARTED, Java:DOWN_CLEAN* > > > > I could not find the DOWN_CLEAN state documented – looked at: > https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html > <https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html> > > > > ”scheck.sh stop” fails – indefinitely waits for wrapper to stop. A simple > kill <pid> terminates it. > > > > Any recommendations – i.e. measures to avoid hanging in the “looks good = > status zero, but down” state? > > > > Below/attached is the information about os version and configuration. > > > > Thanks in advance, > > Christoph > > > > Linux muctxp5b 2.6.32-754.3.5.el6.x86_64 #1 SMP Thu Aug 9 11:56:22 EDT > 2018 x86_64 x86_64 x86_64 GNU/Linux > > > > |
|
From: Christoph S. <csc...@am...> - 2019-01-09 09:11:32
|
CONFIDENTIAL & RESTRICTED
Hello Leif,
Thanks for your response.
Easy one first, the version we use:
[scheck@muctxp5b scheck_unix4]$ ./wrapper --version
Java Service Wrapper Community Edition 64-bit 3.5.30
Copyright (C) 1999-2016 Tanuki Software, Ltd. All Rights Reserved.
http://wrapper.tanukisoftware.com
concerning the forced kill, I think I have seen once on another instance and time in the wrapper log something like "..JVM received sigkill (9)..".
In the case I looked at, the JVM process owned by the wrapper was gone, which suits the DOWN_CLEAN as you explained.
I'll turn on debug output on a few of them in case it happens again.
As I interpret it, the configuration as such is OK, as well as the normal behaviour: when I i.e. kill the JVM manually, the wrapper brings it back online. And due to the OOM situation - more precisely, wrapper and JVM were limited by 1024 processes max in ulimits - the wrapper was not able i.e. to fork a command and that could explain why recovery stalled. Likely is that other wrapper / JVM tandems on the same machine (20-30 tandems) faced the same trouble and tried to recover, which would mean sometimes the ceiling was reached, sometimes not (i.e. when yet another jvm with many threads was killed or die). Does this makes sense to you?
Should I look into updating my script to interpret the output of "app.sh status" concerning certain Java:__ states and kill the wrapper ?
(in such a case the veritas cluster would consider the resource being offline and start the wrapper again).
If that is a good idea depends on the amount of states to consider and for how long such a state can be tolerated. Maybe it is paranoid, since our box is very big, we should be fine concerning OOM unless we screw up settings again. We're newbies on Linux, used windows for years.
Cheers,
Christoph
From: Leif Mortenson [mailto:lei...@ta...]
Sent: 09 January 2019 03:10
To: Wrapper User List <wra...@li...>
Subject: [EXT] Re: [Wrapper-user] no JVM running (state: DOWN_CLEAN) on linux after OOM
Christoph
1) Could you please send me the wrapper.log file with debug output enabled (wrapper.debug=true) that shows what is happening when the Wrapper is failing to restart the JVM?
Please include the part of the log showing the last few moments of the JVM that runs out of memory as well.
2) What version of the Wrapper are you running?
The following issue was fixed in 3.5.16 and sounds like it might be what you are seeing.
https://wrapper.tanukisoftware.com/doc/english/release-notes.html#3.5.16<https://clicktime.symantec.com/a/1/s_mZYlanJcqYJWQ55URpsksoMfAB69FuqpCaCaHcFZI=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Frelease-notes.html%233.5.16>
---
Fix a problem where a JVM process was not stopped completely on a UNIX platform and stayed defunct after a forced kill until the Wrapper process itself stopped. This was especially noticeable if the JVM is frozen and the JVM is being killed forcibly.
---
Are you seeing a zombie Java process still running?
This bug meant that the JVM was being left around in the background when the Wrapper thought it was gone.
If you are out of memory then the next JVM would not have enough memory to launch.
If the first JVM is not actually frozen, it would shut itself down after losing its backend connection to the Wrapper. But that might be happening too late and result in what you are seeing.
3) The DOWN_CLEAN state means that the Wrapper has completely shutdown the JVM and cleaned up any associated resources.
We will take a look at the documentation on the following page as you are correct that it is missing some information.
https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html<https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html>
Cheers,
Leif
On Tue, Jan 8, 2019 at 8:32 PM Christoph SCHWAIGER <csc...@am...<mailto:csc...@am...>> wrote:
CONFIDENTIAL & RESTRICTED
Hello Leif,
Thanks for the information about the subscription. I did so.
We have been using the wrapper on windows for many years, since a couple of years we have a standard support version.
Our problem is on linux RH. After an out of memory situation (the jvm exited) it is not restarted and remains down indefinitely, the status script exits with status zero, so all looks up for the cluster. (integrated into veritas cluster). The OOM was bad: not related to JVM, but caused by overly optimistic ulimits of the user - that has been corrected.
STATUS | wrapper | 2019/01/07 13:38:41 | Launching a JVM...
INFO | jvm 1 | 2019/01/07 13:38:43 | WrapperManager: Initializing...
INFO | jvm 1 | 2019/01/07 13:38:45 | S-Check version 3.0.4 Monte Rosa from 12-Sep-2018 08:02 by cschwaiger
INFO | jvm 1 | 2019/01/07 13:38:45 | Scheck is starting on server MUCTXP5B
INFO | jvm 1 | 2019/01/07 13:38:52 | parsed 1 xml files and created 0 service records.
INFO | jvm 1 | 2019/01/07 15:02:11 | Exception in thread "InactivityMonitor WriteCheck" java.lang.OutOfMemoryError: unable to create new native thread
STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory. Restarting JVM.
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.lang.Thread.start0(Native Method)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.lang.Thread.start(Thread.java:717)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.transport.InactivityMonitor.writeCheck(InactivityMonitor.java:147)
INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.transport.InactivityMonitor$2.run(InactivityMonitor.java:113)
INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.TimerThread.mainLoop(Timer.java:555)
INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.TimerThread.run(Timer.java:505)
ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out waiting for signal from JVM.
ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request, termination requested.
STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL (9).
STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone.
STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested to terminate.
STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper configuration...
STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM...
[scheck@muctxp5b scheck_unix11]$ ./scheck.sh status
Service check monitoring instance (not installed) is running: PID:56766, Wrapper:STARTED, Java:DOWN_CLEAN
I could not find the DOWN_CLEAN state documented - looked at: https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html<https://clicktime.symantec.com/a/1/pFsMh63Y_XDBdbw7xETbo40_Uhah2ByBR7xqKlm8s8w=?d=lq4ISNO68RaA_a5U2L38JAI42nrP-Lj0_jQA4RKR0ryTRdGXvAEAfHiDUn-vKdryduqkwm-zX0YYsOECXFXDc6niuyt7Ae837n0-wWAZ8u99Nabj6hxgw76Xg8rXhtHV8FEA0rrzVL_1TAZuUAMX2ztmAkWA0qdhQO1XYUkMswad3bsnlUv2XxQZ09Oc1lbfNAXv0DNlGOaVnU6lrHEJobFamicDkAhsG_GVSZVC9oI_NjgxAcJ-M7XOvhLaol54ep5LiB5j_uxRx-67kzXJbZT0fZIK8-9mNXr7t7qXXF3EHeUiqKaJuWdkuTfMfI_ZzmE2QhUiHCnSvJmRfKZPZ8K_jzVJBlUz0PDGfOAqzIOVsQmLsYSkVxRtrXkK_DwR_O_u91EdthCtNsTLDOxUBJzYFmWLI6CrV_jpYReCYAthEio3DegMb4kU9fCvs37XzsrCLlh41tLw87m9neyQHU9F5aZIyZY1&u=https%3A%2F%2Fwrapper.tanukisoftware.com%2Fdoc%2Fenglish%2Fprop-java-statusfile.html>
"scheck.sh stop" fails - indefinitely waits for wrapper to stop. A simple kill <pid> terminates it.
Any recommendations - i.e. measures to avoid hanging in the "looks good = status zero, but down" state?
Below/attached is the information about os version and configuration.
Thanks in advance,
Christoph
Linux muctxp5b 2.6.32-754.3.5.el6.x86_64 #1 SMP Thu Aug 9 11:56:22 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
|
|
From: Leif M. <lei...@ta...> - 2019-01-09 02:39:08
|
Christoph 1) Could you please send me the wrapper.log file with debug output enabled (wrapper.debug=true) that shows what is happening when the Wrapper is failing to restart the JVM? Please include the part of the log showing the last few moments of the JVM that runs out of memory as well. 2) What version of the Wrapper are you running? The following issue was fixed in 3.5.16 and sounds like it might be what you are seeing. https://wrapper.tanukisoftware.com/doc/english/release-notes.html#3.5.16 --- Fix a problem where a JVM process was not stopped completely on a UNIX platform and stayed defunct after a forced kill until the Wrapper process itself stopped. This was especially noticeable if the JVM is frozen and the JVM is being killed forcibly. --- Are you seeing a zombie Java process still running? This bug meant that the JVM was being left around in the background when the Wrapper thought it was gone. If you are out of memory then the next JVM would not have enough memory to launch. If the first JVM is not actually frozen, it would shut itself down after losing its backend connection to the Wrapper. But that might be happening too late and result in what you are seeing. 3) The DOWN_CLEAN state means that the Wrapper has completely shutdown the JVM and cleaned up any associated resources. We will take a look at the documentation on the following page as you are correct that it is missing some information. https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html Cheers, Leif On Tue, Jan 8, 2019 at 8:32 PM Christoph SCHWAIGER <csc...@am...> wrote: > CONFIDENTIAL & RESTRICTED > > Hello Leif, > > > > Thanks for the information about the subscription. I did so. > > > > We have been using the wrapper on windows for many years, since a couple > of years we have a standard support version. > > > > Our problem is on linux RH. *After an out of memory situation (the jvm > exited) it is not restarted and remains down indefinitely, the status > script exits with status zero*, so all looks up for the cluster. > (integrated into veritas cluster). The OOM was bad: not related to JVM, but > caused by overly optimistic ulimits of the user - that has been corrected. > > > > STATUS | wrapper | 2019/01/07 13:38:41 | Launching a JVM... > > INFO | jvm 1 | 2019/01/07 13:38:43 | WrapperManager: Initializing... > > INFO | jvm 1 | 2019/01/07 13:38:45 | S-Check version 3.0.4 Monte Rosa > from 12-Sep-2018 08:02 by cschwaiger > > INFO | jvm 1 | 2019/01/07 13:38:45 | Scheck is starting on server > MUCTXP5B > > INFO | jvm 1 | 2019/01/07 13:38:52 | parsed 1 xml files and created 0 > service records. > > INFO | jvm 1 | 2019/01/07 15:02:11 | Exception in thread > "InactivityMonitor WriteCheck" java.lang.OutOfMemoryError: unable to create > new native thread > > STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory. > Restarting JVM. > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.lang.Thread.start0(Native Method) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.lang.Thread.start(Thread.java:717) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > org.apache.activemq.transport.InactivityMonitor.writeCheck(InactivityMonitor.java:147) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > org.apache.activemq.transport.InactivityMonitor$2.run(InactivityMonitor.java:113) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.util.TimerThread.mainLoop(Timer.java:555) > > INFO | jvm 1 | 2019/01/07 15:02:11 | at > java.util.TimerThread.run(Timer.java:505) > > ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out > waiting for signal from JVM. > > ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request, > termination requested. > > STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL > (9). > > STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone. > > STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested > to terminate. > > STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper > configuration... > > STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM... > > > > [scheck@muctxp5b scheck_unix11]$ ./scheck.sh status > > *Service check monitoring instance (not installed) is running: PID:56766, > Wrapper:STARTED, Java:DOWN_CLEAN* > > > > I could not find the DOWN_CLEAN state documented – looked at: > https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html > > > > ”scheck.sh stop” fails – indefinitely waits for wrapper to stop. A simple > kill <pid> terminates it. > > > > Any recommendations – i.e. measures to avoid hanging in the “looks good = > status zero, but down” state? > > > > Below/attached is the information about os version and configuration. > > > > Thanks in advance, > > Christoph > > > > Linux muctxp5b 2.6.32-754.3.5.el6.x86_64 #1 SMP Thu Aug 9 11:56:22 EDT > 2018 x86_64 x86_64 x86_64 GNU/Linux > > > |
|
From: Christoph S. <csc...@am...> - 2019-01-08 11:32:25
|
CONFIDENTIAL & RESTRICTED Hello Leif, Thanks for the information about the subscription. I did so. We have been using the wrapper on windows for many years, since a couple of years we have a standard support version. Our problem is on linux RH. After an out of memory situation (the jvm exited) it is not restarted and remains down indefinitely, the status script exits with status zero, so all looks up for the cluster. (integrated into veritas cluster). The OOM was bad: not related to JVM, but caused by overly optimistic ulimits of the user - that has been corrected. STATUS | wrapper | 2019/01/07 13:38:41 | Launching a JVM... INFO | jvm 1 | 2019/01/07 13:38:43 | WrapperManager: Initializing... INFO | jvm 1 | 2019/01/07 13:38:45 | S-Check version 3.0.4 Monte Rosa from 12-Sep-2018 08:02 by cschwaiger INFO | jvm 1 | 2019/01/07 13:38:45 | Scheck is starting on server MUCTXP5B INFO | jvm 1 | 2019/01/07 13:38:52 | parsed 1 xml files and created 0 service records. INFO | jvm 1 | 2019/01/07 15:02:11 | Exception in thread "InactivityMonitor WriteCheck" java.lang.OutOfMemoryError: unable to create new native thread STATUS | wrapper | 2019/01/07 15:02:11 | The JVM has run out of memory. Restarting JVM. INFO | jvm 1 | 2019/01/07 15:02:11 | at java.lang.Thread.start0(Native Method) INFO | jvm 1 | 2019/01/07 15:02:11 | at java.lang.Thread.start(Thread.java:717) INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.transport.InactivityMonitor.writeCheck(InactivityMonitor.java:147) INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.transport.InactivityMonitor$2.run(InactivityMonitor.java:113) INFO | jvm 1 | 2019/01/07 15:02:11 | at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33) INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.TimerThread.mainLoop(Timer.java:555) INFO | jvm 1 | 2019/01/07 15:02:11 | at java.util.TimerThread.run(Timer.java:505) ERROR | wrapper | 2019/01/07 15:02:45 | Shutdown failed: Timed out waiting for signal from JVM. ERROR | wrapper | 2019/01/07 15:02:46 | JVM did not exit on request, termination requested. STATUS | wrapper | 2019/01/07 15:02:46 | JVM received a signal SIGKILL (9). STATUS | wrapper | 2019/01/07 15:02:46 | JVM process is gone. STATUS | wrapper | 2019/01/07 15:02:46 | JVM exited after being requested to terminate. STATUS | wrapper | 2019/01/07 15:02:50 | Reloading Wrapper configuration... STATUS | wrapper | 2019/01/07 15:02:50 | Launching a JVM... [scheck@muctxp5b scheck_unix11]$ ./scheck.sh status Service check monitoring instance (not installed) is running: PID:56766, Wrapper:STARTED, Java:DOWN_CLEAN I could not find the DOWN_CLEAN state documented - looked at: https://wrapper.tanukisoftware.com/doc/english/prop-java-statusfile.html "scheck.sh stop" fails - indefinitely waits for wrapper to stop. A simple kill <pid> terminates it. Any recommendations - i.e. measures to avoid hanging in the "looks good = status zero, but down" state? Below/attached is the information about os version and configuration. Thanks in advance, Christoph Linux muctxp5b 2.6.32-754.3.5.el6.x86_64 #1 SMP Thu Aug 9 11:56:22 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux |
|
From: Maxime <ma...@ta...> - 2018-12-18 01:39:50
|
Hello everyone, We are proud to announce the release of version 3.5.37 of the Java Service Wrapper. http://wrapper.tanukisoftware.org/doc/english/download.jsp This version includes several bug fixes and improvements. You can review the release notes for a full list of changes. http://wrapper.tanukisoftware.org/doc/english/release-notes.html Please let us know if you have any questions about the release. Sincerely, Java Service Wrapper Team Tanuki Software, Ltd. |
|
From: Pravin S. K. <pka...@tr...> - 2018-11-28 13:53:07
|
Dear Tanuki Wrapper Support Team,
We were using Tanuki Wrapper(version 3.5.25) with Spring boot 1 earlier without any issue.
With Spring Boot 2, I am getting below error.
STATUS | wrapper | Launching a JVM...
INFO | jvm 4 | Error: Could not find or load main class com.mycompany.project.App
ERROR | wrapper | JVM exited while loading the application.
Main class in wrapper conf file -
wrapper.java.mainclass=com.mycompany.project.App
Because of Spring boot 2, my application jar structure has changed and different from previous structure which causes above error.
Spring boot 2 jar structure below(Throwing above error)-
example.war
|
+-META-INF
| +-MANIFEST.MF
+-org
| +-springframework
| +-boot
| +-loader
| +-<spring boot loader classes>
+-BOOT-INF
+-classes
| +-com
| +-mycompany
| +-project
| +-App.class
+-lib
| +-dependency1.jar
| +-dependency2.jar
Spring boot 1 jar structure (Working well with Tanuki wrapper 3.5.25) below:
example.war
|
+-META-INF
| +-MANIFEST.MF
+-org
| +-springframework
| +-boot
| +-loader
| +-<spring boot loader classes>
+-com
| +-mycompany
| +-project
| +-App.class
+-lib
+-dependency1.jar
+-dependency2.jar
Could you please provide your inputs to resolve the issue?
Thanks,
Pravin
|
|
From: Maxime <ma...@ta...> - 2018-10-02 04:00:52
|
Michael Thank you for your email. Using the -fPIC should be fine. We use it on most of the platforms, including FreeBSD 64-bit. -fPIC makes reverse engineering harder so we prefer it for security. Although using position independent code will in theory make a program slightly slower to execute, this should not be an issue with the Wrapper which doesn't really require the small gain of performance of not using it. Note that you could also try the -fpic (lowercase option) which has some platform-dependent limitations but generates smaller and faster code. We will consider adding this option in the make of the FreeBSD 32-bit for the next release. Please let me know if you have any other questions. Regards, Maxime On Sat, Sep 29, 2018 at 4:47 PM, Michael Osipov <198...@gm...> wrote: > Hi folks, > > FreeBSD is evaluting LLVM ld (lld) as its default linker on i386 and I am > the port maintainer of JSW. It is a bit stricter and GNU ld and does not > allow dynamic relocations in the readonly segment: > > [exec] /usr/bin/ld: error: can't create dynamic relocation R_386_PC32 >> against symbol: strcmp in readonly segment; recompile object files with >> -fPIC >> [exec] >>> defined in /lib/libc.so.7 >> [exec] >>> referenced by wrapper_i18n.c >> [exec] >>> wrapper_i18n.o:(multiByteToWideChar) >> [exec] >> [exec] /usr/bin/ld: error: can't create dynamic relocation R_386_32 >> against symbol: .L.str in readonly segment; recompile object files with >> -fPIC >> [exec] >>> defined in wrapper_i18n.o >> [exec] >>> referenced by wrapper_i18n.c >> [exec] >>> wrapper_i18n.o:(multiByteToWideChar) >> [exec] >> [exec] /usr/bin/ld: error: can't create dynamic relocation >> R_386_PC32 against symbol: strcmp in readonly segment; recompile object >> files with -fPIC >> [exec] >>> defined in /lib/libc.so.7 >> [exec] >>> referenced by wrapper_i18n.c >> [exec] >>> wrapper_i18n.o:(multiByteToWideChar) >> [exec] >> > >... > > There are now two approaches to solve this, either compile as > position-independent code (-fPIC) or restore the previous behavior with > "LDFLAGS=-Wl,-znotext". > > Since I cannot properly evaluate the impliciations, can you tell what > would be the right choice here? I know that the amd64 target is already > position-independent. > > See also the reference bug: https://bugs.freebsd.org/bugzi > lla/show_bug.cgi?id=214864 > > Regards, > > Michael > > > _______________________________________________ > Wrapper-user mailing list > Wra...@li... > https://lists.sourceforge.net/lists/listinfo/wrapper-user > |
|
From: Michael O. <198...@gm...> - 2018-09-29 07:47:18
|
Hi folks, FreeBSD is evaluting LLVM ld (lld) as its default linker on i386 and I am the port maintainer of JSW. It is a bit stricter and GNU ld and does not allow dynamic relocations in the readonly segment: > [exec] /usr/bin/ld: error: can't create dynamic relocation R_386_PC32 against symbol: strcmp in readonly segment; recompile object files with -fPIC > [exec] >>> defined in /lib/libc.so.7 > [exec] >>> referenced by wrapper_i18n.c > [exec] >>> wrapper_i18n.o:(multiByteToWideChar) > [exec] > [exec] /usr/bin/ld: error: can't create dynamic relocation R_386_32 against symbol: .L.str in readonly segment; recompile object files with -fPIC > [exec] >>> defined in wrapper_i18n.o > [exec] >>> referenced by wrapper_i18n.c > [exec] >>> wrapper_i18n.o:(multiByteToWideChar) > [exec] > [exec] /usr/bin/ld: error: can't create dynamic relocation R_386_PC32 against symbol: strcmp in readonly segment; recompile object files with -fPIC > [exec] >>> defined in /lib/libc.so.7 > [exec] >>> referenced by wrapper_i18n.c > [exec] >>> wrapper_i18n.o:(multiByteToWideChar) > [exec] >... There are now two approaches to solve this, either compile as position-independent code (-fPIC) or restore the previous behavior with "LDFLAGS=-Wl,-znotext". Since I cannot properly evaluate the impliciations, can you tell what would be the right choice here? I know that the amd64 target is already position-independent. See also the reference bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214864 Regards, Michael |
|
From: Maxime <ma...@ta...> - 2018-09-25 02:01:00
|
Hello everyone, We are proud to announce the release of version 3.5.36 of the Java Service Wrapper. http://wrapper.tanukisoftware.org/doc/english/download.jsp This version includes several bug fixes and improvements. You can review the release notes for a full list of changes. http://wrapper.tanukisoftware.org/doc/english/release-notes.html Please let us know if you have any questions about the release. Sincerely, Java Service Wrapper Team Tanuki Software, Ltd. |
|
From: Etienne J. <lap...@gm...> - 2018-09-12 12:32:48
|
And finally. What I did, because I work on Linux plateform. Log folder is /var/log/jenkins And I execute the following command chmod g+s /var/log/jenkins Regards Etienne Jouvin Le mer. 12 sept. 2018 à 14:10, Etienne Jouvin <lap...@gm...> a écrit : > Ok fine. > > I was "just" wondering and I have the answer. > > I will manage in other way. > > Regards > > Etienne > > > Le mer. 12 sept. 2018 à 04:16, Maxime Andrighetto < > max...@ta...> a écrit : > >> Etienne >> >> Sorry, I was mistaken. It is possible to change the group if your user is >> the owner of the file and also belong to the group. >> However there is currently no configuration properties to change the >> group for now. This is something that we will add in a future version, >> similar to the umask properties. >> >> In the meantime, I can suggest you the following workaround even though >> it is not straightforward: >> >> When the Wrapper starts and whenever the log file changes, a notification >> is sent to the JVM. In response, the WrapperManager will raise a Java event >> which you could subscribe to. >> For this you need to have a class that implements the >> WrapperEventListener interface. Basically you need to have a fired() method >> which receives a WrapperEvent instance, check that this instance is of type >> WrapperLogFileChangedEvent, and then execute your code to change the group >> of the log file. >> >> If you use the professional edition, you can trigger a User event from >> the Java code and execute a shell script in response which would update the >> group of the log file whenever it changes. Alternatively, you could also >> use timers to regularly ensure that the log file has the correct group and >> update it if needed. >> >> Please let me know if you need further details on one of the above >> methods. >> >> Best Regards, >> >> Maxime >> >> >> On Wed, Sep 12, 2018 at 9:16 AM, Maxime Andrighetto < >> max...@ta...> wrote: >> >>> Etienne >>> >>> Thank you for your reply. >>> >>> Unfortunately it is not possible to change the group of the log file >>> without having the root/sudo permission. >>> So this is not possible when the Wrapper is running with the jenkins >>> user. >>> >>> You will have to edit the ownership of your file with a linux command or >>> manually, using the root user. >>> >>> Best Regards, >>> >>> Maxime >>> >>> On Tue, Sep 11, 2018 at 7:37 PM, Etienne Jouvin <lap...@gm... >>> > wrote: >>> >>>> Hello. >>>> >>>> In fact, I am using Wrapper with the projet Jenkins Runner: >>>> https://github.com/mnadeem/JenkinsRunner >>>> >>>> The service is run as a specific user, let's say "jenkins". >>>> >>>> As I am using it under "Ubuntu", I wanted to centralize logs as it is >>>> done. >>>> So I created a folder /var/log/jenkins, and logs are created with name >>>> like jenkins.log. >>>> >>>> What I wanted, is to have permissions for owner jenkins, and group adm, >>>> as if I did something like this : >>>> mkdir /var/log/jenkins >>>> chown jenkins:adm /var/log/jenkins >>>> >>>> But when log files are created, the ownership is something like >>>> jenkins:jenkins. Group may comes from the default group for user jenkins. >>>> But I do not want to put user jenkins in group adm by default, because this >>>> is not an administrator. >>>> >>>> So in fact, this is not a matter of changing the owner (my bad for the >>>> description), but more changing the group. >>>> >>>> If not possible, I will find a way to do it with configuration on the >>>> LInux system. >>>> >>>> Regards >>>> >>>> Etienne Jouvin >>>> >>>> >>>> >>>> Le mar. 11 sept. 2018 à 04:17, Maxime <ma...@ta...> a >>>> écrit : >>>> >>>>> Etienne >>>>> >>>>> Thank you for your email. >>>>> >>>>> Are you running the Wrapper as root? >>>>> The Wrapper can change the permissions of the log file because it is >>>>> owner of it, but it cannot change the ownership (this would require running >>>>> itself as root anyway). >>>>> The Wrapper creates the log file, writes in it and rolls it if needed, >>>>> so usually the user of the Wrapper process is also the owner of the log >>>>> file. >>>>> For this reason there is currently no property to change the owner of >>>>> the log file. >>>>> >>>>> May I ask the use case in which you need to have the owner of the log >>>>> file different than the user of the Wrapper? >>>>> >>>>> Best Regards, >>>>> >>>>> Maxime >>>>> >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Wrapper-user mailing list >>>> Wra...@li... >>>> https://lists.sourceforge.net/lists/listinfo/wrapper-user >>>> >>>> >>> >>> >>> -- >>> Maxime Andrighetto >>> Tanuki Software Ltd. >>> 6-18-10-4F Nishi-Kasai, Edogawa-ku >>> Tokyo 134-0088 Japan >>> Tel: +81-3-3878-3211 >>> Fax: +81-3-3878-0313 >>> http://www.tanukisoftware.com >>> >> >> >> >> -- >> Maxime Andrighetto >> Tanuki Software Ltd. >> 6-18-10-4F Nishi-Kasai, Edogawa-ku >> Tokyo 134-0088 Japan >> Tel: +81-3-3878-3211 >> Fax: +81-3-3878-0313 >> http://www.tanukisoftware.com >> _______________________________________________ >> Wrapper-user mailing list >> Wra...@li... >> https://lists.sourceforge.net/lists/listinfo/wrapper-user >> > |
|
From: Etienne J. <lap...@gm...> - 2018-09-12 12:10:41
|
Ok fine. I was "just" wondering and I have the answer. I will manage in other way. Regards Etienne Le mer. 12 sept. 2018 à 04:16, Maxime Andrighetto < max...@ta...> a écrit : > Etienne > > Sorry, I was mistaken. It is possible to change the group if your user is > the owner of the file and also belong to the group. > However there is currently no configuration properties to change the group > for now. This is something that we will add in a future version, similar to > the umask properties. > > In the meantime, I can suggest you the following workaround even though it > is not straightforward: > > When the Wrapper starts and whenever the log file changes, a notification > is sent to the JVM. In response, the WrapperManager will raise a Java event > which you could subscribe to. > For this you need to have a class that implements the WrapperEventListener > interface. Basically you need to have a fired() method which receives a > WrapperEvent instance, check that this instance is of type > WrapperLogFileChangedEvent, and then execute your code to change the group > of the log file. > > If you use the professional edition, you can trigger a User event from the > Java code and execute a shell script in response which would update the > group of the log file whenever it changes. Alternatively, you could also > use timers to regularly ensure that the log file has the correct group and > update it if needed. > > Please let me know if you need further details on one of the above methods. > > Best Regards, > > Maxime > > > On Wed, Sep 12, 2018 at 9:16 AM, Maxime Andrighetto < > max...@ta...> wrote: > >> Etienne >> >> Thank you for your reply. >> >> Unfortunately it is not possible to change the group of the log file >> without having the root/sudo permission. >> So this is not possible when the Wrapper is running with the jenkins user. >> >> You will have to edit the ownership of your file with a linux command or >> manually, using the root user. >> >> Best Regards, >> >> Maxime >> >> On Tue, Sep 11, 2018 at 7:37 PM, Etienne Jouvin <lap...@gm...> >> wrote: >> >>> Hello. >>> >>> In fact, I am using Wrapper with the projet Jenkins Runner: >>> https://github.com/mnadeem/JenkinsRunner >>> >>> The service is run as a specific user, let's say "jenkins". >>> >>> As I am using it under "Ubuntu", I wanted to centralize logs as it is >>> done. >>> So I created a folder /var/log/jenkins, and logs are created with name >>> like jenkins.log. >>> >>> What I wanted, is to have permissions for owner jenkins, and group adm, >>> as if I did something like this : >>> mkdir /var/log/jenkins >>> chown jenkins:adm /var/log/jenkins >>> >>> But when log files are created, the ownership is something like >>> jenkins:jenkins. Group may comes from the default group for user jenkins. >>> But I do not want to put user jenkins in group adm by default, because this >>> is not an administrator. >>> >>> So in fact, this is not a matter of changing the owner (my bad for the >>> description), but more changing the group. >>> >>> If not possible, I will find a way to do it with configuration on the >>> LInux system. >>> >>> Regards >>> >>> Etienne Jouvin >>> >>> >>> >>> Le mar. 11 sept. 2018 à 04:17, Maxime <ma...@ta...> a >>> écrit : >>> >>>> Etienne >>>> >>>> Thank you for your email. >>>> >>>> Are you running the Wrapper as root? >>>> The Wrapper can change the permissions of the log file because it is >>>> owner of it, but it cannot change the ownership (this would require running >>>> itself as root anyway). >>>> The Wrapper creates the log file, writes in it and rolls it if needed, >>>> so usually the user of the Wrapper process is also the owner of the log >>>> file. >>>> For this reason there is currently no property to change the owner of >>>> the log file. >>>> >>>> May I ask the use case in which you need to have the owner of the log >>>> file different than the user of the Wrapper? >>>> >>>> Best Regards, >>>> >>>> Maxime >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Wrapper-user mailing list >>> Wra...@li... >>> https://lists.sourceforge.net/lists/listinfo/wrapper-user >>> >>> >> >> >> -- >> Maxime Andrighetto >> Tanuki Software Ltd. >> 6-18-10-4F Nishi-Kasai, Edogawa-ku >> Tokyo 134-0088 Japan >> Tel: +81-3-3878-3211 >> Fax: +81-3-3878-0313 >> http://www.tanukisoftware.com >> > > > > -- > Maxime Andrighetto > Tanuki Software Ltd. > 6-18-10-4F Nishi-Kasai, Edogawa-ku > Tokyo 134-0088 Japan > Tel: +81-3-3878-3211 > Fax: +81-3-3878-0313 > http://www.tanukisoftware.com > _______________________________________________ > Wrapper-user mailing list > Wra...@li... > https://lists.sourceforge.net/lists/listinfo/wrapper-user > |
|
From: Maxime A. <max...@ta...> - 2018-09-12 02:16:32
|
Etienne Sorry, I was mistaken. It is possible to change the group if your user is the owner of the file and also belong to the group. However there is currently no configuration properties to change the group for now. This is something that we will add in a future version, similar to the umask properties. In the meantime, I can suggest you the following workaround even though it is not straightforward: When the Wrapper starts and whenever the log file changes, a notification is sent to the JVM. In response, the WrapperManager will raise a Java event which you could subscribe to. For this you need to have a class that implements the WrapperEventListener interface. Basically you need to have a fired() method which receives a WrapperEvent instance, check that this instance is of type WrapperLogFileChangedEvent, and then execute your code to change the group of the log file. If you use the professional edition, you can trigger a User event from the Java code and execute a shell script in response which would update the group of the log file whenever it changes. Alternatively, you could also use timers to regularly ensure that the log file has the correct group and update it if needed. Please let me know if you need further details on one of the above methods. Best Regards, Maxime On Wed, Sep 12, 2018 at 9:16 AM, Maxime Andrighetto < max...@ta...> wrote: > Etienne > > Thank you for your reply. > > Unfortunately it is not possible to change the group of the log file > without having the root/sudo permission. > So this is not possible when the Wrapper is running with the jenkins user. > > You will have to edit the ownership of your file with a linux command or > manually, using the root user. > > Best Regards, > > Maxime > > On Tue, Sep 11, 2018 at 7:37 PM, Etienne Jouvin <lap...@gm...> > wrote: > >> Hello. >> >> In fact, I am using Wrapper with the projet Jenkins Runner: >> https://github.com/mnadeem/JenkinsRunner >> >> The service is run as a specific user, let's say "jenkins". >> >> As I am using it under "Ubuntu", I wanted to centralize logs as it is >> done. >> So I created a folder /var/log/jenkins, and logs are created with name >> like jenkins.log. >> >> What I wanted, is to have permissions for owner jenkins, and group adm, >> as if I did something like this : >> mkdir /var/log/jenkins >> chown jenkins:adm /var/log/jenkins >> >> But when log files are created, the ownership is something like >> jenkins:jenkins. Group may comes from the default group for user jenkins. >> But I do not want to put user jenkins in group adm by default, because this >> is not an administrator. >> >> So in fact, this is not a matter of changing the owner (my bad for the >> description), but more changing the group. >> >> If not possible, I will find a way to do it with configuration on the >> LInux system. >> >> Regards >> >> Etienne Jouvin >> >> >> >> Le mar. 11 sept. 2018 à 04:17, Maxime <ma...@ta...> a >> écrit : >> >>> Etienne >>> >>> Thank you for your email. >>> >>> Are you running the Wrapper as root? >>> The Wrapper can change the permissions of the log file because it is >>> owner of it, but it cannot change the ownership (this would require running >>> itself as root anyway). >>> The Wrapper creates the log file, writes in it and rolls it if needed, >>> so usually the user of the Wrapper process is also the owner of the log >>> file. >>> For this reason there is currently no property to change the owner of >>> the log file. >>> >>> May I ask the use case in which you need to have the owner of the log >>> file different than the user of the Wrapper? >>> >>> Best Regards, >>> >>> Maxime >>> >>> >>> >>> >> >> _______________________________________________ >> Wrapper-user mailing list >> Wra...@li... >> https://lists.sourceforge.net/lists/listinfo/wrapper-user >> >> > > > -- > Maxime Andrighetto > Tanuki Software Ltd. > 6-18-10-4F Nishi-Kasai, Edogawa-ku > Tokyo 134-0088 Japan > Tel: +81-3-3878-3211 > Fax: +81-3-3878-0313 > http://www.tanukisoftware.com > -- Maxime Andrighetto Tanuki Software Ltd. 6-18-10-4F Nishi-Kasai, Edogawa-ku Tokyo 134-0088 Japan Tel: +81-3-3878-3211 Fax: +81-3-3878-0313 http://www.tanukisoftware.com |
|
From: Maxime A. <max...@ta...> - 2018-09-12 00:17:32
|
Etienne Thank you for your reply. Unfortunately it is not possible to change the group of the log file without having the root/sudo permission. So this is not possible when the Wrapper is running with the jenkins user. You will have to edit the ownership of your file with a linux command or manually, using the root user. Best Regards, Maxime On Tue, Sep 11, 2018 at 7:37 PM, Etienne Jouvin <lap...@gm...> wrote: > Hello. > > In fact, I am using Wrapper with the projet Jenkins Runner: > https://github.com/mnadeem/JenkinsRunner > > The service is run as a specific user, let's say "jenkins". > > As I am using it under "Ubuntu", I wanted to centralize logs as it is done. > So I created a folder /var/log/jenkins, and logs are created with name > like jenkins.log. > > What I wanted, is to have permissions for owner jenkins, and group adm, as > if I did something like this : > mkdir /var/log/jenkins > chown jenkins:adm /var/log/jenkins > > But when log files are created, the ownership is something like > jenkins:jenkins. Group may comes from the default group for user jenkins. > But I do not want to put user jenkins in group adm by default, because this > is not an administrator. > > So in fact, this is not a matter of changing the owner (my bad for the > description), but more changing the group. > > If not possible, I will find a way to do it with configuration on the > LInux system. > > Regards > > Etienne Jouvin > > > > Le mar. 11 sept. 2018 à 04:17, Maxime <ma...@ta...> a > écrit : > >> Etienne >> >> Thank you for your email. >> >> Are you running the Wrapper as root? >> The Wrapper can change the permissions of the log file because it is >> owner of it, but it cannot change the ownership (this would require running >> itself as root anyway). >> The Wrapper creates the log file, writes in it and rolls it if needed, so >> usually the user of the Wrapper process is also the owner of the log file. >> For this reason there is currently no property to change the owner of the >> log file. >> >> May I ask the use case in which you need to have the owner of the log >> file different than the user of the Wrapper? >> >> Best Regards, >> >> Maxime >> >> >> >> > > _______________________________________________ > Wrapper-user mailing list > Wra...@li... > https://lists.sourceforge.net/lists/listinfo/wrapper-user > > -- Maxime Andrighetto Tanuki Software Ltd. 6-18-10-4F Nishi-Kasai, Edogawa-ku Tokyo 134-0088 Japan Tel: +81-3-3878-3211 Fax: +81-3-3878-0313 http://www.tanukisoftware.com |
|
From: Etienne J. <lap...@gm...> - 2018-09-11 10:38:13
|
Hello. In fact, I am using Wrapper with the projet Jenkins Runner: https://github.com/mnadeem/JenkinsRunner The service is run as a specific user, let's say "jenkins". As I am using it under "Ubuntu", I wanted to centralize logs as it is done. So I created a folder /var/log/jenkins, and logs are created with name like jenkins.log. What I wanted, is to have permissions for owner jenkins, and group adm, as if I did something like this : mkdir /var/log/jenkins chown jenkins:adm /var/log/jenkins But when log files are created, the ownership is something like jenkins:jenkins. Group may comes from the default group for user jenkins. But I do not want to put user jenkins in group adm by default, because this is not an administrator. So in fact, this is not a matter of changing the owner (my bad for the description), but more changing the group. If not possible, I will find a way to do it with configuration on the LInux system. Regards Etienne Jouvin Le mar. 11 sept. 2018 à 04:17, Maxime <ma...@ta...> a écrit : > Etienne > > Thank you for your email. > > Are you running the Wrapper as root? > The Wrapper can change the permissions of the log file because it is owner > of it, but it cannot change the ownership (this would require running > itself as root anyway). > The Wrapper creates the log file, writes in it and rolls it if needed, so > usually the user of the Wrapper process is also the owner of the log file. > For this reason there is currently no property to change the owner of the > log file. > > May I ask the use case in which you need to have the owner of the log file > different than the user of the Wrapper? > > Best Regards, > > Maxime > > > > |
|
From: Maxime <ma...@ta...> - 2018-09-11 02:17:42
|
Etienne Thank you for your email. Are you running the Wrapper as root? The Wrapper can change the permissions of the log file because it is owner of it, but it cannot change the ownership (this would require running itself as root anyway). The Wrapper creates the log file, writes in it and rolls it if needed, so usually the user of the Wrapper process is also the owner of the log file. For this reason there is currently no property to change the owner of the log file. May I ask the use case in which you need to have the owner of the log file different than the user of the Wrapper? Best Regards, Maxime On Tue, Sep 11, 2018 at 6:40 AM, Etienne Jouvin <lap...@gm...> wrote: > Version: 3.2.3 > OS : Ubuntu 18.04.1 LTS > Linux my-server 4.15.0-33-generic #36-Ubuntu SMP Wed Aug 15 16:00:05 UTC > 2018 x86_64 x86_64 x86_64 GNU/Linux > > > Hello. > > I want to configure log file properties. I show it is possible to setup > the permissions on file, but I would like to set the owner/group on the > file as I can do with a chown. > > Regards > > Etienne Jouvin > > > _______________________________________________ > Wrapper-user mailing list > Wra...@li... > https://lists.sourceforge.net/lists/listinfo/wrapper-user > > |