Edward R. Cheslek reported an issue starting STAFProc on a RHEL 7.3 PPC64 Big Endian system when the STAF.cfg file tries to register the STAX and Event Java services. STAFProc starts fine with the Java services commented out.
The STAFProc output contains:
Error on Service definition line:
SERVICE STAX LIBRARY JSTAF EXECUTE /usr/local/staf/services/stax/STAX.jar
PARMS "EVENTGENERATION Disabled LOGTCNUMSTARTS Disabled LOGTCELAPSEDTIME Disabled LOGTCSTARTSTOP Disabled NUMTHREADS 10" OPTION JVMName=STAFJVM1 OPTION J2=-Xmx1024m
Error code: 27
Reason : Error constructing service, JSTAF, Result: Unable to connect to JVM
Error in configuration file: /usr/local/staf/bin/STAF.cfg
Running "java -version" takes almost 2 minutes to complete:
# java -version
java version "1.6.0"
Java(TM) SE Runtime Environment (build pxp6460sr16-20140418_01(SR16))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux ppc64-64 jvmxp6460sr16-20140416_196573 (JIT disabled, AOT disabled)
J9VM - 20140416_196573
GC - GA24_Java6_SR16_20140416_1614_B196573)
JCL - 20140406_01
The /usr/local/staf/data/STAF/lang/java/jvm/STAFJVM1/JVMLog.1 file doesn't contain additional information about the error:
******************************************************************************
*** 20161004-12:44:55 - Start of Log for JVMName: STAFJVM1
*** JVM Executable: java
*** JVM Options : -Xmx1024m
*** JVM Version : java version "1.6.0"
Java(TM) SE Runtime Environment (build pxp6460sr16-20140418_01(SR16))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux ppc64-64 jvmxp6460sr16-20140416_196573 (JIT enabled, AOT enabled)
J9VM - 20140416_196573
JIT - r9_20130920_46510ifx5
GC - GA24_Java6_SR16_20140416_1614_B196573)
JCL - 20140406_01
*** JVM PID : 69795
******************************************************************************
******************************************************************************
*** 20161004-12:53:25 - Start of Log for JVMName: STAFJVM1
*** JVM Executable: java
*** JVM Options : -Xmx1024m
*** JVM Version : java version "1.6.0"
Java(TM) SE Runtime Environment (build pxp6460sr16-20140418_01(SR16))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux ppc64-64 jvmxp6460sr16-20140416_196573 (JIT enabled, AOT enabled)
J9VM - 20140416_196573
JIT - r9_20130920_46510ifx5
GC - GA24_Java6_SR16_20140416_1614_B196573)
JCL - 20140406_01
Here's information about the version of STAF installed on this power8 system with RHEL 7.3 big endian installed:
[root@thymelp3 STAFJVM1]# cat /usr/local/staf/install.properties
version=3.4.18
platform=linux-ppc64-64
architecture=64-bit
installer=STAFInst
file=STAF3418-linux-ppc64-64.tar
osname=Linux
osversion=*
osarch=ppc64
Installing the latest fixpack version of Java 6 for Linux PPC64 big endian by downloading it from the IBM Java Information Manager did not resolve the problem. It failed in the same manner.
[root@thymelp3 bin]# pwd
/opt/ibm/java-ppc64-60/jre/bin
[root@thymelp3 bin]# ./java -version
java version "1.6.0"
Java(TM) SE Runtime Environment (build pxp6460sr16fp30-20160726_01(SR16fp30))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux ppc64-64 jvmxp6460sr16fp30-2016
0725_312906 (JIT disabled, AOT disabled)
J9VM - 20160725_312906
GC - GA24_Java6_SR16_20160725_1417_B312906)
JCL - 20160719_01
[root@thymelp3 bin]#
Running a "java -version" command with the latest fixpack version of Java 6 takes a long time to complete as well. About 1 minute and 48 seconds.
A workaround for this issue is to comment out the registration of any Java services (e.g. STAX, Event) in the STAF.cfg file and start STAFProc and then register these the Java services via the STAF SERVICE service's ADD request.
Here's what I did to successfully start STAFProc and register the STAX and Event services:
1) Start STAFProc using the STAF.cfg file with the STAX/Event Java services commented out:
I waited about 5 seconds for STAFProc to start and then looked at the STAFProc output file and submitted a STAF PING request to verify STAFProc had started fine.
2) Use the STAF SERVICE ADD request to register the STAX and Event services. Registering the first Java service (STAX) took about 1m 48s to complete (a very long time).
First, make sure that a STAF jvm is not still running:
If so, kill the STAF jvm:
Register the STAX service:
Register the Event service:
Last edit: Sharon Lucas 2016-10-05
When STAFProc registers a Java service in the STAF.cfg file that is uses a new JVM, it has a time limit on how long it waits before for the JVM to be ready -- basically, a loop that checks up to 30 times if the JVM is ready and if it isn't, waits 1 second before retrying. If the JVM is still not ready after 30 iterations, it exits and terminates STAFProc with RC 27 (kSTAFServiceConfigurationError) and error message "Unable to connect to JVM". This wait loop takes place in the STAFServiceConstruct() method in lang/java/STAFJavaService.java. On this system, the initialization of the JVM is very slow (about 1 minute 48 seconds) so it exceeds the maximum time that it waits for the JVM to be ready. Apparently, a SERVICE ADD request does not have this same time limitation.
I will continue to investigate this issue more and see can update STAFJavaService.java to wait a longer time for a JVM to be ready perhaps by either increasing the number of times (30) that it checks if a JVM to be ready, and/or increasing the wait time within the loop, or to add the ability to configure the maximum times the STAFJavaSerivce checks if a JVM is ready. Also, may want to improve the error message from "Unable to connect to JVM" to something like "Timed out waiting to connect to the JVM".
Edward agree that I could use his Linux PPC64 system to help debug (as I cannot reproduce this on any of my systems). I will need to use a different instance of STAFProc so that he can continue to use STAF in /usr/local/staf on this system using the workaround in the mean time. This system gets re-installed usually on Tuesday or Wednesday each week. Also, as I don't have a working Linux PPC64 big endian STAF build system, I'll need to set up his system to perform a STAF Linux PPC64 big endian build.
Increased the number of times STAF tries to see if a JVM is ready from 30 to 50 and improved the error message and logged the error message in the JVM log in addition to returning it.
Verified that this fixed the problem seen on Edward's Linux PPC64 system.
Here's a cvs diff of the changes:
Last edit: Sharon Lucas 2016-12-31
This fix will be in STAF V3.4.26 which will be released by the end of December 2016. Still need to build STAF V3.4.26 for Linux PPC64 Big Endian as I don't have a working Linux PPC64 Big Endian system. Could check if can use Edward's RHEL 7.3 PPC64 Big Endian system but would have to build manually as his system is in a private network.