Menu

#1586 STAFProc fails to start due to Java service registration taking almost 2m

All
open
STAFProc (180)
5
2016-12-31
2016-10-04
No

Edward R. Cheslek reported an issue starting STAFProc on a RHEL 7.3 PPC64 Big Endian system when the STAF.cfg file tries to register the STAX and Event Java services. STAFProc starts fine with the Java services commented out.

The STAFProc output contains:

Error on Service definition line:

SERVICE STAX LIBRARY JSTAF EXECUTE /usr/local/staf/services/stax/STAX.jar
PARMS "EVENTGENERATION Disabled LOGTCNUMSTARTS Disabled LOGTCELAPSEDTIME   Disabled LOGTCSTARTSTOP Disabled NUMTHREADS 10" OPTION JVMName=STAFJVM1 OPTION J2=-Xmx1024m

Error code: 27
Reason    : Error constructing service, JSTAF, Result: Unable to connect to JVM

Error in configuration file: /usr/local/staf/bin/STAF.cfg

Running "java -version" takes almost 2 minutes to complete:

# java -version
java version "1.6.0"
Java(TM) SE Runtime Environment (build pxp6460sr16-20140418_01(SR16))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux ppc64-64 jvmxp6460sr16-20140416_196573 (JIT disabled, AOT disabled)
J9VM - 20140416_196573
GC   - GA24_Java6_SR16_20140416_1614_B196573)
JCL  - 20140406_01

The /usr/local/staf/data/STAF/lang/java/jvm/STAFJVM1/JVMLog.1 file doesn't contain additional information about the error:

******************************************************************************
*** 20161004-12:44:55 - Start of Log for JVMName: STAFJVM1
*** JVM Executable: java
*** JVM Options   : -Xmx1024m
*** JVM Version   : java version "1.6.0"
Java(TM) SE Runtime Environment (build pxp6460sr16-20140418_01(SR16))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux ppc64-64 jvmxp6460sr16-20140416_196573 (JIT enabled, AOT enabled)
J9VM - 20140416_196573
JIT  - r9_20130920_46510ifx5
GC   - GA24_Java6_SR16_20140416_1614_B196573)
JCL  - 20140406_01
*** JVM PID       : 69795
******************************************************************************

******************************************************************************
*** 20161004-12:53:25 - Start of Log for JVMName: STAFJVM1
*** JVM Executable: java
*** JVM Options   : -Xmx1024m
*** JVM Version   : java version "1.6.0"
Java(TM) SE Runtime Environment (build pxp6460sr16-20140418_01(SR16))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux ppc64-64 jvmxp6460sr16-20140416_196573 (JIT enabled, AOT enabled)
J9VM - 20140416_196573
JIT  - r9_20130920_46510ifx5
GC   - GA24_Java6_SR16_20140416_1614_B196573)
JCL  - 20140406_01

Here's information about the version of STAF installed on this power8 system with RHEL 7.3 big endian installed:

[root@thymelp3 STAFJVM1]# cat /usr/local/staf/install.properties
version=3.4.18
platform=linux-ppc64-64
architecture=64-bit
installer=STAFInst
file=STAF3418-linux-ppc64-64.tar
osname=Linux
osversion=*
osarch=ppc64

Installing the latest fixpack version of Java 6 for Linux PPC64 big endian by downloading it from the IBM Java Information Manager did not resolve the problem. It failed in the same manner.

[root@thymelp3 bin]# pwd
/opt/ibm/java-ppc64-60/jre/bin
[root@thymelp3 bin]# ./java -version
java version "1.6.0"
Java(TM) SE Runtime Environment (build pxp6460sr16fp30-20160726_01(SR16fp30))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux ppc64-64 jvmxp6460sr16fp30-2016
0725_312906 (JIT disabled, AOT disabled)
J9VM - 20160725_312906
GC   - GA24_Java6_SR16_20160725_1417_B312906)
JCL  - 20160719_01
[root@thymelp3 bin]#

Running a "java -version" command with the latest fixpack version of Java 6 takes a long time to complete as well. About 1 minute and 48 seconds.

Discussion

  • Sharon Lucas

    Sharon Lucas - 2016-10-05

    A workaround for this issue is to comment out the registration of any Java services (e.g. STAX, Event) in the STAF.cfg file and start STAFProc and then register these the Java services via the STAF SERVICE service's ADD request.

    #SERVICE STAX LIBRARY JSTAF EXECUTE /usr/local/staf/services/stax/STAX.jar \
    #  PARMS "EVENTGENERATION Disabled LOGTCNUMSTARTS Disabled LOGTCELAPSEDTIME Disa
    bled LOGTCSTARTSTOP Disabled NUMTHREADS 10" OPTION JVMName=STAFJVM1 OPTION "JVM=
    /opt/ibm/java-ppc64-60/jre/bin/java" OPTION J2=-Xmx1024m
    
    #SERVICE EVENT LIBRARY JSTAF EXECUTE /usr/local/staf/services/stax/STAFEvent.jar
    

    Here's what I did to successfully start STAFProc and register the STAX and Event services:

    1) Start STAFProc using the STAF.cfg file with the STAX/Event Java services commented out:

    [root@thymelp3 bin]# nohup /usr/local/staf/bin/STAFProc /usr/local/staf/bin/STAF_noJavaServices.cfg >/tmp/STAFProc.out 2>&1 &
    [1] 17151
    [root@thymelp3 bin]#
    

    I waited about 5 seconds for STAFProc to start and then looked at the STAFProc output file and submitted a STAF PING request to verify STAFProc had started fine.

    [root@thymelp3 bin]# tail /tmp/STAFProc.out
    nohup: ignoring input
    
    Machine          : thymelp3.isst.aus.stglabs.ibm.com
    Machine nickname : thymelp3.isst.aus.stglabs.ibm.com
    Startup time     : 20161004-16:35:13
    
    STAFProc version 3.4.18 initialized
    [root@thymelp3 bin]# STAF local PING PING
    Response
    --------
    PONG
    

    2) Use the STAF SERVICE ADD request to register the STAX and Event services. Registering the first Java service (STAX) took about 1m 48s to complete (a very long time).

    First, make sure that a STAF jvm is not still running:

    [root@thymelp3 staf]# ps -eaf | grep -i staf
    root      19954      1 99 17:12 pts/1    00:18:12 /opt/ibm/java-ppc64-60/jre/bin/java -Xmx1024m com.ibm.staf.service.STAFServiceHelper STAFJVM1
    

    If so, kill the STAF jvm:

    [root@thymelp3 staf]# kill -9 19954
    

    Register the STAX service:

    [root@thymelp3 staf]# STAF local SERVICE ADD SERVICE STAX LIBRARY JSTAF EXECUTE /usr/local/staf/services/stax/STAX.jar PARMS "EVENTGENERATION Disabled LOGTCNUMSTARTS Disabled LOGTCELAPSEDTIME Disabled LOGTCSTARTSTOP Disabled NUMTHREADS 10" OPTION JVMName=STAFJVM1 OPTION "JVM=/opt/ibm/java-ppc64-60/jre/bin/java" OPTION J2=-Xmx1024m
    Response
    --------
    
    [root@thymelp3 bin]# STAF local STAX VERSION
    Response
    --------
    3.5.10
    

    Register the Event service:

    [root@thymelp3 bin]# STAF local SERVICE ADD SERVICE Event LIBRARY JSTAF EXECUTE /usr/local/staf/services/stax/STAFEvent.jar
    Response
    --------
    
    [root@thymelp3 bin]# STAF local Event VERSION
    Response
    --------
    3.1.5
    [root@thymelp3 bin]#
    
     

    Last edit: Sharon Lucas 2016-10-05
  • Sharon Lucas

    Sharon Lucas - 2016-10-05

    When STAFProc registers a Java service in the STAF.cfg file that is uses a new JVM, it has a time limit on how long it waits before for the JVM to be ready -- basically, a loop that checks up to 30 times if the JVM is ready and if it isn't, waits 1 second before retrying. If the JVM is still not ready after 30 iterations, it exits and terminates STAFProc with RC 27 (kSTAFServiceConfigurationError) and error message "Unable to connect to JVM". This wait loop takes place in the STAFServiceConstruct() method in lang/java/STAFJavaService.java. On this system, the initialization of the JVM is very slow (about 1 minute 48 seconds) so it exceeds the maximum time that it waits for the JVM to be ready. Apparently, a SERVICE ADD request does not have this same time limitation.

    I will continue to investigate this issue more and see can update STAFJavaService.java to wait a longer time for a JVM to be ready perhaps by either increasing the number of times (30) that it checks if a JVM to be ready, and/or increasing the wait time within the loop, or to add the ability to configure the maximum times the STAFJavaSerivce checks if a JVM is ready. Also, may want to improve the error message from "Unable to connect to JVM" to something like "Timed out waiting to connect to the JVM".

    Edward agree that I could use his Linux PPC64 system to help debug (as I cannot reproduce this on any of my systems). I will need to use a different instance of STAFProc so that he can continue to use STAF in /usr/local/staf on this system using the workaround in the mean time. This system gets re-installed usually on Tuesday or Wednesday each week. Also, as I don't have a working Linux PPC64 big endian STAF build system, I'll need to set up his system to perform a STAF Linux PPC64 big endian build.

     
  • Sharon Lucas

    Sharon Lucas - 2016-12-31

    Increased the number of times STAF tries to see if a JVM is ready from 30 to 50 and improved the error message and logged the error message in the JVM log in addition to returning it.

    Verified that this fixed the problem seen on Edward's Linux PPC64 system.

    Here's a cvs diff of the changes:

    Index: lang/java/STAFJavaService.cpp
    ===================================================================
    RCS file: /cvsroot/staf/src/staf/lang/java/STAFJavaService.cpp,v
    retrieving revision 1.41
    diff -r1.41 STAFJavaService.cpp
    535a536,540
    >
    >             // XXX: Add a optional JVMMAXCONNECTATTEMPTS option when registering
    >             // a Java service and use it instead of hardcoding it to 50.
    >             int jvmMaxConnectAttempts = 50;
    >             int jvmConnectAttempt = 0;
    537c542,543
    <             for (int i = 0; (i < 30) & !jvmReady; ++i)
    ---
    >             for (; (jvmConnectAttempt < jvmMaxConnectAttempts) & !jvmReady;
    >                jvmConnectAttempt++)
    567c573,576
    <                     // XXX: This should really be more specific
    ---
    >                     // XXX: This should be more specific about what STAFException
    >                     //      is thrown.
    >                     // A STAFException is thrown while the JVM is in the
    >                     // process of starting up and we cannot connect to it yet.
    578,579c587,613
    <                 *pErrorBuffer =
    <                     STAFString("Unable to connect to JVM").adoptImpl();
    ---
    >                 // XXX: If add a new JVMMAXCONNECTATTEMPTS option can update
    >                 // error message to say: You can increase the number of times
    >                 // STAF attempts to connect to the JVM by adding a
    >                 // JVMMAXCONNECTATTEMPTS option when registering the Java service.
    >
    >                 // Log and return error message about failure connecting to the JVM.
    >                 try
    >                 {
    >                     fstream outfile(logName.toCurrentCodePage()->buffer(),
    >                                     ios::out | ios::app);
    >                     STAFString line = STAFString(
    >                         STAFTimestamp().asString() +
    >                         " ERROR: Unable to connect to JVM. Timed out after ") +
    >                         jvmConnectAttempt + " attempts.";
    >                     outfile << endl << line.toCurrentCodePage()->buffer() << endl;
    >                     outfile.close();
    >                 }
    >                 catch (...)
    >                 {
    >                   // Ignore error writing the JVM connect attempts
    >                   // to the JVM log file so can return an error message.
    >                 }
    >
    >                 STAFString errorMsg = STAFString(
    >                     "Unable to connect to JVM. Timed out after ") +
    >                     jvmConnectAttempt + " attempts.";
    >                 *pErrorBuffer = errorMsg.adoptImpl();
    584a619,640
    >
    >             // Log the JVM connect attempt information in the JVM log file.
    >             try
    >             {
    >                 fstream outfile(logName.toCurrentCodePage()->buffer(),
    >                                 ios::out | ios::app);
    >
    >                 STAFString line = STAFString(
    >                     STAFTimestamp().asString() + " JVM connected on attempt ") +
    >                     jvmConnectAttempt + " out of " + jvmMaxConnectAttempts;
    >                 outfile << endl << line.toCurrentCodePage()->buffer() << endl;
    >                 outfile.close();
    >             }
    >             catch (...)
    >             {
    >               // Just in case trying to write the JVM connect attempts to the
    >               // JVM log file raised an exception, we don't want logging this
    >               // to prevent loading the service.
    >             }
    
     

    Last edit: Sharon Lucas 2016-12-31
  • Sharon Lucas

    Sharon Lucas - 2016-12-31

    This fix will be in STAF V3.4.26 which will be released by the end of December 2016. Still need to build STAF V3.4.26 for Linux PPC64 Big Endian as I don't have a working Linux PPC64 Big Endian system. Could check if can use Edward's RHEL 7.3 PPC64 Big Endian system but would have to build manually as his system is in a private network.

     

Log in to post a comment.

MongoDB Logo MongoDB