#1366 STAFProc crashes on LA - std::bad_alloc error in STAX JVMLog

Unix::Linux
open
5
2010-08-24
2010-08-24
No

Starting on 8/12 we began to see STAF crash on out Lotus Automator. It has happened a numberr of times since then. I have not been able to identify any LA or OS changes around the time the crashes started. The load running at the time of the crash is low, < 5 tasks. Much higher loads were executing successfully for many months prior to the recent crashes. The errors I see in stafproc log are

20100824-06:00:43;139598736;00000100;Caught STAFException in JSTAF.STAFServiceAcceptRequest(), Endpoint: local://local, Service: STAX, Request: list jobs, Exception: STAFConnectionProviderException, Location: /opt/dev/autobuild/build/src/staf/stafif/STAFConnectionProviderInlImpl.cpp(219), Text: STAFConnectionProviderConnect: Error connecting to endpoint: connect() RC=111, Error code: 22
20100824-06:00:44;139598736;00000100;Caught STAFException in JSTAF.STAFServiceAcceptRequest(), Endpoint: local://local, Service: STAX, Request: list jobs, Exception: STAFConnectionProviderException, Location: /opt/dev/autobuild/build/src/staf/stafif/STAFConnectionProviderInlImpl.cpp(219), Text: STAFConnectionProviderConnect: Error connecting to endpoint: connect() RC=111, Error code: 22
STAFProc ending normally
20100824-06:00:45;8406736;00000100;Caught STAFException in JSTAF.STAFServiceTerm(), Service: STAX, Exception: STAFConnectionProviderException, Location: /opt/dev/autobuild/build/src/staf/stafif/STAFConnectionProviderInlImpl.cpp(219), Text: STAFConnectionProviderConnect: Error connecting to endpoint: connect() RC=111, Error code: 22
20100824-06:00:45;8406736;00000100;Error terminating service, STAX, RC: 6, Result: Error terminating service, JSTAF, Result:
20100824-06:01:34;8406736;00000100;Received signal 15 (SIGTERM)

What can cause that error to be raised? Any suggestions on how to investigate this further?

Discussion

  • Emmet Clifford

    Emmet Clifford - 2010-08-24

    /usr/local/staf/data/STAF/lang/java/jvm/STAFJVM/JVMLog.1

     
  • Sharon Lucas

    Sharon Lucas - 2010-08-24

    Note that when the JVM used by STAX crashes, then you'll often see errors
    like "Caught STAFException in JSTAFSH.HandeRequest (or in JSTAF.STAFServiceAcceptRequest)" in the STAFProc output. But the cause of the problem is usually the JVM running out of memory.

    I see the STAX service is currently using Sun Java 1.6.0_21 (as of 2010-08-19) and were using IBM Java 1.6.0 SR7 before that (presumably to see if this would help resolve this problem).

    In looking at the STAX JVM log on your stsvtla1 machine (in /usr/local/staf/data/STAF/lang/java/jvm/STAFJVM/JVMLog.1), a memory allocation error is often being logged. For example:

    ******************************************************************************
    *** 20100823-10:00:37 - Start of Log for JVMName: STAFJVM
    *** JVM Executable: java
    *** JVM Options : -Xmx2560m
    *** JVM Version : java version "1.6.0_21"
    Java(TM) SE Runtime Environment (build 1.6.0_21-b07)
    Java HotSpot(TM) Server VM (build 17.0-b17, mixed mode)
    *** JVM PID : 17297
    ******************************************************************************

    Registered Extensions for STAX Version 3.3.7:

    terminate called after throwing an instance of 'std::bad_alloc'
    what(): St9bad_alloc

    ******************************************************************************

    So, it appears that the STAX JVM is running out of some kind of memory.

    I suggest that you add the following JVM options when registering the STAX service to increase the MaxPermSize and initial PermSize for the JVM to see if that resolves the problem:

    OPTION J2=-XX:PermSize=256m OPTION J2=-XX:MaxPermSize=256m

    Entry "4.1.8 Why is the STAX JVM crashing with a
    java.lang.OutOfMemoryError logged in the STAX JVM log?" in the STAF/STAX
    FAQ at http://staf.sourceforge.net/current/STAFFAQ.htm#d0e2258 says:

    If the STAX JVM crashes and the STAX JVM log contains an error like
    "java.lang.OutOfMemoryError: requested <size> bytes for <reason>. Out of
    swap space?" in the STAX JVM log, try tuning the JVM by increasing the
    maximum size (and possibly the initial size) of the permanent generation
    space used by the JVM. The permanent generation is the area of the heap
    where class and method objects are stored. If an application loads a very
    large number of classes, then the maximum size of the permanent generation
    space might need to be increased using the -XX:MaxPermSize JVM option when
    registering the STAX service. You may also want to increase the initial
    size of the permanent generation space by using the -XX:PermSize JVM
    option. For example, to increase the maximum and initial sizes of the
    permanent generation space to 256m (and to increase the maximum heap size
    to 1024m), register the STAX service as follows in the STAF.cfg file:

    SERVICE STAX LIBRARY JSTAF EXECUTE C:/STAF/services/stax/STAX.jar \ OPTION JVMName=STAX \ OPTION "J2=-Xmx2560m -XX:MaxPermSize=256m -XX:PermSize=256m"

    Increasing the Java PermSize seems to be required more often when Sun Java
    Version 6 is being used (as it is in your case).

     
  • Sharon Lucas

    Sharon Lucas - 2010-08-24
    • milestone: --> Unix::Linux
    • assigned_to: nobody --> slucas
     
  • Sharon Lucas

    Sharon Lucas - 2010-08-24
    • summary: STAF on our LA crashing - STAFConnectionProviderConnect: Err --> STAFProc crashes on LA - std::bad_alloc error in STAX JVMLog
     
  • Sharon Lucas

    Sharon Lucas - 2010-08-27

    Increasing the MaxPermSize did not help.

    Emmet said that the only thing that had changed before the STAX service started running our of memory was that some Windows OS updates were applied to 6 driver dient machines. So, he backed out thesewindows OS updates on the 6 drivers being used and now his LA tasks are working again (have been running > 40 hours).

    I don't understand how having the Windows OS updates on his client machines is causing the LA STAX service machine to run our of memory.

    He's running with some STAF tracing enabled now to see if that gives any clues.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks