Menu

STAFProc memory leaks

Nixon
2012-03-03
2013-06-12
  • Nixon

    Nixon - 2012-03-03

    I tried to file a defect and looks like it wasn't submitted successfully. I have valgrind output, not sure where to upload.

    we are seeing STAF crash with 'out of memory' when the tests are run for several days. Based on our investigation, STAF is leaking memory. The VIRT field in the 'top' command output grows 100's of MB. Only way to clear the leaks is to shutdown staf and restart.

    # staf local misc version
    Response


    3.4.8

    # cat /etc/redhat-release
    Red Hat Enterprise Linux Server release 5.3 (Tikanga)
    # uname -a
    Linux RH-Linux-53-84 2.6.18-128.el5PAE #1 SMP Wed Dec 17 12:02:33 EST 2008 i686 i686 i386 GNU/Linux

    I tried to write a small script to reproduce the memory and was able to reproduce sometimes. (I had to run several times the below loop. if not reproducible, please try increasing the process count at 'set process 60')

    # for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30;  do top -n1 -b |grep STAF; tclsh test.tcl; done
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:06.55 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:06.62 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:06.70 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:06.77 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:06.85 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:06.92 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:06.99 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:07.06 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:07.13 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:07.20 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:07.27 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:07.36 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:07.43 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:07.50 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:07.58 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:07.66 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:07.74 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:07.81 STAFProc
    23090 root      25   0 90408 5180 3632 S  0.0  0.3   0:07.87 STAFProc
    23090 root      25   0 94508 5192 3632 S  0.0  0.3   0:07.95 STAFProc
    23090 root      25   0 94508 5192 3632 S  0.0  0.3   0:08.03 STAFProc
    23090 root      25   0 98608 5204 3632 S  0.0  0.3   0:08.11 STAFProc
    23090 root      25   0  100m 5216 3632 S  0.0  0.3   0:08.18 STAFProc
    23090 root      25   0  104m 5228 3632 S  0.0  0.3   0:08.27 STAFProc
    23090 root      25   0  104m 5228 3632 S  0.0  0.3   0:08.34 STAFProc

    Content of test.tcl:
    package require STAF

    if { != $STAF::kOk} {
            TMLog "Error registering with STAF, RC: $STAF::RC" $TM_ERROR
            exit $TM_ERROR
    }

    set process 60

    for {set i 0 } { $i < $process } { incr i } {
            STAF::Submit local process "start command tclsh parms /tmp/staf-mem-leak-test/testProc.tcl $i "
    }

    after 20000

    STAF::Submit local process "free all"

    STAF::UnRegister

    exit 0

    #
    # Content of /tmp/staf-mem-leak-test/testProc.tcl
    #
    puts "process: $argv"
    after 10000
    exit 0

    I also tried to use valgrind (in another instance). please find attached the logs

     
  • Sharon Lucas

    Sharon Lucas - 2012-03-05

    However, note that in your test, you are not using the WAIT option on your PROCESS START request so you are starting the processes asynchronously.  When processes are STARTed asynchronously, the process termination timestamp and return code are stored in memoy by STAFProc for later retrieval. To free this data after a process has completed, you must use the PROCESS FREE command.   It appears you are not freeing any process completion data until after 20,000 processes have been run.  So, yes, top is going to show STAFProc memory increasing if you haven't freed the process completion data.  So, if this is what your tests are doing, then your tests need to be checked to be sure that the memory leak isn't actually being caused by your tests and not STAF.

    See section "8.12 Process Service" in the STAF User's Guide at http://staf.sourceforge.net/current/STAFUG.htm#HDRPROCSRV for more information on the STAF PROCESS service.

    If you still think there is a problem explain why and try again to submit a bug via https://sourceforge.net/tracker/?func=add&group_id=33142&atid=407381 so that you can attach your logs.  Note you must be logged in the SourceForge to submit a bug.

     
  • Nixon

    Nixon - 2012-03-05

    No. I think 20000 you are refering is the sleep time of 20 seconds. Only 60 procs were started and all of them are freed using a 'free all'. I submitted a bug but didn't get any defect no and not sure if it got added.

     
  • Nixon

    Nixon - 2012-03-07

    Looks like the defect filing page seems to  have some problem. when i click on the 'Add Artifact', it doesn't give the Artifact Id but shows the 'Add new' page again. The search also doesn't show up the  defect i filed.

     

Log in to post a comment.