#1445 STAFProc memory leaks

Unix::Linux
open
STAFProc (180)
5
2014-08-23
2012-03-09
Nixon
No

We are seeing STAF crash with \'out of memory\' when the tests are run for several days. Based on our investigation, STAF is leaking memory. The VIRT field in the \'top\' command output grows 100\'s of MB. Only way to clear the leaks is to shutdown staf and restart.

# staf local misc version
Response
--------
3.4.8

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
# uname -a
Linux RH-Linux-53-84 2.6.18-128.el5PAE #1 SMP Wed Dec 17 12:02:33 EST 2008 i686 i686 i386 GNU/Linux

I tried to write a small script to reproduce the memory and was able to reproduce sometimes. (I had to run several times the below loop. if not reproducible, please try increasing the process count at \'set process 60\')

[root@RH-Linux-53-84 staf-mem-leak-test]# for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30; do top -n1 -b |grep STAF; tclsh test.tcl; done
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.55 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.62 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.70 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.77 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.85 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.92 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.99 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.06 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.13 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.20 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.27 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.36 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.43 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.50 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.58 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.66 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.74 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.81 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.87 STAFProc
23090 root 25 0 94508 5192 3632 S 0.0 0.3 0:07.95 STAFProc
23090 root 25 0 94508 5192 3632 S 0.0 0.3 0:08.03 STAFProc
23090 root 25 0 98608 5204 3632 S 0.0 0.3 0:08.11 STAFProc
23090 root 25 0 100m 5216 3632 S 0.0 0.3 0:08.18 STAFProc
23090 root 25 0 104m 5228 3632 S 0.0 0.3 0:08.27 STAFProc
23090 root 25 0 104m 5228 3632 S 0.0 0.3 0:08.34 STAFProc
...
...

Content of test.tcl:
package require STAF

if {[STAF::Register \"mem-leak-test\"] != $STAF::kOk} {
TMLog \"Error registering with STAF, RC: $STAF::RC\" $TM_ERROR
exit $TM_ERROR
}

set process 60

for {set i 0 } { $i < $process } { incr i } {
STAF::Submit local process \"start command tclsh parms /tmp/staf-mem-leak-test/testProc.tcl $i \"
}

after 20000

STAF::Submit local process \"free all\"

STAF::UnRegister

exit 0

#
# Content of /tmp/staf-mem-leak-test/testProc.tcl
#
puts \"process: $argv\"
after 10000
exit 0

i ran valgrind and the logs are attached.

Discussion

  • Nixon

    Nixon - 2012-03-09

    valgrind log

     
  • Nixon

    Nixon - 2012-03-13

    would you get a chance to look into this issue?

     
  • Sharon Lucas

    Sharon Lucas - 2012-03-13
    • assigned_to: nobody --> slucas
     
  • Sharon Lucas

    Sharon Lucas - 2012-03-13

    I am investigating.

     
  • Nixon

    Nixon - 2012-04-09

    Any findings on this issue? we keep seeing these leaks.

     
  • Sharon Lucas

    Sharon Lucas - 2012-04-09

    One of the minor memory leaks has already been fixed via Bug #3467922 "Memory leak in unix local connection provider" at https://sourceforge.net/tracker/?func=detail&aid=3467922&group_id=33142&atid=407381 and is contained in STAF V3.4.9 (released March 29, 2012) so you should probably upgrade to STAF V3.4.9 if you haven't already.

    I'm still investigating the other memory leaks.

     
  • Nixon

    Nixon - 2012-04-09

    Thanks. i will upgrade 3.4.9 and let you know if any improvement.

     
  • Nixon

    Nixon - 2012-04-10

    I upgraded to 3.4.9. Still I see virtual memory going up in few hours.

    # staf local misc version
    Response
    --------
    3.4.9
    # top -n1 -b |egrep "PID|STAF"
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    21596 root 17 0 112m 6040 3724 S 0.0 0.3 0:27.56 STAFProc

     
  • Nixon

    Nixon - 2012-06-26

    i filed a new bug 3538007. I believe it is the side effect of this memory leak issue. could you kindly look into this issue and provide a fix.

     
  • Nixon

    Nixon - 2012-08-16

    simple script to repro STAF mem leaks

     
  • Nixon

    Nixon - 2012-08-16

    could you kindly look into this issue. i have uploaded a very simple script repro. please untar the memleak-script.tar.gz and run ./test.sh in a linux machine.

    this is what i see

    # uname -a
    Linux RH-Linux-53-84 2.6.18-128.el5PAE #1 SMP Wed Dec 17 12:02:33 EST 2008 i686 i686 i386 GNU/Linux
    # staf local misc version
    Response
    --------
    3.4.10

    # ./test.sh
    15316 root 25 0 89776 5196 3676 S 0.0 0.3 0:00.07 STAFProc
    15316 root 25 0 94900 5348 3680 S 0.0 0.3 0:00.17 STAFProc
    15316 root 25 0 99000 5384 3680 S 0.0 0.3 0:00.27 STAFProc
    15316 root 25 0 97.7m 5404 3680 S 0.0 0.3 0:00.35 STAFProc
    15316 root 25 0 97.7m 5412 3680 S 0.0 0.3 0:00.43 STAFProc
    15316 root 25 0 97.7m 5412 3680 S 0.0 0.3 0:00.52 STAFProc
    15316 root 25 0 97.7m 5412 3680 S 0.0 0.3 0:00.61 STAFProc
    15316 root 25 0 97.7m 5472 3680 S 0.0 0.3 0:00.70 STAFProc
    15316 root 25 0 105m 5496 3680 S 0.0 0.3 0:00.79 STAFProc

     
  • Sharon Lucas

    Sharon Lucas - 2012-08-16

    I ran your test script on a Linux system and did not see a memory leak in STAFProc. (with set process 60 nor when I increased the process count to 120 in test.tcl). Here are my results:

    [root@staf4g staf]# staf local misc version
    Response
    --------
    3.4.10

    [root@staf4g staf]# uname -a
    Linux staf4g.austin.ibm.com 2.6.18-308.8.2.el5 #1 SMP Tue May 29 11:54:17 EDT 20
    12 x86_64 x86_64 x86_64 GNU/Linux

    [root@staf4g staf]# ./test.sh
    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:04.98 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:05.16 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:05.35 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:05.54 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:05.74 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:05.92 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:06.12 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:06.25 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:06.40 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:06.56 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:06.71 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:06.87 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:07.01 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:07.17 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:07.32 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:07.47 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:07.62 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:07.78 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:07.99 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:08.19 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:08.37 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:08.55 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:08.75 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:08.94 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:09.15 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:09.34 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:09.51 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:09.70 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:09.88 STAFProc

    20601 root 18 0 241m 5864 3572 S 0.0 0.1 0:10.07 STAFProc

    [root@staf4g staf]#

     
  • Nixon

    Nixon - 2013-02-13

    Thanks Lucas for update. Sorry for not responding as i thought no one was working on this (On my side, i put a workaround to re-start STAF when it crashes. that is solving the problem to greater extent). I continue to see memory spiking up issues on every version including 3.4.12.

    [root@RH-Linux-53-86 staf]# uname -a
    Linux RH-Linux-53-86 2.6.18-128.el5PAE #1 SMP Wed Dec 17 12:02:33 EST 2008 i686 i686 i386 GNU/Linux
    [root@RH-Linux-53-86 staf]# staf local misc version
    Response
    --------
    3.4.12

    I ran the same sample script i uploaded earlier with 'valgrind'. Here is a screenshot that hte memory is going up (looks like when running with valgrind it reports the process memcheck)

    [root@RH-Linux-53-86 staf]# ./test.sh
    3800 root 16 0 166m 51m 4448 S 100.6 2.5 0:56.26 memcheck
    3800 root 16 0 209m 54m 4552 S 100.5 2.7 1:20.15 memcheck
    3800 root 16 0 214m 54m 4552 S 100.1 2.7 1:43.95 memcheck
    3800 root 16 0 218m 55m 4552 S 100.1 2.7 2:07.76 memcheck
    3800 root 16 0 218m 55m 4552 S 100.4 2.7 2:31.58 memcheck
    3800 root 16 0 222m 55m 4552 S 98.0 2.7 2:55.45 memcheck
    3800 root 16 0 222m 55m 4552 S 98.5 2.7 3:19.26 memcheck
    3800 root 16 0 226m 55m 4552 S 100.5 2.7 3:43.13 memcheck
    3800 root 16 0 226m 55m 4556 S 102.3 2.7 4:07.01 memcheck
    [root@RH-Linux-53-86 staf]#

    then i did 'staf local shutdown shutdown' to get the report from valgrind. this is what i am seeing

    Script started on Wed 13 Feb 2013 06:09:54 PM GMT
    ^[]0;root@RH-Linux-53-86:~^G[root@RH-Linux-53-86 ~]# ^M
    ^[]0;root@RH-Linux-53-86:~^G[root@RH-Linux-53-86 ~]# valgrind -v --tool=memchecj^H^[[Kk --leak-check=yes /usr/local/staf/bin/STAFProc^M
    ==3800== Memcheck, a memory error detector.^M
    ...
    ...
    ==3800== LEAK SUMMARY:^M
    ==3800== definitely lost: 1,264 bytes in 7 blocks.^M
    ==3800== indirectly lost: 13,707 bytes in 538 blocks.^M
    ==3800== possibly lost: 23,342 bytes in 900 blocks.^M
    ==3800== still reachable: 50,472 bytes in 2,656 blocks.^M
    ==3800== suppressed: 0 bytes in 0 blocks.^M
    ==3800== Reachable blocks (those to which a pointer was found) are not shown.^M
    ==3800== To see them, rerun with: --show-reachable=yes^M
    --3800-- memcheck: sanity checks: 15471 cheap, 619 expensive^M
    --3800-- memcheck: auxmaps: 0 auxmap entries (0k, 0M) in use^M
    --3800-- memcheck: auxmaps: 0 searches, 0 comparisons^M
    --3800-- memcheck: SMs: n_issued = 231 (3696k, 3M)^M
    --3800-- memcheck: SMs: n_deissued = 0 (0k, 0M)^M
    --3800-- memcheck: SMs: max_noaccess = 65535 (1048560k, 1023M)^M
    --3800-- memcheck: SMs: max_undefined = 0 (0k, 0M)^M
    --3800-- memcheck: SMs: max_defined = 2405 (38480k, 37M)^M
    --3800-- memcheck: SMs: max_non_DSM = 231 (3696k, 3M)^M
    --3800-- memcheck: max sec V bit nodes: 1 (0k, 0M)^M
    --3800-- memcheck: set_sec_vbits8 calls: 1 (new: 1, updates: 0)^M
    --3800-- memcheck: max shadow mem size: 4000k, 3M^M
    --3800-- translate: fast SP updates identified: 51,926 ( 93.7%)^M
    --3800-- translate: generic_known SP updates identified: 2,407 ( 4.3%)^M
    --3800-- translate: generic_unknown SP updates identified: 1,035 ( 1.8%)^M
    --3800-- tt/tc: 8,413,008 tt lookups requiring 11,064,710 probes^M
    --3800-- tt/tc: 8,413,008 fast-cache updates, 8 flushes^M
    --3800-- transtab: new 36,948 (934,404 -> 12,803,395; ratio 137:10) [0 scs]^M
    --3800-- transtab: dumped 0 (0 -> ??)^M
    --3800-- transtab: discarded 4,159 (109,390 -> ??)^M
    --3800-- scheduler: 1,320,554,840 jumps (bb entries).^M
    --3800-- scheduler: 15,471/46,925,260 major/minor sched events.^M
    --3800-- sanity: 15472 cheap, 619 expensive checks.^M
    --3800-- exectx: 30,011 lists, 136,122 contexts (avg 4 per list)^M
    --3800-- exectx: 33,424,728 searches, 57,362,871 full compares (1,716 per 1000)^M
    --3800-- exectx: 10,981 cmp2, 761 cmp4, 0 cmpAll^M
    ^[]0;root@RH-Linux-53-86:~^G[root@RH-Linux-53-86 ~]# exit^M
    exit^M

    I will attach the entire valgrind output

     
  • Nixon

    Nixon - 2013-02-13

    Memleak with STAF 4.3.12 on Feb 13, 2013

     
  • Nixon

    Nixon - 2013-02-26

    Hi Sharon, can you please look into this issue?

     
  • Nixon

    Nixon - 2013-08-07

    hi sharon, can you please look into this memory leak issue?

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks