I tried to file a defect and looks like it wasn't submitted successfully. I have valgrind output, not sure where to upload.
we are seeing STAF crash with 'out of memory' when the tests are run for several days. Based on our investigation, STAF is leaking memory. The VIRT field in the 'top' command output grows 100's of MB. Only way to clear the leaks is to shutdown staf and restart.
# staf local misc version
Response
3.4.8
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
# uname -a
Linux RH-Linux-53-84 2.6.18-128.el5PAE #1 SMP Wed Dec 17 12:02:33 EST 2008 i686 i686 i386 GNU/Linux
I tried to write a small script to reproduce the memory and was able to reproduce sometimes. (I had to run several times the below loop. if not reproducible, please try increasing the process count at 'set process 60')
However, note that in your test, you are not using the WAIT option on your PROCESS START request so you are starting the processes asynchronously. When processes are STARTed asynchronously, the process termination timestamp and return code are stored in memoy by STAFProc for later retrieval. To free this data after a process has completed, you must use the PROCESS FREE command. It appears you are not freeing any process completion data until after 20,000 processes have been run. So, yes, top is going to show STAFProc memory increasing if you haven't freed the process completion data. So, if this is what your tests are doing, then your tests need to be checked to be sure that the memory leak isn't actually being caused by your tests and not STAF.
No. I think 20000 you are refering is the sleep time of 20 seconds. Only 60 procs were started and all of them are freed using a 'free all'. I submitted a bug but didn't get any defect no and not sure if it got added.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Looks like the defect filing page seems to have some problem. when i click on the 'Add Artifact', it doesn't give the Artifact Id but shows the 'Add new' page again. The search also doesn't show up the defect i filed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried to file a defect and looks like it wasn't submitted successfully. I have valgrind output, not sure where to upload.
we are seeing STAF crash with 'out of memory' when the tests are run for several days. Based on our investigation, STAF is leaking memory. The VIRT field in the 'top' command output grows 100's of MB. Only way to clear the leaks is to shutdown staf and restart.
# staf local misc version
Response
3.4.8
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
# uname -a
Linux RH-Linux-53-84 2.6.18-128.el5PAE #1 SMP Wed Dec 17 12:02:33 EST 2008 i686 i686 i386 GNU/Linux
I tried to write a small script to reproduce the memory and was able to reproduce sometimes. (I had to run several times the below loop. if not reproducible, please try increasing the process count at 'set process 60')
# for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30; do top -n1 -b |grep STAF; tclsh test.tcl; done
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.55 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.62 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.70 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.77 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.85 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.92 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:06.99 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.06 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.13 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.20 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.27 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.36 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.43 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.50 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.58 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.66 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.74 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.81 STAFProc
23090 root 25 0 90408 5180 3632 S 0.0 0.3 0:07.87 STAFProc
23090 root 25 0 94508 5192 3632 S 0.0 0.3 0:07.95 STAFProc
23090 root 25 0 94508 5192 3632 S 0.0 0.3 0:08.03 STAFProc
23090 root 25 0 98608 5204 3632 S 0.0 0.3 0:08.11 STAFProc
23090 root 25 0 100m 5216 3632 S 0.0 0.3 0:08.18 STAFProc
23090 root 25 0 104m 5228 3632 S 0.0 0.3 0:08.27 STAFProc
23090 root 25 0 104m 5228 3632 S 0.0 0.3 0:08.34 STAFProc
…
…
Content of test.tcl:
package require STAF
if { != $STAF::kOk} {
TMLog "Error registering with STAF, RC: $STAF::RC" $TM_ERROR
exit $TM_ERROR
}
set process 60
for {set i 0 } { $i < $process } { incr i } {
STAF::Submit local process "start command tclsh parms /tmp/staf-mem-leak-test/testProc.tcl $i "
}
after 20000
STAF::Submit local process "free all"
STAF::UnRegister
exit 0
#
# Content of /tmp/staf-mem-leak-test/testProc.tcl
#
puts "process: $argv"
after 10000
exit 0
I also tried to use valgrind (in another instance). please find attached the logs
However, note that in your test, you are not using the WAIT option on your PROCESS START request so you are starting the processes asynchronously. When processes are STARTed asynchronously, the process termination timestamp and return code are stored in memoy by STAFProc for later retrieval. To free this data after a process has completed, you must use the PROCESS FREE command. It appears you are not freeing any process completion data until after 20,000 processes have been run. So, yes, top is going to show STAFProc memory increasing if you haven't freed the process completion data. So, if this is what your tests are doing, then your tests need to be checked to be sure that the memory leak isn't actually being caused by your tests and not STAF.
See section "8.12 Process Service" in the STAF User's Guide at http://staf.sourceforge.net/current/STAFUG.htm#HDRPROCSRV for more information on the STAF PROCESS service.
If you still think there is a problem explain why and try again to submit a bug via https://sourceforge.net/tracker/?func=add&group_id=33142&atid=407381 so that you can attach your logs. Note you must be logged in the SourceForge to submit a bug.
No. I think 20000 you are refering is the sleep time of 20 seconds. Only 60 procs were started and all of them are freed using a 'free all'. I submitted a bug but didn't get any defect no and not sure if it got added.
Looks like the defect filing page seems to have some problem. when i click on the 'Add Artifact', it doesn't give the Artifact Id but shows the 'Add new' page again. The search also doesn't show up the defect i filed.
I had no problem opening a bug so I opened one for you (Bug #3499015) at http://sourceforge.net/tracker/?func=detail&aid=3499015&group_id=33142&atid=407381.
See if you can attach your log files to this bug.