I am using STAX as the execution engine and the respool service is utilized to allocate available test systems. For e.g.
<call function="'STAF'">
[STAXController, 'RESPOOL', 'REQUEST POOL WinXP']
</call>
When the test is finished, the XML file calls RELEASE the pool entry.
However, I have intermittently encountered a problem that respool assigns the same entry to multiple job requests. I wish I could provide more detail but I cannot find any detailed trace info.
What version of STAF are you using on the STAXController machine?
Note that an entry in a resource pool can only be owned by one handle at a time so it is unlikely that there's a bug in the RESPOOL service regarding this. The RESPOOL service has been around a long time and has been used extensively by many groups, including us, with no problems like this.
I need more information on exactly how you're using the RESPOOL service. Please attach your STAX xml file that is submitting RESPOOL REQUEST requests and I can look at it to see if I notice anything obviously wrong.
Also, when you think that the same resource pool entry has been assigned to multiple STAX jobs, you should query the resource pool to see who is the owner (and there can only be one owner for each entry). For example:
STAF yourSTAXMachine RESPOOL QUERY POOL WinXP
Perhaps that will provide you with more information on what you're doing wrong in your STAX job.
I am using STAF 3.3.3.
I agree that a resource is only assigned to only one resource at a time. However, intermittently, the resource is re-assigned to next job request, even though the current job request is still running and no RELEASE entry has been called.
My XML file is kind of complex. But here is the command I used to REQUEST and RELEASE entry.
<call function="'STAF'">
[STAXController, 'RESPOOL', 'REQUEST POOL WinXP']
</call>
<script>testMachine = STAXResult [1] </>
<call function="'STAF'">
[STAXController, 'RESPOOL', 'RELEASE POOL WinXP ENTRY %s FORCE' % testMachine
</call>
In the example you showed of how you're requesting an entry from resource pool WinXP, you're not checking if the return code is 0 so perhaps the request is failing intermittently for some reason. You should always check if a STAF service request was successful instead of assuming it was successful. For example:
<call function="'STAF'">
[STAXController, 'RESPOOL', 'REQUEST POOL WinXP']
</call>
<script>[rc, result] = STAXResult</script>
<if expr="rc != STAFRC.Ok">
<log message="1">
'ERROR: RESPOOL REQUEST POOL WinXP failed with RC=%s Result=%s' % (rc, result)
</log>
<else>
<script>testMachine = result</script>
</else
</if>
It's hard to say what the problem is (something wrong in your job or a bug in the RESPOOL service) since you haven't provided your STAX XML file. We need you to provide a recreation scenario.
I have turned on all trace points for RESPOOL service. And intermittently, I did reproduce the problem. Here is the problem: two STAX jobs (13 & 23) obtained same respool (TAFSys05), almost in the same time. Note that even though I didn't specify to 'RELEASE POOL ENTRY', a deletePooling is logged almost immediately after JOB 13 obtained the pool entry.
Running commands 'RESPOOL QUERY POOL' showed that the entry TAFSys05 is granted to 23.
20090611-12:07:15;3080;00000001;RESPOOL Service Request - Client: local://local, Handle: 1, Process: STAF_Process, Request: STAF_CALLBACK HANDLEDELETED HANDLE 25 MACHINE :50:tcp://US01-TAF-TestController01.us01.savi.com@6500 UUID :32:FB4E314AA00500000A07018E65723031 KEY :12:ResPoolEntry
20090611-12:07:15;2960;00000002;RESPOOL Service Result (0) - Client: tcp://US01-TAF-TestController01.us01.savi.com@6500, Handle: 27, Process: STAX/Job/13, Request: REQUEST POOL NFR3, Result: TAFSys05
20090611-12:07:15;3080;00000002;RESPOOL Service Result (0) - Client: local://local, Handle: 1, Process: STAF_Process, Request: STAF_CALLBACK HANDLEDELETED HANDLE 25 MACHINE :50:tcp://US01-TAF-TestController01.us01.savi.com@6500 UUID :32:FB4E314AA00500000A07018E65723031 KEY :12:ResPoolEntry, Result: <No Result>
20090611-12:07:15;2996;00000002;Remote RESPOOL Service Result (0) - To: us01-TAF-TestController01.us01.savi.com, from Handle: 27, Process: STAX/Job/13, Request: REQUEST POOL NFR3, Result: TAFSys05
20090611-12:07:15;1512;00002000;deletePolling(26, tcp://US01-TAF-TestController01.us01.savi.com@6500, FB4E314AA00500000A07018E65723031, RESPOOL, ResPoolEntry)
20090611-12:07:15;3104;00000001;RESPOOL Service Request - Client: local://local, Handle: 1, Process: STAF_Process, Request: STAF_CALLBACK HANDLEDELETED HANDLE 27 MACHINE :50:tcp://US01-TAF-TestController01.us01.savi.com@6500 UUID :32:FB4E314AA00500000A07018E65723031 KEY :12:ResPoolEntry
20090611-12:07:15;2988;00000001;VAR Service Request - Client: local://local, Handle: 4, Process: STAF/Service/LOG, Request: RESOLVE REQUEST 6575 STRING :23:{STAF/Service/LOG/Mask}
20090611-12:07:15;3172;00000002;RESPOOL Service Result (0) - Client: tcp://US01-TAF-TestController01.us01.savi.com@6500, Handle: 37, Process: STAX/Job/23, Request: REQUEST POOL NFR3, Result: TAFSys05
20090611-12:07:15;3104;00000002;RESPOOL Service Result (0) - Client: local://local, Handle: 1, Process: STAF_Process, Request: STAF_CALLBACK HANDLEDELETED HANDLE 27 MACHINE :50:tcp://US01-TAF-TestController01.us01.savi.com@6500 UUID :32:FB4E314AA00500000A07018E65723031 KEY :12:ResPoolEntry, Result: <No Result>
20090611-12:07:15;3180;00000002;Remote RESPOOL Service Result (0) - To: us01-TAF-TestController01.us01.savi.com, from Handle: 37, Process: STAX/Job/23, Request: REQUEST POOL NFR3, Result: TAFSys05
The trace results show that STAX/Job/13 that has handle 27 requested a resource from pool NFR3 and got resource entry TAFSys05. The trace results then show a "STAF_CALLBACK HANDLEDELETE HANDLE 27 ..." request was submitted to the RESPOOL service. Note that this request is only submitted if handle 27 has been deleted. This would apparently mean that STAX job 13 has completed, which means its STAF handle was deleted, which means that garbage collection would occur for any respool entries owned by STAF handle 27. Garbage collection occurs, releasing any respool entries owned by STAF handle 27 (since this handle no longer exists). So, now resource entry TAFSys05 is available. So the next pending request (which was submitted by STAX Job 23 with handle 37 gets resource entry TAFSys05.
Note that STAF performs garbage collection by default to automatically release RESPOOL entries that are owned by handles that no longer exist (have been deleted) since you did not specify the "GARBAGECOLLECT No" option on your RESPOOL REQUEST requests. See the STAF User's Guide for more information on garbage collection.
Did you query the STAX_Job_13 log to see what time this job completed to verify that this is what is happening? Perhaps you didn't realize that STAF performs automatic garbage collection so you thought that the only way a resource pool entry would be released is if a RELEASE request is submitted? If you require that a resource continue to be held by a STAF handle that no longer exists, then you must specify the "GARBAGECOLLECT No" option on the RESPOOL REQUEST request.
Hi Sharon,
I intentionally leave the the GARBAGECOLLECT at YES because I understand if the test job is terminated, then the respool entry is automatically released back to the pool.
In this case, both JOB 13 & 23 are still running, I don't know why handle 27 is deleted. Here is the query results:
H:\>staf local stax query job 13
Response
--------
{
Job ID : 13
Job Name : <None>
Start Date-Time : 20090611-11:47:12
XML File Name : C:\RunTest.xml
File Machine : 10.7.1.142
Function : RunTest
Arguments : {'script': 'CP_RegressionTestController.bat MASTER_testsc
ript_iso.txt', 'build': 'FDF', 'version': '1.4.1.20090611', 'os': 'WinXP', 'emai
l': 'DVETAF@savi.com', 'chamber': 'NFR3', 'java': 'JRE6', 'timeout': '4h', 'scre
enshot': 'NA'}
Scripts : []
Script Files : []
Script Machine : <None>
Source Machine : tcp://FKONG-D620@6500
Notify OnEnd : No
Clear Logs : Disabled
Log TC Elapsed Time: Enabled
Log TC Num Starts : Enabled
Log TC Start/Stop : Disabled
Python Output : JobUserLog
Python Log Level : Info
Blocks Running : 1
Blocks Held : 0
Blocks Unknown : 0
}
H:\>staf local stax query job 23
Response
--------
{
Job ID : 23
Job Name : <None>
Start Date-Time : 20090611-11:47:37
XML File Name : C:\RunTest.xml
File Machine : 10.7.1.142
Function : RunTest
Arguments : {'script': 'CP_RegressionTestController.bat MASTER_testsc
ript_reader.txt', 'build': 'FDF', 'version': '1.4.1.20090611', 'os': 'Win2003',
'email': 'DVETAF@savi.com', 'chamber': 'NFR3', 'java': 'JRE6', 'timeout': '2h',
'screenshot': 'NA'}
Scripts : []
Script Files : []
Script Machine : <None>
Source Machine : tcp://FKONG-D620@6500
Notify OnEnd : No
Clear Logs : Disabled
Log TC Elapsed Time: Enabled
Log TC Num Starts : Enabled
Log TC Start/Stop : Disabled
Python Output : JobUserLog
Python Log Level : Info
Blocks Running : 1
Blocks Held : 0
Blocks Unknown : 0
}
Can you list handles or query handle 27 on your STAX service machine to see if it has been deleted?
STAF staxmachine HANDLE LIST HANDLES
STAF staxmachine HANDLE QUERY HANDLE 27
Can you try to recreate the problem with STAF tracing including tracepoint Debug (which you didn't have on before) along with Error, Warning, ServiceRequest, ServiceResult.
Also, enable tracing for the HANDLE service along with the RESPOOL service.
Then recreate the problem and provide the STAF tracing as maybe that will provide more information on what's going on. I'd still like to see your STAX job too.
Are you still seeing this problem? If so, please provide the debug information that I requested in June. Thanks.
Closing this bug as invalid as you haven't responded to our request for more information from over 6 months ago. If you are still seeing this problem, please provide the STAF tracing information requested.