Menu

#1264 Same respool entry is assigned to multiple job requests

Win32
closed-invalid
5
2010-01-22
2009-06-03
sfresher
No

I am using STAX as the execution engine and the respool service is utilized to allocate available test systems. For e.g.

<call function="'STAF'">
[STAXController, 'RESPOOL', 'REQUEST POOL WinXP']
</call>

When the test is finished, the XML file calls RELEASE the pool entry.

However, I have intermittently encountered a problem that respool assigns the same entry to multiple job requests. I wish I could provide more detail but I cannot find any detailed trace info.

Discussion

  • Sharon Lucas

    Sharon Lucas - 2009-06-03

    What version of STAF are you using on the STAXController machine?

    Note that an entry in a resource pool can only be owned by one handle at a time so it is unlikely that there's a bug in the RESPOOL service regarding this. The RESPOOL service has been around a long time and has been used extensively by many groups, including us, with no problems like this.

    I need more information on exactly how you're using the RESPOOL service. Please attach your STAX xml file that is submitting RESPOOL REQUEST requests and I can look at it to see if I notice anything obviously wrong.

    Also, when you think that the same resource pool entry has been assigned to multiple STAX jobs, you should query the resource pool to see who is the owner (and there can only be one owner for each entry). For example:

    STAF yourSTAXMachine RESPOOL QUERY POOL WinXP

    Perhaps that will provide you with more information on what you're doing wrong in your STAX job.

     
  • Sharon Lucas

    Sharon Lucas - 2009-06-03
    • assigned_to: nobody --> slucas
     
  • sfresher

    sfresher - 2009-06-03

    I am using STAF 3.3.3.

    I agree that a resource is only assigned to only one resource at a time. However, intermittently, the resource is re-assigned to next job request, even though the current job request is still running and no RELEASE entry has been called.

    My XML file is kind of complex. But here is the command I used to REQUEST and RELEASE entry.

    <call function="'STAF'">
    [STAXController, 'RESPOOL', 'REQUEST POOL WinXP']
    </call>
    <script>testMachine = STAXResult [1] </>

    <call function="'STAF'">
    [STAXController, 'RESPOOL', 'RELEASE POOL WinXP ENTRY %s FORCE' % testMachine
    </call>

     
  • Sharon Lucas

    Sharon Lucas - 2009-06-04

    In the example you showed of how you're requesting an entry from resource pool WinXP, you're not checking if the return code is 0 so perhaps the request is failing intermittently for some reason. You should always check if a STAF service request was successful instead of assuming it was successful. For example:

    <call function="'STAF'">
    [STAXController, 'RESPOOL', 'REQUEST POOL WinXP']
    </call>

    <script>[rc, result] = STAXResult</script>

    <if expr="rc != STAFRC.Ok">
    <log message="1">
    'ERROR: RESPOOL REQUEST POOL WinXP failed with RC=%s Result=%s' % (rc, result)
    </log>
    <else>
    <script>testMachine = result</script>
    </else
    </if>

    It's hard to say what the problem is (something wrong in your job or a bug in the RESPOOL service) since you haven't provided your STAX XML file. We need you to provide a recreation scenario.

     
  • sfresher

    sfresher - 2009-06-11

    I have turned on all trace points for RESPOOL service. And intermittently, I did reproduce the problem. Here is the problem: two STAX jobs (13 & 23) obtained same respool (TAFSys05), almost in the same time. Note that even though I didn't specify to 'RELEASE POOL ENTRY', a deletePooling is logged almost immediately after JOB 13 obtained the pool entry.

    Running commands 'RESPOOL QUERY POOL' showed that the entry TAFSys05 is granted to 23.

    20090611-12:07:15;3080;00000001;RESPOOL Service Request - Client: local://local, Handle: 1, Process: STAF_Process, Request: STAF_CALLBACK HANDLEDELETED HANDLE 25 MACHINE :50:tcp://US01-TAF-TestController01.us01.savi.com@6500 UUID :32:FB4E314AA00500000A07018E65723031 KEY :12:ResPoolEntry
    20090611-12:07:15;2960;00000002;RESPOOL Service Result (0) - Client: tcp://US01-TAF-TestController01.us01.savi.com@6500, Handle: 27, Process: STAX/Job/13, Request: REQUEST POOL NFR3, Result: TAFSys05
    20090611-12:07:15;3080;00000002;RESPOOL Service Result (0) - Client: local://local, Handle: 1, Process: STAF_Process, Request: STAF_CALLBACK HANDLEDELETED HANDLE 25 MACHINE :50:tcp://US01-TAF-TestController01.us01.savi.com@6500 UUID :32:FB4E314AA00500000A07018E65723031 KEY :12:ResPoolEntry, Result: <No Result>
    20090611-12:07:15;2996;00000002;Remote RESPOOL Service Result (0) - To: us01-TAF-TestController01.us01.savi.com, from Handle: 27, Process: STAX/Job/13, Request: REQUEST POOL NFR3, Result: TAFSys05
    20090611-12:07:15;1512;00002000;deletePolling(26, tcp://US01-TAF-TestController01.us01.savi.com@6500, FB4E314AA00500000A07018E65723031, RESPOOL, ResPoolEntry)

    20090611-12:07:15;3104;00000001;RESPOOL Service Request - Client: local://local, Handle: 1, Process: STAF_Process, Request: STAF_CALLBACK HANDLEDELETED HANDLE 27 MACHINE :50:tcp://US01-TAF-TestController01.us01.savi.com@6500 UUID :32:FB4E314AA00500000A07018E65723031 KEY :12:ResPoolEntry
    20090611-12:07:15;2988;00000001;VAR Service Request - Client: local://local, Handle: 4, Process: STAF/Service/LOG, Request: RESOLVE REQUEST 6575 STRING :23:{STAF/Service/LOG/Mask}
    20090611-12:07:15;3172;00000002;RESPOOL Service Result (0) - Client: tcp://US01-TAF-TestController01.us01.savi.com@6500, Handle: 37, Process: STAX/Job/23, Request: REQUEST POOL NFR3, Result: TAFSys05
    20090611-12:07:15;3104;00000002;RESPOOL Service Result (0) - Client: local://local, Handle: 1, Process: STAF_Process, Request: STAF_CALLBACK HANDLEDELETED HANDLE 27 MACHINE :50:tcp://US01-TAF-TestController01.us01.savi.com@6500 UUID :32:FB4E314AA00500000A07018E65723031 KEY :12:ResPoolEntry, Result: <No Result>
    20090611-12:07:15;3180;00000002;Remote RESPOOL Service Result (0) - To: us01-TAF-TestController01.us01.savi.com, from Handle: 37, Process: STAX/Job/23, Request: REQUEST POOL NFR3, Result: TAFSys05

     
  • Sharon Lucas

    Sharon Lucas - 2009-06-11

    The trace results show that STAX/Job/13 that has handle 27 requested a resource from pool NFR3 and got resource entry TAFSys05. The trace results then show a "STAF_CALLBACK HANDLEDELETE HANDLE 27 ..." request was submitted to the RESPOOL service. Note that this request is only submitted if handle 27 has been deleted. This would apparently mean that STAX job 13 has completed, which means its STAF handle was deleted, which means that garbage collection would occur for any respool entries owned by STAF handle 27. Garbage collection occurs, releasing any respool entries owned by STAF handle 27 (since this handle no longer exists). So, now resource entry TAFSys05 is available. So the next pending request (which was submitted by STAX Job 23 with handle 37 gets resource entry TAFSys05.

    Note that STAF performs garbage collection by default to automatically release RESPOOL entries that are owned by handles that no longer exist (have been deleted) since you did not specify the "GARBAGECOLLECT No" option on your RESPOOL REQUEST requests. See the STAF User's Guide for more information on garbage collection.

    Did you query the STAX_Job_13 log to see what time this job completed to verify that this is what is happening? Perhaps you didn't realize that STAF performs automatic garbage collection so you thought that the only way a resource pool entry would be released is if a RELEASE request is submitted? If you require that a resource continue to be held by a STAF handle that no longer exists, then you must specify the "GARBAGECOLLECT No" option on the RESPOOL REQUEST request.

     
  • sfresher

    sfresher - 2009-06-12

    Hi Sharon,

    I intentionally leave the the GARBAGECOLLECT at YES because I understand if the test job is terminated, then the respool entry is automatically released back to the pool.

    In this case, both JOB 13 & 23 are still running, I don't know why handle 27 is deleted. Here is the query results:

    H:\&gt;staf local stax query job 13
    Response
    --------
    {
    Job ID : 13
    Job Name : <None>
    Start Date-Time : 20090611-11:47:12
    XML File Name : C:\RunTest.xml
    File Machine : 10.7.1.142
    Function : RunTest
    Arguments : {'script': 'CP_RegressionTestController.bat MASTER_testsc
    ript_iso.txt', 'build': 'FDF', 'version': '1.4.1.20090611', 'os': 'WinXP', 'emai
    l': 'DVETAF@savi.com', 'chamber': 'NFR3', 'java': 'JRE6', 'timeout': '4h', 'scre
    enshot': 'NA'}
    Scripts : []
    Script Files : []
    Script Machine : <None>
    Source Machine : tcp://FKONG-D620@6500
    Notify OnEnd : No
    Clear Logs : Disabled
    Log TC Elapsed Time: Enabled
    Log TC Num Starts : Enabled
    Log TC Start/Stop : Disabled
    Python Output : JobUserLog
    Python Log Level : Info
    Blocks Running : 1
    Blocks Held : 0
    Blocks Unknown : 0
    }

    H:\&gt;staf local stax query job 23
    Response
    --------
    {
    Job ID : 23
    Job Name : <None>
    Start Date-Time : 20090611-11:47:37
    XML File Name : C:\RunTest.xml
    File Machine : 10.7.1.142
    Function : RunTest
    Arguments : {'script': 'CP_RegressionTestController.bat MASTER_testsc
    ript_reader.txt', 'build': 'FDF', 'version': '1.4.1.20090611', 'os': 'Win2003',
    'email': 'DVETAF@savi.com', 'chamber': 'NFR3', 'java': 'JRE6', 'timeout': '2h',
    'screenshot': 'NA'}
    Scripts : []
    Script Files : []
    Script Machine : <None>
    Source Machine : tcp://FKONG-D620@6500
    Notify OnEnd : No
    Clear Logs : Disabled
    Log TC Elapsed Time: Enabled
    Log TC Num Starts : Enabled
    Log TC Start/Stop : Disabled
    Python Output : JobUserLog
    Python Log Level : Info
    Blocks Running : 1
    Blocks Held : 0
    Blocks Unknown : 0
    }

     
  • Sharon Lucas

    Sharon Lucas - 2009-06-12

    Can you list handles or query handle 27 on your STAX service machine to see if it has been deleted?

    STAF staxmachine HANDLE LIST HANDLES
    STAF staxmachine HANDLE QUERY HANDLE 27

    Can you try to recreate the problem with STAF tracing including tracepoint Debug (which you didn't have on before) along with Error, Warning, ServiceRequest, ServiceResult.

    Also, enable tracing for the HANDLE service along with the RESPOOL service.

    Then recreate the problem and provide the STAF tracing as maybe that will provide more information on what's going on. I'd still like to see your STAX job too.

     
  • Sharon Lucas

    Sharon Lucas - 2009-07-24

    Are you still seeing this problem? If so, please provide the debug information that I requested in June. Thanks.

     
  • Sharon Lucas

    Sharon Lucas - 2010-01-22

    Closing this bug as invalid as you haven't responded to our request for more information from over 6 months ago. If you are still seeing this problem, please provide the STAF tracing information requested.

     
  • Sharon Lucas

    Sharon Lucas - 2010-01-22
    • milestone: --> Win32
    • status: open --> closed-invalid
     

Log in to post a comment.