Menu

#807 AMF returns TRYAGAIN for saAmfRegister

4.3.3
fixed
None
defect
amf
-
4.3
major
2014-06-18
2014-03-06
Hans Feldt
No

Use case: node lock followed by node lock instantiation, During node lock a component causes an SUfailover followed by repair of the component which means instantiation.

So we have a component in state INSTANTIATING when the SU terminate request comes. AMF then (silently) escalates this to execute cleanup of the already instantiating component and changes its state to TERMINATING. Due to timing the cleanup script does not find any process to kill and returns 0. At the same time the instantiate script starts a process that calls saAmfRegister which returns TRYAGAIN because the component is in TERMINATING state.

Suggestions:
- AMF should probably return BAD-OPERATION in this case (is there any valid case where it should return TRYAGAIN?)
- the reason for escalating to cleanup should be logged

Finally, should AMF really start a second CLC CLI script while it knows one is already running? This implies that the cleanup script must be able to kill the instantiate script which is not stated in the specification besides I haven't any such script. So maybe AMF should kill the child process executing the instantiate script before it starts cleanup. This is a change in "core" since there is no interface to do this.

Related

Tickets: #807
Wiki: ChangeLog-4.3.3
Wiki: ChangeLog-4.4.1

Discussion

  • Hans Feldt

    Hans Feldt - 2014-03-26

    As discussed, the director should return TRYAGAIN when an SU is in INSTANTIATING or RESTARTING presence state.

    Check logic for register and TRYAGAIN

     
  • Nagendra Kumar

    Nagendra Kumar - 2014-03-27
    • status: unassigned --> accepted
    • assigned_to: Nagendra Kumar
     
  • Nagendra Kumar

    Nagendra Kumar - 2014-04-02

    Below is the description for returning BAD_OPERATION by saAmfComponentRegister() and it doesn't match with the ticket use case:
    SA_AIS_ERR_BAD_OPERATION:
    The proxy component which is identified by the
    name referred to by proxyCompName and which is registering a proxied component
    has not been assigned the proxy CSI with the active HA state through which the proxied
    component being registered is supposed to be proxied.

    The most suitable return is SA_AIS_ERR_TRY_AGAIN, which is being return.

     
  • Nagendra Kumar

    Nagendra Kumar - 2014-04-02
    • status: accepted --> review
     
  • Nagendra Kumar

    Nagendra Kumar - 2014-04-03

    Hi Hans, any comment ?

     
  • Hans Feldt

    Hans Feldt - 2014-04-03

    I think the patch you have sent is good (will review) but I think there is a race that can only be addressed by changing amfnd.

    If the LOCK-IN admin command is received when an SU is UNINSTANTIATED but the repair order has been sent from amfd to amfnd, we have the same problem again! The SU terminate order is received by amfnd but now the SU is in INSTANTIATING escalating to cleanup of the component. If the cleanup script in this case succeeds without killing a previously started process e.g. because there is no PID file, There will be a detached component process started trying to register.

    So if you are worried about the return code BAD-OPERATION (which I am not) then we could use ERR_LIBRARY which un-arguedly would(should) cause this program to exit.

     
  • Nagendra Kumar

    Nagendra Kumar - 2014-04-03

    Agree, there is a race condition still exists.
    Let me check it again.

     
  • Nagendra Kumar

    Nagendra Kumar - 2014-04-04

    Ok, sent the patch again by incorporating BAD_OP return.

     
  • Nagendra Kumar

    Nagendra Kumar - 2014-04-08

    changeset: 5116:f3705e8f90fc
    branch: opensaf-4.3.x
    parent: 5113:dab0d4067b90
    user: Nagendra Kumarnagendra.k@oracle.com
    date: Tue Apr 08 13:19:02 2014 +0530
    summary: amfd: return TRY_AGAIN for su/node lockin op if su pres state is not appropriate [#807]

    changeset: 5117:24c5651c6639
    branch: opensaf-4.4.x
    parent: 5114:9dbafd1322b9
    user: Nagendra Kumarnagendra.k@oracle.com
    date: Tue Apr 08 13:30:57 2014 +0530
    summary: amfd: return TRY_AGAIN for su/node lockin op if su pres state is not appropriate [#807]

    changeset: 5118:be04892c8ef3
    tag: tip
    parent: 5115:257088744782
    user: Nagendra Kumarnagendra.k@oracle.com
    date: Tue Apr 08 13:31:20 2014 +0530
    summary: amfd: return TRY_AGAIN for su/node lockin op if su pres state is not appropriate [#807]

    [staging:f3705e]
    [staging:24c565]
    [staging:be0489]

     

    Related

    Commit: [24c565]
    Commit: [be0489]
    Commit: [f3705e]
    Tickets: #807

  • Nagendra Kumar

    Nagendra Kumar - 2014-04-08
    • status: review --> fixed
     
  • Nagendra Kumar

    Nagendra Kumar - 2014-04-08
    • Milestone: future --> 4.3.3
     
  • Nagendra Kumar

    Nagendra Kumar - 2014-06-13

    The following should be added in 2.2.5 Implementation Notes:
    When component calls register api when its presence state is in SA_AMF_PRESENCE_UNINSTANTIATED or SA_AMF_PRESENCE_TERMINATING or SA_AMF_PRESENCE_INSTANTIATION_FAILED or SA_AMF_PRESENCE_TERMINATION_FAILED, then AMF returns SA_AIS_ERR_BAD_OPERATION.

     
  • Nagendra Kumar

    Nagendra Kumar - 2014-06-17

    Any comment ? I am going to document it today if there are no objection.

     
  • Nagendra Kumar

    Nagendra Kumar - 2014-06-18

    The following has been added in the document:
    7.6.1 saAmfComponentRegister() Yes, Partly When component calls register api when its presence state is in SA_AMF_PRESENCE_UNINSTANTIATED or SA_AMF_PRESENCE_TERMINATING or SA_AMF_PRESENCE_INSTANTIATION_FAILED or SA_AMF_PRESENCE_TERMINATION_FAILED, then AMF returns SA_AIS_ERR_BAD_OPERATION

    changeset: 105:989eabc3151d
    branch: opensaf-4.3.x
    parent: 103:8b6ab33b700f
    user: Nagendra Kumarnagendra.k@oracle.com
    date: Wed Jun 18 10:12:33 2014 +0530
    summary: amf: add deviation for saAmfComponentRegister [#807]

    changeset: 106:8f0b440af2f9
    branch: opensaf-4.4.x
    parent: 102:522d968ac27d
    user: Nagendra Kumarnagendra.k@oracle.com
    date: Wed Jun 18 10:17:36 2014 +0530
    summary: amf: add deviation for saAmfComponentRegister [#807]

    changeset: 107:2c7c528009af
    tag: tip
    parent: 104:09ba3a2ec82c
    user: Nagendra Kumarnagendra.k@oracle.com
    date: Wed Jun 18 10:31:17 2014 +0530
    summary: amf: add deviation for saAmfComponentRegister [#807]

     

    Related

    Tickets: #807


Log in to post a comment.