Menu

#973 SMF to retry CCB failures when executing AU lock activation step

4.3.3
fixed
None
defect
smf
d
4.3.x
major
2014-08-11
2014-07-31
No

Post cluster reboot, a CCB can get aborted after create operation and before apply operation, because of a synch request, say originating from a payload.
i.e. During modify information model and set maintenance status and while in "Executing AU lock activation step" , there is a chance that CCB was aborted after create, but before being applied.

SMF should retry the whole CCB in the above scenario.

Related

Tickets: #973
Wiki: ChangeLog-4.3.3
Wiki: ChangeLog-4.4.1

Discussion

  • Mathi Naickan

    Mathi Naickan - 2014-08-07

    The desired fix is for SMF to be able to retry the CCBs.
    However,
    For an optimal solution, SMF would require an indication(callback) that a CCB was aborted with possiblility to also provide the reason for the abort! In the absence of such an indication(callback) the SMF would have to rely on a solution that is based on indirect interpretaion of the error codes received for AdminOwnerSet and the CCB APIs.

    Shall prepare a patch that correlates the ERR_NOT_EXIST and ERR_TIMEOUT error codes that would be returned to the AdminOwnerSet() and CCB APIs when a CCB is aborted.

    This is possible by Either modifying the modifyInformationModel() method to retry the immUtil.doImmOperations() for say 3 times OR breaking down modifyInformationModel into two functions OR...!

    Having said that, in my opinion i think the long term solution is for SMF and other services to start providing service(including modifying information model) only after they receive a CLM indication (post cluster restart) that the node/cluster is ready (and cluster information is synched up).

     
  • Mathi Naickan

    Mathi Naickan - 2014-08-07

    Attached is a patch that retries when the modifyInformationModel() fails.

    Note: This scenario can also be handled alternatively by way of making SMF utilise the IMM cluster startuptime related variables (i.e. wait for all the nodes tobe loaded at the same time).

     
  • Mathi Naickan

    Mathi Naickan - 2014-08-11
    • status: assigned --> fixed
    • Version: --> 4.3.x
    • Milestone: 4.4.1 --> 4.3.3
     
  • Mathi Naickan

    Mathi Naickan - 2014-08-11

    [Commit:8edafe]
    [Commit:e186b2]
    [Commit:b9995f]

    changeset: 5548:b9995f897521
    tag: tip
    parent: 5545:5bb62a8b4a26
    user: Mathivanan N.P.mathi.naickan@oracle.com
    date: Mon Aug 11 18:52:27 2014 -0400
    summary: smf: retry modify information model upon CCB abort post cluster restart [#973]

    changeset: 5547:e186b2cdc460
    branch: opensaf-4.4.x
    parent: 5540:830e6a0b6834
    user: Mathivanan N.P.mathi.naickan@oracle.com
    date: Mon Aug 11 18:49:28 2014 -0400
    summary: smf: retry modify information model upon CCB abort post cluster restart [#973]

    changeset: 5546:8edafee4cb70
    branch: opensaf-4.3.x
    parent: 5531:019649b3b1b2
    user: Mathivanan N.P.mathi.naickan@oracle.com
    date: Mon Aug 11 18:48:09 2014 -0400
    summary: smf: retry modify information model upon CCB abort post cluster restart [#973]

     

    Related

    Tickets: #973


Log in to post a comment.