Post cluster reboot, a CCB can get aborted after create operation and before apply operation, because of a synch request, say originating from a payload.
i.e. During modify information model and set maintenance status and while in "Executing AU lock activation step" , there is a chance that CCB was aborted after create, but before being applied.
SMF should retry the whole CCB in the above scenario.
The desired fix is for SMF to be able to retry the CCBs.
However,
For an optimal solution, SMF would require an indication(callback) that a CCB was aborted with possiblility to also provide the reason for the abort! In the absence of such an indication(callback) the SMF would have to rely on a solution that is based on indirect interpretaion of the error codes received for AdminOwnerSet and the CCB APIs.
Shall prepare a patch that correlates the ERR_NOT_EXIST and ERR_TIMEOUT error codes that would be returned to the AdminOwnerSet() and CCB APIs when a CCB is aborted.
This is possible by Either modifying the modifyInformationModel() method to retry the immUtil.doImmOperations() for say 3 times OR breaking down modifyInformationModel into two functions OR...!
Having said that, in my opinion i think the long term solution is for SMF and other services to start providing service(including modifying information model) only after they receive a CLM indication (post cluster restart) that the node/cluster is ready (and cluster information is synched up).
Attached is a patch that retries when the modifyInformationModel() fails.
Note: This scenario can also be handled alternatively by way of making SMF utilise the IMM cluster startuptime related variables (i.e. wait for all the nodes tobe loaded at the same time).
[Commit:8edafe]
[Commit:e186b2]
[Commit:b9995f]
changeset: 5548:b9995f897521
tag: tip
parent: 5545:5bb62a8b4a26
user: Mathivanan N.P.mathi.naickan@oracle.com
date: Mon Aug 11 18:52:27 2014 -0400
summary: smf: retry modify information model upon CCB abort post cluster restart [#973]
changeset: 5547:e186b2cdc460
branch: opensaf-4.4.x
parent: 5540:830e6a0b6834
user: Mathivanan N.P.mathi.naickan@oracle.com
date: Mon Aug 11 18:49:28 2014 -0400
summary: smf: retry modify information model upon CCB abort post cluster restart [#973]
changeset: 5546:8edafee4cb70
branch: opensaf-4.3.x
parent: 5531:019649b3b1b2
user: Mathivanan N.P.mathi.naickan@oracle.com
date: Mon Aug 11 18:48:09 2014 -0400
summary: smf: retry modify information model upon CCB abort post cluster restart [#973]
Related
Tickets:
#973