Changeset : 4325
Model : TWON
Configuration: 1SG,5SUs having 3comps each, 5SIs with 3Csis each.
Intially: 5Node cluster, SU1 mapped to SC-1,SU2 to SC-2,SU3-PL3,SU4&SU5 to PL-4
SU3 was active and SU4 standby
si-si deps configured as SI1<-SI2<-SI3<-SI4
Recovery on Error = COMP_RESTART
Testcase:
Lock of active SU3. The component which receives the active cbk will do saAmfFinalize() api call. Like wise SU4 followed by SU1 then SU5 all got active cbk and called saAmfFinalize. No recovery triggered in any case.
SU3 later got active assignments, but no other SU got standby assignment.
Moreover SU2 which is a UNINSTANTIATED spare SU, didnot get instantiated, which is fair as PrefInserviceSUs=4.
SU states:
safSu=SU2,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=UNINSTANTIATED(1)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU4,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU5,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU1,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
safSu=SU3,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)
syslog on PL-3:
This is the time of test case start
Jul 9 17:20:57 PL-3 osafamfnd[3811]: NO Assigning 'safSi=TWONSI4,safApp=TWONAPP' QUIESCED to 'safSu=SU3,safSg=SGONE,safApp=TWONAPP'
Jul 9 17:20:57 PL-3 osafamfnd[3811]: NO Assigning 'safSi=TWONSI5,safApp=TWONAPP' QUIESCED to 'safSu=SU3,safSg=SGONE,safApp=TWONAPP'
Jul 9 17:20:57 PL-3 osafamfnd[3811]: NO Assigned 'safSi=TWONSI4,safApp=TWONAPP' QUIESCED to 'safSu=SU3,safSg=SGONE,safApp=TWONAPP'
Jul 9 17:20:57 PL-3 osafamfnd[3811]: NO Assigned 'safSi=TWONSI5,safApp=TWONAPP' QUIESCED to 'safSu=SU3,safSg=SGONE,safApp=TWONAPP'
Jul 9 17:20:57 PL-3 osafamfnd[3811]: NO Assigning 'safSi=TWONSI3,safApp=TWONAPP' QUIESCED to 'safSu=SU3,safSg=SGONE,safApp=TWONAPP'
Jul 9 17:20:57 PL-3 osafamfnd[3811]: NO Assigned 'safSi=TWONSI3,safApp=TWONAPP' QUIESCED to 'safSu=SU3,safSg=SGONE,safApp=TWONAPP'
SI states:
safSi=TWONSI4,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=TWONSI1,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=TWONSI3,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=TWONSI5,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=TWONSI2,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
Analysis:
saAmfFinalize() when invoked AMFND disables SU and unregisters the component since same handle was used for its registration. AMFD removes assignment from new active every time in which finalize has been done. But as a part of unregistration, AMFND is not marking the SU as failed. Due to this repair of SUs was not performed and they remained in DISABLED INSTANTIATED state.
But still it cannot be considered as a problem because in AMF B0401 spec saAmfComponentUnregister() has been removed. As a part of saAmfFinalize() components will be unregistered. But at the same time B0401 says(section 7.1.1 page 231):
"The Availability Management Framework also unregisters all components that are
still registered with a particular handle when that handle is finalized explicitly by
invoking the saAmfFinalize() function or implicitly when the process that initialized
the handle exits. However, if an SA-aware component finalizes a handle that still
has some registered components associated to it, the Availability Management
Framework treats this finalization as an error of the SA-aware component. An SAaware
component should only finalize a handle when the previously associated registered
components have automatically been unregistered by the Availability Management
Framework, as indicated above."
I think components in this issue have been modeled with only one process which itself is finalizing the handle. So AMF cannot consider it to be a fault on component and thus AMF is behaving correctly. But in such a case what should be the behavior of AMF from recovery perspective or no recovery is required as component itself decided to finalize.
Related ticket https://sourceforge.net/p/opensaf/tickets/322/
Not able to find any valid use case for this ticket. Reporting error could be a better way to report the component specific problem.