Setup
Version : 4.6 FC
model : 2n
configuration : 1App,1SG,2SUs with 4comps each, 4SIs with 1 CSI each
SU1 is mapped to pl-3 and SU2 to pl-4
Initial state
All the AMF entities regarding the application are in unlocked states. SIs are in fully assigned state. SU1 is the standby SU and SU2 is the active SU
Steps Performed :
-> Ran the command "/etc/init.d/opensafd stop" on the PL-3 node.
Mar 20 15:34:47 SYSTEST-PLD-1 opensafd: Stopping OpenSAF Services
Mar 20 15:34:47 SYSTEST-PLD-1 osafamfnd[6835]: NO Shutdown initiated
Now SU2 on PL4 is having active assignments.
-> Started opensaf on PL-3 node.
Mar 20 15:36:43 SYSTEST-PLD-1 opensafd: Starting OpenSAF Services (Using TIPC)
Mar 20 15:36:45 SYSTEST-PLD-1 opensafd: OpenSAF(4.6.FC - ) services successfully started
Now SU2 on PL-4 is active and SU1 on PL-3 is standby.
-> Stopped opensaf on PL-4 node.
Mar 20 15:36:50 SYSTEST-PLD-1 osafamfnd[16251]: NO Assigned 'all SIs' ACTIVE of 'safSu=SU1,safSg=SG,safApp=test2nApp'
Mar 20 15:36:53 SYSTEST-PLD-1 kernel: [14611.120045] TIPC: Resetting link <1.1.3:eth2-1.1.4:eth2>, peer not responding
Mar 20 15:36:53 SYSTEST-PLD-1 kernel: [14611.120051] TIPC: Lost link <1.1.3:eth2-1.1.4:eth2> on network plane A
Mar 20 15:36:53 SYSTEST-PLD-1 kernel: [14611.120056] TIPC: Lost contact with <1.1.4>
Mar 20 15:37:08 SYSTEST-PLD-1 kernel: [14626.188976] TIPC: Established link <1.1.3:eth2-1.1.4:eth2> on network plane A
Now SU1 on PL-3 is active and SU2 is unassigned state.
-> Started opensaf on PL-4 node.
Mar 20 15:37:08 SYSTEST-PLD-1 kernel: [14626.188976] TIPC: Established link <1.1.3:eth2-1.1.4:eth2> on network plane A
Now the amf-state of all SIs are showing as partially assigned, as saAmfSINumCurrStandbyAssignments is set to the value 2, which is invalid for 2n model.
Callbacks for the components are proper, only the imm attribute is improperly updated by AMF.
saAmfSIPrefStandbyAssignments SA_UINT32_T 1 (0x1)
saAmfSIPrefActiveAssignments SA_UINT32_T 1 (0x1)
saAmfSINumCurrStandbyAssignments SA_UINT32_T 2 (0x2)
saAmfSINumCurrActiveAssignments SA_UINT32_T 1 (0x1)
saAmfSIAssignmentState SA_UINT32_T 3 (0x3)
AMF lock on SI had resulted in following values :
saAmfSIPrefStandbyAssignments SA_UINT32_T 1 (0x1)
saAmfSIPrefActiveAssignments SA_UINT32_T 1 (0x1)
saAmfSINumCurrStandbyAssignments SA_UINT32_T 1 (0x1)
saAmfSINumCurrActiveAssignments SA_UINT32_T 0 (0x0)
From logs analysis, si swap was issued :
Mar 20 15:07:50.825598 osafamfd [2353:si.cc:0821] >> si_admin_op_cb: safSi=SI3,safApp=test2nApp op=7
But component got timeout while transitioning from Quisced to Standby and SU failover triggered.
Mar 20 15:08:10 SYSTEST-PLD-1 osafamfnd[6835]: NO Performing failover of 'safSu=SU1,safSg=SG,safApp=test2nApp' (SU failover count: 5)
Mar 20 15:08:10 SYSTEST-PLD-1 osafamfnd[6835]: NO 'safComp=COMP1,safSu=SU1,safSg=SG,safApp=test2nApp' recovery action escalated from 'componentFailover' to 'suFailover'
Mar 20 15:08:10 SYSTEST-PLD-1 osafamfnd[6835]: NO 'safComp=COMP1,safSu=SU1,safSg=SG,safApp=test2nApp' faulted due to 'csiSetcallbackTimeout' : Recovery is 'suFailover'
It is reproducible with the following steps:
1. Configure SU failover for amf demo app and perform SI swap after unlocking both the SUs.
2. Keep gdb to Timeout when comp is going to standby from quisced.
3. Perform immlist.
saAmfSINumCurrStandbyAssignments is still 1, which should be zero
saAmfSINumCurrStandbyAssignments SA_UINT32_T 1 (0x1)
A defect can not have a future release as milestone.
If the defect exists on 4.6 branch then it should be fixed on the 4.6
branch and any later branches.
changeset: 6568:532573afb8da
branch: opensaf-4.5.x
parent: 6562:e0acc354bd06
user: Nagendra Kumarnagendra.k@oracle.com
date: Fri May 22 15:10:55 2015 +0530
summary: amfd: adjust saAmfSINumCurrStandbyAssignments during HA state change [#1276]
changeset: 6569:2b5372f7166b
branch: opensaf-4.6.x
parent: 6566:9bff9230b284
user: Nagendra Kumarnagendra.k@oracle.com
date: Fri May 22 15:11:39 2015 +0530
summary: amfd: adjust saAmfSINumCurrStandbyAssignments during HA state change [#1276]
changeset: 6570:5720e8f398e3
tag: tip
parent: 6567:89b2c6789acf
user: Nagendra Kumarnagendra.k@oracle.com
date: Fri May 22 15:11:47 2015 +0530
summary: amfd: adjust saAmfSINumCurrStandbyAssignments during HA state change [#1276]
[staging:532573]
[staging:2b5372]
[staging:5720e8]
Related
Tickets:
#1276Commit: [2b5372]
Commit: [532573]
Commit: [5720e8]