Changeset : 6901
Setup : NPM application with 4 SUs hosted on PL-3 & PL-4 and 4SIs
SU1 & SU3 hosted on PL-3 , SU2 & SU4 hosted on PL-4
Steps :
After a series of operation on the NPM application, below are the state of assignments
| TestApp_SI1 | TestApp_SI2 | TestApp_SI3 | TestApp_SI4
TestApp_SU1| ACTIVE | ACTIVE | |
TestApp_SU2| | | ACTIVE | ACTIVE
TestApp_SU3| STANDBY | STANDBY | STANDBY |
TestApp_SU4| | | | STANDBY
After opensafd is stopped on PL-3, below are the assignments
TestApp_SI1 TestApp_SI2 TestApp_SI3 TestApp_SI4
TestApp_SU1
TestApp_SU2 ACTIVE ACTIVE
TestApp_SU3
TestApp_SU4 STANDBY STANDBY STANDBY
Corresponding log in syslog on PL-4 :
Oct 23 19:00:29 PAYLOAD-2 osafimmnd[8101]: NO Implementer disconnected 40 <0, 2010f> (MsgQueueService131855)
Oct 23 19:00:29 PAYLOAD-2 osafamfnd[8120]: NO Assigning 'safSi=TestApp_SI1,safApp=TestApp_Npm' STANDBY to 'safSu=TestApp_SU4,safSg=TestApp_SG1,safApp=TestApp_Npm'
Oct 23 19:00:29 PAYLOAD-2 osafamfnd[8120]: NO Assigning 'safSi=TestApp_SI2,safApp=TestApp_Npm' STANDBY to 'safSu=TestApp_SU4,safSg=TestApp_SG1,safApp=TestApp_Npm'
Oct 23 19:00:29 PAYLOAD-2 osafamfnd[8120]: NO Assigned 'safSi=TestApp_SI2,safApp=TestApp_Npm' STANDBY to 'safSu=TestApp_SU4,safSg=TestApp_SG1,safApp=TestApp_Npm'
Oct 23 19:00:29 PAYLOAD-2 osafamfnd[8120]: NO Assigned 'safSi=TestApp_SI1,safApp=TestApp_Npm' STANDBY to 'safSu=TestApp_SU4,safSg=TestApp_SG1,safApp=TestApp_Npm'
Oct 23 19:00:32 PAYLOAD-2 kernel: [ 7785.128227] TIPC: Resetting link <1.1.4:eth3-1.1.3:eth3>, peer not responding
Attached is amfd.state and amfd traces on active controller, amfnd trace on payload hosting SU2 & SU4 and also the NPM configuration.
Analysis:
In the reported problem, stopped node consists of active and standby SUs for two SIs.
When this node is stopped, AMFD tries to failover the SUs. For the active SU it deletes
the SUSI for both SIs. Since failover of this SU is not possible as standby assignment
also resides (susi to be deleted) on the stopped node only, AMFD runs new assignment logic.
Since no SU is available for active assignment for any SI, AMFD tries for fresh standby assignments.
In the logic of assigning standby HA state, AMFD checks if active assignments are there by checking
list_of_susi and also check if standby assignment is present. Here it must be noted that AMFD
checks only list_of_susi and not the HA state and it assumes that since list_of_susi is not null
it means active assignment are there. In the reported problem since both active and standby
assignments resides on the stopped node, AMFD gets list_of_susi non-null (contains susi for standby
HA state which will be deleted in failover logic of node). Since only one susi(assuming it active)
is present AMFD goes for standby assignments.
https://sourceforge.net/p/opensaf/mailman/message/34810751/
changeset: 7682:b2a10ccc7909
branch: opensaf-4.7.x
user: praveen.malviya@oracle.com
date: Fri May 27 14:59:50 2016 +0530
summary: amfd: fix assignment of standby HA state without active HA state, NPM model [#1562]
changeset: 7683:1cb66b924c14
branch: opensaf-5.0.x
parent: 7677:3d261f31dec7
user: praveen.malviya@oracle.com
date: Fri May 27 15:00:27 2016 +0530
summary: amfd: fix assignment of standby HA state without active HA state, NPM model [#1562]
changeset: 7684:4390ef55f5ad
tag: tip
parent: 7680:eda44b129c47
user: praveen.malviya@oracle.com
date: Fri May 27 15:00:50 2016 +0530
summary: amfd: fix assignment of standby HA state without active HA state, NPM model [#1562]
Related
Tickets:
#1562