In configuration of 2N application which has active SU hosted in controller and the other standby SU is hosted in payload, the event of stopping both SCs could generate a su_si assignment message towards standby SU to change HA state to active.
Testing on similar application's configuration continues, problem found will be added in comments
Another failure in recovery after SC absence is seen with attached model. The model is 2N application, 5 SUs hosted in each nodes in a 5-nodes cluster.
Initially, SU1 (in SC1) and SU2 (in SC2) have active and standby assignment. Abruptly stop SC1 and SC2, SU3 (PL-3) appears to have standby assignment.
When SC comes back, amfd reads assignment from IMM and from amfnd in PL3:
amfd receives SU3's assignment as sync info sent by amfnd in PL3
amfd reads from IMM, SU1 and SU2 are still having active and standby assignments, SU3 has no assignment.
When amfd performs recovery, there are 2 SUs having standby assignment (SU2, SU3) and SU1 has active assignment. This state of 2N assignment is not valid, and SG Fsm node_fail() could not act as a recovery method. Only SU2 is failed over, SU1 still has absent assignment, thus its readiness state is still OUT_OF_SERVICE
The problem seems to be at avd_create_susi_in_imm() and avd_delete_siassignment_from_imm(), which have creation and deletion susi assignment are queued up thus could not perform immediately.
The result is IMM object and assignment object in amfnd being far from consistency.
Log and traces are attached for more details.
Diff:
Change ticket title to be more general for two problems being found:
- Processing su_si assignment when amfd is down (as in description)
- Problem of creation and deletion of susi assignment object (as in previous comment)
Hi Nagu,
The log/trace is in previous comment. I think you would see the issue of creation/deletion IMM assignment object in the trace file. Assignment failover of 2N works as normal, except the creation/deletion IMM assignment objects are queued up, thus they won't be writen to IMM when last controller goes down.
thanks,
Minh
develop
commit 8c09ce778f01cd0b202a2b7b9fd51dbc14648674
Author: Minh Chau minh.chau@dektech.com.au
Date: Mon May 15 16:48:27 2017 +1000
commit 7e94d931c1e3e91cdcdb81e20099105e12af2fab
Author: Minh Chau minh.chau@dektech.com.au
Date: Mon May 15 16:47:31 2017 +1000
commit c2a0c02205635cfaaa04cf20f60592dca2a58021
Author: Minh Chau minh.chau@dektech.com.au
Date: Mon May 15 16:44:13 2017 +1000
release
commit 567ea33f1cce63b2a022a157d2e87b68c0d8eb8c
Author: Minh Chau minh.chau@dektech.com.au
Date: Mon May 15 16:48:27 2017 +1000
commit f2224398f96b3bd62dc60c14dd1c220a0e5c4faa
Author: Minh Chau minh.chau@dektech.com.au
Date: Mon May 15 16:47:31 2017 +1000
commit bd8591ddb0965f10e65c79b6b4003a508b7c8971
Author: Minh Chau minh.chau@dektech.com.au
Date: Mon May 15 16:44:13 2017 +1000
hg default
changeset: 8797:cd58177b7eee
tag: tip
user: Minh Hon Chau minh.chau@dektech.com.au
date: Mon May 15 22:12:50 2017 +1000
summary: amfnd: Ignore susi_assign_evh while active amfd is down [#2416]
changeset: 8796:48822f9b2dc5
user: Minh Hon Chau minh.chau@dektech.com.au
date: Mon May 15 22:07:36 2017 +1000
summary: amfd: Make creation and deletion of assignment object as IMM synced call [#2416]
changeset: 8795:56dbe63d12fb
parent: 8793:55c2a1420b3b
user: Minh Hon Chau minh.chau@dektech.com.au
date: Mon May 15 22:04:26 2017 +1000
summary: amfd: Add iteration to failover all absent assignments [#2416]
Related
Tickets:
#2416There is a problem in Opensaf 2N switchover with enabled SC Absence feature. Reopen it, and additional correction patch is sent for review
hg:default
changeset: 8799:99c99e6d8a34
tag: tip
user: Minh Hon Chau minh.chau@dektech.com.au
date: Thu May 18 01:07:17 2017 +1000
summary: amfd: Check IMM service status before use IMM call [#2416]
git:develop
commit fb0a5e1d27a9eeac0aec4be18aaf4c1f648f6a6a
Author: Minh Chau minh.chau@dektech.com.au
Date: Wed May 17 07:25:37 2017 +1000
amfd: Check IMM service status before use IMM call [#2416]
git:release
commit 4ba33fbdb1d28e47905772239bf85a0d23b0c66a
Author: Minh Chau minh.chau@dektech.com.au
Date: Wed May 17 07:25:37 2017 +1000
amfd: Check IMM service status before use IMM call [#2416]
Related
Tickets:
#2416