Menu

#2416 amf: Problem of assignment failover during stop of both SCs (SC Absence)

5.17.07
fixed
None
defect
amf
-
major
False
2017-07-27
2017-04-10
No

In configuration of 2N application which has active SU hosted in controller and the other standby SU is hosted in payload, the event of stopping both SCs could generate a su_si assignment message towards standby SU to change HA state to active.

  • In case this su_si assignment message is buffered and comes before MDSNCS_DOWN, node is rebooted
  • In other cases where MDSNCS_DOWN comes before su_si assignment, currently amfnd does not ignore this su_si assignment. amfnd should ignore this su_si assignment message as similiar to other messages like su_pres, su_reg

Testing on similar application's configuration continues, problem found will be added in comments

Related

Tickets: #2416
Tickets: #2477
Wiki: ChangeLog-5.17.07

Discussion

  • Minh Hon Chau

    Minh Hon Chau - 2017-04-10
    • status: unassigned --> accepted
    • assigned_to: Minh Hon Chau
     
  • Minh Hon Chau

    Minh Hon Chau - 2017-04-28

    Another failure in recovery after SC absence is seen with attached model. The model is 2N application, 5 SUs hosted in each nodes in a 5-nodes cluster.
    Initially, SU1 (in SC1) and SU2 (in SC2) have active and standby assignment. Abruptly stop SC1 and SC2, SU3 (PL-3) appears to have standby assignment.
    When SC comes back, amfd reads assignment from IMM and from amfnd in PL3:

    amfd receives SU3's assignment as sync info sent by amfnd in PL3

    Apr 28 10:35:25.174806 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> avd_susi_create: safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon state=2
    Apr 28 10:35:25.174839 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> avd_susi_create: safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon state=2
    Apr 28 10:35:25.174873 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> avd_susi_create: safSu=SU3,safSg=AmfDemoTwon,safApp=AmfDemoTwon safSi=AmfDemoTwon,safApp=AmfDemoTwon state=2
    

    amfd reads from IMM, SU1 and SU2 are still having active and standby assignments, SU3 has no assignment.

    Apr 28 10:35:25.176649 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> avd_susi_create: safSu=SU2,safSg=AmfDemoTwon,safApp=AmfDemoTwon safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon state=2
    Apr 28 10:35:25.176814 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> avd_susi_create: safSu=SU2,safSg=AmfDemoTwon,safApp=AmfDemoTwon safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon state=2
    Apr 28 10:35:25.176984 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> avd_susi_create: safSu=SU2,safSg=AmfDemoTwon,safApp=AmfDemoTwon safSi=AmfDemoTwon,safApp=AmfDemoTwon state=2
    Apr 28 10:35:25.177222 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> avd_susi_create: safSu=SU1,safSg=AmfDemoTwon,safApp=AmfDemoTwon safSi=AmfDemoTwon,safApp=AmfDemoTwon state=1
    Apr 28 10:35:25.177413 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> avd_susi_create: safSu=SU1,safSg=AmfDemoTwon,safApp=AmfDemoTwon safSi=AmfDemoTwonDep1,safApp=AmfDemoTwon state=1
    Apr 28 10:35:25.177608 osafamfd [474:474:src/amf/amfd/siass.cc:0438] >> avd_susi_create: safSu=SU1,safSg=AmfDemoTwon,safApp=AmfDemoTwon safSi=AmfDemoTwonDep2,safApp=AmfDemoTwon state=1
    

    When amfd performs recovery, there are 2 SUs having standby assignment (SU2, SU3) and SU1 has active assignment. This state of 2N assignment is not valid, and SG Fsm node_fail() could not act as a recovery method. Only SU2 is failed over, SU1 still has absent assignment, thus its readiness state is still OUT_OF_SERVICE

    Apr 28 10:35:37.392562 osafamfd [474:474:src/amf/amfd/sg_2n_fsm.cc:3379] >> node_fail: 'safSu=SU2,safSg=AmfDemoTwon,safApp=AmfDemoTwon', 0
    

    The problem seems to be at avd_create_susi_in_imm() and avd_delete_siassignment_from_imm(), which have creation and deletion susi assignment are queued up thus could not perform immediately.
    The result is IMM object and assignment object in amfnd being far from consistency.

    Log and traces are attached for more details.

     
  • Minh Hon Chau

    Minh Hon Chau - 2017-04-28
    • summary: amfnd: su_si assignment message could be processed during SC absence stages --> amf: Problem of assignment failover during stop of both SCs (SC Absence)
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -2,3 +2,5 @@
    
     - In case this su_si assignment message is buffered and comes before MDSNCS_DOWN, node is rebooted
     - In other cases where MDSNCS_DOWN comes before su_si assignment, currently amfnd does not ignore this su_si assignment. amfnd should ignore this su_si assignment message as similiar to other messages like su_pres, su_reg
    +
    +Testing on similar application's configuration continues, problem found will be added in comments
    
    • Part: nd --> -
    • Blocker: --> False
    • Milestone: 5.1.1 --> 5.17.06
     
  • Minh Hon Chau

    Minh Hon Chau - 2017-04-28

    Change ticket title to be more general for two problems being found:
    - Processing su_si assignment when amfd is down (as in description)
    - Problem of creation and deletion of susi assignment object (as in previous comment)

     
  • Minh Hon Chau

    Minh Hon Chau - 2017-05-09
    • status: accepted --> review
     
  • Nagendra Kumar

    Nagendra Kumar - 2017-05-11

    Initially, SU1 (in SC1) and SU2 (in SC2) have active and standby assignment. Abruptly stop SC1 and SC2, SU3 (PL-3) appears to have standby assignment.
    Does this happen because SC-1 Amfd sees that SC-2 is going down so it sends standby assignment to SU3(PL-3) ? But this can happen only when PL-3 down has been recieved by SC-1 Amfd and has processed and then sent Standby to SU3, but all the RTA updates would have missed because SC-1 also went down. This results in SU1 is Act and two Standby SU2 and SU3. Am I right?
    Do you have traces and the time it has happened, I just want to analyse it.

     
  • Minh Hon Chau

    Minh Hon Chau - 2017-05-11

    Hi Nagu,

    The log/trace is in previous comment. I think you would see the issue of creation/deletion IMM assignment object in the trace file. Assignment failover of 2N works as normal, except the creation/deletion IMM assignment objects are queued up, thus they won't be writen to IMM when last controller goes down.

    thanks,
    Minh

     
  • Minh Hon Chau

    Minh Hon Chau - 2017-05-15
    • status: review --> fixed
    • assigned_to: Minh Hon Chau --> nobody
     
  • Minh Hon Chau

    Minh Hon Chau - 2017-05-15

    develop

    commit 8c09ce778f01cd0b202a2b7b9fd51dbc14648674
    Author: Minh Chau minh.chau@dektech.com.au
    Date: Mon May 15 16:48:27 2017 +1000

    commit 7e94d931c1e3e91cdcdb81e20099105e12af2fab
    Author: Minh Chau minh.chau@dektech.com.au
    Date: Mon May 15 16:47:31 2017 +1000

    commit c2a0c02205635cfaaa04cf20f60592dca2a58021
    Author: Minh Chau minh.chau@dektech.com.au
    Date: Mon May 15 16:44:13 2017 +1000

    release

    commit 567ea33f1cce63b2a022a157d2e87b68c0d8eb8c
    Author: Minh Chau minh.chau@dektech.com.au
    Date: Mon May 15 16:48:27 2017 +1000

    commit f2224398f96b3bd62dc60c14dd1c220a0e5c4faa
    Author: Minh Chau minh.chau@dektech.com.au
    Date: Mon May 15 16:47:31 2017 +1000

    commit bd8591ddb0965f10e65c79b6b4003a508b7c8971
    Author: Minh Chau minh.chau@dektech.com.au
    Date: Mon May 15 16:44:13 2017 +1000

     
  • Minh Hon Chau

    Minh Hon Chau - 2017-05-15

    hg default

    changeset: 8797:cd58177b7eee
    tag: tip
    user: Minh Hon Chau minh.chau@dektech.com.au
    date: Mon May 15 22:12:50 2017 +1000
    summary: amfnd: Ignore susi_assign_evh while active amfd is down [#2416]

    changeset: 8796:48822f9b2dc5
    user: Minh Hon Chau minh.chau@dektech.com.au
    date: Mon May 15 22:07:36 2017 +1000
    summary: amfd: Make creation and deletion of assignment object as IMM synced call [#2416]

    changeset: 8795:56dbe63d12fb
    parent: 8793:55c2a1420b3b
    user: Minh Hon Chau minh.chau@dektech.com.au
    date: Mon May 15 22:04:26 2017 +1000
    summary: amfd: Add iteration to failover all absent assignments [#2416]

     

    Related

    Tickets: #2416

  • Minh Hon Chau

    Minh Hon Chau - 2017-05-17
    • status: fixed --> review
    • assigned_to: Minh Hon Chau
     
  • Minh Hon Chau

    Minh Hon Chau - 2017-05-17

    There is a problem in Opensaf 2N switchover with enabled SC Absence feature. Reopen it, and additional correction patch is sent for review

     
  • Minh Hon Chau

    Minh Hon Chau - 2017-05-17

    hg:default
    changeset: 8799:99c99e6d8a34
    tag: tip
    user: Minh Hon Chau minh.chau@dektech.com.au
    date: Thu May 18 01:07:17 2017 +1000
    summary: amfd: Check IMM service status before use IMM call [#2416]

    git:develop
    commit fb0a5e1d27a9eeac0aec4be18aaf4c1f648f6a6a
    Author: Minh Chau minh.chau@dektech.com.au
    Date: Wed May 17 07:25:37 2017 +1000
    amfd: Check IMM service status before use IMM call [#2416]

    git:release
    commit 4ba33fbdb1d28e47905772239bf85a0d23b0c66a
    Author: Minh Chau minh.chau@dektech.com.au
    Date: Wed May 17 07:25:37 2017 +1000
    amfd: Check IMM service status before use IMM call [#2416]

     

    Related

    Tickets: #2416

  • Minh Hon Chau

    Minh Hon Chau - 2017-05-17
    • status: review --> fixed
     
  • Anders Widell

    Anders Widell - 2017-07-01
    • Milestone: 5.17.06 --> 5.17.08
     

Log in to post a comment.