Menu

#3308 amf: unexpected reboot due to mismatch msg id

5.22.06
fixed
None
defect
amf
nd
5
major
False
2022-06-01
2022-02-21
No

The issue was noticed in ticket #3040. In that ticket the solution is reboot to recovery.
Some error msg in syslog
On SC changes from STB -> ACT

2022-02-19T14:06:28.439+01:00 SC-2 osafrded[26947]: NO Got peer info response from node 0x2010f with role STANDBY
2022-02-19T14:06:32.366+01:00 SC-2 osafamfd[27233]: NO Switching StandBy --> Active State
2022-02-19T14:06:32.489+01:00 SC-2 osafamfd[27233]: NO Active controller set to SC-2
2022-02-19T14:06:32.491+01:00 SC-2 osafamfnd[27248]: EM AVND record not found, after failover, snd_msg_id = 381, receive id = 380
2022-02-19T14:06:32.492+01:00 SC-2 osafamfnd[27248]: Rebooting OpenSAF NodeId = 2020f EE Name = , Reason: AVND record not found, after failover, OwnNodeId = 2020f, SupervisionTime = 60
2022-02-19T14:06:32.492+01:00 SC-2 osafamfnd[27248]: NO AVD NEW_ACTIVE, adest:1
2022-02-19T14:06:32.516+01:00 SC-2 osafrded[26947]: NO RDE role set to ACTIVE

On SC changes from ACT->STB

2022-02-19T14:06:29.589+01:00 SC-1 osafamfd[3531]: NO ROLE SWITCH Active --> Quiesced
2022-02-19T14:06:29.817+01:00 SC-1 osafrded[3297]: NO New active controller notification from consensus service
2022-02-19T14:06:30.662+01:00 SC-1 osafsmfd[3735]: NO SA_AMF_ADMIN_SI_SWAP [rc=1] successfully initiated
2022-02-19T14:06:30.664+01:00 SC-1 osafsmfd[3735]: NO Campaign thread terminated after SA_AMF_ADMIN_SI_SWAP
2022-02-19T14:06:32.495+01:00 SC-1 osafamfnd[3565]: NO AVD NEW_ACTIVE, adest:1
2022-02-19T14:06:32.497+01:00 SC-1 osafamfnd[3565]: EM AVND record not found, after failover, snd_msg_id = 596, receive id = 594
2022-02-19T14:06:32.499+01:00 SC-1 osafamfnd[3565]: Rebooting OpenSAF NodeId = 2010f EE Name = , Reason: AVND record not found, after failover, OwnNodeId = 2010f, SupervisionTime = 60
2022-02-19T14:06:32.523+01:00 SC-1 osafamfd[3531]: NO Switching Quiesced --> StandBy

The solution to prevent the unexpected reboot is to make msg id count align with new active.

Related

Wiki: ChangeLog-5.22.06

Discussion

  • Thang Duc Nguyen

    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -4,7 +4,6 @@
     ~~~
     2022-02-19T14:06:28.439+01:00 SC-2 osafrded[26947]: NO Got peer info response from node 0x2010f with role STANDBY
     2022-02-19T14:06:32.366+01:00 SC-2 osafamfd[27233]: NO Switching StandBy --> Active State
    -2022-02-19T14:06:32.482+01:00 SC-2 arbitration.plugin: obtained lock at arbitrator
     2022-02-19T14:06:32.489+01:00 SC-2 osafamfd[27233]: NO Active controller set to SC-2
     2022-02-19T14:06:32.491+01:00 SC-2 osafamfnd[27248]: EM AVND record not found, after failover, snd_msg_id = 381, receive id = 380
     2022-02-19T14:06:32.492+01:00 SC-2 osafamfnd[27248]: Rebooting OpenSAF NodeId = 2020f EE Name = , Reason: AVND record not found, after failover, OwnNodeId = 2020f, SupervisionTime = 60
    
     
  • Thang Duc Nguyen

    commit 4062588fae381ecf46b91ee7b7a5e4ab2e776210 (HEAD -> develop, origin/develop, ticket-3308)
    Author: thang.d.nguyen thang.d.nguyen@dektech.com.au
    Date: Mon Feb 21 08:53:32 2022 +0700

    amf: fix unexpected node reboot during failover [#3308]
    
    During SC failover, message sent on ACTIVE AMFD can not be
    checked point to AMFD on STANDBY SC. But the AMFND still
    increase receive/send msg id count. Then STANDBY SC takes
    ACTIVE and mismatch message id b/w AMFND and new active AMFD.
    Solution is to make msg id count alignment b/w AMFD/AMFND
    in this case.
    
     
  • Thang Duc Nguyen

    • status: assigned --> fixed
     

Log in to post a comment.

MongoDB Logo MongoDB