Menu

#3241 amf: cluster stuck unhealthy when SCs brutal reboot

5.21.03
fixed
None
defect
amf
nd
minor
False
2020-12-18
2020-12-01
Thuan Tran
No

Cluster stuck unhealthy under SCs brutal reboot

2020-11-26 06:58:45.011 SC-2 osafamfd[247]: NO Received node_up from 2010f: msg_id 1
2020-11-26 06:58:45.012 SC-2 osafamfd[247]: NO Node 'SC-1' joined the cluster
2020-11-26 06:58:48.240 SC-2 systemd-sysctl[35]: Couldn't write '4 4 1 7' to 'kernel/printk', ignoring: Read-only file system
2020-11-26 06:58:48.252 SC-2 systemd-sysctl[35]: Couldn't write '1' to 'kernel/kptr_restrict', ignoring: Read-only file system


2020-11-26 06:58:45.512 SC-1 osafamfnd[260]: NO Assigning 'safSi=NoRed1,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF'
2020-11-26 06:58:45.518 SC-1 osafamfnd[260]: NO Assigned 'safSi=NoRed1,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF'
2020-11-26 06:58:46.425 SC-1 osafdtmd[126]: NO Lost contact with 'SC-2'
2020-11-26 06:58:46.428 SC-1 osafamfnd[260]: WA AMF director unexpectedly crashed
2020-11-26 06:58:46.428 SC-1 osafamfnd[260]: NO Checking 'safSu=SC-1,safSg=2N,safApp=OpenSAF' for pending messages
2020-11-26 06:58:46.428 SC-1 osafamfnd[260]: NO Checking 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' for pending messages
2020-11-26 06:58:46.436 SC-1 osafamfnd[260]: NO 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Presence State INSTANTIATING => INSTANTIATED

SC-2 power off when SC-1 just up (not yet standby)
Then SC-1 enter headless and promote itself to Active (like roaming SC)
AMFND failed to record SU-SI as exist already

2020-11-26 06:58:49.365 SC-1 osafamfnd[260]: NO AVD NEW_ACTIVE, adest:1
2020-11-26 06:58:49.442 SC-1 osafamfnd[260]: NO saClmDispatch BAD_HANDLE
2020-11-26 06:58:49.442 SC-1 osafamfnd[260]: NO Sending node up due to NCSMDS_NEW_ACTIVE
2020-11-26 06:58:56.028 SC-1 osafamfnd[260]: CR SU-SI record addition failed, SU= safSu=SC-1,safSg=NoRed,safApp=OpenSAF : SI=safSi=NoRed1,safApp=OpenSAF
2020-11-26 06:58:56.038 SC-1 osafamfnd[260]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
2020-11-26 06:58:56.073 SC-1 osafamfnd[260]: NO Assigned 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'

2020-11-26 06:58:56.700 SC-1 osafamfd[247]: NO Received node_up from 2020f: msg_id 1
2020-11-26 06:58:57.086 SC-1 osafamfd[247]: NO Received node_up from 2050f: msg_id 1
2020-11-26 06:58:57.087 SC-1 osafamfd[247]: NO Received node_up from 2030f: msg_id 1
2020-11-26 06:58:57.090 SC-1 osafamfd[247]: NO Received node_up from 2040f: msg_id 1

<143>1 2020-11-26T06:59:02.179518+01:00 SC-1 osafamfd 247 osafamfd [meta sequenceId="18992"] 247:amf/amfd/ndfsm.cc:373 TR invalid init state (2), node 2020f
<143>1 2020-11-26T06:59:02.579492+01:00 SC-1 osafamfd 247 osafamfd [meta sequenceId="19025"] 247:amf/amfd/ndfsm.cc:373 TR invalid init state (2), node 2030f
<143>1 2020-11-26T06:59:02.579642+01:00 SC-1 osafamfd 247 osafamfd [meta sequenceId="19040"] 247:amf/amfd/ndfsm.cc:373 TR invalid init state (2), node 2040f
<143>1 2020-11-26T06:59:02.579795+01:00 SC-1 osafamfd 247 osafamfd [meta sequenceId="19055"] 247:amf/amfd/ndfsm.cc:373 TR invalid init state (2), node 2050f

Related

Wiki: ChangeLog-5.21.03

Discussion

  • Thuan Tran

    Thuan Tran - 2020-12-01
    • status: assigned --> review
     
  • Thuan Tran

    Thuan Tran - 2020-12-18
    • status: review --> fixed
     
  • Thuan Tran

    Thuan Tran - 2020-12-18

    commit 501241653d25bc2beffad7a25ea6a281d66c0c6f (HEAD -> develop, origin/develop)
    Author: thuan.tran thuan.tran@dektech.com.au
    Date: Tue Dec 1 17:06:31 2020 +0700

    amf: fix cluster stuck unhealthy when SCs brutal reboot [#3241]
    
    When see AMFD UP/NEW_ACTIVE in AMFD down state TRUE, AMFND should
    send sync info if any assigned NCS SUs. After msg node_up acked,
    resend buffered headless msg for NCS SUs.
    
     

Log in to post a comment.