Menu

#3230 fm: unexpected reboot node

5.20.11
fixed
None
defect
fm
d
minor
False
2020-11-13
2020-11-04
Thuan Tran
No

Unexpected reboot node by FM

2020-10-27 14:36:54.200 SC-1 osafrded[156]: NO Lost connectivity to consensus service
2020-10-27 14:36:54.200 SC-1 osafrded[156]: Quick local node rebooting, Reason: Lost connectivity to consensus service. Rebooting this node

2020-10-27 14:36:58.488 SC-2 osaffmd[168]: ER Unable to set active controller in consensus service
2020-10-27 14:36:58.488 SC-2 osaffmd[168]: Quick local node rebooting, Reason: Unable to set active controller in consensus service
2020-10-27 14:36:58.529 SC-2 opensaf_reboot: Do quick local node reboot
2020-10-27 14:36:58.871 SC-2 osafdtmd[126]: NO Established contact with 'PL-3'
2020-10-27 14:36:58.876 SC-2 osafdtmd[126]: NO Established contact with 'SC-1'
2020-10-27 14:36:58.877 SC-2 osafrded[156]: NO Peer up on node 0x2010f
2020-10-27 14:36:59.842 SC-2 osafrded[156]: NO Got peer info response from node 0x2010f with role Undefined
2020-10-27 14:36:59.844 SC-2 osafrded[156]: NO Peer down on node 0x2010f
2020-10-27 14:37:00.783 SC-2 osafrded[156]: NO Peer up on node 0x2010f
2020-10-27 14:37:00.785 SC-2 osafrded[156]: NO Got peer info response from node 0x2010f with role ACTIVE

SCs reboot due to lost connection to consensus service (arbitrator) but somehow SC-2 slow reboot some seconds.
SC-1 reboot up and promote to Active but FM reboot node unexpectedly.

2020-10-27 14:36:59.841 SC-1 osafrded[163]: NO Peer up on node 0x2020f
2020-10-27 14:36:59.842 SC-1 osafrded[163]: NO Got peer info response from node 0x2020f with role STANDBY
2020-10-27 14:36:59.843 SC-1 osafrded[163]: NO RDE role set to QUIESCED
2020-10-27 14:36:59.844 SC-1 osafrded[163]: NO Giving up election against 0x2020f with role STANDBY. My role is now QUIESCED
2020-10-27 14:37:00.084 SC-1 /tcp.plugin: obtained lock at arbitrator
2020-10-27 14:37:00.098 SC-1 osafrded[163]: NO Active controller set to SC-1
2020-10-27 14:37:00.098 SC-1 osafrded[163]: NO Running '/usr/local/lib/opensaf/opensaf_sc_active' with 0 argument(s)
2020-10-27 14:37:00.781 SC-1 osafrded[163]: NO Switched to ACTIVE from QUIESCED
2020-10-27 14:37:02.893 SC-1 osaffmd[175]: NO AVD down on: 2020f
2020-10-27 14:37:02.893 SC-1 osaffmd[175]: NO AMFND down on: 2020f
2020-10-27 14:37:02.893 SC-1 osaffmd[175]: NO FM down on: 2020f
2020-10-27 14:37:02.893 SC-1 osaffmd[175]: NO IMMD down on: 2020f
2020-10-27 14:37:02.893 SC-1 osaffmd[175]: NO IMMND down on: 2020f
2020-10-27 14:37:02.893 SC-1 osaffmd[175]: NO Core services went down on node_id: 2020f
2020-10-27 14:37:02.893 SC-1 osaffmd[175]: NO Current role: ACTIVE
2020-10-27 14:37:02.895 SC-1 osaffmd[175]: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Failover occurred, but this node is not yet ready, OwnNodeId = 131343, SupervisionTime = 60
2020-10-27 14:37:02.895 SC-1 osafrded[163]: NO Peer down on node 0x2020f
2020-10-27 14:37:02.895 SC-1 osafimmd[188]: NO MDS event from svc_id 25 (change:4, dest:568511936069789)
2020-10-27 14:37:02.895 SC-1 osafimmd[188]: NO MDS event from svc_id 25 (change:4, dest:567412424442013)
2020-10-27 14:37:02.910 SC-1 opensaf_reboot: Rebooting local node; timeout=60

Related

Wiki: ChangeLog-5.20.11

Discussion

  • Thuan Tran

    Thuan Tran - 2020-11-04
    • status: assigned --> review
     
  • Thuan Tran

    Thuan Tran - 2020-11-13
    • status: review --> fixed
     
  • Thuan Tran

    Thuan Tran - 2020-11-13

    commit 4de5722f578966da5828340df9f9c0a8cb856ab7
    Author: thuan.tran thuan.tran@dektech.com.au
    Date: Wed Nov 4 17:45:19 2020 +0700

    fm: fix unexpected node reboot [#3230]
    
    - Only reboot if RDE role is not ACTIVE because
    there is a case that node just promote to ACTIVE.
    
     

Log in to post a comment.