Menu

#1866 Cluster reset happened because of CLMNA healthcheck timeout in headless state

future
unassigned
nobody
None
defect
clm
-
5.0.GA
major
2016-09-20
2016-06-08
Ritu Raj
No

setup:
Version - opensaf 5.0.GA
6-Node cluster(SC-1:Active, SC-2:Standby, SC-3:Spare PL:4,PL-5&PL-6: Payloads)

Issue Observed:
Cluster reset happened because of CLMNA healthcheck timeout in headless state

Steps Performed:
(1). Started Opensaf on 6-node cluster with Active, Stanbdy, Spare and 3 Payloads
(2). Performed Failover operation in order, killing Active Controller first followed by Standby and Spare controller.
(3). After few successful failover, CLMNA got crashed because of healthcheck timeout and cluster reset happened.

Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: WA saClmInitialize_4 returned 31
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: NO SU failover probation timer started (timeout: 1200000000000 ns)
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: NO Performing failover of 'safSu=PL-6,safSg=NoRed,safApp=OpenSAF' (SU failover count: 1)
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: NO 'safComp=CLMNA,safSu=PL-6,safSg=NoRed,safApp=OpenSAF' recovery action escalated from 'componentFailover' to 'suFailover'
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: NO 'safComp=CLMNA,safSu=PL-6,safSg=NoRed,safApp=OpenSAF' faulted due to 'healthCheckcallbackTimeout' : Recovery is 'suFailover'
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: ER safComp=CLMNA,safSu=PL-6,safSg=NoRed,safApp=OpenSAF Faulted due to:healthCheckcallbackTimeout Recovery is:suFailover
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: Rebooting OpenSAF NodeId = 132623 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 132623, SupervisionTime = 60
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: WA saClmInitialize_4 returned 31

Syslog of PL-6 and Active, Standby and Spare controllers is attached.
clmd and amfnd traces of Controller's attached.
amfnd tace of PL-6 is attached

  • The timestamp of PL-6 at which issue observed.
    Jun 16 16:26:18

Note:
There is time gap between all system
SC-1: Wed Jun 15 13:51:39 IST 2016
SC-2: Fri Jun 10 18:51:43 IST 2016
SC-3: Thu Jun 16 19:38:05 IST 2016
PL-6: Thu Jun 16 19:41:51 IST 2016

4 Attachments

Discussion

  • Anders Widell

    Anders Widell - 2016-09-20
    • Milestone: 4.7.2 --> 5.0.2
     
  • Anders Widell

    Anders Widell - 2017-04-03
    • Milestone: 5.0.2 --> future
     

Log in to post a comment.

MongoDB Logo MongoDB