Menu

#2040 clmd seg faulted on active controller during switchover

future
unassigned
nobody
None
defect
clm
d
major
2016-09-20
2016-09-16
Ritu Raj
No

Environment details

OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 1PBE with 30K objects)

Summary

clmd seg faulted on active controller during controller switchover

Steps followed & Observed behaviour

  1. Incoked controller switchover (SC-1 is the Active)
  2. During role change, on SC-1 clmd got crashed and node went for reboot as 'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown'
  3. After, Active controller went for reboot,

    NTFD crashed on Standby controller and cluster reset happend -- Regarding NTFD crashed a ticket is already raised -- https://sourceforge.net/p/opensaf/tickets/1999/

*Syslog :

Sep 16 15:33:30 sofo-s1 osafamfnd[2162]: NO 'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast'
Sep 16 15:33:30 sofo-s1 osafamfnd[2162]: ER safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast
Sep 16 15:33:30 sofo-s1 osafamfnd[2162]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60

*Below is the bt:

0 0x00007f3db18e1b55 in raise () from /lib64/libc.so.6
1 0x00007f3db18e3131 in abort () from /lib64/libc.so.6
2 0x00007f3db191ec2f in __libc_message () from /lib64/libc.so.6
3 0x00007f3db1924358 in malloc_printerr () from /lib64/libc.so.6
4 0x00007f3db19292fc in free () from /lib64/libc.so.6
5 0x00007f3db223db52 in timer_delete@@GLIBC_2.3.3 () from /lib64/librt.so.1
6 0x00000000004055b3 in amf_quiesced_state_handler (cb=0x633820 <_clms_cb>, invocation=4288675847) at clms_amf.c:123
7 0x0000000000405795 in clms_amf_csi_set_callback (invocation=4288675847, compName=0x6bac88, new_haState=SA_AMF_HA_QUIESCED, csiDescriptor=...) at clms_amf.c:223
8 0x00007f3db332e1f1 in ava_hdl_cbk_rec_prc (info=0x6bac70, reg_cbk=0x7fff3e5fafe0) at ava_hdl.cc:645
9 0x00007f3db332d896 in ava_hdl_cbk_dispatch_all (cb=0x7fff3e5fb0b0, hdl_rec=0x7fff3e5fb0b8) at ava_hdl.cc:446
10 0x00007f3db332d376 in ava_hdl_cbk_dispatch (cb=0x7fff3e5fb0b0, hdl_rec=0x7fff3e5fb0b8, flags=SA_DISPATCH_ALL) at ava_hdl.cc:320
11 0x00007f3db3325a49 in AmfAgent::Dispatch (hdl=4285530114, flags=SA_DISPATCH_ALL) at amf_agent.cc:283
12 0x00007f3db332588e in saAmfDispatch (hdl=4285530114, flags=SA_DISPATCH_ALL) at amf_agent.cc:244
13 0x0000000000413966 in main (argc=2, argv=0x7fff3e5fb208) at clms_main.c:515

*Notes:

  1. Issue is random
  2. Syslog, clmd trace and bt file attached
2 Attachments

Discussion

  • Anders Widell

    Anders Widell - 2016-09-20
    • Milestone: 4.7.2 --> 5.0.2
     
  • Anders Widell

    Anders Widell - 2017-04-03
    • Milestone: 5.0.2 --> future
     

Log in to post a comment.

MongoDB Logo MongoDB