With #3208 fix, sometimes ntfd crash during cluster shutdown.
The back trace as following:
Thread 1 (Thread 0x7fc0a9b4a100 (LWP 276)): #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007fc0a80bd8b1 in __GI_abort () at abort.c:79 #2 0x00007fc0a8106907 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fc0a8233dfa "%s\n") at ../sysdeps/posix/libc_fatal.c:181 #3 0x00007fc0a810d97a in malloc_printerr (str=str@entry=0x7fc0a823206e "malloc(): memory corruption") at malloc.c:5350 #4 0x00007fc0a8111a04 in _int_malloc (av=av@entry=0x7fc0a8468c40 <main_arena>, bytes=bytes@entry=59) at malloc.c:3738 #5 0x00007fc0a8117121 in __libc_calloc (n=n@entry=1, elem_size=elem_size@entry=59) at malloc.c:3436 #6 0x00007fc0a8c9b40c in mds_mdtm_send_tipc (req=0x7ffc9f16ec60) at src/mds/mds_dt_tipc.c:2736 #7 0x00007fc0a8c88f07 in mcm_msg_encode_full_or_flat_and_send (to=to@entry=2 '\002', to_msg=to_msg@entry=0x7ffc9f16ef50, to_svc_id=to_svc_id@entry=29, svc_cb=svc_cb@entry=0x5568f4afcea0, adest=adest@entry=564114769357041, dest_vdest_id=dest_vdest_id@entry=65535, snd_type=4, xch_id=116, pri=MDS_SEND_PRIORITY_HIGH) at src/mds/mds_c_sndrcv.c:1774 #8 0x00007fc0a8c8a5b7 in mds_mcm_send_msg_enc (to=<optimized out>, svc_cb=svc_cb@entry=0x5568f4afcea0, to_msg=to_msg@entry=0x7ffc9f16ef50, to_svc_id=to_svc_id@entry=29, dest_vdest_id=dest_vdest_id@entry=65535, req=req@entry=0x7ffc9f16eff0, xch_id=116, dest=564114769357041, pri=MDS_SEND_PRIORITY_HIGH) at src/mds/mds_c_sndrcv.c:1255 #9 0x00007fc0a8c8ac30 in mcm_pvt_red_snd_process_common (env_hdl=env_hdl@entry=65550, fr_svc_id=fr_svc_id@entry=28, to_msg=..., to_dest=to_dest@entry=564114769357041, to_svc_id=to_svc_id@entry=29, req=req@entry=0x7ffc9f16eff0, pri=pri@entry=MDS_SEND_PRIORITY_HIGH, xch_id=116, anchor=<optimized out>) at src/mds/mds_c_sndrcv.c:2664 #10 0x00007fc0a8c8dba3 in mcm_pvt_normal_svc_snd_rsp (pri=MDS_SEND_PRIORITY_HIGH, req=0x7ffc9f16eff0, to_svc_id=29, to_dest=564114769357041, msg=<optimized out>, fr_svc_id=28, env_hdl=65550) at src/mds/mds_c_sndrcv.c:3699 #11 mds_mcm_send (info=0x1d) at src/mds/mds_c_sndrcv.c:835 #12 mds_send (info=info@entry=0x7ffc9f16f0a0) at src/mds/mds_c_sndrcv.c:458 #13 0x00007fc0a8c9636c in ncsmds_api (svc_to_mds_info=svc_to_mds_info@entry=0x7ffc9f16f0a0) at src/mds/mds_papi.c:165 #14 0x00005568f2e7598f in ntfs_mds_msg_send (cb=<optimized out>, msg=msg@entry=0x7ffc9f16f130, dest=dest@entry=0x7ffc9f16f128, mds_ctxt=mds_ctxt@entry=0x7fc09c01278c, prio=prio@entry=MDS_SEND_PRIORITY_HIGH) at src/ntf/ntfd/ntfs_mds.c:1310 #15 0x00005568f2e75f68 in notfication_result_lib (error=error@entry=SA_AIS_OK, notificationId=182, mdsCtxt=0x7fc09c01278c, frDest=<optimized out>) at src/ntf/ntfd/ntfs_com.c:181 #16 0x00005568f2e809da in NtfClient::confirmNtfNotification (this=this@entry=0x5568f4afc440, notificationId=<optimized out>, mdsCtxt=mdsCtxt@entry=0x7fc09c01278c, mdsDest=mdsDest@entry=564114769357041) at src/ntf/ntfd/NtfClient.cc:341 #17 0x00005568f2e80c47 in NtfClient::notificationReceived (this=0x5568f4afc440, clientId=clientId@entry=2, notification=std::tr1::shared_ptr<NtfNotification> (use count 2, weak count 0) = {...}, mdsCtxt=mdsCtxt@entry=0x7fc09c01278c) at src/ntf/ntfd/NtfClient.cc:146 #18 0x00005568f2e86c32 in NtfAdmin::processNotification (this=this@entry=0x5568f4afb6a0, clientId=clientId@entry=2, notificationType=notificationType@entry=SA_NTF_TYPE_STATE_CHANGE, sendNotInfo=sendNotInfo@entry=0x7fc09c010bb0, mdsCtxt=mdsCtxt@entry=0x7fc09c01278c, notificationId=<optimized out>) at src/ntf/ntfd/NtfAdmin.cc:211 #19 0x00005568f2e86ec1 in NtfAdmin::notificationReceived (this=0x5568f4afb6a0, clientId=2, notificationType=SA_NTF_TYPE_STATE_CHANGE, sendNotInfo=sendNotInfo@entry=0x7fc09c010bb0, mdsCtxt=0x7fc09c01278c) at src/ntf/ntfd/NtfAdmin.cc:262 #20 0x00005568f2e86f52 in notificationReceived (clientId=<optimized out>, notificationType=<optimized out>, sendNotInfo=sendNotInfo@entry=0x7fc09c010bb0, mdsCtxt=mdsCtxt@entry=0x7fc09c01278c) at src/ntf/ntfd/NtfAdmin.cc:1127 #21 0x00005568f2e7086a in proc_send_not_msg (cb=<optimized out>, evt=0x7fc09c012780) at src/ntf/ntfd/ntfs_evt.c:474 #22 0x00005568f2e7033e in process_api_evt (evt=0x7fc09c012780) at src/ntf/ntfd/ntfs_evt.c:673 #23 0x00005568f2e70f19 in ntfs_process_mbx (mbx=<optimized out>) at src/ntf/ntfd/ntfs_evt.c:708 #24 0x00005568f2e6ebad in main (argc=<optimized out>, argv=<optimized out>) at src/ntf/ntfd/ntfs_main.c:400
The problem is MBC free buffer by #3208 that MDS already freed
<139>1 2020-09-08T16:16:48.284822+02:00 SC-1 osafntfd 276 mds.log [meta sequenceId="80"] MDTM: Failed to send message err :No route to host <139>1 2020-09-08T16:16:48.284842+02:00 SC-1 osafntfd 276 mds.log [meta sequenceId="81"] MDTM: Unable to send the msg thru TIPC <139>1 2020-09-08T16:16:48.284866+02:00 SC-1 osafntfd 276 mds.log [meta sequenceId="82"] MDS_SND_RCV: RED sndrsp message SEND Failed from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Need update a part of solution #3208 to solve this issue.
commit 0e1a6847c264ad5e34ca8413307b118066ae03eb (HEAD -> develop, origin/develop)
Author: thuan.tran thuan.tran@dektech.com.au
Date: Wed Sep 9 12:43:56 2020 +0700