Menu

#3217 mbc: agent crash if mds sendto() error

5.20.11
fixed
None
defect
mbc
lib
minor
False
2020-09-14
2020-09-09
Thuan Tran
No

With #3208 fix, sometimes ntfd crash during cluster shutdown.
The back trace as following:

Thread 1 (Thread 0x7fc0a9b4a100 (LWP 276)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fc0a80bd8b1 in __GI_abort () at abort.c:79
#2  0x00007fc0a8106907 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fc0a8233dfa "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007fc0a810d97a in malloc_printerr (str=str@entry=0x7fc0a823206e "malloc(): memory corruption") at malloc.c:5350
#4  0x00007fc0a8111a04 in _int_malloc (av=av@entry=0x7fc0a8468c40 <main_arena>, bytes=bytes@entry=59) at malloc.c:3738
#5  0x00007fc0a8117121 in __libc_calloc (n=n@entry=1, elem_size=elem_size@entry=59) at malloc.c:3436
#6  0x00007fc0a8c9b40c in mds_mdtm_send_tipc (req=0x7ffc9f16ec60) at src/mds/mds_dt_tipc.c:2736
#7  0x00007fc0a8c88f07 in mcm_msg_encode_full_or_flat_and_send (to=to@entry=2 '\002', to_msg=to_msg@entry=0x7ffc9f16ef50, to_svc_id=to_svc_id@entry=29, svc_cb=svc_cb@entry=0x5568f4afcea0, adest=adest@entry=564114769357041, dest_vdest_id=dest_vdest_id@entry=65535, snd_type=4, xch_id=116, pri=MDS_SEND_PRIORITY_HIGH) at src/mds/mds_c_sndrcv.c:1774
#8  0x00007fc0a8c8a5b7 in mds_mcm_send_msg_enc (to=<optimized out>, svc_cb=svc_cb@entry=0x5568f4afcea0, to_msg=to_msg@entry=0x7ffc9f16ef50, to_svc_id=to_svc_id@entry=29, dest_vdest_id=dest_vdest_id@entry=65535, req=req@entry=0x7ffc9f16eff0, xch_id=116, dest=564114769357041, pri=MDS_SEND_PRIORITY_HIGH) at src/mds/mds_c_sndrcv.c:1255
#9  0x00007fc0a8c8ac30 in mcm_pvt_red_snd_process_common (env_hdl=env_hdl@entry=65550, fr_svc_id=fr_svc_id@entry=28, to_msg=..., to_dest=to_dest@entry=564114769357041, to_svc_id=to_svc_id@entry=29, req=req@entry=0x7ffc9f16eff0, pri=pri@entry=MDS_SEND_PRIORITY_HIGH, xch_id=116, anchor=<optimized out>) at src/mds/mds_c_sndrcv.c:2664
#10 0x00007fc0a8c8dba3 in mcm_pvt_normal_svc_snd_rsp (pri=MDS_SEND_PRIORITY_HIGH, req=0x7ffc9f16eff0, to_svc_id=29, to_dest=564114769357041, msg=<optimized out>, fr_svc_id=28, env_hdl=65550) at src/mds/mds_c_sndrcv.c:3699
#11 mds_mcm_send (info=0x1d) at src/mds/mds_c_sndrcv.c:835
#12 mds_send (info=info@entry=0x7ffc9f16f0a0) at src/mds/mds_c_sndrcv.c:458
#13 0x00007fc0a8c9636c in ncsmds_api (svc_to_mds_info=svc_to_mds_info@entry=0x7ffc9f16f0a0) at src/mds/mds_papi.c:165
#14 0x00005568f2e7598f in ntfs_mds_msg_send (cb=<optimized out>, msg=msg@entry=0x7ffc9f16f130, dest=dest@entry=0x7ffc9f16f128, mds_ctxt=mds_ctxt@entry=0x7fc09c01278c, prio=prio@entry=MDS_SEND_PRIORITY_HIGH) at src/ntf/ntfd/ntfs_mds.c:1310
#15 0x00005568f2e75f68 in notfication_result_lib (error=error@entry=SA_AIS_OK, notificationId=182, mdsCtxt=0x7fc09c01278c, frDest=<optimized out>) at src/ntf/ntfd/ntfs_com.c:181
#16 0x00005568f2e809da in NtfClient::confirmNtfNotification (this=this@entry=0x5568f4afc440, notificationId=<optimized out>, mdsCtxt=mdsCtxt@entry=0x7fc09c01278c, mdsDest=mdsDest@entry=564114769357041) at src/ntf/ntfd/NtfClient.cc:341
#17 0x00005568f2e80c47 in NtfClient::notificationReceived (this=0x5568f4afc440, clientId=clientId@entry=2, notification=std::tr1::shared_ptr<NtfNotification> (use count 2, weak count 0) = {...}, mdsCtxt=mdsCtxt@entry=0x7fc09c01278c) at src/ntf/ntfd/NtfClient.cc:146
#18 0x00005568f2e86c32 in NtfAdmin::processNotification (this=this@entry=0x5568f4afb6a0, clientId=clientId@entry=2, notificationType=notificationType@entry=SA_NTF_TYPE_STATE_CHANGE, sendNotInfo=sendNotInfo@entry=0x7fc09c010bb0, mdsCtxt=mdsCtxt@entry=0x7fc09c01278c, notificationId=<optimized out>) at src/ntf/ntfd/NtfAdmin.cc:211
#19 0x00005568f2e86ec1 in NtfAdmin::notificationReceived (this=0x5568f4afb6a0, clientId=2, notificationType=SA_NTF_TYPE_STATE_CHANGE, sendNotInfo=sendNotInfo@entry=0x7fc09c010bb0, mdsCtxt=0x7fc09c01278c) at src/ntf/ntfd/NtfAdmin.cc:262
#20 0x00005568f2e86f52 in notificationReceived (clientId=<optimized out>, notificationType=<optimized out>, sendNotInfo=sendNotInfo@entry=0x7fc09c010bb0, mdsCtxt=mdsCtxt@entry=0x7fc09c01278c) at src/ntf/ntfd/NtfAdmin.cc:1127
#21 0x00005568f2e7086a in proc_send_not_msg (cb=<optimized out>, evt=0x7fc09c012780) at src/ntf/ntfd/ntfs_evt.c:474
#22 0x00005568f2e7033e in process_api_evt (evt=0x7fc09c012780) at src/ntf/ntfd/ntfs_evt.c:673
#23 0x00005568f2e70f19 in ntfs_process_mbx (mbx=<optimized out>) at src/ntf/ntfd/ntfs_evt.c:708
#24 0x00005568f2e6ebad in main (argc=<optimized out>, argv=<optimized out>) at src/ntf/ntfd/ntfs_main.c:400

The problem is MBC free buffer by #3208 that MDS already freed

<139>1 2020-09-08T16:16:48.284822+02:00 SC-1 osafntfd 276 mds.log [meta sequenceId="80"] MDTM: Failed to send message err :No route to host
<139>1 2020-09-08T16:16:48.284842+02:00 SC-1 osafntfd 276 mds.log [meta sequenceId="81"] MDTM: Unable to send the msg thru TIPC
<139>1 2020-09-08T16:16:48.284866+02:00 SC-1 osafntfd 276 mds.log [meta sequenceId="82"] MDS_SND_RCV: RED sndrsp message SEND Failed from svc_id = MBCSV(19), to svc_id = MBCSV(19)

Need update a part of solution #3208 to solve this issue.

Related

Wiki: ChangeLog-5.20.11

Discussion

  • Thuan Tran

    Thuan Tran - 2020-09-10
    • status: assigned --> review
     
  • Thuan Tran

    Thuan Tran - 2020-09-14
    • status: review --> fixed
     
  • Thuan Tran

    Thuan Tran - 2020-09-14

    commit 0e1a6847c264ad5e34ca8413307b118066ae03eb (HEAD -> develop, origin/develop)
    Author: thuan.tran thuan.tran@dektech.com.au
    Date: Wed Sep 9 12:43:56 2020 +0700

    mbc: fix agent crash if mds sendto() error [#3217]
    
    - Fix #3208 to solve MBC memleak will cause agent crash if
    MDS sendto() error return.
    - Update a part of fix #3208 to check MDS encode callback
    done then not need to free memory as MDS already freed.
    
     

Log in to post a comment.