Menu

#2941 mds: Miss to send MDSDOWN after split brain detection

5.19.01
fixed
nobody
None
defect
mds
lib
major
False
2019-01-09
2018-10-16
No

Configuration:
- Cluster with 2SCs, 6 PLs with MDS/TCP
- Split cluster to two partitions: #1 {SC1, PL3, PL4}, #2 {SC2, PL5, PL6, PL7, PL8)
- Network merge, both SC go reboot dueto split-brain detection

Observation:
PL3, PL4 did not get NCSMDS_DOWN vdest, thuse PL3, PL4 did not get headless and resync afterward.

Outlined Logs:
After network merge:

<143>1 2018-10-16T22:53:42.184459+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9668"] >> mds_mcm_svc_up
<143>1 2018-10-16T22:53:42.184463+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9669"] MCM:API: LOCAL SVC INFO : svc_id = AVND(13) | PWE id = 1 | VDEST id = 65535 |
<143>1 2018-10-16T22:53:42.184467+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9670"] MCM:API: REMOTE SVC INFO : svc_id = AVD(12) | PWE id = 1 | VDEST id = 1 | POLICY = 2 | SCOPE = 4 | ROLE = 1 | MY_PCON = 0 |
<143>1 2018-10-16T22:53:42.184471+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9671"] >> mds_svc_tbl_query
<143>1 2018-10-16T22:53:42.184475+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9672"] << mds_svc_tbl_query
<143>1 2018-10-16T22:53:42.184478+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9673"] >> mds_subtn_tbl_get_details
<143>1 2018-10-16T22:53:42.184482+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9674"] << mds_subtn_tbl_get_details
<143>1 2018-10-16T22:53:42.184485+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9675"] >> mds_mcm_validate_scope
<143>1 2018-10-16T22:53:42.184489+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9676"] << mds_mcm_validate_scope
<143>1 2018-10-16T22:53:42.184492+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9677"] >> mds_get_subtn_res_tbl_by_adest
<143>1 2018-10-16T22:53:42.184496+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9678"] MDS:DB: Subscription Result not present
<143>1 2018-10-16T22:53:42.184499+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9679"] << mds_get_subtn_res_tbl_by_adest
<143>1 2018-10-16T22:53:42.184502+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9680"] >> mds_subtn_res_tbl_get
<143>1 2018-10-16T22:53:42.184506+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9681"] << mds_subtn_res_tbl_get
<143>1 2018-10-16T22:53:42.184509+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9682"] >> mds_subtn_res_tbl_add
<143>1 2018-10-16T22:53:42.184515+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9683"] MDS:DB: adest_details: <rem_node<span>[0x2020f]:dest_pid[253]>
<143>1 2018-10-16T22:53:42.184519+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9684"] << get_subtn_adest_details
<143>1 2018-10-16T22:53:42.184524+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9685"] MDS:DB: sub_adest_details: <rem_node<span>[0x2020f]:dest_pid[253]>
<143>1 2018-10-16T22:53:42.184527+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9686"] << mds_subtn_res_tbl_add
<143>1 2018-10-16T22:53:42.184531+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9687"] << mds_mcm_svc_up
...
<143>1 2018-10-16T22:53:45.3608+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9837"] >> mds_mcm_svc_down
<143>1 2018-10-16T22:53:45.360804+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9838"] MCM:API: LOCAL SVC INFO : svc_id = AVND(13) | PWE id = 1 | VDEST id = 65535 |
<143>1 2018-10-16T22:53:45.360809+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9839"] MCM:API: REMOTE SVC INFO : svc_id = AVD(12) | PWE id = 1 | VDEST id = 1 | POLICY = 2 | SCOPE = 4 | ROLE = 1 | MY_PCON = 0 | adest = 565213401186557
<143>1 2018-10-16T22:53:45.360814+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9840"] >> mds_svc_tbl_query
<143>1 2018-10-16T22:53:45.360818+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9841"] << mds_svc_tbl_query
<143>1 2018-10-16T22:53:45.360822+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9842"] >> mds_subtn_tbl_get_details
<143>1 2018-10-16T22:53:45.360826+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9843"] << mds_subtn_tbl_get_details
<143>1 2018-10-16T22:53:45.360829+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9844"] >> mds_mcm_validate_scope
<143>1 2018-10-16T22:53:45.360833+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9845"] << mds_mcm_validate_scope
<143>1 2018-10-16T22:53:45.360836+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9846"] >> mds_get_subtn_res_tbl_by_adest
<143>1 2018-10-16T22:53:45.360841+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9847"] << mds_get_subtn_res_tbl_by_adest
<143>1 2018-10-16T22:53:45.360844+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9848"] >> mds_subtn_res_tbl_get_by_adest
<143>1 2018-10-16T22:53:45.360847+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9849"] << mds_subtn_res_tbl_get_by_adest
<143>1 2018-10-16T22:53:45.36085+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9850"] >> mds_subtn_res_tbl_get
<143>1 2018-10-16T22:53:45.360854+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9851"] << mds_subtn_res_tbl_get
<142>1 2018-10-16T22:53:45.360857+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9852"] active adest: 2020f000000fd
<143>1 2018-10-16T22:53:45.360861+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9853"] >> mds_subtn_res_tbl_del
<143>1 2018-10-16T22:53:45.360866+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9854"] << mds_subtn_res_tbl_del
<143>1 2018-10-16T22:53:45.360872+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9855"] >> mds_subtn_res_tbl_remove_active
<143>1 2018-10-16T22:53:45.360879+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9856"] MCM_DB:RemoveActive:TimerStart:AwaitActiveTmri:Hdl=0xff800004:SvcHdl=562945658454029:sbscr-svcid=AVD(12),vdest=1
<143>1 2018-10-16T22:53:45.360884+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9857"] << mds_subtn_res_tbl_remove_active
<143>1 2018-10-16T22:53:45.360888+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9858"] >> mds_mcm_user_event_callback
<143>1 2018-10-16T22:53:45.360891+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9859"] >> mds_svc_tbl_get
<143>1 2018-10-16T22:53:45.360895+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9860"] << mds_svc_tbl_get
<143>1 2018-10-16T22:53:45.360899+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9861"] >> mds_subtn_tbl_get
<143>1 2018-10-16T22:53:45.360903+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9862"] << mds_subtn_tbl_get
<143>1 2018-10-16T22:53:45.360907+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9863"] << mds_mcm_user_event_callback
<142>1 2018-10-16T22:53:45.36091+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9864"] MCM:API: svc_down : svc_id = AVND(13) on DEST id = 65535 got NO_ACTIVE for svc_id = AVD(12) on Vdest id = 1 Adest = , rem_svc_pvt_ver=7
<143>1 2018-10-16T22:53:45.360915+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9865"] >> mds_subtn_res_tbl_getnext_any
<143>1 2018-10-16T22:53:45.36092+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9866"] << mds_subtn_res_tbl_getnext_any
<143>1 2018-10-16T22:53:45.360924+11:00 PL-3 osafamfnd 193 mds.log [meta sequenceId="9867"] << mds_mcm_svc_down</rem_node<span></rem_node<span>

=> At this point, mds-PL3 should send NCSMDS_DOWN (vdest) to amfnd, but the mds_subtn_res_tbl_getnext_any() returned with another existing adest, thus mds did not send MDS_DOWN. The existing adest was added at the time network merge back, because mds saw the other being up.

Looking at PL-5's log as below, mds sent MDS_DOWN (vdest) to amfnd-PL5, because there is no other adest existed.

<143>1 2018-10-16T22:53:45.366186+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9903"] >> mds_mcm_svc_down
<143>1 2018-10-16T22:53:45.36619+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9904"] MCM:API: LOCAL SVC INFO : svc_id = AVND(13) | PWE id = 1 | VDEST id = 65535 |
<143>1 2018-10-16T22:53:45.366194+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9905"] MCM:API: REMOTE SVC INFO : svc_id = AVD(12) | PWE id = 1 | VDEST id = 1 | POLICY = 2 | SCOPE = 4 | ROLE = 1 | MY_PCON = 0 | adest = 565213401186557
<143>1 2018-10-16T22:53:45.366199+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9906"] >> mds_svc_tbl_query
<143>1 2018-10-16T22:53:45.366203+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9907"] << mds_svc_tbl_query
<143>1 2018-10-16T22:53:45.366207+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9908"] >> mds_subtn_tbl_get_details
<143>1 2018-10-16T22:53:45.36621+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9909"] << mds_subtn_tbl_get_details
<143>1 2018-10-16T22:53:45.366214+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9910"] >> mds_mcm_validate_scope
<143>1 2018-10-16T22:53:45.366216+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9911"] << mds_mcm_validate_scope
<143>1 2018-10-16T22:53:45.366219+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9912"] >> mds_get_subtn_res_tbl_by_adest
<143>1 2018-10-16T22:53:45.366223+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9913"] << mds_get_subtn_res_tbl_by_adest
<143>1 2018-10-16T22:53:45.366226+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9914"] >> mds_subtn_res_tbl_get_by_adest
<143>1 2018-10-16T22:53:45.366229+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9915"] << mds_subtn_res_tbl_get_by_adest
<143>1 2018-10-16T22:53:45.366232+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9916"] >> mds_subtn_res_tbl_get
<143>1 2018-10-16T22:53:45.366237+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9917"] << mds_subtn_res_tbl_get
<142>1 2018-10-16T22:53:45.366241+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9918"] active adest: 2020f000000fd
<143>1 2018-10-16T22:53:45.366245+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9919"] >> mds_subtn_res_tbl_del
<143>1 2018-10-16T22:53:45.366248+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9920"] << mds_subtn_res_tbl_del
<143>1 2018-10-16T22:53:45.366252+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9921"] >> mds_subtn_res_tbl_remove_active
<143>1 2018-10-16T22:53:45.366258+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9922"] MCM_DB:RemoveActive:TimerStart:AwaitActiveTmri:Hdl=0xff900004:SvcHdl=562945658454029:sbscr-svcid=AVD(12),vdest=1
<143>1 2018-10-16T22:53:45.366262+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9923"] << mds_subtn_res_tbl_remove_active
<143>1 2018-10-16T22:53:45.366266+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9924"] >> mds_mcm_user_event_callback
<143>1 2018-10-16T22:53:45.366269+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9925"] >> mds_svc_tbl_get
<143>1 2018-10-16T22:53:45.366272+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9926"] << mds_svc_tbl_get
<143>1 2018-10-16T22:53:45.366276+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9927"] >> mds_subtn_tbl_get
<143>1 2018-10-16T22:53:45.366279+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9928"] << mds_subtn_tbl_get
<143>1 2018-10-16T22:53:45.366283+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9929"] << mds_mcm_user_event_callback
<142>1 2018-10-16T22:53:45.366287+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9930"] MCM:API: svc_down : svc_id = AVND(13) on DEST id = 65535 got NO_ACTIVE for svc_id = AVD(12) on Vdest id = 1 Adest = , rem_svc_pvt_ver=7
<143>1 2018-10-16T22:53:45.366291+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9931"] >> mds_subtn_res_tbl_getnext_any
<143>1 2018-10-16T22:53:45.366296+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9932"] << mds_subtn_res_tbl_getnext_any
<143>1 2018-10-16T22:53:45.366301+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9933"] >> mds_subtn_res_tbl_getnext_any
<143>1 2018-10-16T22:53:45.366313+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9934"] MDS:DB: Subscription Result not present
<143>1 2018-10-16T22:53:45.366316+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9935"] << mds_subtn_res_tbl_getnext_any
<143>1 2018-10-16T22:53:45.36632+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9936"] >> mds_mcm_user_event_callback
<143>1 2018-10-16T22:53:45.366323+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9937"] >> mds_svc_tbl_get
<143>1 2018-10-16T22:53:45.366327+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9938"] << mds_svc_tbl_get
<143>1 2018-10-16T22:53:45.36633+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9939"] >> mds_subtn_tbl_get
<143>1 2018-10-16T22:53:45.366333+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9940"] << mds_subtn_tbl_get
<143>1 2018-10-16T22:53:45.366343+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9941"] << mds_mcm_user_event_callback
<143>1 2018-10-16T22:53:45.366347+11:00 PL-5 osafamfnd 193 mds.log [meta sequenceId="9942"] << mds_mcm_svc_down

4 Attachments

Related

Wiki: ChangeLog-5.19.01

Discussion

  • Minh Hon Chau

    Minh Hon Chau - 2018-10-16
    • assigned_to: Minh Hon Chau --> nobody
     
  • Minh Hon Chau

    Minh Hon Chau - 2018-10-17
    • status: unassigned --> assigned
    • assigned_to: Minh Hon Chau
     
  • Minh Hon Chau

    Minh Hon Chau - 2018-10-22
    • status: assigned --> accepted
     
  • Minh Hon Chau

    Minh Hon Chau - 2018-10-25
    • status: accepted --> review
     
  • Minh Hon Chau

    Minh Hon Chau - 2018-11-05
    • status: review --> fixed
    • assigned_to: Minh Hon Chau --> nobody
     
  • Minh Hon Chau

    Minh Hon Chau - 2018-11-05

    [develop]
    commit 7194f743c08fa336e143949b39fbb19d4723457f
    Author: Minh Chau minh.chau@dektech.com.au
    Date: Mon Nov 5 12:08:59 2018 +1100

    mds: Send NCSMDS_DOWN with vdest if there is no any adest [#2941]
    
     

Log in to post a comment.