Menu

#2899 mbc: mbcsv loop forever while it is being dispatch ALL

5.18.09
fixed
nobody
None
defect
mbc
-
major
False
2018-08-30
2018-07-23
Canh Truong
No

Summary the cause of issue shortly :

Currently SC2 - Standby, SC1 - Active

1/ Reboot SC2. Log service initialize mbcsv. The mbcsv is install and subscribe with MDS_RED_SUBSCRIBE.
2/ From Active SC1. mds get RED_UP event when mbcsv in SC1 is subscribed. the mds callback "mbcsv_mds_evt()" is process and add peer anchor to the peer list mbcsv_add_new_pwe_anc().
3/ When SC2 up to STANDBY (after restarting), mbcsv change role "mbcsv_process_chg_role()" then send "MBCSV_PEER_UP_MSG" msg to mbcsv in SC1
4/ In SC1, log service dispatch ALL mbcsv in main. And mbcsv process the received msg include "MBCSV_PEER_UP_MSG" msg by mbcsv_process_peer_up_info(). If it found the peer anchor in the peer list, then it complete the adding new peer. Otherwise it will loop and block here until found the peer anchor in peer list that was added by mds callback in step 2.

5/ Somehow systemd stop opensaf in SC2 (see below trace). And From SC1, mds get RED_DOWN event, then mds callback "mbcsv_mds_evt()" is process and remove peer anchor from the peer list by mbcsv_rmv_pwe_anc_entry().

==> If step 5 happen before step 4. As step 4, so it cannot found the peer anchor in the peer list forever and cause the osaflogd process (log server) in SC1 is blocked in WHILE loop in mbcsv.

1 Attachments

Related

Wiki: ChangeLog-5.18.09

Discussion

  • Canh Truong

    Canh Truong - 2018-07-23
    • summary: mbc: mbcsv loop forever while it is being diapatch ALL --> mbc: mbcsv loop forever while it is being dispatch ALL
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -5,7 +5,7 @@
     1/ Reboot SC2. Log service initialize mbcsv. The mbcsv is install and subscribe with MDS_RED_SUBSCRIBE.
     2/ From Active SC1. mds get RED_UP event when mbcsv in SC1 is subscribed. the mds callback "mbcsv_mds_evt()" is process and add peer anchor to the peer list mbcsv_add_new_pwe_anc().
     3/ When SC2 up to STANDBY (after restarting), mbcsv change role "mbcsv_process_chg_role()" then send "MBCSV_PEER_UP_MSG" msg to mbcsv in SC1
    -4/ In SC1, log service dispatch ALL mbcsv in main. And mbcsv process the received msg include "MBCSV_PEER_UP_MSG" msg by mbcsv_process_peer_up_info(). If it found the peer anchor in the peer list, then it complete the adding new peer. Otherwise it will loop and block here until found the peer anchor in peer list that was added by mds callback in step 2.
    +4/ In SC1, log service dispatch ALL mbcsv in main. And mbcsv process the received msg include "MBCSV_PEER_UP_MSG" msg by mbcsv_process_peer_up_info(). If it found the peer anchor in the peer list, then it complete the adding new peer. **Otherwise it will loop and block here **until found the peer anchor in peer list that was added by mds callback in step 2.
    
     5/ Somehow systemd stop opensaf  in SC2 (see below trace). And From SC1, mds get RED_DOWN event, then mds callback "mbcsv_mds_evt()" is process and remove peer anchor from the peer list by mbcsv_rmv_pwe_anc_entry().
    
    • assigned_to: Canh Truong
     
  • Canh Truong

    Canh Truong - 2018-07-23
    • status: unassigned --> accepted
     
  • Canh Truong

    Canh Truong - 2018-07-31
    • status: accepted --> review
     
  • Canh Truong

    Canh Truong - 2018-08-23
    • status: review --> fixed
    • assigned_to: Canh Truong --> nobody
     
  • Canh Truong

    Canh Truong - 2018-08-23

    commit 998c5b0f0aa14900c9408bc703ea3c2e99b854e8 (HEAD -> develop, origin/develop)
    Author: Canh Truong canh.v.truong@dektech.com.au
    Date: Thu Aug 23 11:38:04 2018 +1000

    mbc: fix mbcsv loop forever while it is being dispatch ALL [#2899]
    
    When processing "MBCSV_PEER_UP_MSG" msg in case dispatch all from
    user, the msg may be too old. The msg may be come from the node
    that already rebooted. This reason cause mbcsv loop forever in WHILE
    loop, because the active node cannot find the adest of peer msg.
    
    The solution is that find entity in peer list and compare if the peer
    node id that already exist. If the peer node id exist in the entity
    (get node id from peer anchor), this mean that mbcsv already processed
    the "MBCSV_PEER_UP_MSG" for that peer node.
    
     
  • Canh Truong

    Canh Truong - 2018-08-23

    commit 67c86bfc9c58928a901a510ec33b9670682486e9 (HEAD, origin/release)
    Author: Canh Truong canh.v.truong@dektech.com.au
    Date: Thu Aug 23 11:38:04 2018 +1000

    mbc: fix mbcsv loop forever while it is being dispatch ALL [#2899]
    
    When processing "MBCSV_PEER_UP_MSG" msg in case dispatch all from
    user, the msg may be too old. The msg may be come from the node
    that already rebooted. This reason cause mbcsv loop forever in WHILE
    loop, because the active node cannot find the adest of peer msg.
    
    The solution is that find entity in peer list and compare if the peer
    node id that already exist. If the peer node id exist in the entity
    (get node id from peer anchor), this mean that mbcsv already processed
    the "MBCSV_PEER_UP_MSG" for that peer node.
    
     

Log in to post a comment.