Menu

#2124 amfd: handle early node_up from amfnd

5.0.2
fixed
Gary Lee
None
defect
amf
d
major
2016-10-26
2016-10-18
Gary Lee
No

Sometimes amfd fails to send NODE_UP to an amfnd, because MDS cannot find a route to reach the nd. This is probably due to the use of separate sockets by MDS to send data and discovery events. Therefore data events may sometimes arrive before addressing is received. This causes amfnd to reboot the node. AMFD should handle this better.

Oct 6 14:35:01 SC-2-1 osafamfd[15151]: NO Received node_up from 2020f: msg_id 1
Oct 6 14:35:01 SC-2-1 osafamfd[15151]: NO Node 'SC-2' joined the cluster
Oct 6 14:35:01 SC-2-1 osafamfd[15151]: ER avd_d2n_msg_dequeue: ncsmds_api failed 2
Oct 6 14:35:01 SC-2-1 osafamfd[15151]: ER avd_d2n_msg_dequeue: ncsmds_api failed 2
Oct 6 14:35:01 SC-2-1 osafamfd[15151]: ER avd_d2n_msg_dequeue: ncsmds_api failed 2
Oct 6 14:35:02 SC-2-1 osafamfd[15151]: NO Received node_up from 2020f: msg_id 1

--

Oct 6 14:35:01 SC-2-2 osafamfnd[4119]: NO Sending node up due to NCSMDS_UP
Oct 6 14:35:02 SC-2-2 osafamfnd[4119]: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Message ID mismatch, rec 2, expected 1, OwnNodeId = 131599, SupervisionTime = 60
Oct 6 14:35:02 SC-2-2 opensaf_reboot: Rebooting local node; timeout=60


mds.log on SC-2-1

Oct 6 14:35:01.266236 osafamfd[15151] ERR |MDS_SND_RCV:No Route Found from svc_id = AVD(12), to svc_id = AVND(13) on Adest = <0x0002020f, 3393331664>
Oct 6 14:35:01.266302 osafamfd[15151] ERR |MDS_SND_RCV: Normal send Message sent Failed from svc_id = AVD(12), to svc_id = AVND(13)
Oct 6 14:35:01.266336 osafamfd[15151] ERR |MDS_SND_RCV: Adest=<0x0002020f,3393331664>
Oct 6 14:35:01.266391 osafamfd[15151] ERR |MDS_SND_RCV:No Route Found from svc_id = AVD(12), to svc_id = AVND(13) on Adest = <0x0002020f, 3393331664>
Oct 6 14:35:01.266425 osafamfd[15151] ERR |MDS_SND_RCV: Normal send Message sent Failed from svc_id = AVD(12), to svc_id = AVND(13)
Oct 6 14:35:01.266454 osafamfd[15151] ERR |MDS_SND_RCV: Adest=<0x0002020f,3393331664>
Oct 6 14:35:01.266563 osafamfd[15151] ERR |MDS_SND_RCV:No Route Found from svc_id = AVD(12), to svc_id = AVND(13) on Adest = <0x0002020f, 3393331664>
Oct 6 14:35:01.266597 osafamfd[15151] ERR |MDS_SND_RCV: Normal send Message sent Failed from svc_id = AVD(12), to svc_id = AVND(13)
Oct 6 14:35:01.266626 osafamfd[15151] ERR |MDS_SND_RCV: Adest=<0x0002020f,3393331664>
Oct 6 14:35:01.266714 osafamfd[15151] NOTIFY |MDTM: svc up event for svc_id = AVND(13), subscri. by svc_id = AVD(12) pwe_id=1 Adest = <rem_node<span>[2]:dest_tipc_id_ref[3393331664]>

Related

Tickets: #2124
Tickets: #2180
Wiki: ChangeLog-5.0.2
Wiki: ChangeLog-5.1.1

Discussion

  • Gary Lee

    Gary Lee - 2016-10-18
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,4 @@
    -Sometimes amfd fails to send NODE_UP to an amfnd, because MDS cannot find a route to reach the nd. This is probably due to the use of separate sockets by MDS to send data and discovery events. AMFD should handle this better.
    +Sometimes amfd fails to send NODE_UP to an amfnd, because MDS cannot find a route to reach the nd. This is probably due to the use of separate sockets by MDS to send data and discovery events. Therefore data events may sometimes arrive before addressing is received. AMFD should handle this better.
    
     Oct  6 14:35:01 SC-2-1 osafamfd[15151]: NO Received node_up from 2020f: msg_id 1
     Oct  6 14:35:01 SC-2-1 osafamfd[15151]: NO Node 'SC-2' joined the cluster
    
     
  • Gary Lee

    Gary Lee - 2016-10-19
    • status: unassigned --> review
    • assigned_to: Gary Lee
     
  • Gary Lee

    Gary Lee - 2016-10-19
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,4 @@
    -Sometimes amfd fails to send NODE_UP to an amfnd, because MDS cannot find a route to reach the nd. This is probably due to the use of separate sockets by MDS to send data and discovery events. Therefore data events may sometimes arrive before addressing is received. AMFD should handle this better.
    +Sometimes amfd fails to send NODE_UP to an amfnd, because MDS cannot find a route to reach the nd. This is probably due to the use of separate sockets by MDS to send data and discovery events. Therefore data events may sometimes arrive before addressing is received. This causes amfnd to reboot the node. AMFD should handle this better.
    
     Oct  6 14:35:01 SC-2-1 osafamfd[15151]: NO Received node_up from 2020f: msg_id 1
     Oct  6 14:35:01 SC-2-1 osafamfd[15151]: NO Node 'SC-2' joined the cluster
    
     
  • Gary Lee

    Gary Lee - 2016-10-26
    • status: review --> fixed
     
  • Gary Lee

    Gary Lee - 2016-10-26

    changeset: 8262:b5ead5296149
    branch: opensaf-5.0.x
    tag: tip
    parent: 8259:1b8f7e298cfb
    user: Gary Lee gary.lee@dektech.com.au
    date: Wed Oct 26 13:35:27 2016 +1100
    summary: amfd: ignore node_up until the mds event amfnd up has been received [#2124]

    changeset: 8261:e6c6e7786392
    parent: 8256:b88de404e0ae
    user: Gary Lee gary.lee@dektech.com.au
    date: Wed Oct 26 13:29:04 2016 +1100
    summary: amfd: ignore node_up until the mds event amfnd up has been received [#2124]

    changeset: 8260:137b006c9c97
    branch: opensaf-5.1.x
    parent: 8257:6f09e098918c
    user: Gary Lee gary.lee@dektech.com.au
    date: Wed Oct 26 13:28:37 2016 +1100
    summary: amfd: ignore node_up until the mds event amfnd up has been received [#2124]

     

    Related

    Tickets: #2124


Log in to post a comment.