OpenSAF / Tickets / #2952 amfd: reset message ID in LostFound state

amfd: reset message ID in LostFound state

#2952 amfd: reset message ID in LostFound state

Milestone: 5.19.01

Status: fixed

Owner: Gary Lee

Labels: None

Type: defect

Component: amf

Part: d

Version:

Priority: major

Blocker: False

Updated: 2019-01-09

Created: 2018-11-01

Creator: Gary Lee

Private: No

[#2918] adds support for delaying node failover due to network disturbances.

If a PL rejoins the main network partition before the node failover timer expires, it is told to reboot by AMFD. AMFND thinks it has become headless and resets rcv_msg_id to 0, and shows this when it receives the reboot msg from AMFD:

Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Message ID mismatch, rec xx, expected 1, OwnNodeId = xx, SupervisionTime = 60

We can avoid this by resetting snd_msg_id for this PL in AMFD in state LostFound, before the reboot msg is sent.

diff --git a/src/amf/amfd/node_state.cc b/src/amf/amfd/node_state.cc
index a8659dc..9077c07 100644
--- a/src/amf/amfd/node_state.cc
+++ b/src/amf/amfd/node_state.cc
@@ -125,6 +125,8 @@ void LostFound::TimerExpired() {
   LOG_WA("Lost node '%s' has reappeared after network separation",
           node->node_name.c_str());


+  node->snd_msg_id = 0;
+
   if (fsm_->Active() == true) {
     LOG_WA("Sending node reboot order");
     avd_d2n_reboot_snd(node);

Then the proper message will be seen:

Received reboot order, ordering reboot now!

Gary Lee - 2018-11-02

status: accepted --> review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary Lee - 2018-11-03

status: review --> fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

commit 35d44ff686df8c4f15b372581a00d7c1a7c734a6
Author: Gary Lee gary.lee@dektech.com.au
Date: Sat Nov 3 07:43:55 2018 +0000

amfd: reset snd_msg_id in LostFound state [#2952]

If a PL rejoins the main network partition before the node failover timer expires,
it is told to reboot by AMFD. AMFND thinks it has become headless and
resets rcv_msg_id to 0, and shows this when it receives the reboot msg from AMFD:

Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Message ID mismatch, rec xx, expected 1, OwnNodeId = xx, SupervisionTime = 60

We can avoid this by resetting snd_msg_id for this PL in AMFD in state LostFound,
before the reboot msg is sent.

amfd: reset message ID in LostFound state

Milestone

Searches

Help

#2952 amfd: reset message ID in LostFound state

Related

Discussion