OpenSAF / Tickets / #2499 SMF: 20 seconds timeout in getting node destination is not enough

Rafael Odzakow - 2017-06-19

waitForNodeDestination already uses smfRebootTimeout. Is it still timing out or was getNodeDestination called without the waitFor wrapper?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tai Dinh - 2017-06-20

Hi Rafael,

It's under SmfCliCommandAction::execute() => getNodeDestination(n, &nodeDest, NULL, -1).
-1 was passed as maxWaitTime which means 20 seconds timeout will be used.
For a rolling upgrade procedure, this should be OK sine we already wait for the node but for cluster reboot procedure, the similar thing does not happen.

/Tai

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Rafael Odzakow - 2017-06-20
  
  If you have the logs please send them my way.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Rafael Odzakow - 2017-06-20
  
  It should be enough to wrap getNodeDestination in waitForGetNodeDestination in SmfCliCommandAction::execute(). Other getNodeDestination calls are not needing to wait for nodes or have custom code for retry.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Rafael Odzakow - 2017-06-20
  
  Going for a short vacation, here is the untested patch. Use rebootTimeout to increase the timeout for it.
  
  commit 2ffbd1c5cd3f4193fd631130eef60b17c92892e6 (HEAD -> ticket-2499)
  Author: Rafael Odzakow rafael.odzakow@ericsson.com
  Date: Tue Jun 20 16:10:12 2017 +0200
  
  smf: 20 seconds timeout in getting node destination is not enough [#2499]
  
  diff --git a/src/smf/smfd/SmfUpgradeStep.cc b/src/smf/smfd/SmfUpgradeStep.cc
  index 2ffeab110..a99c7661a 100644
  --- a/src/smf/smfd/SmfUpgradeStep.cc
  +++ b/src/smf/smfd/SmfUpgradeStep.cc
  @@ -1966,7 +1966,7 @@ bool SmfUpgradeStep::callActivationCmd() {
  TRACE("Get node destination for %s", getSwNode().c_str());
  uint32_t rc;
  
  if (!getNodeDestination(getSwNode(), &nodeDest, NULL, -1)) {
  
  if (!waitForNodeDestination(getSwNode(), &nodeDest)) {
  LOG_NO("no node destination found for node %s", getSwNode().c_str());
  result = false;
  goto done;
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tai Dinh - 2017-06-21

Thank Rafael,

This is what I expected.

/Tai

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Rafael Odzakow - 2017-06-28

This issue is as far as I could see a bug. In other campaign sequences SMF will wait with rebootTimeout before doing any operation after reboot. In this campaign sequence the first operation type after a reboot was to to a CLI command on a payload node. This timed out because the CLI command is not wrapped in a retry using the rebootTimeout of SMF.

SMF does not keep track of all nodes after a cluster reboot therefore the mechanism for handling a cluster reboot is to wrap all possible operations that are to be executed after cluster reboot in a retry loop.

Last edit: Rafael Odzakow 2017-06-29

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Rafael Odzakow - 2017-06-30

status: unassigned --> fixed

assigned_to: Rafael Odzakow
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Rafael Odzakow - 2017-06-30

fixed in
commit 3e1d1091270fa83cb8efe5458d6050b56f41f001
Author: Rafael Odzakow rafael.odzakow@ericsson.com
Date: Fri Jun 30 10:57:36 2017 +0200

smf: 20 seconds timeout in getting node destination is not enough [#2499]
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SMF: 20 seconds timeout in getting node destination is not enough

Milestone

Searches

Help

#2499 SMF: 20 seconds timeout in getting node destination is not enough

Related

Discussion