Node down entries are being created on the standby, and not being removed, when PLM is enabled.
The issue is that there is a race condition between the active controller sending the NODE_DOWN checkpoint message, which removes the node on the standby; and the active sending another checkpoint message upon receiving PLMreadinessState callback COMPLETED state for the EE of the node which went down.
If the standby receives NCSMDS_NODE_DOWN before the checkpoint from the active PLM readiness state callback, then things work correctly. If the standby receives the checkpoint from the active CLM for PLM readiness state callback, before it receives NCSMDS_NODE_DOWN, the node gets re-added to the node_down list and is never removed.
[staging:f21510]
[staging:3c94fe]
[staging:49cfb0]
changeset: 6702:f21510ad5604
branch: opensaf-4.5.x
parent: 6697:2c31c871f702
user: Alex Jones ajones@genband.com
date: Mon Aug 03 20:29:53 2015 +0530
summary: clm: don't always checkpoint nodes in plm readiness callback completed state [#1416]
changeset: 6703:3c94fe5485ed
branch: opensaf-4.6.x
parent: 6698:404d8d3b1245
user: Alex Jones ajones@genband.com
date: Mon Aug 03 20:29:53 2015 +0530
summary: clm: don't always checkpoint nodes in plm readiness callback completed state [#1416]
changeset: 6704:49cfb0fda068
tag: tip
parent: 6701:9b0f5096f597
user: Alex Jones ajones@genband.com
date: Mon Aug 03 20:29:53 2015 +0530
summary: clm: don't always checkpoint nodes in plm readiness callback completed state [#1416]
Related
Tickets:
#1416Commit: [3c94fe]
Commit: [49cfb0]
Commit: [f21510]