Scenario:
1)smfd order a SI-SWAP to continue the campaign execution on the other controller.
2)before swap is executed, an imm object "SmfRestartIndicator" is created to signal to the smf on the new controller the campaign restart was initiated by smf (spontaneus restarts will always fail the campaign in executing state).
3)When the new controller comes up smf will check the existence of the object. If present OK if not fail. If OK the "SmfRestartIndicator" object is removed.
In this case the new controller fail to start very early, before smf was started. Smf never have a chance to remove the object.
5)AMF order a switchback to the first controller.
6)Smf start up on the "old" controller once again. Since the "SmfRestartIndicator" is still there, smf think the restart was ordered by smf and try to continue campaign execution which fail (the wrong way e.g. core dump)
Todo: find a mechanism which make smf to detect the "SmfRestartIndicator" is the old one and treat this case as it does not exist. Make sure the new solution is backward compatible.
The campaign continues at:
file: SmfUpgradeCampaign.cc, method:SmfUpgradeCampaign::continueExec()
The restart indicator is handeled in:
file: SmfUpgradeCampaign.cc, method:SmfUpgradeCampaign::checkSmfRestartIndicator()
I see two possible options for this
(a) Change the DN name format and concatenate/suffix the node_name to the SmfRestartIndicator DN name.
(b) Change the SMF's ImplementerName string by concatenating/suffixing the node_name. Before becoming implementer, if this object exists, call AccessorGet() for the SmfRestartIndicator PRTO object and read the 'SaImmAttrImplementerName' attribute.
Thus it wil help determine if the object was originally created by smfd running on the same/self node(old controller) or the peer controller node.
Just as a note, I tried Option (b) and it seems to work. (Also, SMF doesn’t performs a classimplementerset() on this, but the implementerName is still available
for reading because it’s a PRTO)
The question would then be -this change should be adopted in older releases too!?
Thanks,
Mathi.
From: Mathi Naickan [mailto:mathi-naickan@users.sf.net]
Sent: Tuesday, December 15, 2015 3:31 PM
To: opensaf-tickets@lists.sourceforge.net
Subject: [tickets] [opensaf:tickets] #1605 smfd: campaign not correctly terminated after failed SI-SWAP
I see two possible options for this
(a) Change the DN name format and concatenate/suffix the node_name to the SmfRestartIndicator DN name.
(b) Change the SMF's ImplementerName string by concatenating/suffixing the node_name. Before becoming implementer, if this object exists, call AccessorGet() for the SmfRestartIndicator PRTO object and read the 'SaImmAttrImplementerName' attribute.
Thus it wil help determine if the object was originally created by smfd running on the same/self node(old controller) or the peer controller node.
_
HYPERLINK "http://sourceforge.net/p/opensaf/tickets/1605/"[tickets:#1605] smfd: campaign not correctly terminated after failed SI-SWAP
Status: unassigned
Milestone: 5.0.FC
Created: Thu Nov 19, 2015 12:26 PM UTC by Ingvar Bergström
Last Updated: Thu Nov 19, 2015 12:26 PM UTC
Owner: nobody
Scenario:
1)smfd order a SI-SWAP to continue the campaign execution on the other controller.
2)before swap is executed, an imm object "SmfRestartIndicator" is created to signal to the smf on the new controller the campaign restart was initiated by smf (spontaneus restarts will always fail the campaign in executing state).
3)When the new controller comes up smf will check the existence of the object. If present OK if not fail. If OK the "SmfRestartIndicator" object is removed.
In this case the new controller fail to start very early, before smf was started. Smf never have a chance to remove the object.
5)AMF order a switchback to the first controller.
6)Smf start up on the "old" controller once again. Since the "SmfRestartIndicator" is still there, smf think the restart was ordered by smf and try to continue campaign execution which fail (the wrong way e.g. core dump)
Todo: find a mechanism which make smf to detect the "SmfRestartIndicator" is the old one and treat this case as it does not exist. Make sure the new solution is backward compatible.
The campaign continues at:
file: SmfUpgradeCampaign.cc, method:SmfUpgradeCampaign::continueExec()
The restart indicator is handeled in:
file: SmfUpgradeCampaign.cc, method:SmfUpgradeCampaign::checkSmfRestartIndicator()
_
Sent from sourceforge.net because HYPERLINK "mailto:opensaf-tickets@lists.sourceforge.net"opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
Related
Tickets:
#1605Tickets: tickets
Very important: Will it also work for upgrade i.e. the first swap when the "new" code shall handle a restart indicator created by the "old" code?
Since it is regarded as a fault, all maintained versions shall be updated.
Option B of changing the implementername will not work!
The attached patch changes the DN name of the restartindicator object.
The scenario mentioned in this ticket can be tested only after the first si-swap succeed. i.e. after this fix (similair) can get into the code.
Also, i just realised that there is a headless cluster related scenario involved in the background of this ticket. That makes it very difficult to reproduce the issue by myself too.!
changeset: 8200:721e05de1401
tag: tip
parent: 8197:69f86b9bddb0
user: Alex Jones ajones@genband.com
date: Mon Oct 10 15:30:37 2016 -0400
summary: smfd: handle failed middleware si-swap [#1605]
changeset: 8199:a37e89393eea
branch: opensaf-5.1.x
parent: 8196:88f6b4d6e234
user: Alex Jones ajones@genband.com
date: Mon Oct 10 15:30:37 2016 -0400
summary: smfd: handle failed middleware si-swap [#1605]
changeset: 8198:6f89139a3134
branch: opensaf-5.0.x
parent: 8195:967e479b7c42
user: Alex Jones ajones@genband.com
date: Mon Oct 10 15:30:37 2016 -0400
summary: smfd: handle failed middleware si-swap [#1605]
Related
Tickets:
#1605