Menu

#2419 smf: when fixing ticket #2145 a NBC problem was introduced

5.2.0
wontfix
nobody
None
defect
smf
-
5.2
major
2017-04-24
2017-04-10
elunlen
No

Previous behavior:
The behavior was to ignore a fail to activate a component unless any secondary fault happened. This means that it was for example possible to complete a campaign even if a component failed to start and fix this problem after committing. No action to resume the campaign was needed.

After [#2145]:
The campaign will always suspend in case of component fail and a resume must be requested for the campaign to continue.

NBC:
The behavior has changed in such a way that it must be seen as a NBC. The #2145 ticket corrects SMF behavior regarding AIS but is still NBC since the previous behavior is the legacy behavior in previous releases.

Proposal 1; Fix if not needed to change setting in runtime e.g. during an upgrade
Add a new configuration attribute to the SMF configuration class that makes it possible to select whether the behavior after #2145 shall be used or not. The default setting must be the previous behavior.
The setting must have the following properties:
- If the attribute does not exist (old model) legacy behavior
- If the attribute value is not changed from default legacy behavior
- If the attribute value is <empty> or invalid legacy behavior
- If the attribute value is a valid “ON” setting new behavior
- A request to change the attribute in runtime shall always be rejected

Proposal 2; Fix if change has to be made during upgrade:
Add a new configuration attribute to the SMF configuration class that makes it possible to select whether the behavior after #2145 shall be used or not. The default setting must be the previous behavior.
The setting must have the following properties:
- If the attribute does not exist (old model) legacy behavior
- If the attribute value is not changed from default legacy behavior
- If the attribute value is <empty> or invalid legacy behavior
- If the attribute value is a valid “ON” setting new behavior
- Attribute value must be possible to change in runtime in “idle” state (no campaign is executing)
- Attribute value must be possible to change in runtime in campaign init state. Note that if changed here
the new setting must be used in the rest of the campaign

Related

Tickets: #2145

Discussion

  • elunlen

    elunlen - 2017-04-10

    Reference to earlier mail converstion about this:
    http://sourceforge.net/p/opensaf/mailman/message/35490515/

     
  • Alex Jones

    Alex Jones - 2017-04-10

    This means that it was for example possible to complete a campaign even if a component failed to start and fix this problem after committing. No action to resume the campaign was needed.

    Just playing devil's advocate here... If the admin was prepared to restart a component after the campaign was committed, why is it a big deal to do this during the campaign?

    SMF already supports the "suspended by error detected" state for other problems (e.g. node failing to come back after reboot for a rolling upgrade w/reboot campaign). So, it is already possible for the admin to restart a campaign before it has been committed.

     
    • Rafael Odzakow

      Rafael Odzakow - 2017-04-12

      Hej, valid question. In the case that we looked at the component recovered automatically without manual repair. It is a weird component...

      It is true that SMF supports "suspended by error detected" but this is for errors dealing with SW add/remove not AMF state.

       

      Last edit: Rafael Odzakow 2017-04-13
  • Rafael Odzakow

    Rafael Odzakow - 2017-04-12
    • Component: unknown --> smf
     
  • Rafael Odzakow

    Rafael Odzakow - 2017-04-13

    Suggestion is to disable setting the maintenance campaign attribute on the AMF object by default. And have a setting to enable setting this attribute. This would disable possible AMF NBC as well as the SMF NBC for #2144 and #2145

     

    Last edit: Rafael Odzakow 2017-04-13
  • elunlen

    elunlen - 2017-04-21
    • status: unassigned --> wontfix
     
  • elunlen

    elunlen - 2017-04-21

    No configuration is needed in SMF. The NBC behaviour is trigged when AMF sends a state change notification with SA_AMF_MAINTENANCE_CAMPAIGN_DN set in additional info. This togeteher with not doing a repair on a failing unit is a NBC behaviour in AMF. A fix is done in AMF adding a new configuration that has to be set to activate the behaviour according to [#2144]. For more information about the AMF fix see [#2435]

     

    Related

    Tickets: #2144
    Tickets: #2435

  • Alex Jones

    Alex Jones - 2017-04-22

    I'm not sure this is the right thing to do.

    If this problem is handled in AMF, now someone can turn off the behavior in the middle of a campaign. AMF doesn't know if a campaign is executing, and would just blindly enable/disable the auto repair which could be dangerous.

    I think Rafael's suggestion from 4/13 is the right thing to do. Then if someone wants to turn the behavior on or off, SMF can reject it in the middle of a campaign.

     
    • Rafael Odzakow

      Rafael Odzakow - 2017-04-24

      It is possible to do it both ways but I prefer to do this in AMF because it appears that the campaign dn was set on the objects before #2144 and #2145 were introduced. It was set by SMF and most likely the attribute was never used but I can't say for sure. The safe solution is to keep setting it just as it has been previously.

      As for turning this on/off during a campaign. If someone external decides to change things in IMM during upgrade then we can not guarantee that the campaign will be successful. This is normal and it is understood that changes to the system configuration should not happen during a campaign (except from the campaign itself). It is left to the user to implement some kind of "maintenance lock" that could be taken by the campaign start.

       
  • Alex Jones

    Alex Jones - 2017-04-24

    The maintenance attribute was definitely never used by AMF. I did the implementation. And SMF was setting it and unsetting it in the correct places.

    I won't argue with you if you want to do it in AMF, but I think SMF is really the right place to do it because it is much cleaner.

     
    • Rafael Odzakow

      Rafael Odzakow - 2017-04-25

      I consider the AMF objects as an interface and some external code outside of OpenSAF might be reading that campaignDN attribute.

       
    • elunlen

      elunlen - 2017-05-03

      Yes you are right, it would be cleaner to have a configuration in SMF but there is a problem, also AMF is NBC. It is NBC to not do repair based on the saAmfSUMaintenanceCampaign attribute. SMF always set this attribute so the NBC behavior in AMF will be trigged. This is why AMF must be configurable. The NBC behavior in SMF will be trigged by a notification with “infoId = SA_AMF_MAINTENANCE_CAMPAIGN_DN and its infoValue is the DN of the upgrade campaign”. If the new behavior in AMF is not switched on no such notification will be sent so no configuration is needed in SMF. However the PR documents for both AMF and SMF should be updated to explain this behavior/dependency

       

Log in to post a comment.