Steps to reproduce:
Load the attached model
Change the clc-cli script of the component to always return 1 for cleanup
* Kill the component
Result: Component Termination Failed alarm is raised then cleared immediately and presence state of the failed component is UNINSTANTIATED instead TERMINATION-FAILED.
root@PL-3:~# ntfread
=== Sep 10 15:51:42 - Alarm ===
eventType = SA_NTF_ALARM_PROCESSING
notificationObject = "safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1"
notifyingObject = "safApp=safAmfService"
notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.3 (0x3)
additionalText = "Cleanup of Component safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 failed"
- additionalInfo: 0 -
infoId = 1
infoType = 10
infoValue = "safAmfNode=PL-3,safAmfCluster=myAmfCluster"
probableCause = SA_NTF_SOFTWARE_ERROR
perceivedSeverity = SA_NTF_SEVERITY_MAJOR
=== Sep 10 15:51:42 - Alarm ===
eventType = SA_NTF_ALARM_PROCESSING
notificationObject = "safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1"
notifyingObject = "safApp=safAmfService"
notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.3 (0x3)
additionalText = "Previous raised alarm of safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 is now cleared"
probableCause = SA_NTF_SOFTWARE_ERROR
perceivedSeverity = SA_NTF_SEVERITY_CLEARED
root@PL-3:~# amf-state su all safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=TERMINATION-FAILED(7)
saAmfSUReadinessState=OUT-OF-SERVICE(1)
root@PL-3:~#
root@PL-3:~# amf-state comp all safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
saAmfCompOperState=DISABLED(2)
saAmfCompPresenceState=UNINSTANTIATED(1)
saAmfCompReadinessState=OUT-OF-SERVICE(1)
root@PL-3:~#
Diff:
I reproduced the issue by using the attached configuration. Only change I did is hosting the Sus on controllers.
There is a case when amfd is clearing the alram raised on comp in term_failed state. When all components are cleaned up, amfnd sends a su-failvoer request to amfd. In this case alram is being cleared and component is marked uninstantiated, even in the case when saAmfNodeFailfastOnTerminationFailure is false on the node hosting the term_failed comp.
Amfd should not mark comp uninstantiated and hence should not clear alarm when saAmfNodeFailfastOnTerminationFailure is false and admin repair is pending.
Attached is the amfd trace.
changeset: 6953:29c8dd3b7608
tag: tip
parent: 6950:592a8a586c96
user: praveen.malviya@oracle.com
date: Wed Sep 30 10:43:44 2015 +0530
summary: amfd: fix comp term_failed state alarm [#1473]
changeset: 6952:a1626e6464f1
branch: opensaf-4.6.x
parent: 6949:98d5f79c002e
user: praveen.malviya@oracle.com
date: Wed Sep 30 10:43:17 2015 +0530
summary: amfd: fix comp term_failed state alarm [#1473]
changeset: 6951:5fd74d9fa974
branch: opensaf-4.5.x
parent: 6948:eeb53120b0fd
user: praveen.malviya@oracle.com
date: Wed Sep 30 10:42:55 2015 +0530
summary: amfd: fix comp term_failed state alarm [#1473]
https://sourceforge.net/p/opensaf/mailman/message/34468055/
Related
Tickets:
#1473