When failover was trigger, there are some problems (maybe implementer disconnect) with immlist tool in short time. So, immlist tool can’t get value of variables or an empty value (ex: saAmfCSICompHAState=<Empty>). Then, amf-state did not handle well with these empty values.
/usr/bin/amf-state: line 62: [: <Empty>: integer expression expected
From IMM perspective:
when the implementer is detached or IMMND goes down then the cached attributes will not have any significance. so, when immlist is called the value is shown as empty.
The same is present in the IMM spec:
saImmOiClassImplementerRelease/saImmOiObjectImplementerRelease
If the operation succeeds, the IMM Service removes the
SA_IMM_ATTR_IMPLEMENTER_NAME attribute as well as all non-persistent cached
runtime attributes from all objects of that class.
In 4.6 and enhancement(#1156) is added
OM clients performing a read (iteration or accessor-get) that fetches a cached
runtime attribute, will not immediately see the attribute as empty when/if the
OI detaches. Instead for a period of grace for 6 seconds, the latest set value is shown.This allows for failover or switchover or process restart of OI to occur without OMclients seeing the "glitch" in the value of the cached runtime attribute.
Good analysis by Neelakanta.
So the component for this ticket should be the AMF and part already set
correctly to tools.
Hi Thuan,
Please specify OpenSAF version.
Thanks
-Nagu
Hi Thuan,
What is the expectation from ticket? Do you think, amf-state should not throw error(integer expression expected) and just print NULL. Please clarify.
As suggested by Neel, since 4.6 release, this problem will not come(if Failover time is < 6 seconds).
Thanks
-Nagu
We expect no error printout.
This problem still occur on 4.6, may be due to failover > 6s
Hi Thuan,
Thanks for the clarification, I need some more:
So, saAmfCSICompHAState may have valid values like SA_AMF_HA_ACTIVE = 1, SA_AMF_HA_STANDBY = 2, etc. During failover time, if the value of saAmfCSICompHAState is returned as NULL/Zero, which is not a valid value for saAmfCSICompHAState, so what should be printed as a result(of course, after we will suppress the error). Do you think, we should not print non-persistent cached attributes (when values returned as NULL/Zero) and only print other attributes.
Thanks
-Nagu
If you have a fail-over time longer than 6 seconds then you are likely going to
get more serious problems than an ugly printout.
To be precise, It is time to fail-over services plus time for the new active
AMFD to attach. The registration of the new active AMFD as OI is visible in
the syslogs.
Could be that there is some deeper issue that needs resolving here.
Perhaps the TRY_AGAIN loop in the AMFD for re-attaching as OI has a
sleep that is too large o a back off that is too quick to add time.
Update from Thuan:
I think it should not printout the objects(<== Here) without satisfy the condition.
amf_state()
{
echo
echo "Service Instances UNLOCKED and UNASSIGNED:" <== Here
si_dns=
immfind -c SaAmfSI
for si_dn in $si_dns; do
adm_state_val=
immlist -a "saAmfSIAdminState" $si_dn | cut -d = -f2
ass_state_val=
immlist -a "saAmfSIAssignmentState" $si_dn | cut -d = -f2
}
Best regards,
Thuan
As per the patch, the following log will still come, but error will not come:
Service Instances UNLOCKED and UNASSIGNED:
Service Units UNLOCKED and DISABLED:
Nodes UNLOCKED and DISABLED:
changeset: 6619:36a139f2515c
branch: opensaf-4.5.x
parent: 6616:f44eceb03a3f
user: Nagendra Kumarnagendra.k@oracle.com
date: Thu Jun 18 12:12:02 2015 +0530
summary: tools/amf: fix errors during switchover/failover [#1384]
changeset: 6620:818a5aea5793
branch: opensaf-4.6.x
parent: 6617:0a44d9c5b951
user: Nagendra Kumarnagendra.k@oracle.com
date: Thu Jun 18 12:12:20 2015 +0530
summary: tools/amf: fix errors during switchover/failover [#1384]
changeset: 6621:5330d6936a50
tag: tip
parent: 6618:8c5c122f5edd
user: Nagendra Kumarnagendra.k@oracle.com
date: Thu Jun 18 12:12:32 2015 +0530
summary: tools/amf: fix errors during switchover/failover [#1384]
[staging:36a139]
[staging:818a5a]
[staging:5330d6]
Related
Tickets:
#1384Commit: [36a139]
Commit: [5330d6]
Commit: [818a5a]