Was running fowling script to do continuous fail-over & switch-over alternatively
and observed the below issue
NUM=2
for (( i =0; i <= 100; i++))
do
((EVEN = ($NUM % 2)))
if [ $EVEN -eq 0 ]; then
echo "Starting opensafd restart "
/etc/init.d/opensafd restart
else
echo "Starting opensafd si-swap "
amf-adm si-swap safSi=SC-2N,safApp=OpenSAF
fi
((NUM=$NUM + 1))
done
============================================
Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 100 <0, 2010f> (safLogService)
Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 104 <0, 2010f> (safClmService)
Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 103 <0, 2010f> (safEvtService)
Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 101 <0, 2010f> (safCheckPointService)
Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 99 <0, 2010f> (safMsgGrpService)
Dec 18 10:43:30 SC-2 osafimmd[27287]: WA IMMD lost contact with peer IMMD (NCSMDS_RED_DOWN)
Dec 18 10:43:30 SC-2 osafimmnd[27298]: WA DISCARD DUPLICATE FEVS message:18220
Dec 18 10:43:30 SC-2 osafimmnd[27298]: WA Error code 2 returned for message type 82 - ignoring
Dec 18 10:43:30 SC-2 osafimmnd[27298]: WA DISCARD DUPLICATE FEVS message:18221
Dec 18 10:43:30 SC-2 osafimmnd[27298]: WA Error code 2 returned for message type 82 - ignoring
Dec 18 10:43:30 SC-2 osafimmd[27287]: WA IMMND DOWN on active controller f1 detected at standby immd!! f2. Possible failover
Dec 18 10:43:30 SC-2 osafimmd[27287]: NO Skipping re-send of fevs message 18220 since it has recently been resent.
Dec 18 10:43:30 SC-2 osafimmd[27287]: NO Skipping re-send of fevs message 18221 since it has recently been resent.
Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Global discard node received for nodeId:2010f pid:6737
Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 98 <0, 2010f(down)> (MsgQueueService131343)
Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 102 <0, 2010f(down)> (safLckService)
Dec 18 10:43:30 SC-2 osafimmnd[27298]: NO Implementer disconnected 97 <0, 2010f(down)> (@safAmfService2010f)
Dec 18 10:43:31 SC-2 kernel: [68517.295788] tipc: Resetting link <1.1.2:eth2-1.1.1:eth1>, changeover initiated by peer
Dec 18 10:43:31 SC-2 kernel: [68517.295794] tipc: Lost link <1.1.2:eth2-1.1.1:eth1> on network plane A
Dec 18 10:43:31 SC-2 kernel: [68517.354991] tipc: Duplicate <1.1.1> using eth(08:00:27:3b:a5:86) seen on <eth:eth2>
Dec 18 10:43:40 SC-2 osafamfd[27348]: ER FAILOVER Active --> Quiesced FAILED, ImplementerClear failed 5
Dec 18 10:43:40 SC-2 osafimmnd[27298]: WA ERR_BAD_HANDLE: Handle use is blocked by pending reply on syncronous call
Dec 18 10:43:40 SC-2 osafimmnd[27298]: NO Implementer locally disconnected. Marking it as doomed 90 <9, 2020f> (safAmfService)
Dec 18 10:43:40 SC-2 osafamfd[27348]: ER FAILOVER Active --> Quiesced FAILED, ImplementerClear failed 9
Dec 18 10:43:40 SC-2 osafamfd[27348]: NO Re-initializing with IMM
Dec 18 10:43:40 SC-2 osafimmnd[27298]: WA IMMND - Client Node Get Failed for cli_hdl 38654837263
Dec 18 10:43:50 SC-2 osafamfd[27348]: ER saImmOiImplementerSet failed 5
Dec 18 10:43:50 SC-2 osafamfd[27348]: ER exiting since avd_imm_applier_set failed
Dec 18 10:43:50 SC-2 osafamfnd[27362]: ER AMF director unexpectedly crashed
Dec 18 10:43:50 SC-2 osafamfnd[27362]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60
Dec 18 10:43:50 SC-2 opensaf_reboot: Rebooting local node; timeout=60
Dec 18 10:49:17 SC-2 syslog-ng[1193]: syslog-ng starting up; version='2.0.9'</eth:eth2>
============================================
This fault seems due to amfd is not handling SA_AIS_ERR_TIMEOUT in avd_imm_reinit_bg_thread. See also [#1607]. Handling ERR_TIMEOUT the same way as TRY_AGAIN is possible if the operation is idempotent, saImmOiImplementerSet is idempotent.
Related
Tickets: #1607
Last edit: Hans Nordebäck 2016-01-07