Changeset : 8634 5.2.FC
Setup : 2 controllers with 3 payloads ( Headless feature enabled)
AMF application : 2n application 2 SUs 4SIs ( si-si deps disabled)
Steps performed :
-> Initially brought up 5 nodes.
-> Deployed the attached configuration.
-> Performed admin operations on SG couped with 2 headless operations.
-> Later performed shutdown operation of SG, which resulted in unstable state.
Attached logs :
-> syslog,amfd and amfnd traces of both controllers and PL-3.
-> AMF application
As per safLog, the issue occured at Mar 14:
11139 18:08:34 03/14/2017 NO safApp=safAmfService "Admin op invocation: 5471788335253, err: 'SG not in STABLE state (safSg=TestApp_SG1,safApp=TestApp_TwoN)'"
Amfd trace is not available during this time. Amfd trace starts from Mar 15:
Mar 15 7:03:12.095487 osafamfd [3250:src/amf/amfd/main.cc:0502] >> initialize
Please upload Amfd traces on/before Mar 14 18:08.
In safLog, the first appearance of "Sg unstable" started from:
5166 17:14:29 03/13/2017 NO safApp=safAmfService "Admin op "LOCK" initiated for 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN', invocation: 1421634174977"
5167 17:14:29 03/13/2017 NO safApp=safAmfService "Admin op invocation: 1421634174977, err: 'SG state is not stable'"
The previous admin operation before headless was restarting component.
One potential problem is because the flag AVD_SU::surestart.
Basically, this flag should be checkpointed to standby as well as be writen to IMM as RTA?
Last edit: Minh Hon Chau 2017-03-27
Mar 13 01:12:00 SUSE-S1-C2 osafamfd[2192]: NO got si :safSi=TestApp_SI1,safApp=TestApp_TwoN
Mar 13 01:12:00 SUSE-S1-C2 osafamfd[2192]: NO got si :safSi=TestApp_SI1,safApp=TestApp_TwoN
Mar 13 01:12:00 SUSE-S1-C2 osafamfd[2192]: NO got si :safSi=TestApp_SI1,safApp=TestApp_TwoN
The syslog does not have information whether the cluster had reloaded without si deps around Mar 13 17:14, so just would like to confirm if the cluster had si dep configured during the test?
The application doesn't have any si-si deps configured. The issue is the same observed in #2105 , where the application responded during link loss time.
Closing this ticket as duplicate of #2105