Migrated from http://devel.opensaf.org/ticket/2912.
Model : TwoN
changeset : 3796
Configuration : Sg,3SUs with 3comps each, 5SIs with 3CSIs each.
SI-SI deps configured. SI1<-SI2<-SI3<-SI4
CSI-CSI deps configured as :
CSI1/SI1 <- CSI2/SI1 <- CSI3/SI1
CSI1/SI5 <- CSI2/SI5 <- CSI3/SI5
scenario:
Amf restarted the comp3 for 3times. Later escalated to SuRestart?. As part of SU Restart it restarted all the 3comps and put the SU1 to RESTARTING->INSTANTIATED state. But the comp3 didnot instantiate as it crashes again.
Finally the assignments are :
safSISU=safSu=SU1\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI1,safApp=TWONAPP
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU2\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI1,safApp=TWONAPP
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SU1\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI2,safApp=TWONAPP
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU2\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI2,safApp=TWONAPP
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SU1\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI3,safApp=TWONAPP
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU2\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI3,safApp=TWONAPP
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SU1\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI4,safApp=TWONAPP
saAmfSISUHAState=QUIESCED(3)
safSISU=safSu=SU2\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI4,safApp=TWONAPP
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SU1\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI5,safApp=TWONAPP
saAmfSISUHAState=QUIESCED(3)
safSISU=safSu=SU2\,safSg=SGONE\,safApp=TWONAPP,safSi=TWONSI5,safApp=TWONAPP
saAmfSISUHAState=STANDBY(2)
Note the SI4 and SI5 are stucked in Quiesced state for SU1.
/var/log/messages show:
Nov 20 10:44:35 linux-xc76 osafamfnd[2951]: Assigning 'safSi=TWONSI4,safApp=TWONAPP' QUIESCED to 'safSu=SU1,safSg=SGONE,safApp=TWONAPP'
Nov 20 10:44:35 linux-xc76 osafamfnd[2951]: Assigning 'safSi=TWONSI5,safApp=TWONAPP' QUIESCED to 'safSu=SU1,safSg=SGONE,safApp=TWONAPP'
CMD is cleanup
APPDN is
Nov 20 10:44:35 linux-xc76 osafamfnd[2951]: 'safComp=COMP3SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP' faulted due to 'avaDown(8)' : Recovery is 'componentRestart(2)'
some thing is spawnd cleanup for safComp=COMP3SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP
CMD is instantiate
APPDN is SYSAPPCOMPDN
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer locally disconnected. Marking it as doomed 10 <721, 2010f> (COMP3SU1TWONAPP)
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer locally disconnected. Marking it as doomed 11 <722, 2010f> (@COMP3SU1TWONAPP)
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer disconnected 10 <721, 2010f> (COMP3SU1TWONAPP)
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer disconnected 11 <722, 2010f> (@COMP3SU1TWONAPP)
some thing is spawnd instantiate for safComp=COMP3SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP
Nov 20 10:44:35 linux-xc76 sysAppComp: logtrace: trace enabled to file /tmp/sysAppImmTrace, mask=0xffffffff
Nov 20 10:44:35 linux-xc76 sysAppComp: IMMA library TRACE initialize done pid:4164 svid:26 file:/tmp/sysAppImmTrace
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer connected: 40 (COMP3SU1TWONAPP) <1161, 2010f>
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer (applier) connected: 41 (@COMP3SU1TWONAPP) <1162, 2010f>
Nov 20 10:44:35 linux-xc76 osafamfnd[2951]: Assigned 'safSi=TWONSI5,safApp=TWONAPP' QUIESCED to 'safSu=SU1,safSg=SGONE,safApp=TWONAPP'
CMD is cleanup
APPDN is
some thing is spawnd cleanup for safComp=COMP3SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP
Nov 20 10:44:35 linux-xc76 osafamfnd[2951]: 'safComp=COMP3SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP' faulted due to 'avaDown(8)' : Recovery is 'componentRestart(2)'
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer locally disconnected. Marking it as doomed 40 <1161, 2010f> (COMP3SU1TWONAPP)
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer locally disconnected. Marking it as doomed 41 <1162, 2010f> (@COMP3SU1TWONAPP)
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer disconnected 40 <1161, 2010f> (COMP3SU1TWONAPP)
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer disconnected 41 <1162, 2010f> (@COMP3SU1TWONAPP)
CMD is instantiate
APPDN is SYSAPPCOMPDN
some thing is spawnd instantiate for safComp=COMP3SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP
Nov 20 10:44:35 linux-xc76 sysAppComp: logtrace: trace enabled to file /tmp/sysAppImmTrace, mask=0xffffffff
Nov 20 10:44:35 linux-xc76 sysAppComp: IMMA library TRACE initialize done pid:4187 svid:26 file:/tmp/sysAppImmTrace
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer connected: 42 (COMP3SU1TWONAPP) <1178, 2010f>
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer (applier) connected: 43 (@COMP3SU1TWONAPP) <1179, 2010f>
CMD is cleanup
APPDN is
some thing is spawnd cleanup for safComp=COMP3SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP
Nov 20 10:44:35 linux-xc76 osafamfnd[2951]: 'safComp=COMP3SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP' faulted due to 'avaDown(8)' : Recovery is 'suRestart(10)'
Nov 20 10:44:35 linux-xc76 osafamfnd[2951]: 'safSu=SU1,safSg=SGONE,safApp=TWONAPP' Presence State INSTANTIATED => RESTARTING
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer locally disconnected. Marking it as doomed 42 <1178, 2010f> (COMP3SU1TWONAPP)
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer locally disconnected. Marking it as doomed 43 <1179, 2010f> (@COMP3SU1TWONAPP)
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer disconnected 42 <1178, 2010f> (COMP3SU1TWONAPP)
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer disconnected 43 <1179, 2010f> (@COMP3SU1TWONAPP)
CMD is instantiate
APPDN is SYSAPPCOMPDN
some thing is spawnd instantiate for safComp=COMP3SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP
Nov 20 10:44:35 linux-xc76 sysAppComp: logtrace: trace enabled to file /tmp/sysAppImmTrace, mask=0xffffffff
Nov 20 10:44:35 linux-xc76 sysAppComp: IMMA library TRACE initialize done pid:4210 svid:26 file:/tmp/sysAppImmTrace
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Delete of PERSISTENT runtime object 'COMP2SU1TWONAPP' by Impl: COMP2SU1TWONAPP
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer disconnected 12 <732, 2010f> (COMP2SU1TWONAPP)
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer disconnected 13 <733, 2010f> (@COMP2SU1TWONAPP)
CMD is instantiate
APPDN is SYSAPPCOMPDN
some thing is spawnd instantiate for safComp=COMP2SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer connected: 44 (COMP3SU1TWONAPP) <1195, 2010f>
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer (applier) connected: 45 (@COMP3SU1TWONAPP) <1196, 2010f>
Nov 20 10:44:35 linux-xc76 sysAppComp: logtrace: trace enabled to file /tmp/sysAppImmTrace, mask=0xffffffff
Nov 20 10:44:35 linux-xc76 sysAppComp: IMMA library TRACE initialize done pid:4229 svid:26 file:/tmp/sysAppImmTrace
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Delete of PERSISTENT runtime object 'COMP1SU1TWONAPP' by Impl: COMP1SU1TWONAPP
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer disconnected 14 <743, 2010f> (COMP1SU1TWONAPP)
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer disconnected 15 <744, 2010f> (@COMP1SU1TWONAPP)
CMD is instantiate
APPDN is SYSAPPCOMPDN
some thing is spawnd instantiate for safComp=COMP1SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer connected: 46 (COMP2SU1TWONAPP) <1206, 2010f>
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Implementer (applier) connected: 47 (@COMP2SU1TWONAPP) <1207, 2010f>
Nov 20 10:44:35 linux-xc76 osafimmnd[2885]: Create of PERSISTENT runtime object 'COMP2SU1TWONAPP' by Impl COMP2SU1TWONAPP
Nov 20 10:44:35 linux-xc76 sysAppComp: logtrace: trace enabled to file /tmp/sysAppImmTrace, mask=0xffffffff
Nov 20 10:44:35 linux-xc76 sysAppComp: IMMA library TRACE initialize done pid:4248 svid:26 file:/tmp/sysAppImmTrace
Nov 20 10:44:35 linux-xc76 osafamfnd[2951]: 'safSu=SU1,safSg=SGONE,safApp=TWONAPP' Presence State RESTARTING => INSTANTIATED
Nov 20 10:44:35 linux-xc76 osafamfnd[2951]: Assigning 'safSi=TWONSI1,safApp=TWONAPP' ACTIVE to 'safSu=SU1,safSg=SGONE,safApp=TWONAPP'
Nov 20 10:44:35 linux-xc76 osafamfnd[2951]: Assigning 'safSi=TWONSI2,safApp=TWONAPP' ACTIVE to 'safSu=SU1,safSg=SGONE,safApp=TWONAPP'
Nov 20 10:44:35 linux-xc76 osafamfnd[2951]: Assigning 'safSi=TWONSI3,safApp=TWONAPP' ACTIVE to 'safSu=SU1,safSg=SGONE,safApp=TWONAPP'
Nov 20 10:44:35 linux-xc76 osafamfnd[2951]: Assigning 'safSi=TWONSI4,safApp=TWONAPP' QUIESCED to 'safSu=SU1,safSg=SGONE,safApp=TWONAPP'
Nov 20 10:44:35 linux-xc76 osafamfnd[2951]: Assigning 'safSi=TWONSI5,safApp=TWONAPP' QUIESCED to 'safSu=SU1,safSg=SGONE,safApp=TWONAPP'
Nov 20 10:44:36 linux-xc76 osafimmnd[2885]: Implementer connected: 48 (COMP1SU1TWONAPP) <1229, 2010f>
Nov 20 10:44:36 linux-xc76 osafimmnd[2885]: Implementer (applier) connected: 49 (@COMP1SU1TWONAPP) <1230, 2010f>
Nov 20 10:44:36 linux-xc76 osafimmnd[2885]: Create of PERSISTENT runtime object 'COMP1SU1TWONAPP' by Impl COMP1SU1TWONAPP
Nov 20 10:44:36 linux-xc76 osafamfnd[2951]: Assigned 'safSi=TWONSI2,safApp=TWONAPP' ACTIVE to 'safSu=SU1,safSg=SGONE,safApp=TWONAPP'
Nov 20 10:44:36 linux-xc76 osafamfnd[2951]: Assigned 'safSi=TWONSI3,safApp=TWONAPP' ACTIVE to 'safSu=SU1,safSg=SGONE,safApp=TWONAPP'
Nov 20 10:44:36 linux-xc76 osafamfnd[2951]: Assigned 'safSi=TWONSI1,safApp=TWONAPP' ACTIVE to 'safSu=SU1,safSg=SGONE,safApp=TWONAPP'
logs are huge and couldn't attach. states of entities is attached.
States of AMF entities.
Issue is reproducible on latest changeset: 4527:387d506494df default branch, even without
configuring SI dependency.
Attached 403.tgz contains AMF traces and 403.xml for reproducing the problem
Attached 403_simple.xml, AMF configuration to reproduce the problem.
A big problem here is that restarted components are assigned QUIESCED. That is an illegal state transition (unassigned -> QUIESCED). Instead if a component fails during the QUIESCED assignment (e.g. SU lock), the SI needs to be failed over to another SU.
I will share a patch based on Praveen initial one.
Hi,
But a restartable component is not considered unassigned (its Oper state
is not disabled) and AMF reassigns it. This should be done till the
component is restartable.
So Amf should perform failover of Comp/SU when component restart count
is reached. Floated patch is performing SU/Comp failover in such a
situation.
I think one such change was done as a part of #3083 in which for fault
in quiesced state SU failover was changed to comp restart.
Thanks,
Praveen
On 21-Oct-13 11:28 AM, Hans Feldt wrote:
Related
Tickets:
#403Praveen it is a good point but I am not sure it is valid. At least it is not according to Picture 3 page 83. There is no ADD arrow from the initial state to QUIESCED.
Where in the spec do you find proof such state change is allowed?
On 21-Oct-13 2:14 PM, Hans Feldt wrote:
In section 3.2.2.1 page 73 spec mentions about reassignment of same HA
state after component restart.
Thanks,
Praveen
Related
Tickets:
#403OK maybe we can close #403 and track this issue some other way. Will comment the patch.
changeset: 4559:eef492babea8
branch: opensaf-4.2.x
parent: 4556:41790a11f954
user: praveen.malviya@oracle.com
date: Tue Oct 22 11:09:21 2013 +0530
summary: amfnd : reset assignment flag when no assignment pending on SU [#403]
changeset: 4560:a5e3629dc90d
branch: opensaf-4.3.x
parent: 4557:1f45451f4032
user: praveen.malviya@oracle.com
date: Tue Oct 22 11:09:59 2013 +0530
summary: amfnd : reset assignment flag when no assignment pending on SU [#403]
changeset: 4561:dee8d4ec28e7
tag: tip
parent: 4558:50a2cc960abf
user: praveen.malviya@oracle.com
date: Tue Oct 22 11:10:58 2013 +0530
summary: amfnd : reset assignment flag when no assignment pending on SU [#403]
Related
Tickets:
#403