Menu

#1557 Comp fails in INSTANTIATION_FAILED because comp crashes after compRegistration timeout

4.6.2
fixed
defect
amf
nd
minor
2015-11-04
2015-10-23
No

Steps reproduce:
. Apply amf_demo.diff and build amf_demo, using attached amf_demo_script as clc script
. Run commands:
. immcfg -f app3_twon2su1si.xml
. echo 1 > /root/hu23992
. amf-adm unlock-in safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon

Logs:
Oct 23 12:47:19 PL-4 osafamfnd[421]: NO 'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Presence State UNINSTANTIATED => INSTANTIATING
Oct 23 12:47:19 PL-4 amf_demo_script: CLC-START: safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon
Oct 23 12:47:22 PL-4 amf_demo[585]: 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' started
Oct 23 12:47:26 PL-4 osafamfnd[421]: NO Instantiation of 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' failed
Oct 23 12:47:26 PL-4 osafamfnd[421]: NO Reason: component registration timer expired
Oct 23 12:47:26 PL-4 amf_demo_script: CLC-STOP: safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon
Oct 23 12:47:27 PL-4 amf_demo[585]: Registered with AMF and HC started
Oct 23 12:47:27 PL-4 amf_demo[585]: Health check 1
Oct 23 12:47:29 PL-4 amf_demo[585]: exiting (caught term signal)
Oct 23 12:47:29 PL-4 osafamfnd[421]: NO 'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' component restart probation timer started (timeout: 10000000000 ns)
Oct 23 12:47:29 PL-4 osafamfnd[421]: NO Restarting a component of 'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' (comp restart count: 1)
Oct 23 12:47:29 PL-4 osafamfnd[421]: NO 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' faulted due to 'avaDown' : Recovery is 'componentRestart'
Oct 23 12:47:29 PL-4 amf_demo_script: CLC-STOP: safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon
Oct 23 12:47:29 PL-4 amf_demo_script: CLC-START: safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon
Oct 23 12:47:32 PL-4 amf_demo[628]: 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' started
Oct 23 12:47:32 PL-4 amf_demo[628]: exiting (caught term signal)
Oct 23 12:47:32 PL-4 osafamfnd[421]: WA 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Presence State INSTANTIATING => INSTANTIATION_FAILED
Oct 23 12:47:32 PL-4 osafamfnd[421]: NO 'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Presence State INSTANTIATING => INSTANTIATION_FAILED

Trace is also attached.

Initial analysis:
. After comp timeout in component_registration phase, amfnd enters instantiating_fail, thus cleanup clc is called
. Then comp crashed, amfnd receives ava_mds_down, amfnd also enters instantiating_fail for component, another cleanup clc is called.
. Eventually, at the returns of two cleanup clc, amfnd will enters cleanup_success twice under instantiating state of component
. At the second cleanup_success, the retry_counter has reach retry_max, so component fails into INSTANTIATION_FAILED

As first thought, amfnd should not enter instantiating_fail when comp is crashed, since it has been already in handling of instantiating_fail.

4 Attachments

Related

Tickets: #1557
Wiki: ChangeLog-4.6.2

Discussion

  • Minh Hon Chau

    Minh Hon Chau - 2015-10-23
    • labels: --> INSTANTIATION_FAILED, component registration
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -33,4 +33,4 @@
     . Eventually, at the returns of two cleanup clc, amfnd will enters cleanup_success twice under instantiating state of component
     . At the second cleanup_success, the retry_counter has reach retry_max, so component fails into INSTANTIATION_FAILED
    
    -As first thought, amfnd should not enter instantiating_fail when comp is crashed, thus it has been already in handling of instantiating_fail.
    +As first thought, amfnd should not enter instantiating_fail when comp is crashed, since it has been already in handling of instantiating_fail.
    
    • status: unassigned --> accepted
    • assigned_to: Minh Hon Chau
     
  • Praveen

    Praveen - 2015-10-27

    We need to remember that there are few deviations from spec when instantiation of comp fails :

    -When instantiate command fails,amfnd must generate a error report on the component and clean it up. Also in this case component should be marked TERMINATING.
    -Reinstantiation must be attempted only if comp is restartable. (4.6 INSTANTIATE Command page 211).
    At present AMFND tres to reinstantiate a non-restartable component and also it is not marking the state TERMINATING.I have checked this is not documented in the PR doc.

     
  • Minh Hon Chau

    Minh Hon Chau - 2015-10-27

    Thanks for the information. I think root cause of this issue is that amfnd let 2 sequence of component life cycle happening parellel
    . The first error is component registration timeout
    Oct 23 12:47:26.169639 osafamfnd [421:clc.cc:0500] NO Reason: component registration timer expired
    Oct 23 12:47:26.169657 osafamfnd [421:clc.cc:1573] >> avnd_comp_clc_xxxing_instfail_hdler: 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon': Instantiate fail event in Instantiating/Restarting State
    Oct 23 12:47:26.169691 osafamfnd [421:clc.cc:2942] T1 CLC CLI command arguments[1] ='cleanup'
    . The second error is component crash detected by ava_down
    Oct 23 12:47:29.177223 osafamfnd [421:err.cc:0407] NO 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' faulted due to 'avaDown' : Recovery is 'componentRestart'
    Oct 23 12:47:29.177508 osafamfnd [421:clc.cc:1573] >> avnd_comp_clc_xxxing_instfail_hdler: 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon': Instantiate fail event in Instantiating/Restarting State
    Oct 23 12:47:29.177548 osafamfnd [421:clc.cc:2942] T1 CLC CLI command arguments[1] ='cleanup'

    The first cleanup succeeds, amfnd starts clc instantiate
    Oct 23 12:47:29.181102 osafamfnd [421:clc.cc:1702] >> avnd_comp_clc_xxxing_cleansucc_hdler: 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon': Cleanup success event in the instantiating/restarting state
    Oct 23 12:47:29.181127 osafamfnd [421:clc.cc:2942] T1 CLC CLI command arguments[1] ='instantiate'

    The problem comes up when amfnd is waiting the return of clc instantiate, the return of second clc cleanup comes. At the begining, the second clc cleanup is just redundant because the first clc cleanup has been in progress

    As of this though, I created the patch for this ticket as in attachment. The patch avoids the same clc commands running in a row if amfnd has not got clc_resp (which indicates timeout/error) from the others. Still testing it.

    @Praveen: the solution of this patch sounds ok to you?

     
  • Minh Hon Chau

    Minh Hon Chau - 2015-10-28
    • status: accepted --> review
     
  • Anders Widell

    Anders Widell - 2015-11-02
    • Milestone: 4.5.2 --> 4.6.2
     
  • Nagendra Kumar

    Nagendra Kumar - 2015-11-04

    changeset: 7089:b2b826373aca
    branch: opensaf-4.6.x
    parent: 7086:30592d182ee1
    user: Minh Hon Chauminh.chau@dektech.com.au
    date: Wed Nov 04 17:17:22 2015 +0530
    summary: amfnd: do not run same clc commands at once [#1557]

    changeset: 7090:9965b8545459
    branch: opensaf-4.7.x
    parent: 7087:c84ea183755f
    user: Minh Hon Chauminh.chau@dektech.com.au
    date: Wed Nov 04 17:17:52 2015 +0530
    summary: amfnd: do not run same clc commands at once [#1557]

    changeset: 7091:e1b3fc6dccd0
    tag: tip
    parent: 7088:c9fa60abcf98
    user: Minh Hon Chauminh.chau@dektech.com.au
    date: Wed Nov 04 17:18:02 2015 +0530
    summary: amfnd: do not run same clc commands at once [#1557]

     

    Related

    Tickets: #1557


    Last edit: Nagendra Kumar 2015-11-04
  • Nagendra Kumar

    Nagendra Kumar - 2015-11-04
    • status: review --> fixed
     
  • Nagendra Kumar

    Nagendra Kumar - 2015-11-04

    Reverted previous commit on my name and commited on Minh name.

     

Log in to post a comment.