Menu

#3223 amfnd: stucks at shutdown until timeout

5.20.11
fixed
None
enhancement
amf
nd
5
minor
False
2020-10-12
2020-10-01
No
2020-10-02 13:45:01.655 SC-2 opensafd: Stopping OpenSAF Services
2020-10-02 13:45:01.657 SC-2 osafamfnd[279]: NO Shutdown initiated
2020-10-02 13:45:01.657 SC-2 osafamfnd[279]: NO Waiting for 'safSi=A,safApp=testapp' (state 2)
2020-10-02 13:45:02.980 SC-2 amfclccli[579]: DB CLEANUP response 'kill(pid=522)'
2020-10-02 13:45:02.987 SC-2 amfclccli[579]: WA Failed to kill pid=522 with signal 9 - [Errno 3] No such process
2020-10-02 13:45:03.098 SC-2 osafamfnd[279]: NO 'safSu=2,safSg=1,safApp=testapp' Presence State TERMINATING => UNINSTANTIATED
2020-10-02 13:45:03.099 SC-2 osafamfnd[279]: NO Terminated all components in 'safSu=2,safSg=1,safApp=testapp'
2020-10-02 13:45:03.099 SC-2 osafamfnd[279]: NO Informing director of sufailover
2020-10-02 13:45:03.162 SC-2 systemd[1]: Started Session 699 of user root.
2020-10-02 13:46:01.778 SC-2 opensafd: amfnd has not yet exited, killing it forcibly.

Amfnd needs to stop this SU-SI assigment and move it to unassigned state.

Related

Wiki: ChangeLog-5.20.11

Discussion

  • Thang Duc Nguyen

    • summary: amf: spend time to terminate component during assignment --> amfnd: stucks at shutdown until timeout
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,20 +1,14 @@
     ~~~
    -2020-10-01 19:42:09.422 SC-2 osafamfnd[278]: NO Assigning 'safSi=A,safApp=testapp' ACTIVE to 'safSu=2,safSg=1,safApp=testapp'
    -2020-10-01 19:42:09.422 SC-2 A[521]: NO csiSetCallback: 'ALL' haState: SA_AMF_HA_ACTIVE, invocation 4275044360
    -2020-10-01 19:42:09.422 SC-2 A[521]: NO csiSetCallback: '0'
    -2020-10-01 19:42:09.422 SC-2 osafntfd[245]: NO kill_imcnproc_in_timewait: SIGKILL sent to osafntfimcnd process pid = 449
    -2020-10-01 19:42:09.426 SC-2 A[521]: DB terminate: safComp=A,safSu=2,safSg=1,safApp=testapp - 0
    -2020-10-01 19:42:09.428 SC-2 A[521]: NO Exiting
    -...
    -2020-10-01 19:42:09.613 SC-2 osafamfnd[278]: NO SU failover probation timer started (timeout: 1200000000000 ns)
    -2020-10-01 19:42:09.614 SC-2 osafamfnd[278]: NO Performing failover of 'safSu=2,safSg=1,safApp=testapp' (SU failover count: 1)
    -2020-10-01 19:42:09.614 SC-2 osafamfnd[278]: NO 'safComp=A,safSu=2,safSg=1,safApp=testapp' recovery action escalated from 'componentRestart' to 'suFailover'
    -2020-10-01 19:42:09.614 SC-2 osafamfnd[278]: NO 'safComp=A,safSu=2,safSg=1,safApp=testapp' faulted due to 'avaDown' : Recovery is 'suFailover'
    -2020-10-01 19:42:09.615 SC-2 osafamfnd[278]: NO Terminating components of 'safSu=2,safSg=1,safApp=testapp'(abruptly & unordered)
    -2020-10-01 19:42:09.615 SC-2 osafamfnd[278]: NO 'safSu=2,safSg=1,safApp=testapp' Presence State INSTANTIATED => TERMINATING
    -2020-10-01 19:42:09.616 SC-2 osafamfnd[278]: NO 'safSu=2,safSg=1,safApp=testapp' Presence State TERMINATING => TERMINATING
    -...
    -2020-10-01 19:42:19.461 SC-2 osafamfnd[278]: NO 'safComp=A,safSu=2,safSg=1,safApp=testapp' faulted due to 'csiSetcallbackTimeout' : Recovery is 'cleanup'
    +2020-10-02 13:45:01.655 SC-2 opensafd: Stopping OpenSAF Services
    +2020-10-02 13:45:01.657 SC-2 osafamfnd[279]: NO Shutdown initiated
    +2020-10-02 13:45:01.657 SC-2 osafamfnd[279]: NO Waiting for 'safSi=A,safApp=testapp' (state 2)
    +2020-10-02 13:45:02.980 SC-2 amfclccli[579]: DB CLEANUP response 'kill(pid=522)'
    +2020-10-02 13:45:02.987 SC-2 amfclccli[579]: WA Failed to kill pid=522 with signal 9 - [Errno 3] No such process
    +2020-10-02 13:45:03.098 SC-2 osafamfnd[279]: NO 'safSu=2,safSg=1,safApp=testapp' Presence State TERMINATING => UNINSTANTIATED
    +2020-10-02 13:45:03.099 SC-2 osafamfnd[279]: NO Terminated all components in 'safSu=2,safSg=1,safApp=testapp'
    +2020-10-02 13:45:03.099 SC-2 osafamfnd[279]: NO Informing director of sufailover
    +2020-10-02 13:45:03.162 SC-2 systemd[1]: Started Session 699 of user root.
    +2020-10-02 13:46:01.778 SC-2 opensafd: amfnd has not yet exited, killing it forcibly.
     ~~~
    
    -AMF need removing assignment instead of waiting csiSetcallbackTimeout.
    +Amfnd needs to stop this SU-SI assigment and move it to unassigned state.
    
     
  • Thang Duc Nguyen

    • status: assigned --> review
     
  • Thang Duc Nguyen

    • status: review --> fixed
     
  • Thang Duc Nguyen

    commit fa78173f280133ceb47224bfbaf9e83b96873fc5 (HEAD -> develop, origin/develop)
    Author: thang.d.nguyen thang.d.nguyen@dektech.com.au
    Date: Sat Oct 3 09:22:27 2020 +0700

    amf: ignore sufailover when shutdown initiated [#3223]
    
    When active assignment is on going, node shutdown and
    sufailover happened. Amfnd tries to sufailover but
    not successful. Stop node stucks due to amfnd wait until
    csiSetCallbackTimeout.
    
    Amfnd needs to stop this SU-SI assigment and move it
    to unassigned state.
    
     

Log in to post a comment.