Menu

#2818 amfnd: AMFWD kills AMFND becuase of deadlock between threads during Passive monitoring

5.18.04
fixed
None
defect
amf
-
major
False
2018-04-09
2018-03-22
No

AMFND do not send health check to AMFWD, then AMFWD kill AMFND to generate AMFND coredump.

From backtrace of coredump, we can see that a deadlock happened between threads of AMFND for Passive Monitor.

Thread 6 (Thread 0x7ff56c5e0b00 (LWP 20005)):
0 0x00007ff56b1cfa8c in lll_lock_wait () from /lib64/libpthread.so.0
1 0x00007ff56b1ca80b in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x00007ff56b9ab8cd in osaf_mutex_lock_ordie (io_mutex=0x55c39fc6c260 <_avnd_cb+32>) at ./src/base/osaf_utility.h:80](http://)

3 ncs_os_lock (lock=0x55c39fc6c260 <_avnd_cb+32>, request=<optimized out="">, type=<optimized out="">) at src/base/os_defs.c:432

4 0x000055c39fa3bf99 in avnd_mon_pids (cb=0x55c39fc6c240 <_avnd_cb>) at src/amf/amfnd/mon.cc:303

Thread 1 (Thread 0x7ff56c678740 (LWP 18877)):

0 0x00007ff56b1c99ad in pthread_join () from /lib64/libpthread.so.0

1 0x00007ff56b9ab304 in ncs_os_task (task=task@entry=0x7ffe87ee8c70, request=request@entry=NCS_OS_TASK_RELEASE) at src/base/os_defs.c:292

2 0x00007ff56b9b58dc in ncs_task_release (task_handle=<optimized out="">) at src/base/sysf_tsk.c:80

3 0x000055c39fa3c1b1 in avnd_mon_req_del (cb=cb@entry=0x55c39fc6c240 <_avnd_cb>, pid=pid@entry=24503) at src/amf/amfnd/mon.cc:169

9 0x000055c39fa395ef in avnd_evt_process (evt=0x7ff564005c40) at src/amf/amfnd/main.cc:668

Thread 1 is keeping “lock” and waiting for thread 6 finish (pthread_cancel() then pthread_join()).
But Thread 6 is waiting for “lock” and cannot be cancelled.

Steps to reproduce:
In amfnd mon.c, add some sleep in the avnd_mon_req_del() routine after taking lock.
The sleep will ensure that monitirung thread gets invoked before releaseing the task

Start a test application, trigger passive monitoring of the test application.
Kill the test application, this will result in the deadlock where
Monitoring thread is waiting for the lock and
Main thread is trying to cancel the thread

Related

Tickets: #2818
Wiki: ChangeLog-5.18.04

Discussion

  • Ravi Sekhar Reddy

    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -4,10 +4,8 @@
     From backtrace of coredump, we can see that a deadlock happened between threads of AMFND for Passive Monitor.
    
     Thread 6 (Thread 0x7ff56c5e0b00 (LWP 20005)):
    -**[#0  0x00007ff56b1cfa8c in lll_lock_wait () from /lib64/libpthread.so.0
    -~~~
    -#1  0x00007ff56b1ca80b in pthread_mutex_lock () from /lib64/libpthread.so.0
    -#2  0x00007ff56b9ab8cd in osaf_mutex_lock_ordie (io_mutex=0x55c39fc6c260 <_avnd_cb+32>) at ./src/base/osaf_utility.h:80](http://)
    +0  0x00007ff56b1cfa8c in lll_lock_wait () from /lib64/libpthread.so.0
    +1  0x00007ff56b1ca80b in pthread_mutex_lock () from /lib64/libpthread.so.0                                                     #2  0x00007ff56b9ab8cd in osaf_mutex_lock_ordie (io_mutex=0x55c39fc6c260 <_avnd_cb+32>) at ./src/base/osaf_utility.h:80](http://)
     #3  ncs_os_lock (lock=0x55c39fc6c260 <_avnd_cb+32>, request=<optimized out>, type=<optimized out>) at src/base/os_defs.c:432
     #4  0x000055c39fa3bf99 in avnd_mon_pids (cb=0x55c39fc6c240 <_avnd_cb>) at src/amf/amfnd/mon.cc:303
    
    • status: unassigned --> accepted
     
  • Ravi Sekhar Reddy

    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -19,3 +19,12 @@
    
     Thread 1 is keeping “lock” and waiting for thread 6 finish (pthread_cancel() then pthread_join()).
     But Thread 6 is waiting for “lock” and cannot be cancelled. 
    +
    +Steps to reproduce:
    + In amfnd mon.c,  add some sleep in the avnd_mon_req_del() routine after taking lock.
    + The sleep will ensure that monitirung thread gets invoked before releaseing the task 
    + 
    + Start a test application, trigger passive monitoring of the test application.
    + Kill the test application, this will result in the deadlock  where
    +  Monitoring thread is waiting for the lock and 
    +  Main thread is trying to cancel the thread
    
     
  • Ravi Sekhar Reddy

    • status: accepted --> review
     
  • Ravi Sekhar Reddy

    • assigned_to: Ravi Sekhar Reddy
     
  • Ravi Sekhar Reddy

    • status: review --> fixed
     
  • Ravi Sekhar Reddy

    Fixed with the following commit

    commit 8f1e636e55d714228eb0c61ec4b4b03e40888460
    Author: ravi-sekhar ravisekhar.konda@oracle.com
    Date: Mon Apr 9 11:27:59 2018 +0530

    amfnd: unlock before releasing the monitoring thread to avoid deadlock [#2818]

     

    Related

    Tickets: #2818


Log in to post a comment.