AMFND do not send health check to AMFWD, then AMFWD kill AMFND to generate AMFND coredump.
From backtrace of coredump, we can see that a deadlock happened between threads of AMFND for Passive Monitor.
Thread 6 (Thread 0x7ff56c5e0b00 (LWP 20005)):
0 0x00007ff56b1cfa8c in lll_lock_wait () from /lib64/libpthread.so.0
1 0x00007ff56b1ca80b in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x00007ff56b9ab8cd in osaf_mutex_lock_ordie (io_mutex=0x55c39fc6c260 <_avnd_cb+32>) at ./src/base/osaf_utility.h:80](http://)
Thread 1 (Thread 0x7ff56c678740 (LWP 18877)):
…
Thread 1 is keeping “lock” and waiting for thread 6 finish (pthread_cancel() then pthread_join()).
But Thread 6 is waiting for “lock” and cannot be cancelled.
Steps to reproduce:
In amfnd mon.c, add some sleep in the avnd_mon_req_del() routine after taking lock.
The sleep will ensure that monitirung thread gets invoked before releaseing the task
Start a test application, trigger passive monitoring of the test application.
Kill the test application, this will result in the deadlock where
Monitoring thread is waiting for the lock and
Main thread is trying to cancel the thread
Diff:
Diff:
Fixed with the following commit
commit 8f1e636e55d714228eb0c61ec4b4b03e40888460
Author: ravi-sekhar ravisekhar.konda@oracle.com
Date: Mon Apr 9 11:27:59 2018 +0530
amfnd: unlock before releasing the monitoring thread to avoid deadlock [#2818]
Related
Tickets:
#2818