From: Saurav S. <Sau...@ha...> - 2012-01-26 20:25:13
|
Hi, I'm using dbus-c++ (the 6.15.2011 version) in a development project. This is a multi-threaded application, where the dispatcher is one thread, and there are other threads making d-bus client method calls. I didn't notice the pipe-based synchronization in the echo client example, so all of my worker threads currently call the dbus-c++ client methods directly. I have seen a few different deadlock scenarios in my application. All of them boil down to basically the same thing: * The dispatcher thread is holding _mutex_t and trying to handle an expired timeout * Either as part of handling the timeout, or from a different thread, libdbus is trying to add or remove another timeout * This reaches back up into dbus-c++ and tries to take the _mutex_t lock again This can happen all in the dispatcher thread, as this stack trace shows. This is a case where the dispatcher thread calls into libdbus, which comes back up to the dispatcher in the on_rem_timeout callback handler. This specific case might be handled by making _mutex_t recursive, but that won't solve the whole problem. #0 0x00007f5e176cd2f4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f5e176c8d51 in _L_lock_110 () from /lib64/libpthread.so.0 #2 0x00007f5e176c8671 in __pthread_mutex_lock (mutex=0x591490) at pthread_mutex_lock.c:86 #3 0x00007f5e17cd4a15 in DBus::DefaultMutex::lock () from /lib64/libdbus-c++-1.so.0 #4 0x00007f5e17cd5f5c in DBus::DefaultTimeout::~DefaultTimeout () from /lib64/libdbus-c++-1.so.0 #5 0x00007f5e17cd8d61 in DBus::BusTimeout::~BusTimeout () from /lib64/libdbus-c++-1.so.0 #6 0x00007f5e17cd756b in DBus::BusDispatcher::rem_timeout () from /lib64/libdbus-c++-1.so.0 #7 0x00007f5e17ccfcd1 in DBus::Dispatcher::Private::on_rem_timeout () from /lib64/libdbus-c++-1.so.0 #8 0x00007f5e17e195de in _dbus_timeout_list_remove_timeout () from /lib64/libdbus-1.so.3 #9 0x00007f5e17dfbb90 in protected_change_timeout () from /lib64/libdbus-1.so.3 #10 0x00007f5e17dfbc2e in _dbus_connection_remove_timeout_unlocked () from /lib64/libdbus-1.so.3 #11 0x00007f5e17dfe958 in reply_handler_timeout () from /lib64/libdbus-1.so.3 #12 0x00007f5e17e196ea in dbus_timeout_handle () from /lib64/libdbus-1.so.3 #13 0x00007f5e17ccfe76 in DBus::Timeout::handle () from /lib64/libdbus-c++-1.so.0 #14 0x00007f5e17cd7752 in DBus::BusDispatcher::timeout_expired () from /lib64/libdbus-c++-1.so.0 #15 0x00007f5e17cd8ce3 in DBus::Callback<DBus::BusDispatcher, void, DBus::DefaultTimeout&>::call () from /lib64/libdbus-c++-1.so.0 #16 0x00007f5e17cd68c0 in DBus::Slot<void, DBus::DefaultTimeout&>::operator() () from /lib64/libdbus-c++-1.so.0 #17 0x00007f5e17cd4dbe in DBus::DefaultMainLoop::dispatch () from /lib64/libdbus-c++-1.so.0 #18 0x00007f5e17cd7866 in DBus::BusDispatcher::do_iteration () from /lib64/libdbus-c++-1.so.0 #19 0x00007f5e17cd7ad0 in DBus::BusDispatcher::enter () from /lib64/libdbus-c++-1.so.0 It can also happen across multiple threads, as the following stack traces show. In this case, the dispatcher holds _mutex_t inside dbus-c++, and is trying to get the connection lock inside libdbus. At the same time, a worker thread holds the connection lock inside libdbus, and is trying to get _mutex_t inside dbus-c++. Thread 78 (Thread 0x41a52950 (LWP 2451)): #0 0x00007f77f1e242f4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f77f1e1fdbd in _L_lock_991 () from /lib64/libpthread.so.0 #2 0x00007f77f1e1fb6a in __pthread_mutex_lock (mutex=0x725350) at pthread_mutex_lock.c:69 #3 0x00007f77f257fc2a in _dbus_pthread_mutex_lock () from /lib64/libdbus-1.so.3 #4 0x00007f77f257087e in _dbus_mutex_lock () from /lib64/libdbus-1.so.3 #5 0x00007f77f2552718 in _dbus_connection_lock () from /lib64/libdbus-1.so.3 #6 0x00007f77f256ac48 in _dbus_pending_call_get_connection_and_lock () from /lib64/libdbus-1.so.3 #7 0x00007f77f2555932 in reply_handler_timeout () from /lib64/libdbus-1.so.3 #8 0x00007f77f25706ea in dbus_timeout_handle () from /lib64/libdbus-1.so.3 #9 0x00007f77f2427292 in DBus::Timeout::handle () from /lib64/libdbus-c++-1.so.0 #10 0x00007f77f242ecee in DBus::BusDispatcher::timeout_expired () from /lib64/libdbus-c++-1.so.0 #11 0x00007f77f243016f in DBus::Callback<DBus::BusDispatcher, void, DBus::DefaultTimeout&>::call () from /lib64/libdbus-c++-1.so.0 #12 0x00007f77f242de4a in DBus::Slot<void, DBus::DefaultTimeout&>::operator() () from /lib64/libdbus-c++-1.so.0 #13 0x00007f77f242c841 in DBus::DefaultMainLoop::dispatch () from /lib64/libdbus-c++-1.so.0 #14 0x00007f77f242ee62 in DBus::BusDispatcher::do_iteration () from /lib64/libdbus-c++-1.so.0 #15 0x00007f77f242f0cc in DBus::BusDispatcher::enter () from /lib64/libdbus-c++-1.so.0 Thread 60 (Thread 0x40c7f950 (LWP 3098)): #0 0x00007f77f1e242f4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f77f1e1fd51 in _L_lock_110 () from /lib64/libpthread.so.0 #2 0x00007f77f1e1f671 in __pthread_mutex_lock (mutex=0x590490) at pthread_mutex_lock.c:86 #3 0x00007f77f242c0f9 in DBus::DefaultMutex::lock () from /lib64/libdbus-c++-1.so.0 #4 0x00007f77f242d8fc in DBus::DefaultTimeout::DefaultTimeout () from /lib64/libdbus-c++-1.so.0 #5 0x00007f77f242f93a in DBus::BusTimeout::BusTimeout () from /lib64/libdbus-c++-1.so.0 #6 0x00007f77f242f9ee in DBus::BusDispatcher::add_timeout () from /lib64/libdbus-c++-1.so.0 #7 0x00007f77f2426cc0 in DBus::Dispatcher::Private::on_add_timeout () from /lib64/libdbus-c++-1.so.0 #8 0x00007f77f257056f in _dbus_timeout_list_add_timeout () from /lib64/libdbus-1.so.3 #9 0x00007f77f2552b6b in protected_change_timeout () from /lib64/libdbus-1.so.3 #10 0x00007f77f2552bf7 in _dbus_connection_add_timeout_unlocked () from /lib64/libdbus-1.so.3 #11 0x00007f77f2552ca9 in _dbus_connection_attach_pending_call_unlocked () from /lib64/libdbus-1.so.3 #12 0x00007f77f2555bc1 in dbus_connection_send_with_reply () from /lib64/libdbus-1.so.3 #13 0x00007f77f2555e66 in dbus_connection_send_with_reply_and_block () from /lib64/libdbus-1.so.3 #14 0x00007f77f2422a03 in DBus::Connection::send_blocking () from /lib64/libdbus-c++-1.so.0 #15 0x00007f77f2418fed in DBus::ObjectProxy::_invoke_method () from /lib64/libdbus-c++-1.so.0 #16 0x00007f77f24139d6 in DBus::InterfaceProxy::invoke_method () from /lib64/libdbus-c++-1.so.0 I've spent several days on this, and haven't been able to figure out a solution. I can probably handle the case of adding a timeout by making the DefaultTimeout constructor put the timeout in a temporary list that's protected by a different lock, and then in the dispatch() function, copy those new timeouts over into the main _timeouts list. However, deleting a timeout becomes tricky, because libdbus frees the DbusTimeout structure after the callback returns. So if I defer the deletion in dbus-c++, or release the _mutex_t lock before processing the timeout, I run the risk of accessing freed memory. The only solution I've been able to come with is that the dispatcher thread needs to be synchronized with my worker threads. The echo-client example does this through the pipes, but switching my application to use pipes will require a major re-write of a lot of code. The other solution I can think of is to add a new Dispatcher::enter method that takes a mutex, use that to protect the whole dispatch loop, and ensure that all my worker threads use that same mutex when calling a client method. This way, only one thread is ever inside libdbus at any time. Of course, this isn't a very elegant solution, and it also results in fully serialized IPC, which isn't ideal. I thought I'd reach out to the list and see if anyone else had any advice to share. Maybe someone else has encountered this before? Maybe there's a patch for it in a branch somewhere that I haven't found? Or even if you just have some suggestions on how to try and address this, that would be most helpful. Thanks, Saurav |