[Dbus-cxx-devel] Some 'sent' messages don't get sent
Status: Beta
Brought to you by:
rvinyard
From: Patrick A. <pa...@ph...> - 2009-08-28 09:20:52
|
Hi all: I'm having a bit of a problem with messages occasionally getting 'stuck' and being unsent somehow. I've turned on libdbus debugging and dbus-cxx debugging, and I think I see why it's happening, but I can't see where the bug is. One bug I *did* find that I really, really thought would fix it, but didn't, was that Connection::on_remove_watch_callback was signalling "signal_add_watch" rather than "signal_remove_watch", and similarly Connection::on_watch_toggled_callback was signaling "signal_add_watch" instead of "signal_watch_toggled" (connection.cpp, lines 845 and 851) I see this problem when one process calls a method in another process, and the reply message gets "lost" - it never shows up in a D-Bus monitor, and I'm pretty sure it's never getting sent: A "correct" reply looks like this. I put a debug statement in Connection::send to see when 'send' is called, so that's what starts this debugging output. 3081477008: Connection::send 16976: LOCK: dbus_connection_send 16976: Message 0x88d7df8 (2 no path no interface no member 'i') for :1.36 added to outgoing queue 0x88d79d0, 1 pending to send 16976: Message 0x88d7df8 serial is 5 16976: _dbus_connection_do_iteration_unlocked start 16976: UNLOCK: _dbus_connection_acquire_io_path 16976: _dbus_connection_acquire_io_path locking io_path_mutex 16976: _dbus_connection_acquire_io_path start connection->io_path_acquired = 0 timeout = 0 16976: _dbus_connection_acquire_io_path end connection->io_path_acquired = 1 we_acquired = 1 16976: _dbus_connection_acquire_io_path unlocking io_path_mutex 16976: LOCK: _dbus_connection_acquire_io_path 16976: Transport iteration flags 0x1 timeout -1 connected = 1 16976: iteration flags = write timeout = -1 read_watch = 0x88d7680 write_watch = 0x88d7658 fd = 3 16976: do_writing(), have_messages = 1, fd = 3 16976: wrote 52 bytes of 52 16976: Message 0x88d7df8 (2 no path no interface no member 'i') removed from outgoing queue 0x88d79d0, 0 left to send 16976: check_write_watch(): needed = 0 on connection 0x88d79d0 watch 0x88d7658 fd = 3 outgoing messages exist 0 16976: UNLOCK: protected_change_watch 16976: LOCK: protected_change_watch 16976: ... leaving do_iteration() 16976: _dbus_transport_do_iteration end 16976: _dbus_connection_release_io_path locking io_path_mutex 16976: _dbus_connection_release_io_path start connection->io_path_acquired = 1 16976: _dbus_connection_release_io_path unlocking io_path_mutex 16976: _dbus_connection_do_iteration_unlocked end 16976: _dbus_connection_send_preallocated_and_unlock middle 16976: dispatch status = complete is_connected = 1 16976: UNLOCK: _dbus_connection_update_dispatch_status_and_unlock 3081477008: Object::handle_message: message was handled And here's one from later in that session, when the message was "lost" and the reply timed out: 3081477008: Connection::send 16976: LOCK: dbus_connection_send 16976: Message 0x88d9bf0 (2 no path no interface no member 's') for :1.37 added to outgoing queue 0x88d79d0, 1 pending to send 16976: Message 0x88d9bf0 serial is 8 16976: _dbus_connection_do_iteration_unlocked start 16976: UNLOCK: _dbus_connection_acquire_io_path 16976: _dbus_connection_acquire_io_path locking io_path_mutex 16976: _dbus_connection_acquire_io_path start connection->io_path_acquired = 1 timeout = 0 16976: _dbus_connection_acquire_io_path waiting 0 for IO path to be acquirable 16976: _dbus_connection_acquire_io_path end connection->io_path_acquired = 1 we_acquired = 0 16976: _dbus_connection_acquire_io_path unlocking io_path_mutex 16976: LOCK: _dbus_connection_acquire_io_path 16976: _dbus_connection_do_iteration_unlocked end 3081477008: Dispatcher::on_wakeup_main 16976: _dbus_connection_send_preallocated_and_unlock middle 16976: dispatch status = complete is_connected = 1 16976: UNLOCK: _dbus_connection_update_dispatch_status_and_unlock 3081477008: Object::handle_message: message was handled >From what I've been able to figure out, whenever dbus_connection_send can't write to the socket immediately, it's supposed to queue the message, and if the queue went from 0->1 it's supposed to toggle the write watch to tell the main thread to select()/poll() on the write fd. As far as I can tell, that's not happening for some reason. I never see on_watch_toggled_callback called, and the write watch is never enabled (it's added to the watch list unenabled). The next time that Connection::send is called, it had 2 pending to send, then 3 pending to send, etc., so clearly the write watch isn't being handled. At that point the programs which are connecting to it usually die due to a timeout expiring, so it's not a temporary thing. I worked around the problem by having Connection::send call Connection::flush, but that's not exactly the ideal solution. Works for now, though... I'm attaching the full debug log - it's pretty verbose, and there's a lot going on (there's output from the main program as well, and also a few additional debugging outputs I added). (Note that this is libdbus 1.2.16., and dbus-cxx 0.4.1) Patrick |