Thread: [Dbus-cxx-devel] Race hack fix
Status: Beta
Brought to you by:
rvinyard
From: Patrick A. <pa...@ph...> - 2010-03-03 22:49:56
|
Hi all: The watch/dispatch race fix in dbus-cxx 0.6 still needs a bit more tweaking, unfortunately. The current fix does: after a watch is handled, it sets need_initiate_processing_hack, and then if that's set, it wakes up the dispatch thread. That's not exactly the problem, though - the dispatch thread will just wake up, see that it has nothing to do (since the dispatch status is complete) and go back to sleep. In fact, to remind, the problem is: 1) watch thread wakes up, select() says there's data to read 2) calls handle_read() on that Watch 3) This queues a message, which changes the dispatch status, which wakes the dispatch thread. 4) The dispatch thread dispatches the message, which calls Connection::send for the reply. The watch thread still holds the IO path, so this message gets silently queued 5) The watch thread finishes handle_read(). In the current hack, all that happens next is that 6) the watch thread wakes up the dispatch thread again 7) the dispatch thread sees the dispatch status is DISPATCH_COMPLETE and wonders why it was woken up before going back to sleep. There are two ways to actually fix this (well, 3 really): 1) Set a mutex between the watch thread and the dispatch thread such that you never dispatch while handling watches, and you never handle watches while dispatching. For some reason I couldn't get this to work. It should be simple: just a mutex lock around each if (FD_ISSET()) {} block, and a mutex lock around the (*ci)->dispatch() calls. I probably just screwed it up simply somehow. 2a) At the end of the watch thread, loop over the connections, and if there are messages pending, call Connection::flush on them. This is sadly a bit brute force because there could be messages pending due to the socket being full, in which case the thread will block unnecessarily. It's also a pain because the watch thread doesn't loop over connections at all, so it's not that elegant. 2b) Change DBus::Connection::signal_add_watch to take a Connection object as well as the watch object, and add a DBus::Connection pointer inside the Watch. Then, after the watch thread completes, loop over all watches in m_write_watches, and then if watch->connection()->has_messages_to_send && !watch->is_enabled() , call watch->connection()->flush(). You could do this with no changes at all currently, if you matched the watch->unix_fd() to the connection->unix_fd() but that seems complex. Not sure it's as simple as this, though, because the question of ownership of the DBus::Connection object exists, so maybe matching via the unix_fd() etc. is needed (geh). Patrick |
From: Rick L. V. Jr. <rvi...@cs...> - 2010-03-03 23:14:30
|
Patrick Allison wrote: > Hi all: > > The watch/dispatch race fix in dbus-cxx 0.6 still needs a bit more > tweaking, unfortunately. The current fix does: after a watch is handled, > it sets need_initiate_processing_hack, and then if that's set, it wakes > up the dispatch thread. > > That's not exactly the problem, though - the dispatch thread will just > wake up, see that it has nothing to do (since the dispatch status is > complete) and go back to sleep. In fact, to remind, the problem is: > > 1) watch thread wakes up, select() says there's data to read > 2) calls handle_read() on that Watch > 3) This queues a message, which changes the dispatch status, which wakes > the dispatch thread. > 4) The dispatch thread dispatches the message, which calls > Connection::send for the reply. The watch thread still holds the IO > path, so this message gets silently queued > 5) The watch thread finishes handle_read(). > > In the current hack, all that happens next is that > 6) the watch thread wakes up the dispatch thread again > 7) the dispatch thread sees the dispatch status is DISPATCH_COMPLETE and > wonders why it was woken up before going back to sleep. > > There are two ways to actually fix this (well, 3 really): > 1) Set a mutex between the watch thread and the dispatch thread such > that you never dispatch while handling watches, and you never handle > watches while dispatching. For some reason I couldn't get this to work. > It should be simple: just a mutex lock around each if (FD_ISSET()) {} > block, and a mutex lock around the (*ci)->dispatch() calls. I probably > just screwed it up simply somehow. > I'd prefer the simpler approach, but it may be a week or two before I have time to look into it. Things are pretty hectic right now. -- Rick |
From: Patrick A. <pa...@ph...> - 2010-03-04 20:27:03
|
> > There are two ways to actually fix this (well, 3 really): > > 1) Set a mutex between the watch thread and the dispatch thread such > > that you never dispatch while handling watches, and you never handle > > watches while dispatching. For some reason I couldn't get this to work. > > It should be simple: just a mutex lock around each if (FD_ISSET()) {} > > block, and a mutex lock around the (*ci)->dispatch() calls. I probably > > just screwed it up simply somehow. > > > > I'd prefer the simpler approach, but it may be a week or two before I have > time to look into it. Things are pretty hectic right now. > Well, which one is the "simpler approach" is debatable. The downside to putting mutexes around the watch/dispatch is of course that if a dispatch takes a fair amount of time, you won't be able to handle reads or writes during that time. But it is probably best right now to implement it that way - I've been getting intermittent weirdness (once every few days) where the watch thread blows up in an infinite loop inside the sigc++ library when it tries to emit the "dispatch_status_changed" signal, and so I hacked the dispatch/watch threads to move the actual watch handling inside the dispatch thread (the watch thread just wakes the dispatch thread when a file descriptor is ready, and then waits for the dispatch thread to handle it). That fixed the problem, so I think there might be other problems inside libdbus with trying to dispatch at the same time as watch handling. But I don't recommend my approach because it's godawful, and a simple mutex should solve that. I'll try to actually write up an simple example piece of code which blows up this way in a few days, which should help. Sadly I can't just reuse the code in the application I'm using because it's waay too complex. Definitely understand the "pretty hectic" bit, though - that's why I hadn't had time to even check the current code for a long while! Personally I just wish we'd hear *anything* from the D-Bus developers regarding this issue, though, considering it's definitely a bug in libdbus and not a bug in dbus-cxx. It's not even really a bug involving watch handling/dispatching at the same time - it's just that there's only a very narrow window for the bug to happen. Patrick |
From: Rick L. V. Jr. <rvi...@cs...> - 2010-03-18 16:02:44
|
Patrick Allison wrote: >> > There are two ways to actually fix this (well, 3 really): >> > 1) Set a mutex between the watch thread and the dispatch thread such >> > that you never dispatch while handling watches, and you never handle >> > watches while dispatching. For some reason I couldn't get this to >> work. >> > It should be simple: just a mutex lock around each if (FD_ISSET()) {} >> > block, and a mutex lock around the (*ci)->dispatch() calls. I probably >> > just screwed it up simply somehow. >> > >> >> I'd prefer the simpler approach, but it may be a week or two before I >> have >> time to look into it. Things are pretty hectic right now. >> > > Well, which one is the "simpler approach" is debatable. The downside to > putting mutexes around the watch/dispatch is of course that if a > dispatch takes a fair amount of time, you won't be able to handle reads > or writes during that time. > > But it is probably best right now to implement it that way - I've been > getting intermittent weirdness (once every few days) where the watch > thread blows up in an infinite loop inside the sigc++ library when it > tries to emit the "dispatch_status_changed" signal, and so I hacked the > dispatch/watch threads to move the actual watch handling inside the > dispatch thread (the watch thread just wakes the dispatch thread when a > file descriptor is ready, and then waits for the dispatch thread to > handle it). > > That fixed the problem, so I think there might be other problems inside > libdbus with trying to dispatch at the same time as watch handling. But > I don't recommend my approach because it's godawful, and a simple mutex > should solve that. > > I'll try to actually write up an simple example piece of code which > blows up this way in a few days, which should help. Sadly I can't just > reuse the code in the application I'm using because it's waay too > complex. > > Definitely understand the "pretty hectic" bit, though - that's why I > hadn't had time to even check the current code for a long while! > > Personally I just wish we'd hear *anything* from the D-Bus developers > regarding this issue, though, considering it's definitely a bug in > libdbus and not a bug in dbus-cxx. It's not even really a bug involving > watch handling/dispatching at the same time - it's just that there's > only a very narrow window for the bug to happen. > I just pushed out a new release, but I still haven't addressed the race condition. I'd like to find something that would allow us to simultaneously handle watches and dispatch without having to modify too much. However, I did noticed that in the newest release of dbus (1.2.22) there is a fix for a race condition in the watch handlers. The old code was: _dbus_connection_acquire_io_path (connection, -1); And the new code checks the return value and unlocks the connection: if (!_dbus_connection_acquire_io_path (connection, 1)) { /* another thread is handling the message */ CONNECTION_UNLOCK (connection); return TRUE; } I haven't traced the code to see if it's going to fix the issue we currently have, or even developed a simple test case. I keep hoping to "acquire" a test case by stumbling across it, but I haven't been bitten by the bug yet. -- Rick |
From: Patrick A. <pa...@ph...> - 2010-03-18 18:36:01
|
On Thu, 2010-03-18 at 10:02 -0600, Rick L. Vinyard, Jr. wrote: > However, I did noticed that in the newest release of dbus (1.2.22) there > is a fix for a race condition in the watch handlers. Sadly, it's the wrong fix. There are two threads, of course, and so the bug happens in two ways: 1) watch thread calls watch_handle while another thread has I/O path 2) another thread calls dbus_connection_send while watch thread has I/O path That patch fixes case #1. Case #2 is still unhandled. > I keep hoping to "acquire" a test case by stumbling across it, but I > haven't been bitten by the bug yet. I can try to write one up soon. The best way to have it happen is to have D-Bus method A emit D-Bus signal B, and then call method A a bajillion times. Needs to be run on an SMP machine, obviously. Still takes about 5-10 minutes, though. Patrick |