Re: [Dbus-cxx-devel] Race hack fix
Status: Beta
Brought to you by:
rvinyard
From: Rick L. V. Jr. <rvi...@cs...> - 2010-03-18 16:02:44
|
Patrick Allison wrote: >> > There are two ways to actually fix this (well, 3 really): >> > 1) Set a mutex between the watch thread and the dispatch thread such >> > that you never dispatch while handling watches, and you never handle >> > watches while dispatching. For some reason I couldn't get this to >> work. >> > It should be simple: just a mutex lock around each if (FD_ISSET()) {} >> > block, and a mutex lock around the (*ci)->dispatch() calls. I probably >> > just screwed it up simply somehow. >> > >> >> I'd prefer the simpler approach, but it may be a week or two before I >> have >> time to look into it. Things are pretty hectic right now. >> > > Well, which one is the "simpler approach" is debatable. The downside to > putting mutexes around the watch/dispatch is of course that if a > dispatch takes a fair amount of time, you won't be able to handle reads > or writes during that time. > > But it is probably best right now to implement it that way - I've been > getting intermittent weirdness (once every few days) where the watch > thread blows up in an infinite loop inside the sigc++ library when it > tries to emit the "dispatch_status_changed" signal, and so I hacked the > dispatch/watch threads to move the actual watch handling inside the > dispatch thread (the watch thread just wakes the dispatch thread when a > file descriptor is ready, and then waits for the dispatch thread to > handle it). > > That fixed the problem, so I think there might be other problems inside > libdbus with trying to dispatch at the same time as watch handling. But > I don't recommend my approach because it's godawful, and a simple mutex > should solve that. > > I'll try to actually write up an simple example piece of code which > blows up this way in a few days, which should help. Sadly I can't just > reuse the code in the application I'm using because it's waay too > complex. > > Definitely understand the "pretty hectic" bit, though - that's why I > hadn't had time to even check the current code for a long while! > > Personally I just wish we'd hear *anything* from the D-Bus developers > regarding this issue, though, considering it's definitely a bug in > libdbus and not a bug in dbus-cxx. It's not even really a bug involving > watch handling/dispatching at the same time - it's just that there's > only a very narrow window for the bug to happen. > I just pushed out a new release, but I still haven't addressed the race condition. I'd like to find something that would allow us to simultaneously handle watches and dispatch without having to modify too much. However, I did noticed that in the newest release of dbus (1.2.22) there is a fix for a race condition in the watch handlers. The old code was: _dbus_connection_acquire_io_path (connection, -1); And the new code checks the return value and unlocks the connection: if (!_dbus_connection_acquire_io_path (connection, 1)) { /* another thread is handling the message */ CONNECTION_UNLOCK (connection); return TRUE; } I haven't traced the code to see if it's going to fix the issue we currently have, or even developed a simple test case. I keep hoping to "acquire" a test case by stumbling across it, but I haven't been bitten by the bug yet. -- Rick |