Re: [Dbus-cxx-devel] Race hack fix

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Patrick Allison wrote:
>> > There are two ways to actually fix this (well, 3 really):
>> > 1) Set a mutex between the watch thread and the dispatch thread such
>> > that you never dispatch while handling watches, and you never handle
>> > watches while dispatching. For some reason I couldn't get this to
>> work.
>> > It should be simple: just a mutex lock around each if (FD_ISSET()) {}
>> > block, and a mutex lock around the (*ci)->dispatch() calls. I probably
>> > just screwed it up simply somehow.
>> >
>>
>> I'd prefer the simpler approach, but it may be a week or two before I
>> have
>> time to look into it. Things are pretty hectic right now.
>>
>
> Well, which one is the "simpler approach" is debatable. The downside to
> putting mutexes around the watch/dispatch is of course that if a
> dispatch takes a fair amount of time, you won't be able to handle reads
> or writes during that time.
>
> But it is probably best right now to implement it that way - I've been
> getting intermittent weirdness (once every few days) where the watch
> thread blows up in an infinite loop inside the sigc++ library when it
> tries to emit the "dispatch_status_changed" signal, and so I hacked the
> dispatch/watch threads to move the actual watch handling inside the
> dispatch thread (the watch thread just wakes the dispatch thread when a
> file descriptor is ready, and then waits for the dispatch thread to
> handle it).
>
> That fixed the problem, so I think there might be other problems inside
> libdbus with trying to dispatch at the same time as watch handling. But
> I don't recommend my approach because it's godawful, and a simple mutex
> should solve that.
>
> I'll try to actually write up an simple example piece of code which
> blows up this way in a few days, which should help. Sadly I can't just
> reuse the code in the application I'm using because it's waay too
> complex.
>
> Definitely understand the "pretty hectic" bit, though - that's why I
> hadn't had time to even check the current code for a long while!
>
> Personally I just wish we'd hear *anything* from the D-Bus developers
> regarding this issue, though, considering it's definitely a bug in
> libdbus and not a bug in dbus-cxx. It's not even really a bug involving
> watch handling/dispatching at the same time - it's just that there's
> only a very narrow window for the bug to happen.
>

I just pushed out a new release, but I still haven't addressed the race
condition.

I'd like to find something that would allow us to simultaneously handle
watches and dispatch without having to modify too much.

However, I did noticed that in the newest release of dbus (1.2.22) there
is a fix for a race condition in the watch handlers.

The old code was:
_dbus_connection_acquire_io_path (connection, -1);

And the new code checks the return value and unlocks the connection:
  if (!_dbus_connection_acquire_io_path (connection, 1))
    {
      /* another thread is handling the message */
      CONNECTION_UNLOCK (connection);
      return TRUE;
    }

I haven't traced the code to see if it's going to fix the issue we
currently have, or even developed a simple test case.

I keep hoping to "acquire" a test case by stumbling across it, but I
haven't been bitten by the bug yet.

-- 

Rick