I think the algorithm outlined below will work. Computing the danger set at each context switch will take some CPU time, but has the big advantage that precise call stacks can be reported without having to store a call stack for each memory access. There is a difference however between reporting races at the end of a segment versus reporting them immediately: when reporting each race immediately, it becomes possible that the same data race is reported more than once ( e.g. when the same address is accessed repeatedly in a loop). Probably care will have to be taken to ensure that each race is only reported once.

On 9/4/06, Julian Seward <jseward@acm.org> wrote:

> data races. E.g. the ThreadChecker that is part of Intel VTune reports the
> complete call stack for both first and second accesses involved in a data
> race. The granularity of this information is one source line: an overview

Is the following a useful idea?  It would give a complete call stack for
the second access in each data race and requires acceptable memory overhead.
All assuming there are no flaws in the idea.

-------

First, forget about the suggestion I made previously in this thread.
The following builds on the vanilla drd implementation that already
exists.

We can take advantage of the fact that V only allows one thread to run
at any time.  When a thread is scheduled for execution, compute a
"danger set".  This is the set of memory locations accessed by the
union of segments which are unordered relative to the segment which is
about to be scheduled.  (ie, the usual deal).

As the segment runs, check each load and store against the danger set.
If the load/store addresses exist in the danger set, we know that the
standard drd algorithm will eventually flag them as race addresses.
So we might as well print a backtrace right now; then at least we will
know the backtrace of this second access.

In short the idea is to do the segment comparison incrementally, rather
than wait until segment completion, as in the current implementation.
This is possible because V's one-thread-at-a-time means only one
segment is ever changing at any one time.

The overheads are:

(1) for each load and store, not only do we need to mark the current
    segment's bitmaps; now we also need to check the danger set too.
    Not great, but tolerable.

(2) for each thread scheduling, need to construct the danger set.
    Could possibly be optimised; eg if a thread repeatedly stops for
    (eg) syscalls and then resumes, and no other thread runs in between,
    can reuse existing danger set.

(3) Extra space required to store the danger set.

So the question is, will it work, or did I miss something?

Consider two segments, A and B, both available to run, not ordered, and
containing a conflicting address.  Does the scheme guarantee still to
detect the conflict?  I believe so.

- Suppose the scheduling is such that A visits the conflicting address
  X for the first time before B does.  Then, for all subsequent
  scheduling(s) of B, X will be in the danger-set for B; hence if
  B accesses X then the conflict will be found.

- Same argument the other way round if B visits the conflicting address
  first.

Suppose A visits the conflicting address first, but there is a synchronising
event between A and B before B visits that address for the first time.
Then A and B acquire an ordering (A < B), so A's accesses are no longer
included in B's danger set, and so no conflict is reported (which is correct;
there is no race).  Of course this is just a restatement of the normal Diota
algorithm; it is not unique to the suggested modification.