|
From: Konstantin S. <kon...@gm...> - 2008-03-28 11:44:48
|
Hi,
I'd like to collect ideas regarding the subject raised today in a
separate thread: how to decipher Helgrind's reports about 'Possible
data race'.
So, this is the usual format of helgrind report about a race:
- ACCESS_TYPE (read or write)
- memory address ADDR
- thread segment SEG
- thread THR
- stack dump of access ACCESS_CONTEXT
- stack dump of the place where ADDR has been allocated:
ALLOC_CONTEXT (or a name of a global variable).
- stack dump of the place where last consistently used lock was
used: LOCK_CONTEXT
- Previous state OLD_STATE which indicates:
- If there were writes to this memory before (or only reads
happened): R or W
- In what segments and threads where these previous accesses:
(like this: S123/T1 S456/T3 S987/T7)
So, if the race happens on a global var, life is easy: we just check
all uses of this var manually.
If there are too many uses, we can call Helgrind second time with
--trace-addr=ADDR --trace-level=2 and we get all accesses.
If the race happens on a memory location allocated from heap, and
which is e.g. a field of a structure inside out code, --trace-addr may
not work (in my experience it never works on big apps).
This is because addresses allocated in multi-threaded programs differ
from run to run (idea: hack the allocator to make it more
reproducible; not sure if possible).
In this case VG_USERREQ__HG_TRACE_MEM is useful: we annotate the racy
field with this client request and rerun Helgrind with --trace-level=2
getting all the accesses.
In my experience it helps in ~50% of cases.
I think that sometimes printing the traces annoys the scheduler and
the race gets hidden (idea: instead of printing traces store them
somewhere and print only when showing the race).
Ok, but what shall we do if the race is inside some library code (e.g.
STL)? We can't annotate it...
That's what I do (not perfect and requires a lot of manual work):
- On each segment creation I record the current context (stack dump)
(added ExeContext* field to Segment)
- When printing a race report I also print contexts of all segment in
the OLD_STATE.
It gives me information like this: access to ADDR in thread T1
happened after context C1, access in T2 happened after C2, ...
Usually, C1 and C2 are quite far from the actual access :(
But now I can find the actual access by creating new segments in
random parts of code starting from C1 and C2. (the new segments can be
created by annotating the code with
_VG_USERREQ__HG_PTHREAD_COND_SIGNAL_PRE(0xDEADBEAF))
A long process I should say... Just yesterday I spent 1.5 hours trying
to understand a particularly nasty race reported inside vector<>.
Does anyone have a better idea?
--kcc
P.S. Julian, the mail to you still bounces:
----- Transcript of session follows -----
.. while talking to open-works.net
>>> DATA
<<< 554 5.7.1 Penalty Box error, please contact the server support to
ensure delivery
|