|
From: Konstantin S. <kon...@gm...> - 2008-01-17 14:45:23
|
Julian, all,
I'd like to return to the discussion of false positives and false negatives
in helgrind (once you have time).
Previously, I sent few examples with wrong reports (or with missing
reports).
Now I have some more examples in one file (see racecheck_unittest.cc,
attached).
The most interesting, I think, are:
test10: (false negative)
// Writer: Reader:
// 1. write(GLOB) a. sleep(long enough so that GLOB
// is most likely initialized by Writer)
// b. read(GLOB)
test11: (false positive)
// Parent: Worker1, Worker2:
// 1. Start(workers) a. read(GLOB)
// 2. MU.Lock() b. MU.Lock()
// 3. while(COND != 2) /-------- c. CV.Signal()
// CV.Wait(&MU) <-------/ d. MU.Unlock()
// 4. MU.Unlock()
// 5. write(GLOB)
test32: (false positive)
// Parent: Writer: Reader:
// 1. Start(Reader) -----------------------\ .
// \ .
// 2. Start(Writer) ---\ \ .
// \---> a. MU.Lock() \--> A. sleep(long enough)
// b. write(GLOB)
// /---- c. MU.Unlock()
// 3. Join(Writer) <---/
// B. MU.Lock()
// C. read(GLOB)
// /------------ D. MU.Unlock()
// 4. Join(Reader) <----------------/
// 5. write(GLOB)
To fix test10 (i.e. to report the actual race) I think we need to
1. distinguish between ExclusiveM and ExclusiveR states. (to be able to
detect the race) *
*2. keep the lockset even for exclusive states (to avoid false positives).
This was missing in the patch I sent you previously. *
*3. add one transition in msm__handle_read: ExclusiveM->ShM (to actually
report the race of the lockset is empty)
To fix test32 (i.e. *not* to report the non-existing race) I think we also
need 1. and 2.
To simplify understanding, here what happens in test32 (helgrind sees events
in this order):
1. TRACE: Parent :: GLOB: NoAccess --> New
2. CREATE Reader
3. CREATE Writer
4. TRACE: wr Writer :: GLOB: New --> ExclusiveW(Writer)
5. JOIN: Writer :: GLOB: does not change
6. TRACE: rd Reader :: GLOB: ExclusiveW(Writer) --> ShM(Reader,
Writer,#Lset=1)
7. JOIN: Reader:: GLOB: ShM(Reader, Writer,#Lset=1) --> ShM(Parent,
Writer,#Lset=1)
8. TRACE: wr Parent :: GLOB: ShM(Parent, Writer,#Lset=1) --> ShM(Parent,
Writer,#Lset=0) # a race is reported
So, to fix this (IMHO) we need to check happens_before during all state
transitions.* *
For this particular case, we need to check that
happens_before(last_segment_of_Writer_where_we accessed_GLOB,
current_segment_of_parent)
and make this transition:
8. TRACE: wr Parent :: GLOB: ShM(Parent, Writer,#Lset=1) --> ExclM(Parent,
#Lset=0)
So, this actually requires keeping thread segments instead of threads in
shared states.
All this will make things slower, but maybe not much. So far I haven't seen
happens_before in the helgrind's profiles.
Alternatively, we may do something tricky at the time we join the writer,
something similar to your 'cv-hack' patch.
I am not sure which is better.
I did not analyze test11 much, bit it looks similar.
Note, that the tests become slightly more complicated as we add
readers/writers.
*To wrap up my proposal: *
- There are now only two states (other than New and NoAccess): ShM and ShR.
- Each SVal has Lset.
- Each SVal has thread-segment-sets instead of thread-sets.
- Memory accessed by just one thread is still ShM/ShR, but with one element
in thread-segment-set (might be optimized to avoid WS calls).
- We check happens_before in all states.
What do you think?
--kcc
P.S. test35 is a unit test for performance issue in
shadow_mem_make_NoAccess.
test34 and test33 are tests that create millions of TSETs/LSETs to
check capacity of SVal.
P.P.S. There is one more crazy test which suggest me that even what I
propose is not enough, see test28.
// Putter1: Getter: Putter2:
// 1. MU.Lock() A. MU.Lock()
// 2. write(GLOB) B.
write(GLOB)
// 3. MU.Unlock() C.
MU.Unlock()
// 4. Q.Put() ---------\ /------- D. Q.Put()
// 5. MU.Lock() \-------> a. Q.Get() / E. MU.Lock
()
// 6. read(GLOB) b. Q.Get() <---------/ F.
read(GLOB)
// 7. MU.Unlock() c. read(GLOB) G.
MU.Unlock()
P.P.P.S Sorry for a long message :)
|
|
From: Konstantin S. <kon...@gm...> - 2008-01-23 09:19:10
|
Hi, I've described the proposed memory state machine at http://code.google.com/p/data-race-test/wiki/MSMProp1 . It has some false positives but imho better than the current one. I will try to implement it in helgrind and see if it really works. I also attempted to document the current helgrind's state machine ( http://code.google.com/p/data-race-test/wiki/MSMHelgrind). Comments are welcome :) Regrading size of SVal: I will prefer 64-bit over 48 bit. These extra 16 bits will give us some room for experiments with 'heavy' state machines. --kcc |
|
From: Julian S. <js...@ac...> - 2008-01-23 12:50:59
|
> I've described the proposed memory state machine at > http://code.google.com/p/data-race-test/wiki/MSMProp1. That's good. It will be useful to have some results from alternative state machines -- essentially, to see if we can find a better compromise between scheduling-sensitivity and false error rates. Some comments (without really understanding all the consequences of your proposal): * I did not understand what the red/green boxes ("R:", "W:") signify. Maybe it would be simpler to remove them? At least for MSMHelgrind the names "ShR" and "ShM", by themselves, carry enough meaning for me. * In MSMProp1, isn't "Read" state similar to "ShR" and "Write" similar to "ShM" ? * I like that MSMProp1 removes E9/10/11/12 from MSMHelgrind. Those scanned all of memory and were potentially very expensive. * So it is kind-of like MSMHelgrind, except that - The Excl is merged back into ShR / ShR - Segment-sets are tracked, not thread-sets - transition E6 allows transfer of ownership at any read event which happens-after all previous accesses - MSMHelgrind only allows such transfers of memory in Excl state * A general comment when you are dealing with segment sets rather than thread sets. There is a possible redundancy which you may need to remove in the implementation: HB(seg,segSet) is equivalent to HB(seg,max(segSet)) and max(segSet) is always same size or smaller than segSet. Intuitively, max(segSet) == segSet except that it only has the most recent seg for each thread: max(segSet) = { seg1 | seg1 <- segSet, not (exists seg2 <- segSet such that seg1 < seg2) } if that makes any sense. (Standard lattice-theory stuff). > I will try to implement it in helgrind and see if it really works. Good. > Regrading size of SVal: I will prefer 64-bit over 48 bit. These extra 16 > bits will give us some room for experiments with 'heavy' state machines. 48- or 64-bits is more or less unavoidable. Not sure 48-bit is practical - it might give fewer cache misses than 64-bit, but is likely to involve more instructions to pack/unpack the values where needed. ------------ I like the idea of describing these state machines in a relatively uniform notation. I think it is possible to describe a pure happens-before detector in the same framework (almost). This would give a lower bound on the false error rate and so would be a useful reference point. J |
|
From: Julian S. <js...@ac...> - 2008-01-23 13:32:31
|
> I think it is possible to describe a pure happens-before detector in
> the same framework (almost).
Like this:
Generate segments as currently. Also, generate segments for each
lock/unlock operation.
Shadow information for each address A is now a pair:
"w-owner" (thread segment)
"r-owners" (set of thread segments)
no locksets are tracked
w-owner(A) is the segment that most recently wrote A.
r-owners(A) are the segment(s) that most recently read A. As per
comment in previous message, r-owners is potentially redundant; we
only need to store the maximum-frontier subset of r-owners. (*)
For segments S1, S2,
define S1 >= S2 to mean "S1 == S2 or S1 happens-after S2"
Then the update rules are:
for a write in seg S:
valid if S >= w-owner and S >= all elements in r-owners
if (!valid) report race
w-owner := S
r-owners := {}
// intuition: a valid write must happen-after the previous
// write, or happen in the same segment. Also, a valid write
// must happen-after all reads, or be in the same segment
for a read in seg S:
valid if S >= w-owner
r-owners := union(r-owners, {S})
// intuition: a valid read must happen-after the previous write.
// but that's all; it can happen in parallel with other reads.
// we merely need to record that it happened, in order that we
// can check that a later write happens-after all reads
(*) follows trivially from definition of max(a segment set). The
only comparison against r-owners is
"S >= all elements in r-owners"
which is equivalent to
"S >= all elements in max(r-owners)"
J
|
|
From: Konstantin S. <kon...@gm...> - 2008-01-23 13:19:33
|
>
>
> * I did not understand what the red/green boxes ("R:", "W:")
> signify. Maybe it would be simpler to remove them? At least
> for MSMHelgrind the names "ShR" and "ShM", by themselves,
> carry enough meaning for me.
That's just a visual sugar. :)
I wanted to create a machine with states were we store separate segment sets
for R and W. Such states would have two boxes: one red and one green.
But I've abandoned it for now as it seems to complex (and useless :))
Anyway, I agree they are redundant here.
>
>
> * In MSMProp1, isn't "Read" state similar to "ShR" and "Write"
> similar to "ShM" ?
Similar, but not the same. Hence the different name to avoid phrases like
'ShR from MSMHelgrind is different from ShR from MSMProp1 ... '
Also Sh in ShR/ShM means 'shared' while Read/Write are not necessary shared.
> * I like that MSMProp1 removes E9/10/11/12 from MSMHelgrind.
> Those scanned all of memory and were potentially very expensive.
We move this cost to handle_read/write, at least partially. But yes, I like
it too.
>
>
> * So it is kind-of like MSMHelgrind, except that
> - The Excl is merged back into ShR / ShR
> - Segment-sets are tracked, not thread-sets
> - transition E6 allows transfer of ownership at any read event
> which happens-after all previous accesses - MSMHelgrind only
> allows such transfers of memory in Excl state
Yes, except that E4/E5/E7 also transfer ownership.
These edges change segment sets using happens-before (see SS_update).
And if HB(oldSS, currS) these edges take the current locks instead of
intersecting with old lock set (see LS_update).
For example, if we are looking at the first access after a barrier, we will
have HB(oldSS, currS)==True.
Not sure how to express it graphically in a more clear way...
>
>
> * A general comment when you are dealing with segment sets rather
> than thread sets. There is a possible redundancy which you may
> need to remove in the implementation:
>
> HB(seg,segSet) is equivalent to HB(seg,max(segSet))
>
> and max(segSet) is always same size or smaller than segSet.
> Intuitively, max(segSet) == segSet except that it only has the
> most recent seg for each thread:
>
> max(segSet) = { seg1 | seg1 <- segSet,
> not (exists seg2 <- segSet
> such that seg1 < seg2) }
>
> if that makes any sense. (Standard lattice-theory stuff).
>
Cool!
>
>
> > I will try to implement it in helgrind and see if it really works.
>
> Good.
>
> > Regrading size of SVal: I will prefer 64-bit over 48 bit. These extra 16
> > bits will give us some room for experiments with 'heavy' state machines.
>
> 48- or 64-bits is more or less unavoidable. Not sure 48-bit is practical
> - it might give fewer cache misses than 64-bit, but is likely to involve
> more instructions to pack/unpack the values where needed.
>
> ------------
>
> I like the idea of describing these state machines in a relatively
> uniform notation.
>
> I think it is possible to describe a pure happens-before detector in
> the same framework (almost). This would give a lower bound on the false
> error rate and so would be a useful reference point.
By 'pure happens-before detector' you understand the one which creates
happens-before relation after lock/unlock?
--kcc
|
|
From: Julian S. <js...@ac...> - 2008-01-23 14:18:45
|
> > * In MSMProp1, isn't "Read" state similar to "ShR" and "Write" > > similar to "ShM" ? > > Similar, but not the same. Hence the different name to avoid phrases like > 'ShR from MSMHelgrind is different from ShR from MSMProp1 ... ' > Also Sh in ShR/ShM means 'shared' while Read/Write are not necessary > shared. Fair enough. If I ignore the HB(SS,currS), then is it approximately correct to understand that "Write" means "last access was a write" and "Read" means "last access was a read" ? > > * I like that MSMProp1 removes E9/10/11/12 from MSMHelgrind. > > Those scanned all of memory and were potentially very expensive. > > We move this cost to handle_read/write, at least partially. But yes, I like > it too. You mean into the SS_update and LS_update "procedures" ? At least it is lazy, though -- presumably the cost only happens for locations which are later accessed. > Not sure how to express it graphically in a more clear way... I would find the table a little clearer if it was like this: Edge OldState AccessType Condition NewState NewSegSet NewLockSet RaceIf Then the inputs to the FSM are clearly separated into cols 2,3,4 and the outputs to cols 5,6,7,8. Also, I was unclear if the transition rules as given exactly cover the possible input space exactly once? Or do I have to read from top to bottom and use the first rule that matches? At the moment I get the impression there is a hidden dependency, that E7 only applies if E6 does not apply -- I cannot just look at E7 by itself, and the input data I have, and see if it applies or not. > By 'pure happens-before detector' you understand the one which creates > happens-before relation after lock/unlock? Yes. Details posted already. J |
|
From: Konstantin S. <kon...@gm...> - 2008-01-23 14:41:45
|
> > * In MSMProp1, isn't "Read" state similar to "ShR" and "Write" > > > similar to "ShM" ? > > > > Similar, but not the same. Hence the different name to avoid phrases > like > > 'ShR from MSMHelgrind is different from ShR from MSMProp1 ... ' > > Also Sh in ShR/ShM means 'shared' while Read/Write are not necessary > > shared. > > Fair enough. If I ignore the HB(SS,currS), then is it approximately > correct to understand that "Write" means "last access was a write" and > "Read" means "last access was a read" ? Emm, 'Read' means that if there were writes to this memory location they all 'happened before' one of the reads. 'Write' means there were writes and this is not 'Read'. > > > > > * I like that MSMProp1 removes E9/10/11/12 from MSMHelgrind. > > > Those scanned all of memory and were potentially very expensive. > > > > We move this cost to handle_read/write, at least partially. But yes, I > like > > it too. > > You mean into the SS_update and LS_update "procedures" ? At least it > is lazy, though -- presumably the cost only happens for locations which > are later accessed. > Yes. > > > Not sure how to express it graphically in a more clear way... > > I would find the table a little clearer if it was like this: > > Edge OldState AccessType Condition NewState NewSegSet NewLockSet RaceIf > Sounds nice. > > Then the inputs to the FSM are clearly separated into cols 2,3,4 and > the outputs to cols 5,6,7,8. > > Also, I was unclear if the transition rules as given exactly cover the > possible input space exactly once? > Or do I have to read from top to bottom and use the first rule that > matches? Yes. Sorry for ambiguity. I'll try to address it. > At the moment I get the > impression there is a hidden dependency, that E7 only applies if > E6 does not apply -- I cannot just look at E7 by itself, and the input > data I have, and see if it applies or not. > > > By 'pure happens-before detector' you understand the one which creates > > happens-before relation after lock/unlock? > > Yes. Details posted already. > > J > |
|
From: Bart V. A. <bar...@gm...> - 2008-01-23 18:02:54
|
Hello Konstantin, I had a look at the file thread_wrappers_pthread.h. I was surprised to find a class with the name Mutex that has both a mutex and a condition variable as members. Are you aware of the monitor concept that was defined by C.A.R. Hoare ? See e.g. http://en.wikipedia.org/wiki/Monitor_(synchronization) Furthermore, using a condition variable by itself (class CondVar) is *very* error prone and can easily introduce race conditions. A pthread condition variable should always be used in combination with a pthread mutex. Bart. |
|
From: Konstantin S. <kon...@gm...> - 2008-01-23 18:25:51
|
Hi Bart, > > I had a look at the file thread_wrappers_pthread.h. I was surprised to > find a class with the name Mutex that has both a mutex and a condition > variable as members. This is made to implement methods like LockWhen that are equivalent to a sequence MU.Lock(); while(cond==false)) CV.Wait(); ANNOTATE_COND_WAIT Consider it as a syntax sugar. If you don't use methods like LockWhen, Mutex/CondVar is *exactly* the same as pthread_mutex_t/pthread_convar_t. > Are you aware of the monitor concept that was > defined by C.A.R. Hoare ? See e.g. > http://en.wikipedia.org/wiki/Monitor_(synchronization)<http://en.wikipedia.org/wiki/Monitor_%28synchronization%29> > Sort of :) > > Furthermore, using a condition variable by itself (class CondVar) is > *very* error prone and can easily introduce race conditions. A pthread > condition variable should always be used in combination with a > pthread mutex. Sure. CondVar::Signal()/Wait() should always be called under Mutex. If this is not the case, the test is incorrect. --kcc |
|
From: Konstantin S. <kon...@gm...> - 2008-01-24 16:07:22
|
On Jan 23, 2008 12:19 PM, Konstantin Serebryany < kon...@gm...> wrote: > Hi, > > I've described the proposed memory state machine at > http://code.google.com/p/data-race-test/wiki/MSMProp1 . > It has some false positives but imho better than the current one. > I will try to implement it in helgrind and see if it really works. > I've made first (quick-and-dirty) implementation. The machine works as expected, at least on my set of tests. Details on wiki. To be continued... --kcc |
|
From: Julian S. <js...@ac...> - 2008-01-24 19:52:45
|
On Thursday 24 January 2008 17:07, Konstantin Serebryany wrote:
> I've made first (quick-and-dirty) implementation.
> The machine works as expected, at least on my set of tests.
Sounds good.
I see you changed the "Race if ..." condition in MSMProp1 from "LS={}"
to "LS={} and #SS > 1", yes?
I tried to summarise the MSMProp1 state machine, so as to get a
clearer idea of what behaviour it allows and does not allow. But
really I am guessing. Can you fix/refine this summary? You must
have some intuition of what the state machine allows/disallows.
J
* define a "synchronization event" as any of the following:
semaphore post-waits
condition variable signal-wait when the waiter blocks
thread creation
thread joinage
a barrier
* synchronisation events partition the execution of a threaded program
into a set of segments which have a partial ordering, called the
the "happens-before" ordering.
* a location may be read by a segment S, without a protecting lock,
only if all writes to it happened-before S
* a location may be written by a segment S, without a protecting
lock, only if all reads and all writes to it happened-before S
* a location may be read by a segment S, using a protecting lock, only
if all writes to it which are concurrent (not-happens-before) use
the same protecting lock
* a location may be read by a segment S, using a protecting lock,
only if all reads and all writes to it which are concurrent
(not happens-before) use the same protecting lock
|
|
From: Konstantin S. <kon...@gm...> - 2008-01-24 20:22:18
|
>
> I see you changed the "Race if ..." condition in MSMProp1 from "LS={}"
> to "LS={} and #SS > 1", yes?
>
Yep. That's sort of obvious. There is no race in exclusive state.
>
> I tried to summarise the MSMProp1 state machine, so as to get a
> clearer idea of what behaviour it allows and does not allow. But
> really I am guessing. Can you fix/refine this summary? You must
> have some intuition of what the state machine allows/disallows.
>
> J
>
>
> * define a "synchronization event" as any of the following:
> semaphore post-waits
> condition variable signal-wait when the waiter blocks
> thread creation
> thread joinage
> a barrier
>
There are several primary "synchronization events":
semaphore post-waits
condition variable signal-wait when the waiter blocks or the while-wait
loop is annotated
thread creation/joinage
and various possible secondary (defined via primary) events:
message queue (aka PCQ), defined through semaphore.
barrier, defined through condition variable
> * synchronisation events partition the execution of a threaded program
> into a set of segments which have a partial ordering, called the
> the "happens-before" ordering.
>
> * a location may be read by a segment S, without a protecting lock,
> only if all writes to it happened-before S
>
> * a location may be written by a segment S, without a protecting
> lock, only if all reads and all writes to it happened-before S
>
> * a location may be read by a segment S, using a protecting lock, only
> if all writes to it which are concurrent (not-happens-before) use
> the same protecting lock
>
> * a location may be read by a segment S, using a protecting lock,
> only if all reads and all writes to it which are concurrent
> (not happens-before) use the same protecting lock
>
I could not express it better! I'll borrow this description for the wiki :)
It also helps to see what this machine can not do.
Good examples are test38 and test40 which differ only by calls to sleep().
test38 is not handled by this machine, while test40 works fine.
--kcc
|
|
From: Konstantin S. <kon...@gm...> - 2008-01-24 20:49:02
|
On Jan 24, 2008 11:22 PM, Konstantin Serebryany <
kon...@gm...> wrote:
> I see you changed the "Race if ..." condition in MSMProp1 from "LS={}"
> > to "LS={} and #SS > 1", yes?
> >
>
> Yep. That's sort of obvious. There is no race in exclusive state.
>
>
> >
> > I tried to summarise the MSMProp1 state machine, so as to get a
> > clearer idea of what behaviour it allows and does not allow. But
> > really I am guessing. Can you fix/refine this summary? You must
> > have some intuition of what the state machine allows/disallows.
> >
> > J
> >
> >
> > * define a "synchronization event" as any of the following:
> > semaphore post-waits
> > condition variable signal-wait when the waiter blocks
> > thread creation
> > thread joinage
> > a barrier
> >
>
> There are several primary "synchronization events":
> semaphore post-waits
> condition variable signal-wait when the waiter blocks or the while-wait
> loop is annotated
> thread creation/joinage
>
> and various possible secondary (defined via primary) events:
> message queue (aka PCQ), defined through semaphore.
> barrier, defined through condition variable
>
>
>
> > * synchronisation events partition the execution of a threaded program
> > into a set of segments which have a partial ordering, called the
> > the "happens-before" ordering.
> >
> > * a location may be read by a segment S, without a protecting lock,
> > only if all writes to it happened-before S
> >
> > * a location may be written by a segment S, without a protecting
> > lock, only if all reads and all writes to it happened-before S
> >
> > * a location may be read by a segment S, using a protecting lock, only
> > if all writes to it which are concurrent (not-happens-before) use
> > the same protecting lock
> >
> > * a location may be read by a segment S, using a protecting lock,
> > only if all reads and all writes to it which are concurrent
> > (not happens-before) use the same protecting lock
> >
>
>
> I could not express it better! I'll borrow this description for the wiki
> :)
>
>
> It also helps to see what this machine can not do.
> Good examples are test38 and test40 which differ only by calls to sleep().
>
> test38 is not handled by this machine, while test40 works fine.
Ehmm.
The last bullet should probably be 'a location may be *written* by a segment
S'
And actually it is not correct. For test38 this bullet is true, but we still
have a false positive.
Same for reads and test28.
--kcc
|
|
From: Konstantin S. <kon...@gm...> - 2008-01-30 02:06:08
|
Few observations regarding MSMProp1:
When in exclusive state (i.e. state is Read or Write and #SS == 1) and we
access the memory from the same thread, HB(SS,currS) is always true.
This means that each access that leaves us in the same thread rewrites the
shadow value with SS={currS}; LS=currLS.
In other words, while in exclusive state we do not track the history, but
only store information about the last access.
This helps us to avoid false positives on tests like test{37,43,44,45},
while still reporting a (true) race on test10.
Both MSMProp1 and MSMHelgrind have false negative on test46:
//
// First: Second:
// 1. write
// 2. MU.Lock()
// 3. write
// 4. MU.Unlock() (sleep)
// a. MU.Lock()
// b. write
// c. MU.Unlock();
With MSMProp1 we no longer need to scan all memory at thread_join.
We, however, still have this scan-all-memory in shadow_mem_make_NoAccess.
The comment says:
7. Modify all shadow words, by removing ToDelete from the lockset
of all ShM and ShR states. Note this involves a complete scan
over map_shmem, which is very expensive...
Do we really need this?
If we don't do that and if a new lock gets allocated at the same memory
location we may miss some very weird race.
Any other reason for doing this?
Thanks,
--kcc
On Jan 24, 2008 12:49 PM, Konstantin Serebryany <
kon...@gm...> wrote:
>
>
> On Jan 24, 2008 11:22 PM, Konstantin Serebryany <
> kon...@gm...> wrote:
>
> > I see you changed the "Race if ..." condition in MSMProp1 from "LS={}"
> > > to "LS={} and #SS > 1", yes?
> > >
> >
> > Yep. That's sort of obvious. There is no race in exclusive state.
> >
> >
> > >
> > > I tried to summarise the MSMProp1 state machine, so as to get a
> > > clearer idea of what behaviour it allows and does not allow. But
> > > really I am guessing. Can you fix/refine this summary? You must
> > > have some intuition of what the state machine allows/disallows.
> > >
> > > J
> > >
> > >
> > > * define a "synchronization event" as any of the following:
> > > semaphore post-waits
> > > condition variable signal-wait when the waiter blocks
> > > thread creation
> > > thread joinage
> > > a barrier
> > >
> >
> > There are several primary "synchronization events":
> > semaphore post-waits
> > condition variable signal-wait when the waiter blocks or the
> > while-wait loop is annotated
> > thread creation/joinage
> >
> > and various possible secondary (defined via primary) events:
> > message queue (aka PCQ), defined through semaphore.
> > barrier, defined through condition variable
> >
> >
> >
> > > * synchronisation events partition the execution of a threaded program
> > > into a set of segments which have a partial ordering, called the
> > > the "happens-before" ordering.
> > >
> > > * a location may be read by a segment S, without a protecting lock,
> > > only if all writes to it happened-before S
> > >
> > > * a location may be written by a segment S, without a protecting
> > > lock, only if all reads and all writes to it happened-before S
> > >
> > > * a location may be read by a segment S, using a protecting lock, only
> > > if all writes to it which are concurrent (not-happens-before) use
> > > the same protecting lock
> > >
> > > * a location may be read by a segment S, using a protecting lock,
> > > only if all reads and all writes to it which are concurrent
> > > (not happens-before) use the same protecting lock
> > >
> >
> >
> > I could not express it better! I'll borrow this description for the wiki
> > :)
> >
> >
> > It also helps to see what this machine can not do.
> > Good examples are test38 and test40 which differ only by calls to
> > sleep().
> > test38 is not handled by this machine, while test40 works fine.
>
>
>
> Ehmm.
> The last bullet should probably be 'a location may be *written* by a
> segment S'
> And actually it is not correct. For test38 this bullet is true, but we
> still have a false positive.
> Same for reads and test28.
>
>
> --kcc
>
|
|
From: Julian S. <js...@ac...> - 2008-01-30 12:14:34
|
On Wednesday 30 January 2008 03:05, Konstantin Serebryany wrote:
> Few observations regarding MSMProp1:
>
> When in exclusive state (i.e. state is Read or Write and #SS == 1) and we
> access the memory from the same thread, HB(SS,currS) is always true.
> This means that each access that leaves us in the same thread rewrites the
> shadow value with SS={currS}; LS=currLS.
> In other words, while in exclusive state we do not track the history, but
> only store information about the last access.
> This helps us to avoid false positives on tests like test{37,43,44,45},
> while still reporting a (true) race on test10.
My impression of MSMProp1 is that it is like a pure happens-before
detector, except that it deals with locks using locksets rather than
by generating happens-before edges in the graph.
Indeed, I wonder if it is possible to take the formal description of
MSMProp1 and from it derive the formal description of a pure happens-
before detector I posted earlier in this thread (Wed Jan 23 14:30:21 2008)
by (1) removing all locksets from the MSMProp1 description, and
(2) adding happens-before edges for lock/unlock events. It would
be interesting to try.
> Both MSMProp1 and MSMHelgrind have false negative on test46:
> //
> // First: Second:
> // 1. write
> // 2. MU.Lock()
> // 3. write
> // 4. MU.Unlock() (sleep)
> // a. MU.Lock()
> // b. write
> // c. MU.Unlock();
I don't think you can have zero false positives and zero false
negatives at the same time (holy grail ...), because all these state
machines are different approximations to something which is NP-complete
(and therefore which we cannot compute exactly).
> With MSMProp1 we no longer need to scan all memory at thread_join.
Good.
> We, however, still have this scan-all-memory in shadow_mem_make_NoAccess.
> The comment says:
> 7. Modify all shadow words, by removing ToDelete from the lockset
> of all ShM and ShR states. Note this involves a complete scan
> over map_shmem, which is very expensive...
>
> Do we really need this?
> If we don't do that and if a new lock gets allocated at the same memory
> location we may miss some very weird race.
> Any other reason for doing this?
I only did it to be clean / safe. Otherwise I think if you deallocate
a lock and allocate a new one at the same address, very strange things
will happen.
Is it really needed? I don't know. But I don't know how to prove/argue
that it is not needed. Alternative question is, is it possible to do
this cheaply/incrementally?
There is also a different problem for the scaling of Helgrind to
very large programs: the Lock structures are freed when locks are
destroyed, but Segment and Thread are never freed. So a program
which generates a very large number of Segments, or Threads, will
eventually cause Helgrind to run out of memory. I can see that
Helgrind's storage management strategy will need to be redesigned
at some future point.
Are you intending to make a revised MSMProp1 patch that can deal
with larger programs? I would like to compare it to MSMHelgrind
on OpenOffice and Firefox.
J
|
|
From: Konstantin S. <kon...@gm...> - 2008-01-30 18:20:30
|
> > > My impression of MSMProp1 is that it is like a pure happens-before > detector, except that it deals with locks using locksets rather than > by generating happens-before edges in the graph. I have the same impression :) > > > Indeed, I wonder if it is possible to take the formal description of > MSMProp1 and from it derive the formal description of a pure happens- > before detector I posted earlier in this thread (Wed Jan 23 14:30:21 2008) > by (1) removing all locksets from the MSMProp1 description, and > (2) adding happens-before edges for lock/unlock events. It would > be interesting to try. I shall try this... But after all we have exp-drd. The problem with pure HB is (iiuc) that it has too many false negatives (not mentioning performance and scalability): // First: Second: // 1. write // 2. MU.Lock() // 3. MU.Unlock() (sleep) // a. MU.Lock() // b. MU.Unlock(); // c. write Pure HB will not see a race between the writes because you have HB edge between Unlock() and Lock(). Both MSMHelgrind and MSMProp1 detect this race (test47), while exp-drd does not. > I don't think you can have zero false positives and zero false > negatives at the same time (holy grail ...), because all these state > machines are different approximations to something which is NP-complete > (and therefore which we cannot compute exactly). Oh, sure. I think that it makes sense to have several different machines in helgrind so that a user can choose one that fits his application best. For example, if an application never creates/join threads (except for startup/shutdown) and does not use semahpores, cond vars, message queues, etc, Eraser will be enough. If on contrary the application never use locks (but uses barriers, semaphores, message queues, etc) locksets are useless. The applications I try to analyze use both locks and HB stuff, so I need both. > > We, however, still have this scan-all-memory in > shadow_mem_make_NoAccess. > > The comment says: > > 7. Modify all shadow words, by removing ToDelete from the lockset > > of all ShM and ShR states. Note this involves a complete scan > > over map_shmem, which is very expensive... > > > > Do we really need this? > > If we don't do that and if a new lock gets allocated at the same memory > > location we may miss some very weird race. > > Any other reason for doing this? > > I only did it to be clean / safe. Otherwise I think if you deallocate > a lock and allocate a new one at the same address, very strange things > will happen. > > Is it really needed? I don't know. But I don't know how to prove/argue > that it is not needed. Alternative question is, is it possible to do > this cheaply/incrementally? I thought about using unique lock id in locksets instead of Lock*. It leads to a problem of mapping from the id to Lock* though... But we only need this while reporting a race. > There is also a different problem for the scaling of Helgrind to > very large programs: the Lock structures are freed when locks are > destroyed, but Segment and Thread are never freed. So a program > which generates a very large number of Segments, or Threads, will > eventually cause Helgrind to run out of memory. I can see that > Helgrind's storage management strategy will need to be redesigned > at some future point. Yea... Perhaps we could garbage-collect old segments and if some SVal still refers to a deleted one, behave conservatively. Not sure about threads... If a program creates more than 1M threads, we are in trouble performance-wise anyway. > Are you intending to make a revised MSMProp1 patch that can deal > with larger programs? Certainly. Right now it is ~60% slower than MSMHelgrind, but there are still things to improve there. I've been trying both machines on some of my applications. MSMProp1 reports ~30% more races. Those extra races look similar to test10. Most (but not all!) of them look harmless which makes me think about classifying data races somehow. See e.g. http://www.eecs.umich.edu/~nsatish/DataRace-PLDI07.html. Also with MSMProp1 I see less false positives on my apps. > I would like to compare it to MSMHelgrind on OpenOffice and Firefox. Trying to run OpenOffice and Firefox on my machine leads to this. --1052-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --1052-- si_code=1; Faulting address: 0x0; sp: 0x478CDF4 valgrind: the 'impossible' happened: Killed by fatal signal ==1052== at 0x3801979C: vgPlain_strlen (m_libcbase.c:232) ==1052== by 0x38010941: evh__pre_mem_read_asciiz (hg_main.c:5763) ==1052== by 0x38037D47: vgPlain_client_syscall (syswrap-main.c:850) ==1052== by 0x38035899: vgPlain_scheduler (scheduler.c:790) ==1052== by 0x38048C0D: run_a_thread_NORETURN (syswrap-linux.c:89) I have better luck with konqueror, I'll try to compare the two machines on it. Any set of *unit* tests won't be enough to evaluate new machines, but it would be very useful to make the first steps. I beg everyone interested to contribute to racecheck_unittest or similar. --kcc |
|
From: Julian S. <js...@ac...> - 2008-01-31 13:10:27
|
> > I don't think you can have zero false positives and zero false > > negatives at the same time (holy grail ...), because all these state > > machines are different approximations to something which is NP-complete > > (and therefore which we cannot compute exactly). > > Oh, sure. > I think that it makes sense to have several different machines in helgrind > so that a user can choose one that fits his application best. > For example, if an application never creates/join threads (except for > startup/shutdown) and does not use semahpores, cond vars, message queues, > etc, Eraser will be enough. Using only locks, no semaphores, cond vars, MQs, no create/join, sounds unlikely to me :-) > If on contrary the application never use locks (but uses barriers, > semaphores, message queues, etc) locksets are useless. > The applications I try to analyze use both locks and HB stuff, so I need > both. I suspect most real-world threaded apps use both locks and HB stuff. > > Is it really needed? I don't know. But I don't know how to prove/argue > > that it is not needed. Alternative question is, is it possible to do > > this cheaply/incrementally? > > I thought about using unique lock id in locksets instead of Lock*. > It leads to a problem of mapping from the id to Lock* though... But we only > need this while reporting a race. Interesting possibility. > Yea... Perhaps we could garbage-collect old segments and if some SVal > still refers to a deleted one, behave conservatively. I think we can import some ideas from the world of garbage collection. Basically scan through shadow memory at a rate proportional to the rate of (simulated) instruction execution (rate may be selected adaptively). * some set S of segments are considered for deletion * once the scan starts, we guarantee to not create any more references to any segment in S * during the scan, if any SVal is seen to refer to an element of S, then that element is removed from S At the end of a scan, therefore, it is safe to delete the remaining segments in S. The scan then starts again. This makes it possible to recover memory from unreferenced segments at some constant cost per simulated instruction. Or something like that. > Certainly. Right now it is ~60% slower than MSMHelgrind, but there are > still things to improve there. I would say, first figure out if MSMProp1 is a SM that will work well for you, in terms of false positive/negative behaviour. If yes then there are probably many tricks to improve performance. > I've been trying both machines on some of my applications. MSMProp1 reports > ~30% more races. > Those extra races look similar to test10. Most (but not all!) of them look > harmless which makes me think about classifying data races somehow. > See e.g. http://www.eecs.umich.edu/~nsatish/DataRace-PLDI07.html. Yes. Also the Racetrack paper (Rodehoffer et al) has some useful ideas about classification. > Also with MSMProp1 I see less false positives on my apps. That sounds good, but I did not exactly understand .. you said above that MSMProp1 reports ~30% more races. So are you saying that the 30% more are real races that MSMHelgrind missed? Can you clarify? > Trying to run OpenOffice and Firefox on my machine leads to this. > > --1052-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - > exiting > --1052-- si_code=1; Faulting address: 0x0; sp: 0x478CDF4 > > valgrind: the 'impossible' happened: > Killed by fatal signal > ==1052== at 0x3801979C: vgPlain_strlen (m_libcbase.c:232) > ==1052== by 0x38010941: evh__pre_mem_read_asciiz (hg_main.c:5763) > ==1052== by 0x38037D47: vgPlain_client_syscall (syswrap-main.c:850) > ==1052== by 0x38035899: vgPlain_scheduler (scheduler.c:790) > ==1052== by 0x38048C0D: run_a_thread_NORETURN (syswrap-linux.c:89) Hmm. I can't imagine how that happened. If you send a patch against the svn trunk, I can try to reproduce and chase the problem for you. > I have better luck with konqueror, I'll try to compare the two machines on > it. konqueror is not very threaded; I think this will not tell you anything useful. J |
|
From: Konstantin S. <kon...@gm...> - 2008-01-31 19:18:54
|
> > > I suspect most real-world threaded apps use both locks and HB stuff. Agree. But still the ratio between locks and HB could vary a lot. Also the users may have different preferences regarding false-positives vs false-nagatives and between speed and accuracy. > > Certainly. Right now it is ~60% slower than MSMHelgrind, but there are > > still things to improve there. > > I would say, first figure out if MSMProp1 is a SM that will work well > for you, in terms of false positive/negative behaviour. If yes then > there are probably many tricks to improve performance. It's hard. With slower machine more tests fail due to timeouts. So, without making the machine fast I can't really evaluate it. This is why I am so much interested in unit tests. > > > > Also with MSMProp1 I see less false positives on my apps. > > That sounds good, but I did not exactly understand .. you said above > that MSMProp1 reports ~30% more races. So are you saying that the 30% > more are real races that MSMHelgrind missed? Can you clarify? Sorry. The number of false positives *decreased slightly*, mostly due to correct handling of cond vars and barriers. The number of true positives *increased significantly*, mostly due to cases like test10. Overall number of reports increased with MSMProp1. This is more like an observation than a precise statistic. I am trying to collect real numbers and test cases. > > > > Trying to run OpenOffice and Firefox on my machine leads to this. > > > > --1052-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 > (SIGSEGV) - > > exiting > > --1052-- si_code=1; Faulting address: 0x0; sp: 0x478CDF4 > > > > valgrind: the 'impossible' happened: > > Killed by fatal signal > > ==1052== at 0x3801979C: vgPlain_strlen (m_libcbase.c:232) > > ==1052== by 0x38010941: evh__pre_mem_read_asciiz (hg_main.c:5763) > > ==1052== by 0x38037D47: vgPlain_client_syscall (syswrap-main.c:850) > > ==1052== by 0x38035899: vgPlain_scheduler (scheduler.c:790) > > ==1052== by 0x38048C0D: run_a_thread_NORETURN (syswrap-linux.c:89) > > Hmm. I can't imagine how that happened. If you send a patch against > the svn trunk, I can try to reproduce and chase the problem for you. This is unmodified version from svn trunk. Happens on both 32- and 64-bit. Looks like a common issue on my system, some other programs (e.g. acroread) fail as well. Is there anything I can do to help reproduce it? > > konqueror is not very threaded; I think this will not tell you anything > useful. > Yep, I see this already. :( |
|
From: Julian S. <js...@ac...> - 2008-01-31 20:27:33
|
> > > Trying to run OpenOffice and Firefox on my machine leads to this. > > > > > > --1052-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 > > > > (SIGSEGV) - > > > > > exiting > > > --1052-- si_code=1; Faulting address: 0x0; sp: 0x478CDF4 > > > > > > valgrind: the 'impossible' happened: > > > Killed by fatal signal > > > ==1052== at 0x3801979C: vgPlain_strlen (m_libcbase.c:232) > > > ==1052== by 0x38010941: evh__pre_mem_read_asciiz (hg_main.c:5763) > > > ==1052== by 0x38037D47: vgPlain_client_syscall (syswrap-main.c:850) > > > ==1052== by 0x38035899: vgPlain_scheduler (scheduler.c:790) > > > ==1052== by 0x38048C0D: run_a_thread_NORETURN (syswrap-linux.c:89) > > > > Hmm. I can't imagine how that happened. If you send a patch against > > the svn trunk, I can try to reproduce and chase the problem for you. > > This is unmodified version from svn trunk. Happens on both 32- and 64-bit. > Looks like a common issue on my system, some other programs (e.g. acroread) > fail as well. I can't reproduce it, using acroread on openSUSE 10.2 (64-bit) and --tool=memcheck or --tool=helgrind. Can you run with --trace-syscalls=yes and send the last few lines of the log? J |