|
From: Bart V. A. <bva...@ac...> - 2012-06-17 17:34:06
|
On 06/17/12 16:47, Philippe Waroquiers wrote: > On Sun, 2012-06-17 at 08:14 -0700, John Reiser wrote: >> On 06/17/2012, Philippe Waroquiers wrote: >> >>> I am however not at all convinced that only protecting the STORE8 >>> will avoid false positive (we could maybe tolerate a small rate >>> of false negative when running in parallel). >>> Even if protecting the STORE8 is good enough, then any parallel >>> algorithm which works a lot with single bytes might be slowed >>> down by a factor 10 or similar. >> >> Use LoadLocked and StoreConditional instructions, as on MIPS. >> These sense the state of the cache line, and implement a "greedy" >> solution. Your code provides the fixup when greedy fails >> (usually: try again, after re-fetch and re-modify.) > > Are the LL/SC instructions not suffering from the same performance > degradation as the x86/amd64 atomic instructions ? > (cfr the unacceptable performance degration in memcheck when > adding one atomic instruction in the STORE helperc). > > I have very bad knowledge of all these atomic things and similar, > but I am guessing that there is no order of magnitude difference > of performance between these approaches. > From what I understood, on x86/amd64, we would need such operations > to be faster by two order of magnitude to be usable for memcheck. It looks like a good idea to me to read more about existing multithreaded dynamic analysis tools first. An example: Paul Sack e.a., Accurate and efficient filtering for the Intel thread checker race detector, ASID '06 Proceedings of the 1st workshop on Architectural and system support for improving software dependability, Pages 34 - 41, 2006. Bart. |