|
From: Patrick J. L. <lop...@gm...> - 2012-12-31 17:42:52
|
On Mon, Dec 31, 2012 at 4:48 AM, Philippe Waroquiers <phi...@sk...> wrote: > > It is even not ok to use an atomic instruction : first tests have > shown that having one atomic instruction on this path makes a > multi-threaded Valgrind slower than a serialised Valgrind. You mean a multi-threaded Valgrind is slower even when running multiple threads? Wow. In that case, there is only one way to handle this: Take advantage of the fact that the vast majority of memory accesses (i.e. on the stack) are per-thread. And others are "owned" at any point in time. So I think you need to introduce some concept of "memory pool", "memory pool owner", and "transfer of ownership". So each thread would tend to own the pool corresponding to its own stack (most of the time). A thread locking a mutex then accessing a bunch of data will tend to transfer ownership of that data to that thread. And so on. This will still require the use of atomic instructions at least, if not mutexes. (Mutexes have the advantage of implicitly handling the "ownership" concept...) But atomic instructions, and perhaps even mutexes, should be reasonably fast as long as they do not involve any contention between cores. The trick here will be to parallelize access to the relevant data structures (i.e. the V bits for each pool). Just my $0.02, which is about what it is worth. Good luck :-) - Pat |