|
From: Adam G. <ar...@cy...> - 2003-05-27 09:56:10
|
At 13:57 26/05/2003 +0200, Josef Weidendorfer wrote: >On Monday 26 May 2003 12:05, Nicholas Nethercote wrote: >> [...] >> Julian and I have discussed this, and AFAWCT the killer point is shadow >> memory for Memcheck -- each time memory is written shadow memory is >> written a few instructions before. The danger is that if two threads are >> racing on a memory word, you could get a thread switch in between the >> shadow write and the real write, and then your shadow memory would not >> match your real memory. > >Sidenote (perhaps there's a misunderstanding): I thought about 2 threads >running on 2 processors in parallel. There does not have any thread switching >for races to occur, as without locking, interleaved memory accesses from the >two threads/processors can happen in any order. you would need to guarantee that whatever instructions the program uses as locking primitives are atomic... probably the only way to do that is to insist they come from valgrind's pthread library etc. anything else where valgrind shows misbehaviour is probably a bug in the program (ie not using locking...) >As memory is a shared resource, we must use locking (atomic access to shadow >and real memory). How fine should locking be done? One lock for all memory >accesses kills performance. Perhaps one lock for each allocated area / or >every 64 bytes of memory? This would need very fast user mode mutexes (e.g. >Futexes from Linux 2.5). >But all this is about problems in some specific skin. I currently wonder more >about problems with the generic valgrind core (binary translation engine and >runtime environment). > >> The problem would be avoided if we could guarantee that thread switches >> only occur between basic blocks. Actually, now that I think about it, >> that shouldn't be a problem with the current implementation since it does >> thread scheduling itself, and never does thread switches in the middle of >> a basic block. Hmm. >> >> As for getting rid of Valgrind's threads implementation, the best idea so >> far is to intercept the clone() syscall, and have Valgrind schedule >> threads itself but not do all the pthreads ops itself. This sounds > >But with the szenario of kernel level threads, Valgrind can't schedule them ?! you really should have a look at the stuff in my WINE support patches... or the 1.9.6-wine tarball. I have modified valgrind to support clone(), but only one (kernel level) thread runs at once. what happens is the valgrind scheduler loop spins in each cloned thread, but only one is active at once (so there is no need for locking). signals are used to tell the cloned threads when to wake up. clone() (the library call) has been re-implemented to use a variant of valgrind's pthread_create() which guarantees a kernel thread is created. I suspect that there will be so much locking overhead imposed by multiple threads running simultaneously that it is quicker to just do a 'global' lock. the real killer will be memory accesses, particularly thread's stacks. You could possibly do something with mprotect(), but that would require the threads to have separate memory maps (and then valgrind fakes up the 'shared' memory map they expect). ugh. multi-processor machines use a LOT of hardware to cope with these issues (cache snooping etc etc). Seeya, Adam -- Real Programmers don't comment their code. If it was hard to write, it should be hard to read, and even harder to modify. These are all my own opinions. |