Re: [Valgrind-developers] Re: Making Valgrind Core Framework multithread-safe?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Monday 26 May 2003 12:05, Nicholas Nethercote wrote:
> On Mon, 26 May 2003, Josef Weidendorfer wrote:
> > currently I'm thinking a little bit of what would be needed to allow
> > applications run under Valgrind to use processors in parallel. The main
> > goal would be to speed up cache simulation for multithreaded
> > applications, more specially first to let OpenMP apps (number crunshing)
> > run simultaneously. I'm not at all convinced if there will be any
> > benefit/speedup at all on multiple processors because of a possible need
> > for additional fine-grained communication among the threads.
> > [...]

Nick, Jeremy, Adam,

thanks a lot for the responses.
As the main goal of my thinking was (for now) about speeding up cache 
simulation, I trashed the idea of make V threads out of application threads, 
because of the synchronisation issues in the skins.

A better idea would be to separate event handling (e.g. memory access, 
trackable valgrind events, ...) into another process(es), forked off at 
valgrind startup.
By using a ring buffer shared to the event handling process, communication 
should be really fast, and the handler process can be run in parallel on a 
2-processor machine (or on a P4 with hyperthreading - here, AFAIK busy 
polling would be a no-no, but there are workarounds for this problem?).
I think this could be a second general "split" approach, almost orthogonal to 
core/skin splitting: instead of calling a event handler of the skin, the 
event could be put into the ring buffer.
We even could make the communication bidirectional by using a second ring 
buffer, and if a event handler has to run synchroniously, block for an answer 
from the event handler process.

The best would be to add the ring buffer communication in a transparent way to 
the event handler functions, much like RPC. This way, we could make it a 
runtime switch to use the event handler process or not.

Advantages:
* speedup on 2-processor machines / P4 with hyperthreading
* normal use of GDB for the event handler process, eases development of 
handlers in contrast to skin development (the valgrind process will block 
because the ring buffer is full)
* the event handler process can be valgrinded (!)
* if the communication to the handler process is unidirectional, the events 
can be dumped directly to a file, and the event handler can run afterwards, 
using the stored events. Of course, this often will not be practical because 
of the huge amount of event data. But perhaps some compression could be done 
here.
* the event handler can be a GUI application (allowing e.g. for a real trace 
visualisation, not only profile).

Fortunately, for cache simulation, no results have to be feeded back from 
cache simulation to valgrind runtime. For cachegrind, almost all skin actions 
(e.g. BBCC allocation, cache simulation, trace dumping) could be separated 
into the event handler process.
I could imagine that even memcheck could be splitted this way for most events, 
as error reporting can be done asynchroniously.

What do you think about this idea?

Josef