Re: [Valgrind-developers] Speeding up cache simulation

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

All looks fine to me.  This is an implementation-only change, yes?  
The insn-read and insn-miss numbers that the user sees will not
change?

My only comment re the patches is that the names Ir, IrS, IrG 
might be a bit confusing, especially the first.  Maybe IrS -> IrSeq,
IrG -> IrGen, and Ir -> IrNoX (No X == no crossing of cache line) ?

J

On Saturday, October 06, 2012, Josef Weidendorfer wrote:
> Hi,
> 
> I just committed a commit to get rid of the huge macro in cachegrind
> (no functional/performance changes). We already discussed that quite a
> while ago, and I thought it is the right time to visit the topic.
> The second patch simplifies the simulator by using memory block numbers.
> It's more or less cosmetic, but it helps a bit.
> I will change Callgrind in the same way.
> 
> But this mail is meant for two patches on top, speeding up
> Cachegrind by 30% on average, as attached.
> Both introduce special cases of the Ir event, and each improves
> Cachegrind by around 15%.
> 
> * Most Ir's do not cross cache lines, so one can move work to
> instrumentation time. It changes the semantic of Ir event to be the
> special, but common case, and adds "IrG" as fallback - G for "general
> case". (IrG.patch).
> 
> * Of the Ir's touching only one cache lines, quite some go into
> the same line, and do not change cache state. Thus, the simulation
> call can be avoided. I call that IrS, S for "sequential access".
> As it makes sense to combine both Ir and IrS with Dr/Dw etc, this
> adds quite some handlers for combined events, but a few handlers
> could be removed (IrS.patch).
> 
> Then I have a 3rd one reducing parameters for dirty helpers, which
> seems to be beneficial for x86 (32bit). For x86_64 it does not matter,
> but it always reduces the generated code size (pars.patch).
> 
> 
> What do you think?
> 
> Josef