|
From: Julian S. <js...@ac...> - 2012-10-08 09:19:38
|
All looks fine to me. This is an implementation-only change, yes? The insn-read and insn-miss numbers that the user sees will not change? My only comment re the patches is that the names Ir, IrS, IrG might be a bit confusing, especially the first. Maybe IrS -> IrSeq, IrG -> IrGen, and Ir -> IrNoX (No X == no crossing of cache line) ? J On Saturday, October 06, 2012, Josef Weidendorfer wrote: > Hi, > > I just committed a commit to get rid of the huge macro in cachegrind > (no functional/performance changes). We already discussed that quite a > while ago, and I thought it is the right time to visit the topic. > The second patch simplifies the simulator by using memory block numbers. > It's more or less cosmetic, but it helps a bit. > I will change Callgrind in the same way. > > But this mail is meant for two patches on top, speeding up > Cachegrind by 30% on average, as attached. > Both introduce special cases of the Ir event, and each improves > Cachegrind by around 15%. > > * Most Ir's do not cross cache lines, so one can move work to > instrumentation time. It changes the semantic of Ir event to be the > special, but common case, and adds "IrG" as fallback - G for "general > case". (IrG.patch). > > * Of the Ir's touching only one cache lines, quite some go into > the same line, and do not change cache state. Thus, the simulation > call can be avoided. I call that IrS, S for "sequential access". > As it makes sense to combine both Ir and IrS with Dr/Dw etc, this > adds quite some handlers for combined events, but a few handlers > could be removed (IrS.patch). > > Then I have a 3rd one reducing parameters for dirty helpers, which > seems to be beneficial for x86 (32bit). For x86_64 it does not matter, > but it always reduces the generated code size (pars.patch). > > > What do you think? > > Josef |