|
From: Josef W. <Jos...@gm...> - 2012-10-10 18:57:46
|
Hi Julian, thanks for the comments! Am 08.10.2012 11:14, schrieb Julian Seward: > > All looks fine to me. This is an implementation-only change, yes? Yes. At least, it is expected to be. That's actually a tricky problem. When the cachegrind binary changes, also a lot of client addresses will change, and there can be new/other cache conflicts misses. Hmm.. there is currently a bug in the second patch. The common case I actually wanted to check for is "Is this Ir completely in a cache line accessed by the previous Ir?". The test "is it sequentially after the previous?" is not the same, as the new Ir could be aligned at a new cache line, which needs a call into the simulator... but this is easy to fix. Using a test case which only has a few hot regions did not show this issue. However with "tinycc" it became obvious. > The insn-read and insn-miss numbers that the user sees will not > change? > > My only comment re the patches is that the names Ir, IrS, IrG > might be a bit confusing, especially the first. Maybe IrS -> IrSeq, Using longer names is good. I need to change this to "IrSame" (same cache line, see above). > IrG -> IrGen, and Ir -> IrNoX (No X == no crossing of cache line) ? Sounds good. Actually, I did not do "Ir -> IrNoX" (or similar) to reduce patch size. BTW, I am not really comfortable with the IrS (also a fixed one) patch yet. An IrS does not need to call into the cache simulator, but there still is a dirty helper call. If I could get rid of it, it should be faster, and there would not be this explosion of combined events to handle. As far as I understand, for any Ir which is not the first in a SB or after a side exit, the number of executions must be the same as for the previous Ir. So a post-processing should be able to correct the counts. Josef > > J > > > On Saturday, October 06, 2012, Josef Weidendorfer wrote: >> Hi, >> >> I just committed a commit to get rid of the huge macro in cachegrind >> (no functional/performance changes). We already discussed that quite a >> while ago, and I thought it is the right time to visit the topic. >> The second patch simplifies the simulator by using memory block numbers. >> It's more or less cosmetic, but it helps a bit. >> I will change Callgrind in the same way. >> >> But this mail is meant for two patches on top, speeding up >> Cachegrind by 30% on average, as attached. >> Both introduce special cases of the Ir event, and each improves >> Cachegrind by around 15%. >> >> * Most Ir's do not cross cache lines, so one can move work to >> instrumentation time. It changes the semantic of Ir event to be the >> special, but common case, and adds "IrG" as fallback - G for "general >> case". (IrG.patch). >> >> * Of the Ir's touching only one cache lines, quite some go into >> the same line, and do not change cache state. Thus, the simulation >> call can be avoided. I call that IrS, S for "sequential access". >> As it makes sense to combine both Ir and IrS with Dr/Dw etc, this >> adds quite some handlers for combined events, but a few handlers >> could be removed (IrS.patch). >> >> Then I have a 3rd one reducing parameters for dirty helpers, which >> seems to be beneficial for x86 (32bit). For x86_64 it does not matter, >> but it always reduces the generated code size (pars.patch). >> >> >> What do you think? >> >> Josef > > |