|
From: Josef W. <Jos...@gm...> - 2004-06-02 13:35:43
|
[forgot the mailing list] Hi Nick, On Wednesday 02 June 2004 14:45, Nicholas Nethercote wrote: > Hi all (and particularly Josef), > > I've been looking at Cachegrind, and realised that the JIFZ handling is > broken. > > Ages ago (September 2000) I put in some special handling for JIFZ, which > is used for REP-prefixed instructions. The idea was meant to be this: > since the REP prefix allows one instruction to do many accesses, the best > way to model it in the cache simulation is as if its execution causes 1 > I-cache access, but N D-cache accesses (a "1*I+N*D" model). Who knows if > modern machines actually do this, but it seemed a reasonable idea. I thought/think that this model is OK, and copied your implementation... > However, I just realised the way I implemented it was wrong. Here's what > the instrumented currently added looks like: > > > 0x810120D7: rep stosl > > 17: CALLM_So > 18: MOVL $0x0, t12 > 19: PUSHL t12 > 20: CALLMo $0xC6 (-rD) > 21: POPL t12 > 22: CALLM_Eo > 23: SHLL $0x2, t12 > 24: GETL %ECX, t14 > <insert I-cache access here> > 25: JIFZL t14, $0x810120D9 > 26: DECL t14 > 27: PUTL t14, %ECX > 28: GETL %EAX, t16 > 29: GETL %EDI, t18 > 30: STL t16, (t18) > 31: ADDL t12, t18 > 32: PUTL t18, %EDI > <insert D-cache access here> > 33: JMPo $0x810120D7 > > > I thought that putting the I-cache access before the JIFZ meant it would > only be done once, whereas the D-cache access would be done N times. I > now realise that is wrong; both will be done N times (an "N*I+N*D" > model). I can't see how the 1*I+N*D model can be done without making big > changes to the structure of basic blocks in the presence of REP prefixes. Isn't it actually "(N+1)*I+N*D" currently, i.e. always 1 instruction fetch more than data fetches? To correct this, a way would be to have 2 basic blocks for 1 instruction: One with the instruction fetch, and 1 in the conditional loop with the data fetch. Am I correct here? As any instruction with a REP prefix has a size >1 byte, could we artifically introduce 2 basic blocks? In the example above this would be one instrumented block for 0x810120d7 (with the call to the instruction fetch), and one for 0x810120d8 (with the data fetch). The problem here is of course that one can not switch to the real processor at this point. So another idea: A flag to store if the instruction fetch was already done. Also quite difficult and errorprone: when to reset the flag? Another idea: Correct the error afterwards: subtract the number of data accesses from the number of instruction fetches... > In which case, maybe the N*I/N*D model is ok. The easy solution is to > accept this, get rid of the special case, and just do both parts at the > end. This makes is really easy to do, just removes about 110 lines of > code. (And the behaviour would be identical to what we currently have > anyway). Yes, perhaps that's the easiest: I don't think this JIFZ special case makes any big differences in the result anyway. Have you done any experiments regarding REP prefixes and the results from real hardware counters for "instructions retired"? > Then there's one extra complication -- because the JIFZ can exit the basic > block, putting the instrumentation at the end means that the last > execution may not be simulated (this is also the case with the current > method). A more precise approach would be to put the instrumentation > before the JIFZ, although this would take effort. (A similar thing is > true for the jecxz instruction, which is translated using JIFZ.) I don't see this problem. When CX==0, there is nothing to do (jumping out of the basic block). Why are we losing here the last execution? Or does this problem appear if we get rid of the special casing? Josef > Anyone have any comments about all this? > > N > > > ------------------------------------------------------- > This SF.Net email is sponsored by the new InstallShield X. > From Windows to Linux, servers to mobile, InstallShield X is the one > installation-authoring solution that does it all. Learn more and > evaluate today! http://www.installshield.com/Dev2Dev/0504 > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |