|
From: Nicholas N. <nj...@ca...> - 2004-06-02 12:45:49
|
Hi all (and particularly Josef),
I've been looking at Cachegrind, and realised that the JIFZ handling is
broken.
Ages ago (September 2000) I put in some special handling for JIFZ, which
is used for REP-prefixed instructions. The idea was meant to be this:
since the REP prefix allows one instruction to do many accesses, the best
way to model it in the cache simulation is as if its execution causes 1
I-cache access, but N D-cache accesses (a "1*I+N*D" model). Who knows if
modern machines actually do this, but it seemed a reasonable idea.
However, I just realised the way I implemented it was wrong. Here's what
the instrumented currently added looks like:
0x810120D7: rep stosl
17: CALLM_So
18: MOVL $0x0, t12
19: PUSHL t12
20: CALLMo $0xC6 (-rD)
21: POPL t12
22: CALLM_Eo
23: SHLL $0x2, t12
24: GETL %ECX, t14
<insert I-cache access here>
25: JIFZL t14, $0x810120D9
26: DECL t14
27: PUTL t14, %ECX
28: GETL %EAX, t16
29: GETL %EDI, t18
30: STL t16, (t18)
31: ADDL t12, t18
32: PUTL t18, %EDI
<insert D-cache access here>
33: JMPo $0x810120D7
I thought that putting the I-cache access before the JIFZ meant it would
only be done once, whereas the D-cache access would be done N times. I
now realise that is wrong; both will be done N times (an "N*I+N*D"
model). I can't see how the 1*I+N*D model can be done without making big
changes to the structure of basic blocks in the presence of REP prefixes.
In which case, maybe the N*I/N*D model is ok. The easy solution is to
accept this, get rid of the special case, and just do both parts at the
end. This makes is really easy to do, just removes about 110 lines of
code. (And the behaviour would be identical to what we currently have
anyway).
Then there's one extra complication -- because the JIFZ can exit the basic
block, putting the instrumentation at the end means that the last
execution may not be simulated (this is also the case with the current
method). A more precise approach would be to put the instrumentation
before the JIFZ, although this would take effort. (A similar thing is
true for the jecxz instruction, which is translated using JIFZ.)
Anyone have any comments about all this?
N
|