From: David B. <dbr...@st...> - 2003-04-30 18:33:34
|
> > Lackey and Cachegrind (Valgrind skins, use --skin=lackey or > --skin=cachegrind in any version later than 1.0.4) both measure x86 > instructions retired (I think? I guess so) but not the number issued. > > > If not valgrind, does anyone have any suggestions? I can't find any > > good execution tracing tools for x86 and linux.... > > You want Rabbit: www.scl.ameslab.gov/Projects/Rabbit/. I've used it lots, > it's very good, gives you access to all your machine's performance > counters. I have an Athlon, I know it gives you both instructions issued > and retired, I think the Pentium counters probably give you those two. > > Or if you have a P4, Brink+Abyss: > www.eg.bucknell.edu/~bsprunt/emon/brink_abyss/brink_abyss.shtm. I haven't > tried it, but it looks very powerful. Thanks, i was aware of those two, but the major drawback for either is the need to recompile the source. I've used vtune to get instructions retired, but that is sort of on the "opposite" end of the processor that I'm interested in. To give you an idea why i'm asking, here's a short explanation. I'm developing timing attacks against OpenSSL (http://crypto.stanford.edu/~dabo/abstracts/ssl-timing.html). The timing characteristics we need are about 1% of the total execution time (measured in cycles). I've noticed that a small change in the source, say one extra mov instruction, will change the function offsets (say program A without the extra instruction vs. program B with the extra instruction). This alignment difference can skew the execution profile by 1% or a little more, changing the timing attack characteristics (unfortunately for security people it still works). Most notably, what algorithmically should result in say a negative timing difference when a bit of the key=0 can become a positive timing difference, showing that the P4's internal optimizations such as branch predictions, etc. can influence the results. For the paper, I'd like to compare instructions issued vs. instructions retired for the two programs. On the P4, those two may not be the same for a number of weird P4 reasons (as I understand talking to P4 experts). (And yes, I've verified the change isn't due to a bug in the program by inspecting the assembly output and checking for memory problems. In the end, I'm hoping to show simpler processors aren't affected by such small changes in the source, like the Pentium 2) So, I can't recompile my two test programs easily, since any change by adding libraries or external function calls changes the attack characteristic itself, as explained above. (I'm surprised that this isn't obviated on the websites for Rabbit and the like...inserting the code to measure timing can change the timing). I've tried valgrind with --single-step=yes, and see a couple of different things that I think *may* be the instructions issued, like "translate: new" in the output, but I'm not sure. Is this instructions issued? Thanks for everyone's help! -david |