From: David B. <dbr...@st...> - 2003-04-30 00:59:56
|
Hi! Is it possible to use valgrind to get a total number of x86 assembly instructions issued? Specifically, I'm trying to see how many assembly instructions are issued for a given input into a crypto library. I've looked and used profiling tools such as vtune and got total number of instructions retired, but that's not quite what I'm interested in. I want the total number of assembly instructions given to the processor... If not valgrind, does anyone have any suggestions? I can't find any good execution tracing tools for x86 and linux.... thanks for any help! david |
From: Nicholas N. <nj...@ca...> - 2003-04-30 07:29:16
|
On 29 Apr 2003, David Brumley wrote: > Is it possible to use valgrind to get a total number of x86 assembly > instructions issued? Specifically, I'm trying to see how many assembly > instructions are issued for a given input into a crypto library. I've > looked and used profiling tools such as vtune and got total number of > instructions retired, but that's not quite what I'm interested in. I > want the total number of assembly instructions given to the processor... Lackey and Cachegrind (Valgrind skins, use --skin=lackey or --skin=cachegrind in any version later than 1.0.4) both measure x86 instructions retired (I think? I guess so) but not the number issued. > If not valgrind, does anyone have any suggestions? I can't find any > good execution tracing tools for x86 and linux.... You want Rabbit: www.scl.ameslab.gov/Projects/Rabbit/. I've used it lots, it's very good, gives you access to all your machine's performance counters. I have an Athlon, I know it gives you both instructions issued and retired, I think the Pentium counters probably give you those two. Or if you have a P4, Brink+Abyss: www.eg.bucknell.edu/~bsprunt/emon/brink_abyss/brink_abyss.shtm. I haven't tried it, but it looks very powerful. N |
From: Vimal R. <vim...@ya...> - 2003-04-30 18:00:43
|
I used Sun's shade. It's pretty good and also has the option where you can choose to trace annulled (squashed) instructions. Which I guess would be the same as the number of instructions issued (?). The standard download also comes with a program (icount.c) that you can easily compile and run. Hope that helps. Thanks, Vimal --- Nicholas Nethercote <nj...@ca...> wrote: > On 29 Apr 2003, David Brumley wrote: > > > Is it possible to use valgrind to get a total number of x86 assembly > > instructions issued? Specifically, I'm trying to see how many assembly > > instructions are issued for a given input into a crypto library. I've > > looked and used profiling tools such as vtune and got total number of > > instructions retired, but that's not quite what I'm interested in. I > > want the total number of assembly instructions given to the processor... > > Lackey and Cachegrind (Valgrind skins, use --skin=lackey or > --skin=cachegrind in any version later than 1.0.4) both measure x86 > instructions retired (I think? I guess so) but not the number issued. > > > If not valgrind, does anyone have any suggestions? I can't find any > > good execution tracing tools for x86 and linux.... > > You want Rabbit: www.scl.ameslab.gov/Projects/Rabbit/. I've used it lots, > it's very good, gives you access to all your machine's performance > counters. I have an Athlon, I know it gives you both instructions issued > and retired, I think the Pentium counters probably give you those two. > > Or if you have a P4, Brink+Abyss: > www.eg.bucknell.edu/~bsprunt/emon/brink_abyss/brink_abyss.shtm. I haven't > tried it, but it looks very powerful. > > N > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users ===== Vimal Reddy Graduate Student, ECE North Carolina State University Raleigh, NC Ph: (919) 836-8254 Web: http://www4.ncsu.edu/~vkreddy __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com |
From: Vimal R. <vim...@ya...> - 2003-04-30 20:56:27
|
I'm sorry, I did'nt pay attention to the x86 part of the message. Shade only works for Solaris. Sorry abt that. -vimal --- Vimal Reddy <vim...@ya...> wrote: > I used Sun's shade. It's pretty good and also has the option where you can > choose to trace annulled (squashed) instructions. Which I guess would be the > same as the number of instructions issued (?). > The standard download also comes with a program (icount.c) that you can > easily > compile and run. > > Hope that helps. > > Thanks, > Vimal > > > --- Nicholas Nethercote <nj...@ca...> wrote: > > On 29 Apr 2003, David Brumley wrote: > > > > > Is it possible to use valgrind to get a total number of x86 assembly > > > instructions issued? Specifically, I'm trying to see how many assembly > > > instructions are issued for a given input into a crypto library. I've > > > looked and used profiling tools such as vtune and got total number of > > > instructions retired, but that's not quite what I'm interested in. I > > > want the total number of assembly instructions given to the processor... > > > > Lackey and Cachegrind (Valgrind skins, use --skin=lackey or > > --skin=cachegrind in any version later than 1.0.4) both measure x86 > > instructions retired (I think? I guess so) but not the number issued. > > > > > If not valgrind, does anyone have any suggestions? I can't find any > > > good execution tracing tools for x86 and linux.... > > > > You want Rabbit: www.scl.ameslab.gov/Projects/Rabbit/. I've used it lots, > > it's very good, gives you access to all your machine's performance > > counters. I have an Athlon, I know it gives you both instructions issued > > and retired, I think the Pentium counters probably give you those two. > > > > Or if you have a P4, Brink+Abyss: > > www.eg.bucknell.edu/~bsprunt/emon/brink_abyss/brink_abyss.shtm. I haven't > > tried it, but it looks very powerful. > > > > N > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > Valgrind-users mailing list > > Val...@li... > > https://lists.sourceforge.net/lists/listinfo/valgrind-users > > > ===== > Vimal Reddy > Graduate Student, ECE > North Carolina State University > Raleigh, NC > Ph: (919) 836-8254 Web: http://www4.ncsu.edu/~vkreddy > > __________________________________ > Do you Yahoo!? > The New Yahoo! Search - Faster. Easier. Bingo. > http://search.yahoo.com > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users ===== Vimal Reddy Graduate Student, ECE North Carolina State University Raleigh, NC Ph: (919) 836-8254 Web: http://www4.ncsu.edu/~vkreddy __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com |
From: David B. <dbr...@st...> - 2003-04-30 18:33:34
|
> > Lackey and Cachegrind (Valgrind skins, use --skin=lackey or > --skin=cachegrind in any version later than 1.0.4) both measure x86 > instructions retired (I think? I guess so) but not the number issued. > > > If not valgrind, does anyone have any suggestions? I can't find any > > good execution tracing tools for x86 and linux.... > > You want Rabbit: www.scl.ameslab.gov/Projects/Rabbit/. I've used it lots, > it's very good, gives you access to all your machine's performance > counters. I have an Athlon, I know it gives you both instructions issued > and retired, I think the Pentium counters probably give you those two. > > Or if you have a P4, Brink+Abyss: > www.eg.bucknell.edu/~bsprunt/emon/brink_abyss/brink_abyss.shtm. I haven't > tried it, but it looks very powerful. Thanks, i was aware of those two, but the major drawback for either is the need to recompile the source. I've used vtune to get instructions retired, but that is sort of on the "opposite" end of the processor that I'm interested in. To give you an idea why i'm asking, here's a short explanation. I'm developing timing attacks against OpenSSL (http://crypto.stanford.edu/~dabo/abstracts/ssl-timing.html). The timing characteristics we need are about 1% of the total execution time (measured in cycles). I've noticed that a small change in the source, say one extra mov instruction, will change the function offsets (say program A without the extra instruction vs. program B with the extra instruction). This alignment difference can skew the execution profile by 1% or a little more, changing the timing attack characteristics (unfortunately for security people it still works). Most notably, what algorithmically should result in say a negative timing difference when a bit of the key=0 can become a positive timing difference, showing that the P4's internal optimizations such as branch predictions, etc. can influence the results. For the paper, I'd like to compare instructions issued vs. instructions retired for the two programs. On the P4, those two may not be the same for a number of weird P4 reasons (as I understand talking to P4 experts). (And yes, I've verified the change isn't due to a bug in the program by inspecting the assembly output and checking for memory problems. In the end, I'm hoping to show simpler processors aren't affected by such small changes in the source, like the Pentium 2) So, I can't recompile my two test programs easily, since any change by adding libraries or external function calls changes the attack characteristic itself, as explained above. (I'm surprised that this isn't obviated on the websites for Rabbit and the like...inserting the code to measure timing can change the timing). I've tried valgrind with --single-step=yes, and see a couple of different things that I think *may* be the instructions issued, like "translate: new" in the output, but I'm not sure. Is this instructions issued? Thanks for everyone's help! -david |
From: Nicholas N. <nj...@ca...> - 2003-04-30 19:31:31
|
On 30 Apr 2003, David Brumley wrote: > > You want Rabbit: www.scl.ameslab.gov/Projects/Rabbit/. > > Thanks, i was aware of those two, but the major drawback for either is > the need to recompile the source. You don't need to recompile the source with Rabbit. You can insert calls to Rabbit's library if you want to time parts of the program, but you can just run the program 'rabbit' (which comes from rabbit.c). By default it uses sampling to get estimates of all the events your processor can measure, but if you use the --events option it only measures a small number (on the Athlon it's 4) of events exactly. N |
From: John R. <jr...@Bi...> - 2003-04-30 20:33:06
|
David Brumley wrote: > [snip] > I've used vtune to get instructions retired, but that is sort of on the > "opposite" end of the processor that I'm interested in. [snip] > For the paper, I'd like to compare instructions > issued vs. instructions retired for the two programs. On the P4, those > two may not be the same for a number of weird P4 reasons (as I > understand talking to P4 experts). [snip] > I've tried valgrind with --single-step=yes, and see a couple of > different things that I think *may* be the instructions issued, like > "translate: new" in the output, but I'm not sure. Is this instructions > issued? All x86 implementations beginning with the Intel PentiumPro [ca.1995] have speculative, out-of-order execution, including along false conditional control paths (and even false _un_conditional control paths). Speculation is influenced by code alignment, icache and dcache hits/misses, interrupts and task switches, CPU temperature, etc. "Instructions issued" is almost meaningless for the hardware; certainly no software could ever compute it. -- John Reiser, jr...@Bi... |
From: David B. <dbr...@st...> - 2003-05-01 17:47:03
|
> > I've used vtune to get instructions retired, but that is sort of on the > > "opposite" end of the processor that I'm interested in. [snip] > > For the paper, I'd like to compare instructions > > issued vs. instructions retired for the two programs. On the P4, those > > two may not be the same for a number of weird P4 reasons (as I > > understand talking to P4 experts). [snip] > > I've tried valgrind with --single-step=yes, and see a couple of > > different things that I think *may* be the instructions issued, like > > "translate: new" in the output, but I'm not sure. Is this instructions > > issued? > > All x86 implementations beginning with the Intel PentiumPro [ca.1995] have > speculative, out-of-order execution, including along false conditional > control paths (and even false _un_conditional control paths). Speculation > is influenced by code alignment, icache and dcache hits/misses, interrupts > and task switches, CPU temperature, etc. "Instructions issued" is almost > meaningless for the hardware; certainly no software could ever compute it. Thanks, that's good information but I'm not sure it completely answers my question. What I want is the number of assembly instructions issued, then I can calculate with vtune the number of instructions retired. It would be interesting in it's own right just to compare the two...(esp. since instructions retired is uops, not assembly instructions). I was hoping valgrind already would increment an instructions issued variable before simulating the execution...this is all I really need for right now. Is this in the output of valgrind? After I get that number, i'll compare it to vtune's instructions retired (running the program by itself w/o valgrind). -david |
From: Nicholas N. <nj...@ca...> - 2003-05-02 10:34:22
|
On 1 May 2003, David Brumley wrote: > >"Instructions issued" is almost meaningless for the hardware; certainly > > no software could ever compute it. > > Thanks, that's good information but I'm not sure it completely answers > my question. What I want is the number of assembly instructions issued, Um... I think the hardware counters are the only way you're going to get instructions issued (at least, in the sense that I understand the term). Again, I think Rabbit can do the job if you have a Pentium II or III or Athlon (not sure about P4s), and you don't need to recompile your program. If you have a P4, I imagine Abyss/Brink could be used without recompiling your program by creating a simple wrapper program like Rabbit's rabbit.c, but I'm not certain. N |