|
From: Stefan O. <so...@ly...> - 2005-09-19 13:47:27
|
Hi, I need to identify the top cache-missing loads in a program written in assembly language. To this end I compiled and installed the latest version of Cachegrind. However, to me the output looks unreasonable. Some loads appear to be skipped, even though other loads in the same basic block do get executed. Maybe I'm just misunderstanding the output? Dr D1mr D2mr . . . bb0: 10 9 9 movl 0(%ebp), %esi 0 0 0 inc %esi 0 0 0 and $0xFFFFFF, %esi 0 0 0 mov %esi, 0(%ebp) 0 0 0 pushl %eax ; is the load below skipped? 0 0 0 movl 0x1E26A0(%ebp), %ecx 0 0 0 addl $1, %ecx How can Dr be zero for the "movl 0x1E26A0(%ebp), %ecx" instruction, if it did get executed, which it should have? Also, I get an error message: --18056-- warning: Pentium with 12 K micro-op instruction trace cache --18056-- Simulating a 16 KB cache with 32 B lines Why does this message appear, is it a problem, and if so what can I do about it? Thanks for your help. |
|
From: Stefan O. <so...@ly...> - 2005-09-23 08:52:13
|
Hi again list, I did not notice all replies to my question earlier, I wasn't subscribed to the list, so I only saw Nicholas' replies. I installed valgrind-2.4.1 now, and its output seems more reasonable. All loads that are executed have numbers larger than 0 in the margin. It looks to me as if valgrind-3.0.1 has a bug valgrind-2.4.1 don't. I'll see if I can construct a short version of the code to demonstrate the problem. |
|
From: Tom H. <to...@co...> - 2005-09-19 13:57:21
|
In message <Pin...@ko...>
Stefan Ottosson <so...@ly...> wrote:
> Also, I get an error message:
>
> --18056-- warning: Pentium with 12 K micro-op instruction trace cache
> --18056-- Simulating a 16 KB cache with 32 B lines
>
> Why does this message appear, is it a problem, and if so what can I do
> about it?
You can't do anything about it, other than not use a Pentium 4 to
run cachegrind - basically cachegrind is not capable of accurately
modelling the instruction cache on your machine because the cache
doesn't cache raw instructions it caches the internal micro-ops that
your CPU decodes instructions to.
So your CPU has a cache capable of hold 12000 micro-ops but cachgrind
is treating it as if it held 16000 bytes of variable length x86
instructions which is not the same thing at all. So the instruction
cache hit/miss values will not be accurate.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Nicholas N. <nj...@cs...> - 2005-09-19 14:01:43
|
On Mon, 19 Sep 2005, Stefan Ottosson wrote: > However, to me the output looks unreasonable. Some loads appear to be > skipped, even though other loads in the same basic block do get executed. > Maybe I'm just misunderstanding the output? Cachegrind annotation is only as good as the debug info present in the binary. Often it's not very exact, so that could be the problem. > Dr D1mr D2mr > > . . . bb0: > 10 9 9 movl 0(%ebp), %esi > 0 0 0 inc %esi > 0 0 0 and $0xFFFFFF, %esi > 0 0 0 mov %esi, 0(%ebp) > 0 0 0 pushl %eax > ; is the load below skipped? > 0 0 0 movl 0x1E26A0(%ebp), %ecx > 0 0 0 addl $1, %ecx It's interesting that you're annotating the asm. What command line are you assembling with? > Also, I get an error message: > > --18056-- warning: Pentium with 12 K micro-op instruction trace cache > --18056-- Simulating a 16 KB cache with 32 B lines > > Why does this message appear, is it a problem, and if so what can I do > about it? Cachegrind doesn't simulate trace caches accurately. So it just simulates a normal I-cache instead. The overall results will still be indicative of your program's cache behaviour in general, but don't expect it to match the exact results you'd get on the real hardware. Nick |
|
From: Josef W. <Jos...@gm...> - 2005-09-19 14:36:06
|
On Monday 19 September 2005 15:47, Stefan Ottosson wrote: > Dr D1mr D2mr > > . . . bb0: > 10 9 9 movl 0(%ebp), %esi > 0 0 0 inc %esi > 0 0 0 and $0xFFFFFF, %esi > 0 0 0 mov %esi, 0(%ebp) > 0 0 0 pushl %eax > ; is the load below skipped? > 0 0 0 movl 0x1E26A0(%ebp), %ecx > 0 0 0 addl $1, %ecx This is assembler source, compiled with as -g ? > How can Dr be zero for the "movl 0x1E26A0(%ebp), %ecx" instruction, if it > did get executed, which it should have? This Looks strange. What is the result if you run it with Cachegrind from VG 2.4.1? > --18056-- warning: Pentium with 12 K micro-op instruction trace cache > --18056-- Simulating a 16 KB cache with 32 B lines > > Why does this message appear, is it a problem, and if so what can I do > about it? Nothing. As Tom notes, I1mr for sure will have another value than a level 1 instruction cache miss counter would get, measured with real HW performance counters. I2mr would probably be more near to reality, as Cachegrind's L2 cache models the hardware of your processor better, but that can not really be measured separately in hardware. But I do not think that exact miss counters on instruction fetches are important: Most of the time, instruction misses play no role at all for bad cache behaviour. And if they are high, this means that your code size is very large for often executed code. The best optimization I know in this case would be to reduce the amount of inlining done by the compiler. And to check any improvements, the relative figures are important and not the absolute. Josef > > Thanks for your help. > > > > ------------------------------------------------------- > SF.Net email is sponsored by: > Tame your development challenges with Apache's Geronimo App Server. > Download it for free - -and be entered to win a 42" plasma tv or your very > own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |
|
From: Stefan O. <so...@ly...> - 2005-09-19 18:01:40
|
The assembly files are compiled by "as" with --gstabs and linked with some C code (which was compiled with gcc -g) gcc -Wall -g -c -o obj/driver.o ... as --gstabs -o obj/test00.o obj/test00.s ar -r obj/test_blocks.a obj/test00.o ... gcc --static-libc -o driver -g -lrt -I. obj/driver.o obj/test_blocks.a ... The program itself is quite exotic. The asm files are generated by another program, and don't compute anything useful. The results of the computations performed by the generated code are meaningless, what is interesting is the execution performance. I'm writing my final thesis on a data prefetching technique, which I intend to evaluate using the above program. I'm mostly interested in the data cache, so I'll ignore the error message for now. I'll probably try oprofile later on, for "real" data. On Mon, 19 Sep 2005, Nicholas Nethercote wrote: > Cachegrind annotation is only as good as the debug info present in the > binary. Often it's not very exact, so that could be the problem. > It's interesting that you're annotating the asm. What command line are > you assembling with? > Cachegrind doesn't simulate trace caches accurately. So it just simulates > a normal I-cache instead. The overall results will still be indicative of > your program's cache behaviour in general, but don't expect it to match > the exact results you'd get on the real hardware. |
|
From: Nicholas N. <nj...@cs...> - 2005-09-19 18:40:02
|
On Mon, 19 Sep 2005, Stefan Ottosson wrote: > The program itself is quite exotic. The asm files are generated by another > program, and don't compute anything useful. The results of the > computations performed by the generated code are meaningless, > what is interesting is the execution performance. It might be worth trying to work out if the debug info looks like it is bad, eg. by stepping through it by GDB or some other way. Nick |
|
From: Stefan O. <so...@ly...> - 2005-09-20 13:55:11
|
I stepped though it with ddd and everything seemed fine, so I guess the debug info is alright. On Mon, 19 Sep 2005, Nicholas Nethercote wrote: > On Mon, 19 Sep 2005, Stefan Ottosson wrote: > > > The program itself is quite exotic. The asm files are generated by another > > program, and don't compute anything useful. The results of the > > computations performed by the generated code are meaningless, > > what is interesting is the execution performance. > > It might be worth trying to work out if the debug info looks like it is > bad, eg. by stepping through it by GDB or some other way. > > Nick > |