I am trying to use oProfile for finding cache misses in an application
code. I am running it on the board which has "Cavium Octeon 58xx" CPU. I
am facing an issue with CIMISS and DMLDS event counters. I am allocating
one buffer (worth one cache line) and accessing it multiple times. (PS:
attached benchmark_code.c). I ran the benchmark_code multiple times,
with different argv (i.e. as 1, 100, 1000, 10000, 40000, 60000). I
expected data cache misses (as seen in DMLDS) to reduce (from 100%),
when the same element is accessed more number of times. But when I ran
this program, I don't see any reduction in the misses (PS: attached
Can someone please let me know:-
-if I am missing something or if my understanding is not correct ?
-what can I try differently, to find out the data cache misses in an
application code ?
As per http://oprofile.sourceforge.net/doc/index.html, I have configured
the events as opcontrol --event=CIMISS:1000 --event=DMLDS:1000.
Any pointers or answers would be of great help.