|
From: Paul Y. <yin...@gm...> - 2009-08-14 09:31:23
|
Hi all,
I used cachegrind to evaluate the cache behavior. But the Dw number is
very strange.
//test1.c
#include <stdio.h>
#include <stdlib.h>
#define N 20000
int *parray[N];
int main ()
{
int i;
for (i = 0; i < N; i++)
parray[i] = (int *) malloc (10 * sizeof (int));
for (i = 0; i < N; i++)
parray[i] = (int *) (i);
return 0;
}
valgrind --tool=cachegrind ./test1
1) On 64-bit Intel Xeon CPU 3050 2.13GHz dual core: cache line size
is 64 bytes.
gcc -O2 -g -o test1 test1.c
sizeof (int*) = 8.
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
----------------------------------------------------
39,998 0 0 0 0 0 0 0 0 for (i = 0; i < N; i++)
80,000 1 1 0 0 0 40,000 2,501 2,501 parray[i] = (int
*) malloc (10 * sizeof (int));
39,998 0 0 0 0 0 0 0 0 for (i = 0; i < N; i++)
40,000 0 0 0 0 0 20,000 2,501 0 parray[i] = (int *) (i);
Question: Why is Dw is 40,000 for the line of malloc()? The
corresponding assembly is "movq %rax, pdarray(,%rbx,8)". The D1mw is
reasonable. 20,000 * 8 / 64 = 2500.
2) On Dual Core AMD Opteron Processor 270: cache line size is 64 bytes.
gcc -m32 -O2 -g -o test1 test1.c
sizeof (int*) = 4.
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
----------------------------------------------------
80,002 0 0 0 0 0 0 0 0 for (i = 0; i < N; i++)
80,000 0 0 0 0 0 60,000 1,250 1,250
parray[i] = (int *) malloc (10 * sizeof (int));
60,002 0 0 0 0 0 0 0 0 for (i = 0; i < N; i++)
20,000 0 0 0 0 0 20,000 1,250 0
parray[i] = (int *) (i);
Both Ir and Dw numbers are wrong. The D1mw is reasonable. 20,000 * 4 /
64 = 1250.
Any suggestion is welcome.
--
Regards,
Paul Yuan (袁鹏)
|