|
From: Вадим В. <vad...@ma...> - 2011-04-20 11:52:36
|
Hello!
Can anyone tell me, is it possible to collect number of fetches and
cache misses for a particular array using Callgrind/Cachegrind?
For example, I have such fragment:
for (j = 0; j < length; j++) {
for (i = ip[j]; i <= ip[j+1]-1; i++) {
b[ia[i]] = b[ia[i]] + a[i]*x[j];
}
}
And I want to collect information only about array /b/. How can this be
done?
Thanks in advance!
--
With regards,
Vadim Voevodin.
|
|
From: Josef W. <Jos...@gm...> - 2011-04-20 21:18:27
|
On Wednesday 20 April 2011, Вадим Воеводин wrote:
> Can anyone tell me, is it possible to collect number of fetches and
> cache misses for a particular array using Callgrind/Cachegrind?
Unfortunately not possible at the moment.
The ideal thing would be to add a command line option for "please
separate counters for different variable accesses" ;-)
An easy way to implement it would be to add a client request to
restrict event counter increments to accesses falling into a given
memory range; then you would do the following before your code
fragment:
CALLGRIND_RESTRICT_COLLECTIONRANGE( &(b[0]), &(b[BSIZE]) );
Hmm. Would that be useful?
Josef
> For example, I have such fragment:
>
> for (j = 0; j < length; j++) {
> for (i = ip[j]; i <= ip[j+1]-1; i++) {
> b[ia[i]] = b[ia[i]] + a[i]*x[j];
> }
> }
>
> And I want to collect information only about array /b/. How can this be
> done?
>
> Thanks in advance!
>
>
|
|
From: Вадим В. <vad...@ma...> - 2011-04-21 04:43:08
|
21.04.2011 1:00, Josef Weidendorfer пишет:
> On Wednesday 20 April 2011, Вадим Воеводин wrote:
>> Can anyone tell me, is it possible to collect number of fetches and
>> cache misses for a particular array using Callgrind/Cachegrind?
> Unfortunately not possible at the moment.
> The ideal thing would be to add a command line option for "please
> separate counters for different variable accesses" ;-)
>
> An easy way to implement it would be to add a client request to
> restrict event counter increments to accesses falling into a given
> memory range; then you would do the following before your code
> fragment:
>
> CALLGRIND_RESTRICT_COLLECTIONRANGE(&(b[0]),&(b[BSIZE]) );
>
> Hmm. Would that be useful?
>
Yes, thanks, I think this should help!
> Josef
>
>> For example, I have such fragment:
>>
>> for (j = 0; j< length; j++) {
>> for (i = ip[j]; i<= ip[j+1]-1; i++) {
>> b[ia[i]] = b[ia[i]] + a[i]*x[j];
>> }
>> }
>>
>> And I want to collect information only about array /b/. How can this be
>> done?
>>
>> Thanks in advance!
>>
>>
>
|
|
From: Julian S. <js...@ac...> - 2011-04-21 00:17:43
|
> Can anyone tell me, is it possible to collect number of fetches and
> cache misses for a particular array using Callgrind/Cachegrind?
> For example, I have such fragment:
>
> for (j = 0; j < length; j++) {
> for (i = ip[j]; i <= ip[j+1]-1; i++) {
> b[ia[i]] = b[ia[i]] + a[i]*x[j];
> }
> }
>
> And I want to collect information only about array /b/. How can this be
> done?
Short answer is, like this
for (j = 0; j < length; j++) {
for (i = ip[j]; i <= ip[j+1]-1; i++) {
long ia_i = ia[i];
double tmp = a[i] * x[j]; // or whatever type it is
b[ia_i] += tmp;
}
}
Now the stats for the line b[ia_i] += tmp should show you info only
for the /b/ access.
Long answer is, the question is kind-of meaningless. Cache misses are
a function of the overall memory behaviour of your program. So the
misses on b[] also depend on how the program accesses a[], x[], ip[],
etc, and you can't really measure each in isolation.
J
|
|
From: Вадим В. <vad...@ma...> - 2011-04-21 05:01:07
|
21.04.2011 4:16, Julian Seward пишет:
>> Can anyone tell me, is it possible to collect number of fetches and
>> cache misses for a particular array using Callgrind/Cachegrind?
>> For example, I have such fragment:
>>
>> for (j = 0; j< length; j++) {
>> for (i = ip[j]; i<= ip[j+1]-1; i++) {
>> b[ia[i]] = b[ia[i]] + a[i]*x[j];
>> }
>> }
>>
>> And I want to collect information only about array /b/. How can this be
>> done?
> Short answer is, like this
>
> for (j = 0; j< length; j++) {
> for (i = ip[j]; i<= ip[j+1]-1; i++) {
> long ia_i = ia[i];
> double tmp = a[i] * x[j]; // or whatever type it is
> b[ia_i] += tmp;
> }
> }
>
> Now the stats for the line b[ia_i] += tmp should show you info only
> for the /b/ access.
Thanks, this could help, and I need to change my program only a little,
that's convenient :)
> Long answer is, the question is kind-of meaningless. Cache misses are
> a function of the overall memory behaviour of your program. So the
> misses on b[] also depend on how the program accesses a[], x[], ip[],
> etc, and you can't really measure each in isolation.
>
Of course I agree with you, it depends on other accesses, but it is
interesting for my research to get caches misses for particular arrays
as well as for the whole program.
And in simple programs like this influence of different arrays on each
other can be approximately estimated.
> J
>
>
--
С уважением,
Вадим Воеводин.
|
|
From: Josef W. <Jos...@gm...> - 2011-04-27 18:16:35
|
On Thursday 21 April 2011, Julian Seward wrote:
> > Can anyone tell me, is it possible to collect number of fetches and
> > cache misses for a particular array using Callgrind/Cachegrind?
> > For example, I have such fragment:
> >
> > for (j = 0; j < length; j++) {
> > for (i = ip[j]; i <= ip[j+1]-1; i++) {
> > b[ia[i]] = b[ia[i]] + a[i]*x[j];
> > }
> > }
> >
> > And I want to collect information only about array /b/. How can this be
> > done?
>
> Short answer is, like this
>
> for (j = 0; j < length; j++) {
> for (i = ip[j]; i <= ip[j+1]-1; i++) {
> long ia_i = ia[i];
> double tmp = a[i] * x[j]; // or whatever type it is
> b[ia_i] += tmp;
> }
> }
Good suggestion!
If you can decipher the machine code, that rewriting is not even needed.
Just run callgrind with "--dump-instr=yes" and identify the instructions
accessing array b (in the machine code annotation view of KCachegrind).
> Now the stats for the line b[ia_i] += tmp should show you info only
> for the /b/ access.
>
> Long answer is, the question is kind-of meaningless. Cache misses are
> a function of the overall memory behaviour of your program. So the
> misses on b[] also depend on how the program accesses a[], x[], ip[],
> etc, and you can't really measure each in isolation.
Just collecting pure counts in isolation is possible. It just does
not make much sense. Of course, the simulator always has to be fed by
all memory accesses. But you always want to see the relation of
separate counters to the total number.
If 90% of misses in the inner loop are because of accesses to array b,
it makes sense to do cache blocking on b.
Josef
>
> J
>
> ------------------------------------------------------------------------------
> Benefiting from Server Virtualization: Beyond Initial Workload
> Consolidation -- Increasing the use of server virtualization is a top
> priority.Virtualization can reduce costs, simplify management, and improve
> application availability and disaster protection. Learn more about boosting
> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
> _______________________________________________
> Valgrind-users mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>
|
|
From: Julian S. <js...@ac...> - 2011-04-28 08:40:54
|
> > > For example, I have such fragment:
> > > for (j = 0; j < length; j++) {
> > >
> > > for (i = ip[j]; i <= ip[j+1]-1; i++) {
> > >
> > > b[ia[i]] = b[ia[i]] + a[i]*x[j];
> > >
> > > }
> > >
> > > }
> > >
> > > And I want to collect information only about array /b/. How can this be
> > > done?
> >
> > Short answer is, like this
> >
> > for (j = 0; j < length; j++) {
> >
> > for (i = ip[j]; i <= ip[j+1]-1; i++) {
> >
> > long ia_i = ia[i];
> > double tmp = a[i] * x[j]; // or whatever type it
> > is b[ia_i] += tmp;
> >
> > }
> >
> > }
>
The second version might be faster too. For the compiler to transform
the first version into the second version will require a lot of praying
to the gods of Common Subexpression Elimination and Alias Analysis.
J
|
|
From: Josef W. <Jos...@gm...> - 2011-04-27 18:41:29
|
On Thursday 21 April 2011, Вадим Воеводин wrote: > > Long answer is, the question is kind-of meaningless. Cache misses are > > a function of the overall memory behaviour of your program. So the > > misses on b[] also depend on how the program accesses a[], x[], ip[], > > etc, and you can't really measure each in isolation. > > > Of course I agree with you, it depends on other accesses, but it is > interesting for my research to get caches misses for particular arrays > as well as for the whole program. > And in simple programs like this influence of different arrays on each > other can be approximately estimated. Not sure I understand this statement. I am curious. How do you suggest to estimate the influence of different arrays on each other? Even if you only have 2 arrays, there still could be accesses to the stack, and the code around the inner loop also matters. With 3 arrays as in this example, separate influences of one array on the other is very difficult to define in a meaningful way, and probably would be mostly useless to come up with code improvements. I suppose you also would need to take into account how array accesses are covered by the different cache sets. Josef > > J > > > > > > > |
|
From: Вадим В. <vad...@ma...> - 2011-04-28 05:38:52
|
27.04.2011 22:41, Josef Weidendorfer пишет: > On Thursday 21 April 2011, Вадим Воеводин wrote: >>> Long answer is, the question is kind-of meaningless. Cache misses are >>> a function of the overall memory behaviour of your program. So the >>> misses on b[] also depend on how the program accesses a[], x[], ip[], >>> etc, and you can't really measure each in isolation. >>> >> Of course I agree with you, it depends on other accesses, but it is >> interesting for my research to get caches misses for particular arrays >> as well as for the whole program. >> And in simple programs like this influence of different arrays on each >> other can be approximately estimated. > Not sure I understand this statement. I am curious. How do you suggest to > estimate the influence of different arrays on each other? Even if you > only have 2 arrays, there still could be accesses to the stack, and the > code around the inner loop also matters. With 3 arrays as in this example, > separate influences of one array on the other is very difficult to define > in a meaningful way, and probably would be mostly useless to come up with > code improvements. I suppose you also would need to take into account how > array accesses are covered by the different cache sets. > OK, I would say it like that: in my research I'm trying to understand, can I (and how precise) estimate data locality using several characteristics. By now I'm working with independent arrays, just because it's simlper. Of course in future array interference need to be considered. So I was interested in number of cache misses for an array in order to understand, whether these characteristics can give rather close estimation. > Josef > > >>> J >>> >>> >> >> > -- С уважением, Вадим Воеводин. |