|
From: Mandy M. <te...@ho...> - 2016-02-14 15:26:13
|
Hi There is a need to recompile kernel to patch this module, how to check and step by step debug run this scheduler with valgrind before patch to kernel ? what are the commands to check this scheduler code? http://www.embedded.com/design/operating-systems/4204980/Real-Time-Linux-Scheduling-Part-3 [http://www.embedded.com/content/images/icons/contentitem-default.png]<http://www.embedded.com/design/operating-systems/4204980/Real-Time-Linux-Scheduling-Part-3> Implementing a new real-time scheduling policy for Linux ...<http://www.embedded.com/design/operating-systems/4204980/Real-Time-Linux-Scheduling-Part-3> www.embedded.com Implementing a new real-time scheduling policy for Linux: Part 3. Paulo Baltarejo Sousa and Luis Lino Ferreira, Polytechnic Institute of Porto July 28 ... Regards, Martin |
|
From: Mandy M. <te...@ho...> - 2016-02-14 16:48:46
|
Hi , how to resolve this issue, invalid read of size 4 and Process terminating with default action of signal 11 (SIGSEGV) ? martin@ubuntu:~/Downloads/tasks$ valgrind --leak-check=yes casio_system valgrind: casio_system: command not found martin@ubuntu:~/Downloads/tasks$ valgrind --leak-check=yes ./casio_system ==15478== Memcheck, a memory error detector ==15478== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==15478== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==15478== Command: ./casio_system ==15478== Usage: ./casio_system file_name (system configuration) ==15478== ==15478== HEAP SUMMARY: ==15478== in use at exit: 0 bytes in 0 blocks ==15478== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==15478== ==15478== All heap blocks were freed -- no leaks are possible ==15478== ==15478== For counts of detected and suppressed errors, rerun with: -v ==15478== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) martin@ubuntu:~/Downloads/tasks$ valgrind --leak-check=yes ./casio_system a ==15479== Memcheck, a memory error detector ==15479== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==15479== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==15479== Command: ./casio_system a ==15479== ==15479== Invalid read of size 4 ==15479== at 0x40FFB4B: fgets (iofgets.c:50) ==15479== by 0x8048D36: get_casio_tasks_config_info (in /home/martin/Downloads/tasks/casio_system) ==15479== by 0x8048E95: main (in /home/martin/Downloads/tasks/casio_system) ==15479== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==15479== ==15479== ==15479== Process terminating with default action of signal 11 (SIGSEGV) ==15479== Access not within mapped region at address 0x0 ==15479== at 0x40FFB4B: fgets (iofgets.c:50) ==15479== by 0x8048D36: get_casio_tasks_config_info (in /home/martin/Downloads/tasks/casio_system) ==15479== by 0x8048E95: main (in /home/martin/Downloads/tasks/casio_system) ==15479== If you believe this happened as a result of a stack ==15479== overflow in your program's main thread (unlikely but ==15479== possible), you can try to increase the size of the ==15479== main thread stack using the --main-stacksize= flag. ==15479== The main thread stack size used in this run was 8388608. ==15479== ==15479== HEAP SUMMARY: ==15479== in use at exit: 0 bytes in 0 blocks ==15479== total heap usage: 1 allocs, 1 frees, 352 bytes allocated ==15479== ==15479== All heap blocks were freed -- no leaks are possible ==15479== ==15479== For counts of detected and suppressed errors, rerun with: -v ==15479== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Segmentation fault (core dumped) Regards, Martin ________________________________ From: Mandy Martino <te...@ho...> Sent: Sunday, February 14, 2016 23:25 To: val...@li... Subject: [Valgrind-users] what are the commands to check this scheduler code? Hi There is a need to recompile kernel to patch this module, how to check and step by step debug run this scheduler with valgrind before patch to kernel ? what are the commands to check this scheduler code? http://www.embedded.com/design/operating-systems/4204980/Real-Time-Linux-Scheduling-Part-3 [http://www.embedded.com/content/images/icons/contentitem-default.png]<http://www.embedded.com/design/operating-systems/4204980/Real-Time-Linux-Scheduling-Part-3> Implementing a new real-time scheduling policy for Linux ...<http://www.embedded.com/design/operating-systems/4204980/Real-Time-Linux-Scheduling-Part-3> www.embedded.com Implementing a new real-time scheduling policy for Linux: Part 3. Paulo Baltarejo Sousa and Luis Lino Ferreira, Polytechnic Institute of Porto July 28 ... Regards, Martin |
|
From: Mandy M. <te...@ho...> - 2016-02-15 10:11:30
|
Hi ,
/*
gcc -o marti marti.c
valgrind --tool=cachegrind ./marti
valgrind --dsymutil=yes --tool=callgrind ./marti
*/
int main()
{
int x[5000][100];
int i = 0;
int j = 0;
for(i = 0; i <5000; ++i)
{
for (j = 0; j < 100; ++j)
{
x[i][j] = 2*x[i][j];
}
}
return 0;
}
/*
Ubuntu 12 in VMware player 12
for (j = 0; j < 100; ++j)
{
for(i = 0; i <5000; ++i)
{
x[i][j] = 2*x[i][j];
}
}
==4526== Cachegrind, a cache and branch-prediction profiler
==4526== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote et al.
==4526== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==4526== Command: ./marti
==4526==
--4526-- warning: L3 cache found, using its data for the LL simulation.
==4526==
==4526== I refs: 6,113,976
==4526== I1 misses: 687
==4526== LLi misses: 682
==4526== I1 miss rate: 0.01%
==4526== LLi miss rate: 0.01%
==4526==
==4526== D refs: 4,053,131 (3,537,745 rd + 515,386 wr)
==4526== D1 misses: 501,148 ( 500,988 rd + 160 wr)
==4526== LLd misses: 32,197 ( 32,062 rd + 135 wr)
==4526== D1 miss rate: 12.3% ( 14.1% + 0.0% )
==4526== LLd miss rate: 0.7% ( 0.9% + 0.0% )
==4526==
==4526== LL refs: 501,835 ( 501,675 rd + 160 wr)
==4526== LL misses: 32,879 ( 32,744 rd + 135 wr)
==4526== LL miss rate: 0.3% ( 0.3% + 0.0% )
for(i = 0; i <5000; ++i)
{
for (j = 0; j < 100; ++j)
{
x[i][j] = 2*x[i][j];
}
}
==4539== Cachegrind, a cache and branch-prediction profiler
==4539== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote et al.
==4539== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==4539== Command: ./marti
==4539==
--4539-- warning: L3 cache found, using its data for the LL simulation.
==4539==
==4539== I refs: 6,148,278
==4539== I1 misses: 687
==4539== LLi misses: 682
==4539== I1 miss rate: 0.01%
==4539== LLi miss rate: 0.01%
==4539==
==4539== D refs: 4,072,731 (3,552,445 rd + 520,286 wr)
==4539== D1 misses: 32,387 ( 32,238 rd + 149 wr)
==4539== LLd misses: 32,197 ( 32,062 rd + 135 wr)
==4539== D1 miss rate: 0.7% ( 0.9% + 0.0% )
==4539== LLd miss rate: 0.7% ( 0.9% + 0.0% )
==4539==
==4539== LL refs: 33,074 ( 32,925 rd + 149 wr)
==4539== LL misses: 32,879 ( 32,744 rd + 135 wr)
==4539== LL miss rate: 0.3% ( 0.3% + 0.0% )
*/
Regards,
Martin
|
|
From: Mandy M. <te...@ho...> - 2016-02-15 10:25:25
|
Hi,
why
I1 misses increase, LLi misses increase, LL misses increase, D1 misses increase
though miss rate decrease at this row 0.1% + 0.0% ?
which indicator show the correct number that can show the improvement after optimization?
#define min(a,b) (((a)<(b))?(a):(b))
#define max(a,b) (((a)>(b))?(a):(b))
int main()
{
int x[100][100];
int y[100][100];
int z[100][100];
int i=0;
int j=0;
int k=0;
int N=100;
int r=0;
int jj=0;
int kk=0;
int B = 5;
/*
for(i=0;i<N;++i)
{
for(j=0;j<N;++j)
{
r=0;
for(k=0;k<N;++k)
{
r=r+y[i][k]*z[k][j];
}
x[i][j]=r;
}
}
*/
for(jj=0;jj<N;jj=jj+B)
for(kk=0;kk<N;kk=kk+B)
for(i=0;i<N;++i)
{
for(j=0;j<min(jj+B,N);++j)
{
r=0;
for(k=kk;k<min(kk+B,N);++k)
{
r=r+y[i][k]*z[k][j];
}
x[i][j]=x[i][j]+r;
}
}
return 0;
}
/*
for(i=0;i<N;++i)
{
for(j=0;j<N;++j)
{
r=0;
for(k=0;k<N;++k)
{
r=r+y[i][k]*z[k][j];
}
x[i][j]=r;
}
}
martin@ubuntu:~$ valgrind --tool=cachegrind ./mar5ti
==4602== Cachegrind, a cache and branch-prediction profiler
==4602== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote et al.
==4602== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==4602== Command: ./mar5ti
==4602==
--4602-- warning: L3 cache found, using its data for the LL simulation.
==4602==
==4602== I refs: 14,264,184
==4602== I1 misses: 689
==4602== LLi misses: 684
==4602== I1 miss rate: 0.00%
==4602== LLi miss rate: 0.00%
==4602==
==4602== D refs: 10,163,336 (10,117,945 rd + 45,391 wr)
==4602== D1 misses: 64,978 ( 64,200 rd + 778 wr)
==4602== LLd misses: 2,823 ( 2,063 rd + 760 wr)
==4602== D1 miss rate: 0.6% ( 0.6% + 1.7% )
==4602== LLd miss rate: 0.0% ( 0.0% + 1.6% )
==4602==
==4602== LL refs: 65,667 ( 64,889 rd + 778 wr)
==4602== LL misses: 3,507 ( 2,747 rd + 760 wr)
==4602== LL miss rate: 0.0% ( 0.0% + 1.6% )
for(jj=0;jj<N;jj=jj+B)
for(kk=0;kk<N;kk=kk+B)
for(i=0;i<N;++i)
{
for(j=0;j<min(jj+B,N);++j)
{
r=0;
for(k=kk;k<min(kk+B,N);++k)
{
r=r+y[i][k]*z[k][j];
}
x[i][j]=x[i][j]+r;
}
}
martin@ubuntu:~$ valgrind --tool=cachegrind ./mar5ti
==4654== Cachegrind, a cache and branch-prediction profiler
==4654== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote et al.
==4654== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==4654== Command: ./mar5ti
==4654==
--4654-- warning: L3 cache found, using its data for the LL simulation.
==4654==
==4654== I refs: 265,277,487
==4654== I1 misses: 690
==4654== LLi misses: 685
==4654== I1 miss rate: 0.00%
==4654== LLi miss rate: 0.00%
==4654==
==4654== D refs: 166,275,677 (159,919,965 rd + 6,355,712 wr)
==4654== D1 misses: 170,231 ( 170,082 rd + 149 wr)
==4654== LLd misses: 2,823 ( 2,688 rd + 135 wr)
==4654== D1 miss rate: 0.1% ( 0.1% + 0.0% )
==4654== LLd miss rate: 0.0% ( 0.0% + 0.0% )
==4654==
==4654== LL refs: 170,921 ( 170,772 rd + 149 wr)
==4654== LL misses: 3,508 ( 3,373 rd + 135 wr)
==4654== LL miss rate: 0.0% ( 0.0% + 0.0% )
*/
Regards,
Martin
|
|
From: Josef W. <Jos...@gm...> - 2016-02-15 22:37:22
|
Am 15.02.2016 um 11:25 schrieb Mandy Martino:
> why
>
> I1 misses increase, LLi misses increase, LL misses increase, D1 misses
> increase
> though miss rate decrease at this row 0.1% + 0.0% ?
>
> which indicator show the correct number that can show the improvement
> after optimization?
I see you do blocking in the 2nd version. However, the number of data
accesses is 166 million vs. 10 million in your 1st version. I assume
this is because you did not compile with -O2 or -O3 ?
Miss rate is a relative number, based on total number of accesses.
A comparison is meaningless if the number of accesses is so different.
Josef
>
>
> #define min(a,b) (((a)<(b))?(a):(b))
> #define max(a,b) (((a)>(b))?(a):(b))
> int main()
> {
> int x[100][100];
> int y[100][100];
> int z[100][100];
> int i=0;
> int j=0;
> int k=0;
> int N=100;
> int r=0;
> int jj=0;
> int kk=0;
> int B = 5;
> /*
> for(i=0;i<N;++i)
> {
> for(j=0;j<N;++j)
> {
> r=0;
> for(k=0;k<N;++k)
> {
> r=r+y[i][k]*z[k][j];
> }
> x[i][j]=r;
> }
> }
> */
> for(jj=0;jj<N;jj=jj+B)
> for(kk=0;kk<N;kk=kk+B)
> for(i=0;i<N;++i)
> {
> for(j=0;j<min(jj+B,N);++j)
> {
> r=0;
> for(k=kk;k<min(kk+B,N);++k)
> {
> r=r+y[i][k]*z[k][j];
> }
> x[i][j]=x[i][j]+r;
> }
> }
> return 0;
> }
> /*
> for(i=0;i<N;++i)
> {
> for(j=0;j<N;++j)
> {
> r=0;
> for(k=0;k<N;++k)
> {
> r=r+y[i][k]*z[k][j];
> }
> x[i][j]=r;
> }
> }
>
> martin@ubuntu:~$ valgrind --tool=cachegrind ./mar5ti
> ==4602== Cachegrind, a cache and branch-prediction profiler
> ==4602== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote
> et al.
> ==4602== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
> ==4602== Command: ./mar5ti
> ==4602==
> --4602-- warning: L3 cache found, using its data for the LL simulation.
> ==4602==
> ==4602== I refs: 14,264,184
> ==4602== I1 misses: 689
> ==4602== LLi misses: 684
> ==4602== I1 miss rate: 0.00%
> ==4602== LLi miss rate: 0.00%
> ==4602==
> ==4602== D refs: 10,163,336 (10,117,945 rd + 45,391 wr)
> ==4602== D1 misses: 64,978 ( 64,200 rd + 778 wr)
> ==4602== LLd misses: 2,823 ( 2,063 rd + 760 wr)
> ==4602== D1 miss rate: 0.6% ( 0.6% + 1.7% )
> ==4602== LLd miss rate: 0.0% ( 0.0% + 1.6% )
> ==4602==
> ==4602== LL refs: 65,667 ( 64,889 rd + 778 wr)
> ==4602== LL misses: 3,507 ( 2,747 rd + 760 wr)
> ==4602== LL miss rate: 0.0% ( 0.0% + 1.6% )
>
> for(jj=0;jj<N;jj=jj+B)
> for(kk=0;kk<N;kk=kk+B)
> for(i=0;i<N;++i)
> {
> for(j=0;j<min(jj+B,N);++j)
> {
> r=0;
> for(k=kk;k<min(kk+B,N);++k)
> {
> r=r+y[i][k]*z[k][j];
> }
> x[i][j]=x[i][j]+r;
> }
> }
> martin@ubuntu:~$ valgrind --tool=cachegrind ./mar5ti
> ==4654== Cachegrind, a cache and branch-prediction profiler
> ==4654== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote
> et al.
> ==4654== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
> ==4654== Command: ./mar5ti
> ==4654==
> --4654-- warning: L3 cache found, using its data for the LL simulation.
> ==4654==
> ==4654== I refs: 265,277,487
> ==4654== I1 misses: 690
> ==4654== LLi misses: 685
> ==4654== I1 miss rate: 0.00%
> ==4654== LLi miss rate: 0.00%
> ==4654==
> ==4654== D refs: 166,275,677 (159,919,965 rd + 6,355,712 wr)
> ==4654== D1 misses: 170,231 ( 170,082 rd + 149 wr)
> ==4654== LLd misses: 2,823 ( 2,688 rd + 135 wr)
> ==4654== D1 miss rate: 0.1% ( 0.1% + 0.0% )
> ==4654== LLd miss rate: 0.0% ( 0.0% + 0.0% )
> ==4654==
> ==4654== LL refs: 170,921 ( 170,772 rd + 149 wr)
> ==4654== LL misses: 3,508 ( 3,373 rd + 135 wr)
> ==4654== LL miss rate: 0.0% ( 0.0% + 0.0% )
>
> */
>
>
> Regards,
>
>
> Martin
>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>
>
>
> _______________________________________________
> Valgrind-users mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>
|
|
From: Mandy M. <te...@ho...> - 2016-02-16 08:37:00
|
Hi,
when no -O2 or -O3
the blocking version has 166 million data access , original version without blocking it has 10 million data access.
After run with gcc -O3 -o mar5ti mar5ti.c,
1.there are nearly no difference in the result. Why no difference?
2. does it mean that compiler has already done cache optimization? no need to consider cache optimization in c language programming any more nowadays?
#define min(a,b) (((a)<(b))?(a):(b))
#define max(a,b) (((a)>(b))?(a):(b))
int main()
{
int x[100][100];
int y[100][100];
int z[100][100];
int i=0;
int j=0;
int k=0;
int N=100;
int r=0;
int jj=0;
int kk=0;
int B = 5;
/*
for(i=0;i<N;++i)
{
for(j=0;j<N;++j)
{
r=0;
for(k=0;k<N;++k)
{
r=r+y[i][k]*z[k][j];
}
x[i][j]=r;
}
}
*/
for(jj=0;jj<N;jj=jj+B)
for(kk=0;kk<N;kk=kk+B)
for(i=0;i<N;++i)
{
for(j=0;j<min(jj+B,N);++j)
{
r=0;
for(k=kk;k<min(kk+B,N);++k)
{
r=r+y[i][k]*z[k][j];
}
x[i][j]=x[i][j]+r;
}
}
return 0;
}
/*
for(i=0;i<N;++i)
{
for(j=0;j<N;++j)
{
r=0;
for(k=0;k<N;++k)
{
r=r+y[i][k]*z[k][j];
}
x[i][j]=r;
}
}
martin@ubuntu:~$ valgrind --tool=cachegrind ./mar5ti
==2934== Cachegrind, a cache and branch-prediction profiler
==2934== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote et al.
==2934== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==2934== Command: ./mar5ti
==2934==
--2934-- warning: L3 cache found, using its data for the LL simulation.
==2934==
==2934== I refs: 113,272
==2934== I1 misses: 686
==2934== LLi misses: 681
==2934== I1 miss rate: 0.60%
==2934== LLi miss rate: 0.60%
==2934==
==2934== D refs: 52,724 (37,442 rd + 15,282 wr)
==2934== D1 misses: 1,075 ( 930 rd + 145 wr)
==2934== LLd misses: 982 ( 847 rd + 135 wr)
==2934== D1 miss rate: 2.0% ( 2.4% + 0.9% )
==2934== LLd miss rate: 1.8% ( 2.2% + 0.8% )
==2934==
==2934== LL refs: 1,761 ( 1,616 rd + 145 wr)
==2934== LL misses: 1,663 ( 1,528 rd + 135 wr)
==2934== LL miss rate: 1.0% ( 1.0% + 0.8% )
for(jj=0;jj<N;jj=jj+B)
for(kk=0;kk<N;kk=kk+B)
for(i=0;i<N;++i)
{
for(j=0;j<min(jj+B,N);++j)
{
r=0;
for(k=kk;k<min(kk+B,N);++k)
{
r=r+y[i][k]*z[k][j];
}
x[i][j]=x[i][j]+r;
}
}
martin@ubuntu:~$ valgrind --tool=cachegrind ./mar5ti
==3047== Cachegrind, a cache and branch-prediction profiler
==3047== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote et al.
==3047== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==3047== Command: ./mar5ti
==3047==
--3047-- warning: L3 cache found, using its data for the LL simulation.
==3047==
==3047== I refs: 113,268
==3047== I1 misses: 686
==3047== LLi misses: 681
==3047== I1 miss rate: 0.60%
==3047== LLi miss rate: 0.60%
==3047==
==3047== D refs: 52,724 (37,442 rd + 15,282 wr)
==3047== D1 misses: 1,075 ( 930 rd + 145 wr)
==3047== LLd misses: 982 ( 847 rd + 135 wr)
==3047== D1 miss rate: 2.0% ( 2.4% + 0.9% )
==3047== LLd miss rate: 1.8% ( 2.2% + 0.8% )
==3047==
==3047== LL refs: 1,761 ( 1,616 rd + 145 wr)
==3047== LL misses: 1,663 ( 1,528 rd + 135 wr)
==3047== LL miss rate: 1.0% ( 1.0% + 0.8% )
*/
Regards,
Martin
________________________________________
From: Josef Weidendorfer <Jos...@gm...>
Sent: Tuesday, February 16, 2016 6:37
To: val...@li...
Subject: Re: [Valgrind-users] why miss rate decrease but number of misses increase in ubuntu 12 in vmware player 12 ?
Am 15.02.2016 um 11:25 schrieb Mandy Martino:
> why
>
> I1 misses increase, LLi misses increase, LL misses increase, D1 misses
> increase
> though miss rate decrease at this row 0.1% + 0.0% ?
>
> which indicator show the correct number that can show the improvement
> after optimization?
I see you do blocking in the 2nd version. However, the number of data
accesses is 166 million vs. 10 million in your 1st version. I assume
this is because you did not compile with -O2 or -O3 ?
Miss rate is a relative number, based on total number of accesses.
A comparison is meaningless if the number of accesses is so different.
Josef
>
>
> #define min(a,b) (((a)<(b))?(a):(b))
> #define max(a,b) (((a)>(b))?(a):(b))
> int main()
> {
> int x[100][100];
> int y[100][100];
> int z[100][100];
> int i=0;
> int j=0;
> int k=0;
> int N=100;
> int r=0;
> int jj=0;
> int kk=0;
> int B = 5;
> /*
> for(i=0;i<N;++i)
> {
> for(j=0;j<N;++j)
> {
> r=0;
> for(k=0;k<N;++k)
> {
> r=r+y[i][k]*z[k][j];
> }
> x[i][j]=r;
> }
> }
> */
> for(jj=0;jj<N;jj=jj+B)
> for(kk=0;kk<N;kk=kk+B)
> for(i=0;i<N;++i)
> {
> for(j=0;j<min(jj+B,N);++j)
> {
> r=0;
> for(k=kk;k<min(kk+B,N);++k)
> {
> r=r+y[i][k]*z[k][j];
> }
> x[i][j]=x[i][j]+r;
> }
> }
> return 0;
> }
> /*
> for(i=0;i<N;++i)
> {
> for(j=0;j<N;++j)
> {
> r=0;
> for(k=0;k<N;++k)
> {
> r=r+y[i][k]*z[k][j];
> }
> x[i][j]=r;
> }
> }
>
> martin@ubuntu:~$ valgrind --tool=cachegrind ./mar5ti
> ==4602== Cachegrind, a cache and branch-prediction profiler
> ==4602== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote
> et al.
> ==4602== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
> ==4602== Command: ./mar5ti
> ==4602==
> --4602-- warning: L3 cache found, using its data for the LL simulation.
> ==4602==
> ==4602== I refs: 14,264,184
> ==4602== I1 misses: 689
> ==4602== LLi misses: 684
> ==4602== I1 miss rate: 0.00%
> ==4602== LLi miss rate: 0.00%
> ==4602==
> ==4602== D refs: 10,163,336 (10,117,945 rd + 45,391 wr)
> ==4602== D1 misses: 64,978 ( 64,200 rd + 778 wr)
> ==4602== LLd misses: 2,823 ( 2,063 rd + 760 wr)
> ==4602== D1 miss rate: 0.6% ( 0.6% + 1.7% )
> ==4602== LLd miss rate: 0.0% ( 0.0% + 1.6% )
> ==4602==
> ==4602== LL refs: 65,667 ( 64,889 rd + 778 wr)
> ==4602== LL misses: 3,507 ( 2,747 rd + 760 wr)
> ==4602== LL miss rate: 0.0% ( 0.0% + 1.6% )
>
> for(jj=0;jj<N;jj=jj+B)
> for(kk=0;kk<N;kk=kk+B)
> for(i=0;i<N;++i)
> {
> for(j=0;j<min(jj+B,N);++j)
> {
> r=0;
> for(k=kk;k<min(kk+B,N);++k)
> {
> r=r+y[i][k]*z[k][j];
> }
> x[i][j]=x[i][j]+r;
> }
> }
> martin@ubuntu:~$ valgrind --tool=cachegrind ./mar5ti
> ==4654== Cachegrind, a cache and branch-prediction profiler
> ==4654== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote
> et al.
> ==4654== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
> ==4654== Command: ./mar5ti
> ==4654==
> --4654-- warning: L3 cache found, using its data for the LL simulation.
> ==4654==
> ==4654== I refs: 265,277,487
> ==4654== I1 misses: 690
> ==4654== LLi misses: 685
> ==4654== I1 miss rate: 0.00%
> ==4654== LLi miss rate: 0.00%
> ==4654==
> ==4654== D refs: 166,275,677 (159,919,965 rd + 6,355,712 wr)
> ==4654== D1 misses: 170,231 ( 170,082 rd + 149 wr)
> ==4654== LLd misses: 2,823 ( 2,688 rd + 135 wr)
> ==4654== D1 miss rate: 0.1% ( 0.1% + 0.0% )
> ==4654== LLd miss rate: 0.0% ( 0.0% + 0.0% )
> ==4654==
> ==4654== LL refs: 170,921 ( 170,772 rd + 149 wr)
> ==4654== LL misses: 3,508 ( 3,373 rd + 135 wr)
> ==4654== LL miss rate: 0.0% ( 0.0% + 0.0% )
>
> */
>
>
> Regards,
>
>
> Martin
>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>
>
>
> _______________________________________________
> Valgrind-users mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Valgrind-users mailing list
Val...@li...
https://lists.sourceforge.net/lists/listinfo/valgrind-users
|
|
From: Josef W. <Jos...@gm...> - 2016-02-16 11:13:33
|
Am 16.02.2016 um 09:36 schrieb Mandy Martino:
> Hi,
>
> when no -O2 or -O3
> the blocking version has 166 million data access , original version without blocking it has 10 million data access.
>
> After run with gcc -O3 -o mar5ti mar5ti.c,
>
> 1.there are nearly no difference in the result. Why no difference?
No idea. You have to dig deeper.
You can see a split-up using cg_annotate or the q/kcachegrind GUI.
If you use callgrind, you can get annotation on the machine instruction
level (--dump-instr=yes).
But I would imagine that your workload/data is too small, and
perhaps the numbers are dominated by startup stuff which has
nothing to do with your code. You do not show initialization of
the used arrays. If you really do not initialize, your benchmark
is screwed.
> 2. does it mean that compiler has already done cache optimization? no need to consider cache optimization in c language programming any more nowadays?
Some compilers try to be smart, but usually they cannot do much
as the data sizes are not known at compile time. You could compare
the resulting machine code.
Cheers,
Josef
>
> #define min(a,b) (((a)<(b))?(a):(b))
> #define max(a,b) (((a)>(b))?(a):(b))
> int main()
> {
> int x[100][100];
> int y[100][100];
> int z[100][100];
> int i=0;
> int j=0;
> int k=0;
> int N=100;
> int r=0;
> int jj=0;
> int kk=0;
> int B = 5;
> /*
> for(i=0;i<N;++i)
> {
> for(j=0;j<N;++j)
> {
> r=0;
> for(k=0;k<N;++k)
> {
> r=r+y[i][k]*z[k][j];
> }
> x[i][j]=r;
> }
> }
> */
> for(jj=0;jj<N;jj=jj+B)
> for(kk=0;kk<N;kk=kk+B)
> for(i=0;i<N;++i)
> {
> for(j=0;j<min(jj+B,N);++j)
> {
> r=0;
> for(k=kk;k<min(kk+B,N);++k)
> {
> r=r+y[i][k]*z[k][j];
> }
> x[i][j]=x[i][j]+r;
> }
> }
>
> return 0;
> }
> /*
> for(i=0;i<N;++i)
> {
> for(j=0;j<N;++j)
> {
> r=0;
> for(k=0;k<N;++k)
> {
> r=r+y[i][k]*z[k][j];
> }
> x[i][j]=r;
> }
> }
> martin@ubuntu:~$ valgrind --tool=cachegrind ./mar5ti
> ==2934== Cachegrind, a cache and branch-prediction profiler
> ==2934== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote et al.
> ==2934== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
> ==2934== Command: ./mar5ti
> ==2934==
> --2934-- warning: L3 cache found, using its data for the LL simulation.
> ==2934==
> ==2934== I refs: 113,272
> ==2934== I1 misses: 686
> ==2934== LLi misses: 681
> ==2934== I1 miss rate: 0.60%
> ==2934== LLi miss rate: 0.60%
> ==2934==
> ==2934== D refs: 52,724 (37,442 rd + 15,282 wr)
> ==2934== D1 misses: 1,075 ( 930 rd + 145 wr)
> ==2934== LLd misses: 982 ( 847 rd + 135 wr)
> ==2934== D1 miss rate: 2.0% ( 2.4% + 0.9% )
> ==2934== LLd miss rate: 1.8% ( 2.2% + 0.8% )
> ==2934==
> ==2934== LL refs: 1,761 ( 1,616 rd + 145 wr)
> ==2934== LL misses: 1,663 ( 1,528 rd + 135 wr)
> ==2934== LL miss rate: 1.0% ( 1.0% + 0.8% )
>
> for(jj=0;jj<N;jj=jj+B)
> for(kk=0;kk<N;kk=kk+B)
> for(i=0;i<N;++i)
> {
> for(j=0;j<min(jj+B,N);++j)
> {
> r=0;
> for(k=kk;k<min(kk+B,N);++k)
> {
> r=r+y[i][k]*z[k][j];
> }
> x[i][j]=x[i][j]+r;
> }
> }
>
> martin@ubuntu:~$ valgrind --tool=cachegrind ./mar5ti
> ==3047== Cachegrind, a cache and branch-prediction profiler
> ==3047== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote et al.
> ==3047== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
> ==3047== Command: ./mar5ti
> ==3047==
> --3047-- warning: L3 cache found, using its data for the LL simulation.
> ==3047==
> ==3047== I refs: 113,268
> ==3047== I1 misses: 686
> ==3047== LLi misses: 681
> ==3047== I1 miss rate: 0.60%
> ==3047== LLi miss rate: 0.60%
> ==3047==
> ==3047== D refs: 52,724 (37,442 rd + 15,282 wr)
> ==3047== D1 misses: 1,075 ( 930 rd + 145 wr)
> ==3047== LLd misses: 982 ( 847 rd + 135 wr)
> ==3047== D1 miss rate: 2.0% ( 2.4% + 0.9% )
> ==3047== LLd miss rate: 1.8% ( 2.2% + 0.8% )
> ==3047==
> ==3047== LL refs: 1,761 ( 1,616 rd + 145 wr)
> ==3047== LL misses: 1,663 ( 1,528 rd + 135 wr)
> ==3047== LL miss rate: 1.0% ( 1.0% + 0.8% )
>
> */
>
> Regards,
>
> Martin
>
> ________________________________________
> From: Josef Weidendorfer <Jos...@gm...>
> Sent: Tuesday, February 16, 2016 6:37
> To: val...@li...
> Subject: Re: [Valgrind-users] why miss rate decrease but number of misses increase in ubuntu 12 in vmware player 12 ?
>
> Am 15.02.2016 um 11:25 schrieb Mandy Martino:
>> why
>>
>> I1 misses increase, LLi misses increase, LL misses increase, D1 misses
>> increase
>> though miss rate decrease at this row 0.1% + 0.0% ?
>>
>> which indicator show the correct number that can show the improvement
>> after optimization?
>
> I see you do blocking in the 2nd version. However, the number of data
> accesses is 166 million vs. 10 million in your 1st version. I assume
> this is because you did not compile with -O2 or -O3 ?
>
> Miss rate is a relative number, based on total number of accesses.
> A comparison is meaningless if the number of accesses is so different.
>
> Josef
>
>
>>
>>
>> #define min(a,b) (((a)<(b))?(a):(b))
>> #define max(a,b) (((a)>(b))?(a):(b))
>> int main()
>> {
>> int x[100][100];
>> int y[100][100];
>> int z[100][100];
>> int i=0;
>> int j=0;
>> int k=0;
>> int N=100;
>> int r=0;
>> int jj=0;
>> int kk=0;
>> int B = 5;
>> /*
>> for(i=0;i<N;++i)
>> {
>> for(j=0;j<N;++j)
>> {
>> r=0;
>> for(k=0;k<N;++k)
>> {
>> r=r+y[i][k]*z[k][j];
>> }
>> x[i][j]=r;
>> }
>> }
>> */
>> for(jj=0;jj<N;jj=jj+B)
>> for(kk=0;kk<N;kk=kk+B)
>> for(i=0;i<N;++i)
>> {
>> for(j=0;j<min(jj+B,N);++j)
>> {
>> r=0;
>> for(k=kk;k<min(kk+B,N);++k)
>> {
>> r=r+y[i][k]*z[k][j];
>> }
>> x[i][j]=x[i][j]+r;
>> }
>> }
>> return 0;
>> }
>> /*
>> for(i=0;i<N;++i)
>> {
>> for(j=0;j<N;++j)
>> {
>> r=0;
>> for(k=0;k<N;++k)
>> {
>> r=r+y[i][k]*z[k][j];
>> }
>> x[i][j]=r;
>> }
>> }
>>
>> martin@ubuntu:~$ valgrind --tool=cachegrind ./mar5ti
>> ==4602== Cachegrind, a cache and branch-prediction profiler
>> ==4602== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote
>> et al.
>> ==4602== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
>> ==4602== Command: ./mar5ti
>> ==4602==
>> --4602-- warning: L3 cache found, using its data for the LL simulation.
>> ==4602==
>> ==4602== I refs: 14,264,184
>> ==4602== I1 misses: 689
>> ==4602== LLi misses: 684
>> ==4602== I1 miss rate: 0.00%
>> ==4602== LLi miss rate: 0.00%
>> ==4602==
>> ==4602== D refs: 10,163,336 (10,117,945 rd + 45,391 wr)
>> ==4602== D1 misses: 64,978 ( 64,200 rd + 778 wr)
>> ==4602== LLd misses: 2,823 ( 2,063 rd + 760 wr)
>> ==4602== D1 miss rate: 0.6% ( 0.6% + 1.7% )
>> ==4602== LLd miss rate: 0.0% ( 0.0% + 1.6% )
>> ==4602==
>> ==4602== LL refs: 65,667 ( 64,889 rd + 778 wr)
>> ==4602== LL misses: 3,507 ( 2,747 rd + 760 wr)
>> ==4602== LL miss rate: 0.0% ( 0.0% + 1.6% )
>>
>> for(jj=0;jj<N;jj=jj+B)
>> for(kk=0;kk<N;kk=kk+B)
>> for(i=0;i<N;++i)
>> {
>> for(j=0;j<min(jj+B,N);++j)
>> {
>> r=0;
>> for(k=kk;k<min(kk+B,N);++k)
>> {
>> r=r+y[i][k]*z[k][j];
>> }
>> x[i][j]=x[i][j]+r;
>> }
>> }
>> martin@ubuntu:~$ valgrind --tool=cachegrind ./mar5ti
>> ==4654== Cachegrind, a cache and branch-prediction profiler
>> ==4654== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote
>> et al.
>> ==4654== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
>> ==4654== Command: ./mar5ti
>> ==4654==
>> --4654-- warning: L3 cache found, using its data for the LL simulation.
>> ==4654==
>> ==4654== I refs: 265,277,487
>> ==4654== I1 misses: 690
>> ==4654== LLi misses: 685
>> ==4654== I1 miss rate: 0.00%
>> ==4654== LLi miss rate: 0.00%
>> ==4654==
>> ==4654== D refs: 166,275,677 (159,919,965 rd + 6,355,712 wr)
>> ==4654== D1 misses: 170,231 ( 170,082 rd + 149 wr)
>> ==4654== LLd misses: 2,823 ( 2,688 rd + 135 wr)
>> ==4654== D1 miss rate: 0.1% ( 0.1% + 0.0% )
>> ==4654== LLd miss rate: 0.0% ( 0.0% + 0.0% )
>> ==4654==
>> ==4654== LL refs: 170,921 ( 170,772 rd + 149 wr)
>> ==4654== LL misses: 3,508 ( 3,373 rd + 135 wr)
>> ==4654== LL miss rate: 0.0% ( 0.0% + 0.0% )
>>
>> */
>>
>>
>> Regards,
>>
>>
>> Martin
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>
>>
>>
>> _______________________________________________
>> Valgrind-users mailing list
>> Val...@li...
>> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Valgrind-users mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>
|