|
From: stavros k. <ska...@gm...> - 2014-05-26 18:51:59
|
Dear Valgrind Developers, As prof. Davenport already explained, for my Bsc project I extended Cachegrind's version 3.8.1 to provide information for L2 cache and measure the TLB. L2 cache inclusion presents the same information as presented in L1 and LL caches, while measuring TLB provides per TLB information for: a) the number of hits and misses, b) the pages used and the times used and c) identified parts in the code that caused many misses. (extensions only work for Intel x86) Analysis of each of the capabilities of the extension follows: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L2 cache inclusion and per TLB hits and misses statistics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The basic execution of the extension when run upon the Unix program ls is shown below: (L2 cache information is shown within lines 11-33, while lines 34-84 contain TLB measuring information) 1 $valgrind --tool=cachegrind ls 2 ==3148== Cachegrind , a TLB , cache and branch - prediction profiler 3 ==3148== Copyright ( C ) 2002 -2012 , and GNU GPL ’d , by Nicholas Nethercote et al . 4 ==3148== Using Valgrind -3.8.1 and LibVEX ; rerun with -h for copyright info 5 ==3148== Command : ls 6 ==3148== 7 - -3148 - - warning : L4 cache ignored 8 - -3148 - - warning : L3 cache found , using its data for the LL simulation . 9 ...( ls command output cropped )... 10 ==3148== 11 ==3148== I refs : 987 ,465 12 ==3148== I1 misses : 1 ,631 13 ==3148== L2i misses : 1 ,510 14 ==3148== LLi misses : 1 ,505 15 ==3148== I1 miss rate : 0.16% 16 ==3148== L2i miss rate : 0.15% 17 ==3148== LLi miss rate : 0.15% 18 ==3148== 19 ==3148== D refs : 494 ,321 (353 ,809 rd + 140 ,512 wr ) 20 ==3148== D1 misses : 4 ,185 ( 3 ,433 rd + 752 wr ) 21 ==3148== L2d misses : 2 ,880 ( 2 ,201 rd + 679 wr ) 22 ==3148== LLd misses : 2 ,826 ( 2 ,163 rd + 663 wr ) 23 ==3148== D1 miss rate : 0.8% ( 0.9% + 0.5% ) 24 ==3148== L2d miss rate : 0.5% ( 0.6% + 0.4% ) 25 ==3148== LLd miss rate : 0.5% ( 0.6% + 0.4% ) 26 ==3148== 27 ==3148== L2 refs : 5 ,816 ( 5 ,064 rd + 752 wr ) 28 ==3148== L2 misses : 4 ,390 ( 3 ,711 rd + 679 wr ) 29 ==3148== L2 miss rate : 0.2% ( 0.2% + 0.4% ) 30 ==3148== 31 ==3148== LL refs : 4 ,390 ( 3 ,711 rd + 679 wr ) 32 ==3148== LL misses : 4 ,331 ( 3 ,668 rd + 663 wr ) 33 ==3148== LL miss rate : 0.2% ( 0.2% + 0.4% ) 34 ==3148== 35 ==3148== 36 ==3148== 37 ==3148== --- TLB characteristics - - - 38 ==3148== Virtual Address Size : 32 bits 39 ==3148== Replacement Policy : Least Recently Used 40 ==3148== 41 ==3148== TLB type : iTLB ( L1 Instruction TLB ) 42 ==3148== Associativity : 8 - Way Associative 43 ==3148== Page Size : 4096 bytes 44 ==3148== Entries : 64 45==3148== 46 ==3148== 47 ==3148== TLB type : dTLB ( L1 Data TLB ) 48 ==3148== Associativity : 4 - Way Associative 49 ==3148== Page Size : 4096 bytes 50 ==3148== Entries : 64 51 ==3148== 52 ==3148== 53 ==3148== TLB type : L2TLB ( L2 Unified TLB ) 54 ==3148== Associativity : 8 - Way Associative 55 ==3148== Page Size : 4096 bytes 56 ==3148== Entries : 1024 57 ==3148== 58 ==3148== 59 ==3148== 60 ==3148== --- Results - - - 61 ==3148== 62 ==3148== --- iTLB Stats - - - 63 ==3148== Total Accesses : 987465 64 ==3148== Hits : 987315 65 ==3148== Misses : 150 66 ==3148== Hit ratio : 99.9% 67 ==3148== Miss ratio : 0.0% 68 ==3148== 69 ==3148== 70 ==3148== --- dTLB Stats - - - 71 ==3148== Total Accesses : 494321 72 ==3148== Hits : 493934 73 ==3148== Misses : 387 74 ==3148== Hit ratio : 99.9% 75 ==3148== Miss ratio : 0.0% 76 ==3148== 77 ==3148== 78 ==3148== --- L2TLB Stats - - - 79 ==3148== Total Accesses : 537 80 ==3148== Hits : 313 81 ==3148== Misses : 224 82 ==3148== Hit ratio : 58.2% 83 ==3148== Miss ratio : 41.7% 84 ==3148== ~~~~~~~~~~~~~~~~~ Per TLB page tracking ~~~~~~~~~~~~~~~~~~ Per TLB page tracking (pages accessed and the times they were accessed) is available as an option and information is presented below TLB measuring statistics. In order to demonstrate page tracking, ls will be used again. Since the output regarding the CPU caches and the TLB is the same as before, it will not be shown. Moreover, not all pages for each TLB will be shown since they would occupy a lot of space. Only a portion of the pages accessed will be shown in order to familiarise the reader with the program’s output. $valgrind --tool=cachegrind --tlb-page-sim=yes ls 2 ...( Cachegrind ’s output cropped ) .. 3 ==14046== --- Pages Accessed - - - 4 ==14046== 5 ==14046== iTLB Pages Accesed 6 ==14046== Pages Accessed In total : 36 5 ==14046== 1) Page 00000830 , accessed 1118 times 8 ==14046== 2) Page 00000822 , accessed 146 times 9 .....( page tracking results not shown ) .... 10 ==14046== 34) Page 00000801 , accessed 237529 times 11 ==14046== 35) Page 00000802 , accessed 73013 times 12 ==14046== 36) Page 00000800 , accessed 14996 times 13 ==14046== 14 ==14046== dTLB Pages Accesed 15 ==14046== Pages Accessed In total : 25 16 ==14046== 1) Page 0000 bec7 , accessed 25284 times 17 ==14046== 2) Page 00000435 , accessed 1 times 18 .....( page tracking results not shown ) .... 19 ==14046== 23) Page 00000400 , accessed 1312 times 20 ==14046== 24) Page 00000401 , accessed 8445 times 21 ==14046== 25) Page 0000bec8 , accessed 200037 times 22 ==14046== 23 ==14046== L2TLB Pages Accesed 24 ==14046== Pages Accessed In total : 10 25 ==14046== 1) Page 000017d8 , accessed 1 times 26 ==14046== 2) Page 00000086 , accessed 1 times 27 .....( page tracking results not shown ) .... 28 ==14046== 8) Page 00000100 , accessed 4619 times 29 ==14046== 9) Page 000017d9 , accessed 2 times 30 ==14046== 10) Page 00000080 , accessed 10974 times ~~~~~~~~~~~~~~~~~~~~~~ Per source code line statistics ~~~~~~~~~~~~~~~~~~~~~~ Moreover, I extended Cachegrind to provide per file, per function and per source code line information for the number of TLB accesses and misses of Ir, Dr, and Dw. That kind of information is not available in basic execution and can only be accessed by later analysis and annotation of Cachegrind’s log files. This image http://imgur.com/ncu1CwG illustrates per source code line statistics for the following program: # define SIZE (100) int main () { int i , array [ SIZE ]; for ( i =0; i < SIZE ; i ++) { array [ i ]= i +10; } return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Conclusion - Providing extension code ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I would like to provide my extensions to Valgrind. At the moment I can only provide them for version 3.8.1, however, I could try to extend the latest version (based on the git repo) to include L2 cache information and TLB measuring. Moreover, I can provide more details regarding how my extension works and the tests made regarding the validity of the results or I could perhaps provide my dissertation (given that I will get the permission from my University) where a detailed analysis of all aspects is presented. In this email I tried to illustrate the capabilities of the extension, therefore, I only presented its features and the corresponding output. I would like to know if you are interested in me providing the code of the extensions and, if yes, where can I explicitly provide code. Thank you in advance, Stavros Kaparelos |
|
From: Josef W. <Jos...@gm...> - 2014-05-28 15:10:43
|
Hi Stavros, this looks like a quite straight-forward extension, which is nice. Why does this only work for x86? What is the envisioned use of the detailed page tracking stuff? Can you give some numbers about the performance penalties of the simulator with the added features? Josef Am 26.05.2014 20:51, schrieb stavros kaparelos: > Dear Valgrind Developers, > > As prof. Davenport already explained, for my Bsc project I extended > Cachegrind's version 3.8.1 to provide information for L2 cache and > measure the TLB. > > L2 cache inclusion presents the same information as presented in L1 and > LL caches, while measuring TLB provides per TLB information for: a) the > number of hits and misses, b) the pages used and the times used and c) > identified parts in the code that caused many misses. > > (extensions only work for Intel x86) > > Analysis of each of the capabilities of the extension follows: > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > L2 cache inclusion and per TLB hits and misses statistics > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The basic execution of the extension when run upon the Unix program ls > is shown below: > (L2 cache information is shown within lines 11-33, while lines 34-84 > contain TLB measuring information) > > 1 $valgrind --tool=cachegrind ls > > 2 ==3148== Cachegrind , a TLB , cache and branch - prediction profiler > > 3 ==3148== Copyright ( C ) 2002 -2012 , and GNU GPL ’d , by Nicholas > Nethercote et al . > > 4 ==3148== Using Valgrind -3.8.1 and LibVEX ; rerun with -h for > copyright info > > 5 ==3148== Command : ls > > 6 ==3148== > > 7 - -3148 - - warning : L4 cache ignored > > 8 - -3148 - - warning : L3 cache found , using its data for the LL > simulation . > > 9 ...( ls command output cropped )... > > 10 ==3148== > > 11 ==3148== I refs : 987 ,465 > > 12 ==3148== I1 misses : 1 ,631 > > 13 ==3148== L2i misses : 1 ,510 > > 14 ==3148== LLi misses : 1 ,505 > > 15 ==3148== I1 miss rate : 0.16% > > 16 ==3148== L2i miss rate : 0.15% > > 17 ==3148== LLi miss rate : 0.15% > > 18 ==3148== > > 19 ==3148== D refs : 494 ,321 (353 ,809 rd + 140 ,512 wr ) > > 20 ==3148== D1 misses : 4 ,185 ( 3 ,433 rd + 752 wr ) > > 21 ==3148== L2d misses : 2 ,880 ( 2 ,201 rd + 679 wr ) > > 22 ==3148== LLd misses : 2 ,826 ( 2 ,163 rd + 663 wr ) > > 23 ==3148== D1 miss rate : 0.8% ( 0.9% + 0.5% ) > > 24 ==3148== L2d miss rate : 0.5% ( 0.6% + 0.4% ) > > 25 ==3148== LLd miss rate : 0.5% ( 0.6% + 0.4% ) > > 26 ==3148== > > 27 ==3148== L2 refs : 5 ,816 ( 5 ,064 rd + 752 wr ) > > 28 ==3148== L2 misses : 4 ,390 ( 3 ,711 rd + 679 wr ) > > 29 ==3148== L2 miss rate : 0.2% ( 0.2% + 0.4% ) > > 30 ==3148== > > 31 ==3148== LL refs : 4 ,390 ( 3 ,711 rd + 679 wr ) > > 32 ==3148== LL misses : 4 ,331 ( 3 ,668 rd + 663 wr ) > > 33 ==3148== LL miss rate : 0.2% ( 0.2% + 0.4% ) > > 34 ==3148== > > 35 ==3148== > > 36 ==3148== > > 37 ==3148== --- TLB characteristics - - - > > 38 ==3148== Virtual Address Size : 32 bits > > 39 ==3148== Replacement Policy : Least Recently Used > > 40 ==3148== > > 41 ==3148== TLB type : iTLB ( L1 Instruction TLB ) > > 42 ==3148== Associativity : 8 - Way Associative > > 43 ==3148== Page Size : 4096 bytes > > 44 ==3148== Entries : 64 > > 45==3148== > > 46 ==3148== > > 47 ==3148== TLB type : dTLB ( L1 Data TLB ) > > 48 ==3148== Associativity : 4 - Way Associative > > 49 ==3148== Page Size : 4096 bytes > > 50 ==3148== Entries : 64 > > 51 ==3148== > > 52 ==3148== > > 53 ==3148== TLB type : L2TLB ( L2 Unified TLB ) > > 54 ==3148== Associativity : 8 - Way Associative > > 55 ==3148== Page Size : 4096 bytes > > 56 ==3148== Entries : 1024 > > 57 ==3148== > > 58 ==3148== > > 59 ==3148== > > 60 ==3148== --- Results - - - > > 61 ==3148== > > 62 ==3148== --- iTLB Stats - - - > > 63 ==3148== Total Accesses : 987465 > > 64 ==3148== Hits : 987315 > > 65 ==3148== Misses : 150 > > 66 ==3148== Hit ratio : 99.9% > > 67 ==3148== Miss ratio : 0.0% > > 68 ==3148== > > 69 ==3148== > > 70 ==3148== --- dTLB Stats - - - > > 71 ==3148== Total Accesses : 494321 > > 72 ==3148== Hits : 493934 > > 73 ==3148== Misses : 387 > > 74 ==3148== Hit ratio : 99.9% > > 75 ==3148== Miss ratio : 0.0% > > 76 ==3148== > > 77 ==3148== > > 78 ==3148== --- L2TLB Stats - - - > > 79 ==3148== Total Accesses : 537 > > 80 ==3148== Hits : 313 > > 81 ==3148== Misses : 224 > > 82 ==3148== Hit ratio : 58.2% > > 83 ==3148== Miss ratio : 41.7% > > 84 ==3148== > > ~~~~~~~~~~~~~~~~~ > Per TLB page tracking > ~~~~~~~~~~~~~~~~~~ > > Per TLB page tracking (pages accessed and the times they were accessed) > is available as an option and information is presented below TLB > measuring statistics. In order to demonstrate page tracking, ls will be > used again. Since the output regarding the CPU caches and the TLB is the > same as before, it will not be shown. Moreover, not all pages for each > TLB will be shown since they would occupy a lot of space. Only a portion > of the pages accessed will be shown in order to familiarise the reader > with the program’s output. > > $valgrind --tool=cachegrind --tlb-page-sim=yes ls > > 2 ...( Cachegrind ’s output cropped ) .. > > 3 ==14046== --- Pages Accessed - - - > > 4 ==14046== > > 5 ==14046== iTLB Pages Accesed > > 6 ==14046== Pages Accessed In total : 36 > > 5 ==14046== 1) Page 00000830 , accessed 1118 times > > 8 ==14046== 2) Page 00000822 , accessed 146 times > > 9 .....( page tracking results not shown ) .... > > 10 ==14046== 34) Page 00000801 , accessed 237529 times > > 11 ==14046== 35) Page 00000802 , accessed 73013 times > > 12 ==14046== 36) Page 00000800 , accessed 14996 times > > 13 ==14046== > > 14 ==14046== dTLB Pages Accesed > > 15 ==14046== Pages Accessed In total : 25 > > 16 ==14046== 1) Page 0000 bec7 , accessed 25284 times > > 17 ==14046== 2) Page 00000435 , accessed 1 times > > 18 .....( page tracking results not shown ) .... > > 19 ==14046== 23) Page 00000400 , accessed 1312 times > > 20 ==14046== 24) Page 00000401 , accessed 8445 times > > 21 ==14046== 25) Page 0000bec8 , accessed 200037 times > > 22 ==14046== > > 23 ==14046== L2TLB Pages Accesed > > 24 ==14046== Pages Accessed In total : 10 > > 25 ==14046== 1) Page 000017d8 , accessed 1 times > > 26 ==14046== 2) Page 00000086 , accessed 1 times > > 27 .....( page tracking results not shown ) .... > > 28 ==14046== 8) Page 00000100 , accessed 4619 times > > 29 ==14046== 9) Page 000017d9 , accessed 2 times > > 30 ==14046== 10) Page 00000080 , accessed 10974 times > > > ~~~~~~~~~~~~~~~~~~~~~~ > Per source code line statistics > ~~~~~~~~~~~~~~~~~~~~~~ > > Moreover, I extended Cachegrind to provide per file, per function and per > source code line information for the number of TLB accesses and misses > of Ir, Dr, and Dw. That kind of information is not available in basic > execution and can only be accessed by later analysis and annotation of > Cachegrind’s log files. This image http://imgur.com/ncu1CwG illustrates > per source code line statistics for the following program: > > > # define SIZE (100) > > int main () { > > int i , array [ SIZE ]; > > for ( i =0; i < SIZE ; i ++) { > > array [ i ]= i +10; > > } > > return 0; > > } > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Conclusion - Providing extension code > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > I would like to provide my extensions to Valgrind. At the moment I can > only provide them for version 3.8.1, however, I could try to extend the > latest version (based on the git repo) to include L2 cache information > and TLB measuring. Moreover, I can provide more details regarding how my > extension works and the tests made regarding the validity of the results > or I could perhaps provide my dissertation (given that I will get the > permission from my University) where a detailed analysis of all aspects > is presented. In this email I tried to illustrate the capabilities of > the extension, therefore, I only presented its features and the > corresponding output. I would like to know if you are interested in me > providing the code of the extensions and, if yes, where can I explicitly > provide code. > > Thank you in advance, > Stavros Kaparelos > > > > ------------------------------------------------------------------------------ > The best possible search technologies are now affordable for all companies. > Download your FREE open source Enterprise Search Engine today! > Our experts will assist you in its installation for $59/mo, no commitment. > Test it for FREE on our Cloud platform anytime! > http://pubads.g.doubleclick.net/gampad/clk?id=145328191&iu=/4140/ostg.clktrk > > > > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |