You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
1
(6) |
|
2
(6) |
3
(9) |
4
(4) |
5
(1) |
6
|
7
|
8
|
|
9
|
10
(2) |
11
(1) |
12
(2) |
13
(4) |
14
(6) |
15
(8) |
|
16
(9) |
17
(5) |
18
(13) |
19
(6) |
20
(15) |
21
(17) |
22
(19) |
|
23
(2) |
24
(4) |
25
(2) |
26
(10) |
27
(6) |
28
(9) |
29
(3) |
|
30
|
|
|
|
|
|
|
|
From: Paul F. <pa...@so...> - 2023-04-22 07:39:01
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=3b93737876f443709ee0cec81627fe1186f44862 commit 3b93737876f443709ee0cec81627fe1186f44862 Author: Paul Floyd <pj...@wa...> Date: Sat Apr 22 09:37:25 2023 +0200 regtest: add C++11 flag to build of user_histo1.cpp For old compilers that don't default to C++11 or later Diff: --- dhat/tests/Makefile.am | 1 + 1 file changed, 1 insertion(+) diff --git a/dhat/tests/Makefile.am b/dhat/tests/Makefile.am index 818cc10d08..aba7f06fdf 100644 --- a/dhat/tests/Makefile.am +++ b/dhat/tests/Makefile.am @@ -37,3 +37,4 @@ big_CFLAGS = $(AM_CFLAGS) -Wno-unused-result copy_CFLAGS = $(AM_CFLAGS) -fno-builtin user_histo1_SOURCES = user_histo1.cpp +user_histo1_CXXFLAGS = $(AM_CXXFLAGS) -std=c++11 |
|
From: Tobias B. <tb...@we...> - 2023-04-22 06:29:10
|
Hi, a big “THANK YOU!” to all Valgrind developers. Unfortunately, these days I have to deal primarily with nonsensical Java code instead of nonsensical C++ code, so I rarely get to use my favorite tool memcheck anymore. XD From NEWS: > […] It is for example possible to do: > (gdb) memcheck who_point_at &some_struct sizeof(some_struct) > instead of: > (gdb) p &some_struct > $2 = (some_struct_type *) 0x1130a0 <some_struct> > (gdb) p sizeof(some_struct) > $3 = 40 > (gdb) monitor who_point_at 0x1130a0 40 That sounds awesome, thanks! Minor nitpicking: Building c8832cb2d on Ubuntu (MATE) 20.04.6 LTS with gcc 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1) produces these 3 warnings: vgdb.c: In function ‘do_multi_mode’: vgdb.c:1332:11: warning: ignoring return value of ‘asprintf’, declared with attribute warn_unused_result [-Wunused-result] 1332 | asprintf (&reply, | ^~~~~~~~~~~~~~~~~ […] 1347 | "QSetWorkingDir+", (UInt)PBUFSIZ - 1); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vgdb.c: In function ‘fork_and_exec_valgrind’: vgdb.c:1228:13: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result] 1228 | write (pipefd[1], &err, sizeof (int)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vgdb.c:1283:7: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result] 1283 | write (pipefd[1], &err, sizeof (int)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 8 bits of contribution in NEWS: - from different terminals. So for example to start you program + from different terminals. So for example to start your program Tobias On 22.04.23 03:47, Mark Wielaard wrote: > An RC2 tarball for 3.21.0 is now available at > https://sourceware.org/pub/valgrind/valgrind-3.21.0.RC2.tar.bz2 > (md5sum = f33407fdffbfa78f5014781cc92297cf) > (sha1sum = c520ee0c28d9e20d28aa25d05ce2525c39a69135) > https://sourceware.org/pub/valgrind/valgrind-3.21.0.RC2.tar.bz2.asc > > Please give it a try in configurations that are important for you and > report any problems you have, either on this mailing list, or > (preferably) via our bug tracker at > https://bugs.kde.org/enter_bug.cgi?product=valgrind > > Please check the NEWS entry below for new features that could use some > extra testing. Note that there has also been a dhat extension which > hasn't yet been added to NEWS. > > There is now a a client request for DHAT to mark memory to be > histogrammed: > https://bugs.kde.org/464103 > https://snapshots.sourceware.org/valgrind/trunk/latest/html/dh-manual.html#dh-access-counts > > If nothing critical emerges, a final release will happen on > Friday 28 April. > > * ==================== CORE CHANGES =================== > > * When GDB is used to debug a program running under valgrind using > the valgrind gdbserver, GDB will automatically load some > python code provided in valgrind defining GDB front end commands > corresponding to the valgrind monitor commands. > These GDB front end commands accept the same format as > the monitor commands directly sent to the Valgrind gdbserver. > These GDB front end commands provide a better integration > in the GDB command line interface, so as to use for example > GDB auto-completion, command specific help, searching for > a command or command help matching a regexp, ... > For relevant monitor commands, GDB will evaluate arguments > to make the use of monitor commands easier. > For example, instead of having to print the address of a variable > to pass it to a subsequent monitor command, the GDB front end > command will evaluate the address argument. It is for example > possible to do: > (gdb) memcheck who_point_at &some_struct sizeof(some_struct) > instead of: > (gdb) p &some_struct > $2 = (some_struct_type *) 0x1130a0 <some_struct> > (gdb) p sizeof(some_struct) > $3 = 40 > (gdb) monitor who_point_at 0x1130a0 40 > > * The vgdb utility now supports extended-remote protocol when > invoked with --multi. In this mode the GDB run command is > supported. Which means you don't need to run gdb and valgrind > from different terminals. So for example to start you program > in gdb and run it under valgrind you can do: > $ gdb prog > (gdb) set remote exec-file prog > (gdb) set sysroot / > (gdb) target extended-remote | vgdb --multi > (gdb) start > > * The behaviour of realloc with a size of zero can now > be changed for tools that intercept malloc. Those > tools are memcheck, helgrind, drd, massif and dhat. > Realloc implementations generally do one of two things > - free the memory like free() and return NULL > (GNU libc and ptmalloc). > - either free the memory and then allocate a > minumum siized block or just return the > original pointer. Return NULL if the > allocation of the minimum sized block fails > (jemalloc, musl, snmalloc, Solaris, macOS). > When Valgrind is configured and built it will > try to match the OS and libc behaviour. However > if you are using a non-default library to replace > malloc and family (e.g., musl on a glibc Linux or > tcmalloc on FreeBSD) then you can use a command line > option to change the behaviour of Valgrind: > --realloc-zero-bytes-frees=yes|no [yes on Linux glibc, no otherwise] > > * ================== PLATFORM CHANGES ================= > > * Make the address space limit on FreeBSD amd64 128Gbytes > (the same as Linux and Solaris, it was 32Gbytes) > > * ==================== TOOL CHANGES =================== > > * Memcheck: > - When doing a delta leak_search, it is now possible to only > output the new loss records compared to the previous leak search. > This is available in the memcheck monitor command 'leak_search' > by specifying the "new" keyword or in your program by using > the client request VALGRIND_DO_NEW_LEAK_CHECK. > Whenever a "delta" leak search is done (i.e. when specifying > "new" or "increased" or "changed" in the monitor command), > the new loss records have a "new" marker. > - Valgrind now contains python code that defines GDB memcheck > front end monitor commands. See CORE CHANGES. > - Performs checks for the use of realloc with a size of zero. > This is non-portable and a source of errors. If memcheck > detects such a usage it will generate an error > realloc() with size 0 > followed by the usual callstacks. > A switch has been added to allow this to be turned off: > --show-realloc-size-zero=yes|no [yes] > > * Helgrind: > - The option ---history-backtrace-size=<number> allows to configure > the number of entries to record in the stack traces of "old" > accesses. Previously, this number was hardcoded to 8. > - Valgrind now contains python code that defines GDB helgrind > front end monitor commands. See CORE CHANGES. > > * Cachegrind: > - `--cache-sim=no` is now the default. The cache simulation is old and > unlikely to match any real modern machine. This means only the `Ir` > event are gathered by default, but that is by far the most useful > event. > - `cg_annotate`, `cg_diff`, and `cg_merge` have been rewritten in > Python. As a result, they all have more flexible command line > argument handling, e.g. supporting `--show-percs` and > `--no-show-percs` forms as well as the existing `--show-percs=yes` > and `--show-percs=no`. > - `cg_annotate` has some functional changes. > - It's much faster, e.g. 3-4x on common cases. > - It now supports diffing (with `--diff`, `--mod-filename`, and > `--mod-funcname`) and merging (by passing multiple data files). > - It now provides more information at the file and function level. > There are now "File:function" and "Function:file" sections. These > are very useful for programs that use inlining a lot. > - Support for user-annotated files and the `-I`/`--include` option > has been removed, because it was of little use and blocked other > improvements. > - The `--auto` option is renamed `--annotate`, though the old > `--auto=yes`/`--auto=no` forms are still supported. > - `cg_diff` and `cg_merge` are now deprecated, because `cg_annotate` > now does a better job of diffing and merging. > - The Cachegrind output file format has changed very slightly, but in > ways nobody is likely to notice. > > * Callgrind: > - Valgrind now contains python code that defines GDB callgrind > front end monitor commands. See CORE CHANGES. > > * Massif: > - Valgrind now contains python code that defines GDB massif > front end monitor commands. See CORE CHANGES. > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |
|
From: Mark W. <ma...@kl...> - 2023-04-22 01:47:41
|
An RC2 tarball for 3.21.0 is now available at https://sourceware.org/pub/valgrind/valgrind-3.21.0.RC2.tar.bz2 (md5sum = f33407fdffbfa78f5014781cc92297cf) (sha1sum = c520ee0c28d9e20d28aa25d05ce2525c39a69135) https://sourceware.org/pub/valgrind/valgrind-3.21.0.RC2.tar.bz2.asc Please give it a try in configurations that are important for you and report any problems you have, either on this mailing list, or (preferably) via our bug tracker at https://bugs.kde.org/enter_bug.cgi?product=valgrind Please check the NEWS entry below for new features that could use some extra testing. Note that there has also been a dhat extension which hasn't yet been added to NEWS. There is now a a client request for DHAT to mark memory to be histogrammed: https://bugs.kde.org/464103 https://snapshots.sourceware.org/valgrind/trunk/latest/html/dh-manual.html#dh-access-counts If nothing critical emerges, a final release will happen on Friday 28 April. * ==================== CORE CHANGES =================== * When GDB is used to debug a program running under valgrind using the valgrind gdbserver, GDB will automatically load some python code provided in valgrind defining GDB front end commands corresponding to the valgrind monitor commands. These GDB front end commands accept the same format as the monitor commands directly sent to the Valgrind gdbserver. These GDB front end commands provide a better integration in the GDB command line interface, so as to use for example GDB auto-completion, command specific help, searching for a command or command help matching a regexp, ... For relevant monitor commands, GDB will evaluate arguments to make the use of monitor commands easier. For example, instead of having to print the address of a variable to pass it to a subsequent monitor command, the GDB front end command will evaluate the address argument. It is for example possible to do: (gdb) memcheck who_point_at &some_struct sizeof(some_struct) instead of: (gdb) p &some_struct $2 = (some_struct_type *) 0x1130a0 <some_struct> (gdb) p sizeof(some_struct) $3 = 40 (gdb) monitor who_point_at 0x1130a0 40 * The vgdb utility now supports extended-remote protocol when invoked with --multi. In this mode the GDB run command is supported. Which means you don't need to run gdb and valgrind from different terminals. So for example to start you program in gdb and run it under valgrind you can do: $ gdb prog (gdb) set remote exec-file prog (gdb) set sysroot / (gdb) target extended-remote | vgdb --multi (gdb) start * The behaviour of realloc with a size of zero can now be changed for tools that intercept malloc. Those tools are memcheck, helgrind, drd, massif and dhat. Realloc implementations generally do one of two things - free the memory like free() and return NULL (GNU libc and ptmalloc). - either free the memory and then allocate a minumum siized block or just return the original pointer. Return NULL if the allocation of the minimum sized block fails (jemalloc, musl, snmalloc, Solaris, macOS). When Valgrind is configured and built it will try to match the OS and libc behaviour. However if you are using a non-default library to replace malloc and family (e.g., musl on a glibc Linux or tcmalloc on FreeBSD) then you can use a command line option to change the behaviour of Valgrind: --realloc-zero-bytes-frees=yes|no [yes on Linux glibc, no otherwise] * ================== PLATFORM CHANGES ================= * Make the address space limit on FreeBSD amd64 128Gbytes (the same as Linux and Solaris, it was 32Gbytes) * ==================== TOOL CHANGES =================== * Memcheck: - When doing a delta leak_search, it is now possible to only output the new loss records compared to the previous leak search. This is available in the memcheck monitor command 'leak_search' by specifying the "new" keyword or in your program by using the client request VALGRIND_DO_NEW_LEAK_CHECK. Whenever a "delta" leak search is done (i.e. when specifying "new" or "increased" or "changed" in the monitor command), the new loss records have a "new" marker. - Valgrind now contains python code that defines GDB memcheck front end monitor commands. See CORE CHANGES. - Performs checks for the use of realloc with a size of zero. This is non-portable and a source of errors. If memcheck detects such a usage it will generate an error realloc() with size 0 followed by the usual callstacks. A switch has been added to allow this to be turned off: --show-realloc-size-zero=yes|no [yes] * Helgrind: - The option ---history-backtrace-size=<number> allows to configure the number of entries to record in the stack traces of "old" accesses. Previously, this number was hardcoded to 8. - Valgrind now contains python code that defines GDB helgrind front end monitor commands. See CORE CHANGES. * Cachegrind: - `--cache-sim=no` is now the default. The cache simulation is old and unlikely to match any real modern machine. This means only the `Ir` event are gathered by default, but that is by far the most useful event. - `cg_annotate`, `cg_diff`, and `cg_merge` have been rewritten in Python. As a result, they all have more flexible command line argument handling, e.g. supporting `--show-percs` and `--no-show-percs` forms as well as the existing `--show-percs=yes` and `--show-percs=no`. - `cg_annotate` has some functional changes. - It's much faster, e.g. 3-4x on common cases. - It now supports diffing (with `--diff`, `--mod-filename`, and `--mod-funcname`) and merging (by passing multiple data files). - It now provides more information at the file and function level. There are now "File:function" and "Function:file" sections. These are very useful for programs that use inlining a lot. - Support for user-annotated files and the `-I`/`--include` option has been removed, because it was of little use and blocked other improvements. - The `--auto` option is renamed `--annotate`, though the old `--auto=yes`/`--auto=no` forms are still supported. - `cg_diff` and `cg_merge` are now deprecated, because `cg_annotate` now does a better job of diffing and merging. - The Cachegrind output file format has changed very slightly, but in ways nobody is likely to notice. * Callgrind: - Valgrind now contains python code that defines GDB callgrind front end monitor commands. See CORE CHANGES. * Massif: - Valgrind now contains python code that defines GDB massif front end monitor commands. See CORE CHANGES. |
|
From: Mark W. <ma...@so...> - 2023-04-22 01:42:32
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=c8832cb2ddf08c6f8c28ff4fd511561e15ec120e commit c8832cb2ddf08c6f8c28ff4fd511561e15ec120e Author: Mark Wielaard <ma...@kl...> Date: Sat Apr 22 03:41:04 2023 +0200 Add helgrind/tests/garbage.filtered.out to .gitignore Diff: --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index 71f1828d66..27183b99b9 100644 --- a/.gitignore +++ b/.gitignore @@ -661,6 +661,7 @@ /helgrind/tests/cond_timedwait_invalid /helgrind/tests/cond_timedwait_test /helgrind/tests/free_is_write +/helgrind/tests/garbage.filtered.out /helgrind/tests/hg01_all_ok /helgrind/tests/hg02_deadlock /helgrind/tests/hg03_inherit |
|
From: Mark W. <ma...@so...> - 2023-04-22 01:21:27
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=181b3f24673145e6d1591d974d8f592d391bf731 commit 181b3f24673145e6d1591d974d8f592d391bf731 Author: Mark Wielaard <ma...@kl...> Date: Sat Apr 22 03:00:27 2023 +0200 Set version to 3.21.0-RC2 Diff: --- NEWS | 1 + configure.ac | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index 373de5ef1a..bff5b5d760 100644 --- a/NEWS +++ b/NEWS @@ -190,6 +190,7 @@ To see details of a given bug, visit where XXXXXX is the bug number as listed above. (3.21.0.RC1: 14 Apr 2023) +(3.21.0.RC2: 21 Apr 2023) Release 3.20.0 (24 Oct 2022) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/configure.ac b/configure.ac index 8217136d9e..8470daebd1 100755 --- a/configure.ac +++ b/configure.ac @@ -17,8 +17,8 @@ m4_define([v_major_ver], [3]) m4_define([v_minor_ver], [21]) m4_define([v_micro_ver], [0]) -m4_define([v_suffix_ver], [RC1]) -m4_define([v_rel_date], ["14 Apr 2023"]) +m4_define([v_suffix_ver], [RC2]) +m4_define([v_rel_date], ["21 Apr 2023"]) m4_define([v_version], m4_if(v_suffix_ver, [], [v_major_ver.v_minor_ver.v_micro_ver], |
|
From: Nicholas N. <n.n...@gm...> - 2023-04-21 23:53:43
|
Failing to add test files to Makefile.am is a really easy mistake to make, and one I've done multiple times recently. Mark, you've often fixed these, how do you detect them? I did a try push beforehand and it didn't detect the missing file. Nick On Sat, 22 Apr 2023 at 00:18, Mark Wielaard <ma...@so...> wrote: > > https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=9d6d45cfdebc62099640d8ae566dc9c8577ba514 > > commit 9d6d45cfdebc62099640d8ae566dc9c8577ba514 > Author: Mark Wielaard <ma...@kl...> > Date: Fri Apr 21 16:15:15 2023 +0200 > > Add cachegrind/tests/ann-diff4b-aux/w.rs > > Missing testfile from commit 1fdf0e728a047f0aab4de805576b6a3a84f37b79 > "Add diff and merge capability to `cg_annotate`." > > Diff: > --- > cachegrind/tests/ann-diff4b-aux/w.rs | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/cachegrind/tests/ann-diff4b-aux/w.rs > b/cachegrind/tests/ann-diff4b-aux/w.rs > new file mode 100644 > index 0000000000..4cb29ea38f > --- /dev/null > +++ b/cachegrind/tests/ann-diff4b-aux/w.rs > @@ -0,0 +1,3 @@ > +one > +two > +three > > > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |
|
From: Nicholas N. <n.n...@gm...> - 2023-04-21 23:52:11
|
An interesting failure mode here: I actually did a try push for the commits that introduced this XML error. After try-pushing, I waited, then got some "tests succeeded" emails, and so I figured I was good to go and did the master-push. And then shortly after that I got some "tests failed" emails for different platforms. First, it's surprising that this XML problem was detected on some platforms but not others. Is the reason for this known? It seems sub-optimal. Also, is it possible to make this mistake harder/impossible to make? E.g. a single email with all the results, rather than multiple emails that arrive at different times. Failing that, at least the README_DEVELOPERS file could be clearer, it currently says this: > When all builders have build your patch the buildbot will sent you (or actually the patch author) > an email telling you if any builds failed and references to all the logs. Perhaps looking at https://builder.sourceware.org/buildbot/#/builders?tags=valgrind-try should be preferred to relying on the emails? It's frustrating that I tried to do the right thing here and still got it wrong. Stronger protections would be helpful. Nick On Fri, 21 Apr 2023 at 23:02, Nicholas Nethercote <nj...@so...> wrote: > > https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=b12703598ab42d377c3ecedc71f2e1324d932cd2 > > commit b12703598ab42d377c3ecedc71f2e1324d932cd2 > Author: Nicholas Nethercote <n.n...@gm...> > Date: Fri Apr 21 23:00:39 2023 +1000 > > Fix two xmllint errors. > > Diff: > --- > cachegrind/docs/cg-manual.xml | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/cachegrind/docs/cg-manual.xml b/cachegrind/docs/cg-manual.xml > index 35d6a412e3..a14cd7c926 100644 > --- a/cachegrind/docs/cg-manual.xml > +++ b/cachegrind/docs/cg-manual.xml > @@ -1065,8 +1065,6 @@ Cachegrind-specific options are: > <sect1 id="cg-manual.mergeopts" xreflabel="cg_merge Command-line Options"> > <title>cg_merge Command-line Options</title> > > -Although cg_merge is deprecated, its options are listed here for > completeness. > - > <!-- start of xi:include in the manpage --> > <variablelist id="cg_merge.opts.list"> > > @@ -1091,8 +1089,6 @@ Although cg_merge is deprecated, its options are > listed here for completeness. > <sect1 id="cg-manual.diffopts" xreflabel="cg_diff Command-line Options"> > <title>cg_diff Command-line Options</title> > > -Although cg_diff is deprecated, its options are listed here for > completeness. > - > <!-- start of xi:include in the manpage --> > <variablelist id="cg_diff.opts.list"> > > > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |
|
From: Paul F. <pa...@so...> - 2023-04-21 21:36:19
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=f5416b5edd9afa96f0096c0b63842aaa58b080e3 commit f5416b5edd9afa96f0096c0b63842aaa58b080e3 Author: Paul Floyd <pj...@wa...> Date: Fri Apr 21 23:35:37 2023 +0200 regtest: dhat/tests/user_histo1 filter out summary Too system dependent. Diff: --- dhat/tests/Makefile.am | 2 +- dhat/tests/filter_user_histo | 3 ++- dhat/tests/user_histo1.stderr.exp | 5 ----- dhat/tests/user_histo1.stderr.exp-gcc | 8 -------- 4 files changed, 3 insertions(+), 15 deletions(-) diff --git a/dhat/tests/Makefile.am b/dhat/tests/Makefile.am index bdc0f15688..818cc10d08 100644 --- a/dhat/tests/Makefile.am +++ b/dhat/tests/Makefile.am @@ -13,7 +13,7 @@ EXTRA_DIST = \ sig.stderr.exp sig.vgtest \ single.stderr.exp single.vgtest \ user_histo1.stderr.exp user_histo1.vgtest \ - user_histo1.stdout.exp user_histo1.stderr.exp-gcc + user_histo1.stdout.exp check_PROGRAMS = \ acc \ diff --git a/dhat/tests/filter_user_histo b/dhat/tests/filter_user_histo index c872a9f8e7..176a93e7a5 100755 --- a/dhat/tests/filter_user_histo +++ b/dhat/tests/filter_user_histo @@ -5,5 +5,6 @@ # So we allow 1,000,000..1,009,999 bytes and 1,000..1,099 blocks. ./filter_stderr "$@" | -sed -e "s/address for user histogram request not found .*/address for user histogram request not found/" +sed -e "s/address for user histogram request not found .*/address for user histogram request not found/" | +grep Warning diff --git a/dhat/tests/user_histo1.stderr.exp b/dhat/tests/user_histo1.stderr.exp index 8a96f6c9d8..819765a19f 100644 --- a/dhat/tests/user_histo1.stderr.exp +++ b/dhat/tests/user_histo1.stderr.exp @@ -1,8 +1,3 @@ Warning: request for user histogram of size 500 is smaller than the normal histogram limit, request ignored Warning: address for user histogram request not found Warning: request for user histogram of size 100000 is larger than the maximum user request limit, request ignored -Total: 102,500 bytes in 3 blocks -At t-gmax: 100,500 bytes in 2 blocks -At t-end: 0 bytes in 0 blocks -Reads: 1,000 bytes -Writes: 102,520 bytes diff --git a/dhat/tests/user_histo1.stderr.exp-gcc b/dhat/tests/user_histo1.stderr.exp-gcc deleted file mode 100644 index b7fe054823..0000000000 --- a/dhat/tests/user_histo1.stderr.exp-gcc +++ /dev/null @@ -1,8 +0,0 @@ -Warning: request for user histogram of size 500 is smaller than the normal histogram limit, request ignored -Warning: address for user histogram request not found -Warning: request for user histogram of size 100000 is larger than the maximum user request limit, request ignored -Total: 175,204 bytes in 4 blocks -At t-gmax: 173,204 bytes in 3 blocks -At t-end: 0 bytes in 0 blocks -Reads: 1,001 bytes -Writes: 102,536 bytes |
|
From: Paul F. <pa...@so...> - 2023-04-21 21:18:18
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=577912e62c411c438a86ce99032cd5c31f12b68a commit 577912e62c411c438a86ce99032cd5c31f12b68a Author: Paul Floyd <pj...@wa...> Date: Fri Apr 21 23:17:32 2023 +0200 regtest: add another expected for dhat/tests/user_histo1 Diff: --- dhat/tests/Makefile.am | 2 +- dhat/tests/user_histo1.stderr.exp-gcc | 8 ++++++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/dhat/tests/Makefile.am b/dhat/tests/Makefile.am index 818cc10d08..bdc0f15688 100644 --- a/dhat/tests/Makefile.am +++ b/dhat/tests/Makefile.am @@ -13,7 +13,7 @@ EXTRA_DIST = \ sig.stderr.exp sig.vgtest \ single.stderr.exp single.vgtest \ user_histo1.stderr.exp user_histo1.vgtest \ - user_histo1.stdout.exp + user_histo1.stdout.exp user_histo1.stderr.exp-gcc check_PROGRAMS = \ acc \ diff --git a/dhat/tests/user_histo1.stderr.exp-gcc b/dhat/tests/user_histo1.stderr.exp-gcc new file mode 100644 index 0000000000..b7fe054823 --- /dev/null +++ b/dhat/tests/user_histo1.stderr.exp-gcc @@ -0,0 +1,8 @@ +Warning: request for user histogram of size 500 is smaller than the normal histogram limit, request ignored +Warning: address for user histogram request not found +Warning: request for user histogram of size 100000 is larger than the maximum user request limit, request ignored +Total: 175,204 bytes in 4 blocks +At t-gmax: 173,204 bytes in 3 blocks +At t-end: 0 bytes in 0 blocks +Reads: 1,001 bytes +Writes: 102,536 bytes |
|
From: Paul F. <pa...@so...> - 2023-04-21 21:12:14
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=869e6e28b10bf01bf75feed4219699c9d896c3da commit 869e6e28b10bf01bf75feed4219699c9d896c3da Author: Paul Floyd <pj...@wa...> Date: Fri Apr 21 23:10:51 2023 +0200 regtest: filter error address from dhat/tests/user_histo1 Diff: --- dhat/tests/Makefile.am | 2 +- dhat/tests/filter_user_histo | 9 +++++++++ dhat/tests/user_histo1.stderr.exp | 2 +- dhat/tests/user_histo1.vgtest | 1 + 4 files changed, 12 insertions(+), 2 deletions(-) diff --git a/dhat/tests/Makefile.am b/dhat/tests/Makefile.am index df36ef2b4b..818cc10d08 100644 --- a/dhat/tests/Makefile.am +++ b/dhat/tests/Makefile.am @@ -1,7 +1,7 @@ include $(top_srcdir)/Makefile.tool-tests.am -dist_noinst_SCRIPTS = filter_stderr filter_copy +dist_noinst_SCRIPTS = filter_stderr filter_copy filter_user_histo EXTRA_DIST = \ acc.stderr.exp acc.vgtest \ diff --git a/dhat/tests/filter_user_histo b/dhat/tests/filter_user_histo new file mode 100755 index 0000000000..c872a9f8e7 --- /dev/null +++ b/dhat/tests/filter_user_histo @@ -0,0 +1,9 @@ +#! /bin/sh + +# It's impossible to get exact matches for copy counts because even trivial C +# programs do a few memcpy/strcpy calls. So we allow some fuzzy matching. +# So we allow 1,000,000..1,009,999 bytes and 1,000..1,099 blocks. + +./filter_stderr "$@" | +sed -e "s/address for user histogram request not found .*/address for user histogram request not found/" + diff --git a/dhat/tests/user_histo1.stderr.exp b/dhat/tests/user_histo1.stderr.exp index 59206b17e3..8a96f6c9d8 100644 --- a/dhat/tests/user_histo1.stderr.exp +++ b/dhat/tests/user_histo1.stderr.exp @@ -1,5 +1,5 @@ Warning: request for user histogram of size 500 is smaller than the normal histogram limit, request ignored -Warning: address for user histogram request not found 55b2820 +Warning: address for user histogram request not found Warning: request for user histogram of size 100000 is larger than the maximum user request limit, request ignored Total: 102,500 bytes in 3 blocks At t-gmax: 100,500 bytes in 2 blocks diff --git a/dhat/tests/user_histo1.vgtest b/dhat/tests/user_histo1.vgtest index c44637a14a..7c5d52e77d 100644 --- a/dhat/tests/user_histo1.vgtest +++ b/dhat/tests/user_histo1.vgtest @@ -1,3 +1,4 @@ prog: user_histo1 vgopts: --dhat-out-file=dhat.out +stderr_filter: filter_user_histo cleanup: rm dhat.out |
|
From: Paul F. <pa...@so...> - 2023-04-21 20:58:06
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=dc5209d42b65f1c428108e92615ac4bae3681706 commit dc5209d42b65f1c428108e92615ac4bae3681706 Author: Paul Floyd <pj...@wa...> Date: Fri Apr 21 22:57:09 2023 +0200 Add missing user_histo1.stdout.exp to EXTRA_DIST Diff: --- dhat/tests/Makefile.am | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/dhat/tests/Makefile.am b/dhat/tests/Makefile.am index e504b46551..df36ef2b4b 100644 --- a/dhat/tests/Makefile.am +++ b/dhat/tests/Makefile.am @@ -12,7 +12,8 @@ EXTRA_DIST = \ empty.stderr.exp empty.vgtest \ sig.stderr.exp sig.vgtest \ single.stderr.exp single.vgtest \ - user_histo1.stderr.exp user_histo1.vgtest + user_histo1.stderr.exp user_histo1.vgtest \ + user_histo1.stdout.exp check_PROGRAMS = \ acc \ |
|
From: Paul F. <pa...@so...> - 2023-04-21 19:22:33
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=424340403c5d9eab23e883f6d27f780243bf8435 commit 424340403c5d9eab23e883f6d27f780243bf8435 Author: Paul Floyd <pj...@wa...> Date: Fri Apr 21 21:21:23 2023 +0200 Bug 464103 - Enhancement: add a client request to DHAT to mark memory to be histogrammed Diff: --- .gitignore | 1 + NEWS | 1 + dhat/dh_main.c | 47 +++++++++++++++++++++++++++++++++++++++ dhat/dhat.h | 23 +++++++++++++++++++ dhat/docs/dh-manual.xml | 24 ++++++++++++++++++++ dhat/tests/Makefile.am | 8 +++++-- dhat/tests/user_histo1.cpp | 34 ++++++++++++++++++++++++++++ dhat/tests/user_histo1.stderr.exp | 8 +++++++ dhat/tests/user_histo1.stdout.exp | 0 dhat/tests/user_histo1.vgtest | 3 +++ 10 files changed, 147 insertions(+), 2 deletions(-) diff --git a/.gitignore b/.gitignore index 9e26e2fbf5..71f1828d66 100644 --- a/.gitignore +++ b/.gitignore @@ -298,6 +298,7 @@ /dhat/tests/empty /dhat/tests/sig /dhat/tests/single +/dhat/tests/user_histo1 # /docs/ /docs/FAQ.txt diff --git a/NEWS b/NEWS index 50f171cd58..373de5ef1a 100644 --- a/NEWS +++ b/NEWS @@ -167,6 +167,7 @@ are not entered into bugzilla tend to get forgotten about or ignored. 460356 s390: Sqrt32Fx4 -- cannot reduce tree 462830 WARNING: unhandled amd64-freebsd syscall: 474 463027 broken check for MPX instruction support in assembler +464103 Enhancement: add a client request to DHAT to mark memory to be histogrammed 464476 Firefox fails to start under Valgrind 464609 Valgrind memcheck should support Linux pidfd_open 464680 Show issues caused by memory policies like selinux deny_execmem diff --git a/dhat/dh_main.c b/dhat/dh_main.c index 57d94237c5..ac0a7478e9 100644 --- a/dhat/dh_main.c +++ b/dhat/dh_main.c @@ -45,6 +45,7 @@ #include "dhat.h" #define HISTOGRAM_SIZE_LIMIT 1024 +#define USER_HISTOGRAM_SIZE_LIMIT 25*HISTOGRAM_SIZE_LIMIT //------------------------------------------------------------// //--- Globals ---// @@ -1229,6 +1230,52 @@ static Bool dh_handle_client_request(ThreadId tid, UWord* arg, UWord* ret) return True; } + case VG_USERREQ__DHAT_HISTOGRAM_MEMORY: { + Addr address = (Addr)arg[1]; + UWord initial_count = arg[2]; + + Block* bk = find_Block_containing( address ); + // bogus address + if (!bk) { + VG_(message)( + Vg_UserMsg, + "Warning: address for user histogram request not found %llx\n", (ULong)address + ); + return False; + } + + // already histogrammed + if (bk->req_szB <= HISTOGRAM_SIZE_LIMIT) { + VG_(message)( + Vg_UserMsg, + "Warning: request for user histogram of size %lu is smaller than the normal histogram limit, request ignored\n", + bk->req_szB + ); + return False; + } + + // already histogrammed + if (bk->req_szB > USER_HISTOGRAM_SIZE_LIMIT) { + VG_(message)( + Vg_UserMsg, + "Warning: request for user histogram of size %lu is larger than the maximum user request limit, request ignored\n", + bk->req_szB + ); + return False; + } + + + bk->histoW = VG_(malloc)("dh.new_block.3", bk->req_szB * sizeof(UShort)); + if (initial_count == 0U) { + VG_(memset)(bk->histoW, 0, bk->req_szB * sizeof(UShort)); + } else { + for (SizeT i = 0U; i < bk->req_szB; ++i) { + bk->histoW[i] = 1U; + } + } + return True; + } + case _VG_USERREQ__DHAT_COPY: { SizeT len = (SizeT)arg[1]; diff --git a/dhat/dhat.h b/dhat/dhat.h index 653ed417b2..cbedbe88b6 100644 --- a/dhat/dhat.h +++ b/dhat/dhat.h @@ -56,11 +56,16 @@ ---------------------------------------------------------------- */ +#if !defined(VALGRIND_DHAT_H) +#define VALGRIND_DHAT_H + #include "valgrind.h" typedef enum { VG_USERREQ__DHAT_AD_HOC_EVENT = VG_USERREQ_TOOL_BASE('D', 'H'), + VG_USERREQ__DHAT_HISTOGRAM_MEMORY, + VG_USERREQ__DHAT_HISTOGRAM_ARRAY, // This is just for DHAT's internal use. Don't use it. _VG_USERREQ__DHAT_COPY = VG_USERREQ_TOOL_BASE('D','H') + 256 @@ -73,3 +78,21 @@ typedef VALGRIND_DO_CLIENT_REQUEST_STMT(VG_USERREQ__DHAT_AD_HOC_EVENT, \ (_qzz_weight), 0, 0, 0, 0) +// for limited histograms of memory larger than 1k +#define DHAT_HISTOGRAM_MEMORY(_qzz_address, _qzz_initial_count) \ + VALGRIND_DO_CLIENT_REQUEST_STMT(VG_USERREQ__DHAT_HISTOGRAM_MEMORY, \ + (_qzz_address), (_qzz_initial_count), 0, 0, 0) + +// convenience macro for DHAT_HISTOGRAM_MEMORY +// for initialized memory (calloc, std::vector with initialization) +#define DHAT_HISTOGRAM_MEMORY_INIT(_qzz_address) \ + DHAT_HISTOGRAM_MEMORY(_qzz_address, 1U) + +// convenience macro for DHAT_HISTOGRAM_MEMORY +// for uninitialized memory (malloc, std::vector without initialization) +#define DHAT_HISTOGRAM_MEMORY_UNINIT(_qzz_address) \ + DHAT_HISTOGRAM_MEMORY(_qzz_address, 0U) + + +#endif + diff --git a/dhat/docs/dh-manual.xml b/dhat/docs/dh-manual.xml index 17aa9ab61c..9d99b4f352 100644 --- a/dhat/docs/dh-manual.xml +++ b/dhat/docs/dh-manual.xml @@ -422,6 +422,30 @@ this example) means "exceeded the maximum tracked count".</para> probably have four separate byte-sized fields, followed by a four-byte field, and so on.</para> +<para>The size of the blocks that measure and display access counts is limited +to 1024 bytes. This is done to limit the performance overhead and also to keep +the size of the generated output reasonable. However, it is possible to override +this limit using client requests. The use-case for this is to first run DHAT +normally, and then identify any large blocks that you would like to further +investigate with access histograms. The client requests are declared in +<filename>dhat/dhat.h</filename>. There are two versions, <computeroutput>DHAT_HISTOGRAM_MEMORY_INIT</computeroutput> and +<computeroutput>DHAT_HISTOGRAM_MEMORY_UNINIT</computeroutput>. The +UNINIT version should be used with allocators that return uninitialized +memory (like <computeroutput>malloc</computeroutput> and <computeroutput>new</computeroutput> +without a constructor). The INIT version should be used with allocators +that initialize memory (like <computeroutput>calloc</computeroutput> and <computeroutput>new</computeroutput> +with a constructor). The macros should be placed immediately after the +call to the allocator, and use the pointer returned by the allocator.</para> + +<programlisting><![CDATA[ +// LargeStruct bigger than 1024 bytes +struct LargeStruct* ls = malloc(sizeof(struct LargeStruct)); +DHAT_HISTOGRAM_MEMORY_INIT(ls); +]]></programlisting> + +<para>The memory that can be profiled in this way with user requests +has a further upper limit of 25kbytes.</para> + <para>Access counts can be useful for identifying data alignment holes or other layout inefficiencies.</para> diff --git a/dhat/tests/Makefile.am b/dhat/tests/Makefile.am index b86fc416d4..e504b46551 100644 --- a/dhat/tests/Makefile.am +++ b/dhat/tests/Makefile.am @@ -11,7 +11,8 @@ EXTRA_DIST = \ copy.stderr.exp copy.vgtest \ empty.stderr.exp empty.vgtest \ sig.stderr.exp sig.vgtest \ - single.stderr.exp single.vgtest + single.stderr.exp single.vgtest \ + user_histo1.stderr.exp user_histo1.vgtest check_PROGRAMS = \ acc \ @@ -21,7 +22,8 @@ check_PROGRAMS = \ copy \ empty \ sig \ - single + single \ + user_histo1 AM_CFLAGS += $(AM_FLAG_M3264_PRI) AM_CXXFLAGS += $(AM_FLAG_M3264_PRI) @@ -32,3 +34,5 @@ big_CFLAGS = $(AM_CFLAGS) -Wno-unused-result # Prevent the copying functions from being inlined copy_CFLAGS = $(AM_CFLAGS) -fno-builtin + +user_histo1_SOURCES = user_histo1.cpp diff --git a/dhat/tests/user_histo1.cpp b/dhat/tests/user_histo1.cpp new file mode 100644 index 0000000000..f5c8e69bbc --- /dev/null +++ b/dhat/tests/user_histo1.cpp @@ -0,0 +1,34 @@ +#include <vector> +#include <cstdint> +#include <iostream> +#include <random> +#include "dhat/dhat.h" + +int main() +{ + std::vector<uint8_t> vec(2000, 0); + DHAT_HISTOGRAM_MEMORY_INIT(vec.data()); + std::mt19937 gen(42);; + std::uniform_int_distribution<> index_distrib(0, 1999); + std::uniform_int_distribution<> val_distrib(0, 255); + + for (int i = 0; i < 20; ++i) + { + int index = index_distrib(gen); + int val = val_distrib(gen); + vec[index] = val; + //std::cout << "wrote " << val << " to index " << index << "\n"; + } + + // try to generate some warnings + vec.resize(500); + vec.shrink_to_fit(); + DHAT_HISTOGRAM_MEMORY_UNINIT(vec.data()); + + auto old = vec.data(); + vec.resize(100000); + // old should have been deleted + DHAT_HISTOGRAM_MEMORY_UNINIT(old); + // and this is too big + DHAT_HISTOGRAM_MEMORY_UNINIT(vec.data()); +} diff --git a/dhat/tests/user_histo1.stderr.exp b/dhat/tests/user_histo1.stderr.exp new file mode 100644 index 0000000000..59206b17e3 --- /dev/null +++ b/dhat/tests/user_histo1.stderr.exp @@ -0,0 +1,8 @@ +Warning: request for user histogram of size 500 is smaller than the normal histogram limit, request ignored +Warning: address for user histogram request not found 55b2820 +Warning: request for user histogram of size 100000 is larger than the maximum user request limit, request ignored +Total: 102,500 bytes in 3 blocks +At t-gmax: 100,500 bytes in 2 blocks +At t-end: 0 bytes in 0 blocks +Reads: 1,000 bytes +Writes: 102,520 bytes diff --git a/dhat/tests/user_histo1.stdout.exp b/dhat/tests/user_histo1.stdout.exp new file mode 100644 index 0000000000..e69de29bb2 diff --git a/dhat/tests/user_histo1.vgtest b/dhat/tests/user_histo1.vgtest new file mode 100644 index 0000000000..c44637a14a --- /dev/null +++ b/dhat/tests/user_histo1.vgtest @@ -0,0 +1,3 @@ +prog: user_histo1 +vgopts: --dhat-out-file=dhat.out +cleanup: rm dhat.out |
|
From: Mark W. <ma...@so...> - 2023-04-21 16:16:10
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=6fc239ed47b44b82fb14849159f5b22132b6ca94 commit 6fc239ed47b44b82fb14849159f5b22132b6ca94 Author: Mark Wielaard <ma...@kl...> Date: Fri Apr 21 18:13:14 2023 +0200 Add use strict and use warnings to perl callgrind scripts This way we can simply use #! /usr/bin/env perl and don't need env -S and perl -w flags which might confuse some packaging utilities. Diff: --- callgrind/callgrind_annotate.in | 3 ++- callgrind/callgrind_control.in | 6 +++++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/callgrind/callgrind_annotate.in b/callgrind/callgrind_annotate.in index c0715e0649..976fe9b5fb 100644 --- a/callgrind/callgrind_annotate.in +++ b/callgrind/callgrind_annotate.in @@ -1,4 +1,4 @@ -#! /usr/bin/env -S perl -w +#! /usr/bin/env perl ##--------------------------------------------------------------------## ##--- The cache simulation framework: instrumentation, recording ---## ##--- and results printing. ---## @@ -59,6 +59,7 @@ # commifying (halves the number of commify calls) 1.68s --> 1.47s use strict; +use warnings; #---------------------------------------------------------------------------- # Overview: the running example in the comments is for: diff --git a/callgrind/callgrind_control.in b/callgrind/callgrind_control.in index b8969373e4..083ffa29fc 100644 --- a/callgrind/callgrind_control.in +++ b/callgrind/callgrind_control.in @@ -1,4 +1,4 @@ -#! /usr/bin/env -S perl -w +#! /usr/bin/env perl ##--------------------------------------------------------------------## ##--- Control supervision of applications run with callgrind ---## ##--- callgrind_control ---## @@ -21,6 +21,10 @@ # # You should have received a copy of the GNU General Public License # along with this program; if not, see <http://www.gnu.org/licenses/>. + +use strict; +use warnings; + use File::Basename; # vgdb_exe will be set to a vgdb found 'near' the callgrind_control file |
|
From: Mark W. <ma...@so...> - 2023-04-21 14:17:16
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=9d6d45cfdebc62099640d8ae566dc9c8577ba514 commit 9d6d45cfdebc62099640d8ae566dc9c8577ba514 Author: Mark Wielaard <ma...@kl...> Date: Fri Apr 21 16:15:15 2023 +0200 Add cachegrind/tests/ann-diff4b-aux/w.rs Missing testfile from commit 1fdf0e728a047f0aab4de805576b6a3a84f37b79 "Add diff and merge capability to `cg_annotate`." Diff: --- cachegrind/tests/ann-diff4b-aux/w.rs | 3 +++ 1 file changed, 3 insertions(+) diff --git a/cachegrind/tests/ann-diff4b-aux/w.rs b/cachegrind/tests/ann-diff4b-aux/w.rs new file mode 100644 index 0000000000..4cb29ea38f --- /dev/null +++ b/cachegrind/tests/ann-diff4b-aux/w.rs @@ -0,0 +1,3 @@ +one +two +three |
|
From: Nicholas N. <nj...@so...> - 2023-04-21 13:01:16
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=b12703598ab42d377c3ecedc71f2e1324d932cd2 commit b12703598ab42d377c3ecedc71f2e1324d932cd2 Author: Nicholas Nethercote <n.n...@gm...> Date: Fri Apr 21 23:00:39 2023 +1000 Fix two xmllint errors. Diff: --- cachegrind/docs/cg-manual.xml | 4 ---- 1 file changed, 4 deletions(-) diff --git a/cachegrind/docs/cg-manual.xml b/cachegrind/docs/cg-manual.xml index 35d6a412e3..a14cd7c926 100644 --- a/cachegrind/docs/cg-manual.xml +++ b/cachegrind/docs/cg-manual.xml @@ -1065,8 +1065,6 @@ Cachegrind-specific options are: <sect1 id="cg-manual.mergeopts" xreflabel="cg_merge Command-line Options"> <title>cg_merge Command-line Options</title> -Although cg_merge is deprecated, its options are listed here for completeness. - <!-- start of xi:include in the manpage --> <variablelist id="cg_merge.opts.list"> @@ -1091,8 +1089,6 @@ Although cg_merge is deprecated, its options are listed here for completeness. <sect1 id="cg-manual.diffopts" xreflabel="cg_diff Command-line Options"> <title>cg_diff Command-line Options</title> -Although cg_diff is deprecated, its options are listed here for completeness. - <!-- start of xi:include in the manpage --> <variablelist id="cg_diff.opts.list"> |
|
From: Nicholas N. <nj...@so...> - 2023-04-21 12:44:47
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=c2e62127ad8a9b71c4abf4b166ad545988490c32 commit c2e62127ad8a9b71c4abf4b166ad545988490c32 Author: Nicholas Nethercote <n.n...@gm...> Date: Fri Apr 21 07:20:11 2023 +1000 Rewrite Cachegrind docs. For all the changes I've made recently. And also various other changes that occurred over the past 20 years that didn't previously make it into the docs. Also, this change de-emphasises the cache and branch simulation aspect, because they're no longer that useful. Instead it emphasises the precision and reproducibility of instruction count profiling. Diff: --- cachegrind/docs/cg-manual.xml | 1583 ++++----- cachegrind/docs/cg_annotate-manpage.xml | 5 +- cachegrind/docs/cg_diff-manpage.xml | 9 +- cachegrind/docs/cg_merge-manpage.xml | 8 +- cachegrind/docs/concord.c | 532 +++ cachegrind/docs/concord.cgann | 560 ++++ cachegrind/docs/concord.cgout | 5573 +++++++++++++++++++++++++++++++ 7 files changed, 7470 insertions(+), 800 deletions(-) diff --git a/cachegrind/docs/cg-manual.xml b/cachegrind/docs/cg-manual.xml index 92fe086824..35d6a412e3 100644 --- a/cachegrind/docs/cg-manual.xml +++ b/cachegrind/docs/cg-manual.xml @@ -5,167 +5,117 @@ <!-- Referenced from both the manual and manpage --> <chapter id="&vg-cg-manual-id;" xreflabel="&vg-cg-manual-label;"> -<title>Cachegrind: a cache and branch-prediction profiler</title> +<title>Cachegrind: a high-precision tracing profiler</title> -<para>To use this tool, you must specify -<option>--tool=cachegrind</option> on the -Valgrind command line.</para> +<para> +To use this tool, specify <option>--tool=cachegrind</option> on the Valgrind +command line. +</para> <sect1 id="cg-manual.overview" xreflabel="Overview"> <title>Overview</title> -<para>Cachegrind simulates how your program interacts with a machine's cache -hierarchy and (optionally) branch predictor. It simulates a machine with -independent first-level instruction and data caches (I1 and D1), backed by a -unified second-level cache (L2). This exactly matches the configuration of -many modern machines.</para> - -<para>However, some modern machines have three or four levels of cache. For these -machines (in the cases where Cachegrind can auto-detect the cache -configuration) Cachegrind simulates the first-level and last-level caches. -The reason for this choice is that the last-level cache has the most influence on -runtime, as it masks accesses to main memory. Furthermore, the L1 caches -often have low associativity, so simulating them can detect cases where the -code interacts badly with this cache (eg. traversing a matrix column-wise -with the row length being a power of 2).</para> - -<para>Therefore, Cachegrind always refers to the I1, D1 and LL (last-level) -caches.</para> - <para> -Cachegrind gathers the following statistics (abbreviations used for each statistic -is given in parentheses):</para> +Cachegrind is a high-precision tracing profiler. It runs slowly, but collects +precise and reproducible profiling data. It can merge and diff data from +different runs. To expand on these characteristics: +</para> + <itemizedlist> <listitem> - <para>I cache reads (<computeroutput>Ir</computeroutput>, - which equals the number of instructions executed), - I1 cache read misses (<computeroutput>I1mr</computeroutput>) and - LL cache instruction read misses (<computeroutput>ILmr</computeroutput>). - </para> - </listitem> - <listitem> - <para>D cache reads (<computeroutput>Dr</computeroutput>, which - equals the number of memory reads), - D1 cache read misses (<computeroutput>D1mr</computeroutput>), and - LL cache data read misses (<computeroutput>DLmr</computeroutput>). - </para> - </listitem> - <listitem> - <para>D cache writes (<computeroutput>Dw</computeroutput>, which equals - the number of memory writes), - D1 cache write misses (<computeroutput>D1mw</computeroutput>), and - LL cache data write misses (<computeroutput>DLmw</computeroutput>). - </para> - </listitem> - <listitem> - <para>Conditional branches executed (<computeroutput>Bc</computeroutput>) and - conditional branches mispredicted (<computeroutput>Bcm</computeroutput>). + <para> + <emphasis>Precise.</emphasis> Cachegrind measures the exact number of + instructions executed by your program, not an approximation. Furthermore, + it presents the gathered data at the file, function, and line level. This + is different to many other profilers that measure approximate execution + time, using sampling, and only at the function level. </para> </listitem> + <listitem> - <para>Indirect branches executed (<computeroutput>Bi</computeroutput>) and - indirect branches mispredicted (<computeroutput>Bim</computeroutput>). + <para> + <emphasis>Reproducible.</emphasis> In general, execution time is a better + metric than instruction counts because it's what users perceive. However, + execution time often has high variability. When running the exact same + program on the exact same input multiple times, execution time might vary + by several percent. Furthermore, small changes in a program can change its + memory layout and have even larger effects on runtime. In contrast, + instruction counts are highly reproducible; for some programs they are + perfectly reproducible. This means the effects of small changes in a + program can be measured with high precision. </para> </listitem> </itemizedlist> -<para>Note that D1 total accesses is given by -<computeroutput>D1mr</computeroutput> + -<computeroutput>D1mw</computeroutput>, and that LL total -accesses is given by <computeroutput>ILmr</computeroutput> + -<computeroutput>DLmr</computeroutput> + -<computeroutput>DLmw</computeroutput>. +<para> +For these reasons, Cachegrind is an excellent complement to time-based profilers. </para> -<para>These statistics are presented for the entire program and for each -function in the program. You can also annotate each line of source code in -the program with the counts that were caused directly by it.</para> - -<para>On a modern machine, an L1 miss will typically cost -around 10 cycles, an LL miss can cost as much as 200 -cycles, and a mispredicted branch costs in the region of 10 -to 30 cycles. Detailed cache and branch profiling can be very useful -for understanding how your program interacts with the machine and thus how -to make it faster.</para> +<para> +Cachegrind can annotate programs written in any language, so long as debug info +is present to map machine code back to the original source code. Cachegrind has +been used successfully on programs written in C, C++, Rust, and assembly. +</para> -<para>Also, since one instruction cache read is performed per -instruction executed, you can find out how many instructions are -executed per line, which can be useful for traditional profiling.</para> +<para> +Cachegrind can also simulate how your program interacts with a machine's cache +hierarchy and branch predictor. This simulation was the original motivation for +the tool, hence its name. However, the simulations are basic and unlikely to +reflect the behaviour of a modern machine. For this reason they are off by +default. If you really want cache and branch information, a profiler like +<computeroutput>perf</computeroutput> that accesses hardware counters is a +better choice. +</para> </sect1> - <sect1 id="cg-manual.profile" - xreflabel="Using Cachegrind, cg_annotate and cg_merge"> -<title>Using Cachegrind, cg_annotate and cg_merge</title> + xreflabel="Using Cachegrind and cg_annotate"> +<title>Using Cachegrind and cg_annotate</title> + +<para> +First, as for normal Valgrind use, you should compile with debugging info (the +<option>-g</option> option in most compilers). But by contrast with normal +Valgrind use, you probably do want to turn optimisation on, since you should +profile your program as it will be normally run. +</para> -<para>First off, as for normal Valgrind use, you probably want to -compile with debugging info (the -<option>-g</option> option). But by contrast with -normal Valgrind use, you probably do want to turn -optimisation on, since you should profile your program as it will -be normally run.</para> +<para> +Second, run Cachegrind itself to gather the profiling data. +</para> -<para>Then, you need to run Cachegrind itself to gather the profiling -information, and then run cg_annotate to get a detailed presentation of that -information. As an optional intermediate step, you can use cg_merge to sum -together the outputs of multiple Cachegrind runs into a single file which -you then use as the input for cg_annotate. Alternatively, you can use -cg_diff to difference the outputs of two Cachegrind runs into a single file -which you then use as the input for cg_annotate.</para> +<para> +Third, run cg_annotate to get a detailed presentation of that data. cg_annotate +can combine the results of multiple Cachegrind output files. It can also +perform a diff between two Cachegrind output files. +</para> <sect2 id="cg-manual.running-cachegrind" xreflabel="Running Cachegrind"> <title>Running Cachegrind</title> -<para>To run Cachegrind on a program <filename>prog</filename>, run:</para> +<para> +To run Cachegrind on a program <filename>prog</filename>, run: <screen><![CDATA[ valgrind --tool=cachegrind prog ]]></screen> +</para> -<para>The program will execute (slowly). Upon completion, -summary statistics that look like this will be printed:</para> +<para> +The program will execute (slowly). Upon completion, summary statistics that +look like this will be printed: +</para> <programlisting><![CDATA[ -==31751== I refs: 27,742,716 -==31751== I1 misses: 276 -==31751== LLi misses: 275 -==31751== I1 miss rate: 0.0% -==31751== LLi miss rate: 0.0% -==31751== -==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr) -==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr) -==31751== LLd misses: 23,085 ( 3,987 rd + 19,098 wr) -==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%) -==31751== LLd miss rate: 0.1% ( 0.0% + 0.4%) -==31751== -==31751== LL misses: 23,360 ( 4,262 rd + 19,098 wr) -==31751== LL miss rate: 0.0% ( 0.0% + 0.4%)]]></programlisting> - -<para>Cache accesses for instruction fetches are summarised -first, giving the number of fetches made (this is the number of -instructions executed, which can be useful to know in its own -right), the number of I1 misses, and the number of LL instruction -(<computeroutput>LLi</computeroutput>) misses.</para> - -<para>Cache accesses for data follow. The information is similar -to that of the instruction fetches, except that the values are -also shown split between reads and writes (note each row's -<computeroutput>rd</computeroutput> and -<computeroutput>wr</computeroutput> values add up to the row's -total).</para> - -<para>Combined instruction and data figures for the LL cache -follow that. Note that the LL miss rate is computed relative to the total -number of memory accesses, not the number of L1 misses. I.e. it is -<computeroutput>(ILmr + DLmr + DLmw) / (Ir + Dr + Dw)</computeroutput> -not -<computeroutput>(ILmr + DLmr + DLmw) / (I1mr + D1mr + D1mw)</computeroutput> -</para> - -<para>Branch prediction statistics are not collected by default. -To do so, add the option <option>--branch-sim=yes</option>.</para> +==17942== I refs: 8,195,070 +]]></programlisting> + +<para> +The <computeroutput>I refs</computeroutput> number is short for "Instruction +cache references", which is equivalent to "instructions executed". If you +enable the cache and/or branch simulation, additional counts will be shown. +</para> </sect2> @@ -173,691 +123,791 @@ To do so, add the option <option>--branch-sim=yes</option>.</para> <sect2 id="cg-manual.outputfile" xreflabel="Output File"> <title>Output File</title> -<para>As well as printing summary information, Cachegrind also writes -more detailed profiling information to a file. By default this file is named -<filename>cachegrind.out.<pid></filename> (where -<filename><pid></filename> is the program's process ID), but its name -can be changed with the <option>--cachegrind-out-file</option> option. This -file is human-readable, but is intended to be interpreted by the -accompanying program cg_annotate, described in the next section.</para> - -<para>The default <computeroutput>.<pid></computeroutput> suffix -on the output file name serves two purposes. Firstly, it means you -don't have to rename old log files that you don't want to overwrite. -Secondly, and more importantly, it allows correct profiling with the -<option>--trace-children=yes</option> option of -programs that spawn child processes.</para> +<para> +Cachegrind also writes more detailed profiling data to a file. By default this +Cachegrind output file is named <filename>cachegrind.out.<pid></filename> +(where <filename><pid></filename> is the program's process ID), but its +name can be changed with the <option>--cachegrind-out-file</option> option. +This file is human-readable, but is intended to be interpreted by the +accompanying program cg_annotate, described in the next section. +</para> -<para>The output file can be big, many megabytes for large applications -built with full debugging information.</para> +<para> +The default <computeroutput>.<pid></computeroutput> suffix on the output +file name serves two purposes. First, it means existing Cachegrind output files +aren't immediately overwritten. Second, and more importantly, it allows correct +profiling with the <option>--trace-children=yes</option> option of programs +that spawn child processes. +</para> </sect2> - <sect2 id="cg-manual.running-cg_annotate" xreflabel="Running cg_annotate"> <title>Running cg_annotate</title> -<para>Before using cg_annotate, -it is worth widening your window to be at least 120-characters -wide if possible, as the output lines can be quite long.</para> - -<para>To get a function-by-function summary, run:</para> +<para> +Before using cg_annotate, it is worth widening your window to be at least 120 +characters wide if possible, because the output lines can be quite long. +</para> +<para> +Then run: <screen>cg_annotate <filename></screen> - -<para>on a Cachegrind output file.</para> +on a Cachegrind output file. +</para> </sect2> +<!-- +To produce the sample date, I did the following. Note that the single hypens in +the valgrind command should be double hyphens, but XML doesn't allow double +hyphens in comments. + + gcc -g -O concord.c -o concord + valgrind -tool=cachegrind -cachegrind-out-file=concord.cgout ./concord ../cg_main.c + (to exit, type `q` and hit enter) + python ../cg_annotate concord.cgout > concord.cgann + +concord.c is a small C program I wrote at university. It's a good size for an example. +--> -<sect2 id="cg-manual.the-output-preamble" xreflabel="The Output Preamble"> -<title>The Output Preamble</title> +<sect2 id="cg-manual.the-metadata" xreflabel="The Metadata Section"> +<title>The Metadata Section</title> -<para>The first part of the output looks like this:</para> +<para> +The first part of the output looks like this: +</para> <programlisting><![CDATA[ -------------------------------------------------------------------------------- -I1 cache: 65536 B, 64 B, 2-way associative -D1 cache: 65536 B, 64 B, 2-way associative -LL cache: 262144 B, 64 B, 8-way associative -Command: concord vg_to_ucode.c -Events recorded: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw -Events shown: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw -Event sort order: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw -Threshold: 99% -Chosen for annotation: -Auto-annotation: off +-- Metadata +-------------------------------------------------------------------------------- +Invocation: ../cg_annotate concord.cgout +Command: ./concord ../cg_main.c +Events recorded: Ir +Events shown: Ir +Event sort order: Ir +Threshold: 0.1% +Annotation: on ]]></programlisting> - -<para>This is a summary of the annotation options:</para> +<para> +It summarizes how Cachegrind and the profiled program were run. +</para> <itemizedlist> - <listitem> - <para>I1 cache, D1 cache, LL cache: cache configuration. So - you know the configuration with which these results were - obtained.</para> + <para> + Invocation: the command line used to produce this output. + </para> </listitem> <listitem> - <para>Command: the command line invocation of the program - under examination.</para> + <para> + Command: the command line used to run the profiled program. + </para> </listitem> <listitem> - <para>Events recorded: which events were recorded.</para> - - </listitem> - - <listitem> - <para>Events shown: the events shown, which is a subset of the events - gathered. This can be adjusted with the - <option>--show</option> option.</para> + <para> + Events recorded: which events were recorded. By default, this is + <computeroutput>Ir</computeroutput>. More events will be recorded if cache + and/or branch simulation is enabled. + </para> </listitem> <listitem> - <para>Event sort order: the sort order in which functions are - shown. For example, in this case the functions are sorted - from highest <computeroutput>Ir</computeroutput> counts to - lowest. If two functions have identical - <computeroutput>Ir</computeroutput> counts, they will then be - sorted by <computeroutput>I1mr</computeroutput> counts, and - so on. This order can be adjusted with the - <option>--sort</option> option.</para> - - <para>Note that this dictates the order the functions appear. - It is <emphasis>not</emphasis> the order in which the columns - appear; that is dictated by the "events shown" line (and can - be changed with the <option>--show</option> - option).</para> + <para> + Events shown: the events shown, which is a subset of the events gathered. + This can be adjusted with the <option>--show</option> option. + </para> </listitem> <listitem> - <para>Threshold: cg_annotate - by default omits functions that cause very low counts - to avoid drowning you in information. In this case, - cg_annotate shows summaries the functions that account for - 99% of the <computeroutput>Ir</computeroutput> counts; - <computeroutput>Ir</computeroutput> is chosen as the - threshold event since it is the primary sort event. The - threshold can be adjusted with the - <option>--threshold</option> - option.</para> + <para> + Event sort order: the sort order used for the subsequent sections. For + example, in this case those sections are sorted from highest + <computeroutput>Ir</computeroutput> counts to lowest. If there are multiple + events, one will be the primary sort event, and then there can be a + secondary sort event, tertiary sort event, etc., though more than one is + rarely needed. This order can be adjusted with the <option>--sort</option> + option. Note that this does <emphasis>not</emphasis> specify the order in + which the columns appear. That is specified by the "events shown" line (and + can be changed with the <option>--show</option> option). + </para> </listitem> <listitem> - <para>Chosen for annotation: names of files specified - manually for annotation; in this case none.</para> + <para> + Threshold: cg_annotate by default omits files and functions with very low + counts to keep the output size reasonable. By default cg_annotate only + shows files and functions that account for at least 0.1% of the primary + sort event. The threshold can be adjusted with the + <option>--threshold</option> option. + </para> </listitem> <listitem> - <para>Auto-annotation: whether auto-annotation was requested - via the <option>--auto=yes</option> - option. In this case no.</para> + <para> + Annotation: whether source file annotation is enabled. Controlled with the + <option>--annotate</option> option. + </para> </listitem> </itemizedlist> +<para> +If cache simulation is enabled, details of the cache parameters will be shown +above the "Invocation" line. +</para> + </sect2> <sect2 id="cg-manual.the-global" - xreflabel="The Global and Function-level Counts"> -<title>The Global and Function-level Counts</title> + xreflabel="Global, File, and Function-level Counts"> +<title>Global, File, and Function-level Counts</title> -<para>Then follows summary statistics for the whole -program:</para> +<para> +Next comes the summary for the whole program: +</para> <programlisting><![CDATA[ -------------------------------------------------------------------------------- -Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw +-- Summary +-------------------------------------------------------------------------------- +Ir________________ + +8,195,070 (100.0%) PROGRAM TOTALS +]]></programlisting> + +<para> +The <computeroutput>Ir</computeroutput> column label is suffixed with +underscores to show the bounds of the columns underneath. +</para> + +<para> +Then comes file:function counts. Here is the first part of that section: +</para> + +<programlisting><![CDATA[ +-------------------------------------------------------------------------------- +-- File:function summary -------------------------------------------------------------------------------- -27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS]]></programlisting> + Ir______________________ file:function + +< 3,078,746 (37.6%, 37.6%) /home/njn/grind/ws1/cachegrind/concord.c: + 1,630,232 (19.9%) get_word + 630,918 (7.7%) hash + 461,095 (5.6%) insert + 130,560 (1.6%) add_existing + 91,014 (1.1%) init_hash_table + 88,056 (1.1%) create + 46,676 (0.6%) new_word_node + +< 1,746,038 (21.3%, 58.9%) ./malloc/./malloc/malloc.c: + 1,285,938 (15.7%) _int_malloc + 458,225 (5.6%) malloc + +< 1,107,550 (13.5%, 72.4%) ./libio/./libio/getc.c:getc + +< 551,071 (6.7%, 79.1%) ./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S:__strcmp_avx2 + +< 521,228 (6.4%, 85.5%) ./ctype/../include/ctype.h: + 260,616 (3.2%) __ctype_tolower_loc + 260,612 (3.2%) __ctype_b_loc + +< 468,163 (5.7%, 91.2%) ???: + 468,151 (5.7%) ??? + +< 456,071 (5.6%, 96.8%) /usr/include/ctype.h:get_word + +]]></programlisting> + +<para> +Each entry covers one file, and one or more functions within that file. If +there is only one significant function within a file, as in the first entry, +the file and function are shown on the same line separate by a colon. If there +are multiple significant functions within a file, as in the third entry, each +function gets its own line. +</para> + +<para> +This example involves a small C program, and shows a combination of code from +the program itself (including functions like <function>get_word</function> and +<function>hash</function> in the file <filename>concord.c</filename>) as well +as code from system libraries, such as functions like +<function>malloc</function> and <function>getc</function>. +</para> + +<para> +Each entry is preceded with a <computeroutput><</computeroutput>, which can +be useful when navigating through the output in an editor, or grepping through +results. +</para> <para> -These are similar to the summary provided when Cachegrind finishes running. +The first percentage in each column indicates the proportion of the total event +count is covered by this line. The second percentage, which only shows on the +first line of each entry, shows the cumulative percentage of all the entries up +to and including this one. The entries shown here account for 96.8% of the +instructions executed by the program. </para> -<para>Then comes function-by-function statistics:</para> +<para> +The name <computeroutput>???</computeroutput> is used if the file name and/or +function name could not be determined from debugging information. If +<filename>???</filename> filenames dominate, the program probably wasn't +compiled with <option>-g</option>. If <function>???</function> function names +dominate, the program may have had symbols stripped. +</para> + +<para> +After that comes function:file counts. Here is the first part of that section: +</para> <programlisting><![CDATA[ -------------------------------------------------------------------------------- -Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw file:function +-- Function:file summary -------------------------------------------------------------------------------- -8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc -5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word -2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp -2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash -2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower -1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert - 897,991 51 51 897,831 95 30 62 1 1 ???:??? - 598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile - 598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile - 598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc - 446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing - 341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER - 320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table - 298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create - 149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0 - 149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0 - 95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node - 85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue]]></programlisting> - -<para>Each function -is identified by a -<computeroutput>file_name:function_name</computeroutput> pair. If -a column contains only a dot it means the function never performs -that event (e.g. the third row shows that -<computeroutput>strcmp()</computeroutput> contains no -instructions that write to memory). The name -<computeroutput>???</computeroutput> is used if the file name -and/or function name could not be determined from debugging -information. If most of the entries have the form -<computeroutput>???:???</computeroutput> the program probably -wasn't compiled with <option>-g</option>.</para> - -<para>It is worth noting that functions will come both from -the profiled program (e.g. <filename>concord.c</filename>) -and from libraries (e.g. <filename>getc.c</filename>)</para> + Ir______________________ function:file + +> 2,086,303 (25.5%, 25.5%) get_word: + 1,630,232 (19.9%) /home/njn/grind/ws1/cachegrind/concord.c + 456,071 (5.6%) /usr/include/ctype.h + +> 1,285,938 (15.7%, 41.1%) _int_malloc:./malloc/./malloc/malloc.c + +> 1,107,550 (13.5%, 54.7%) getc:./libio/./libio/getc.c + +> 630,918 (7.7%, 62.4%) hash:/home/njn/grind/ws1/cachegrind/concord.c + +> 551,071 (6.7%, 69.1%) __strcmp_avx2:./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S + +> 480,248 (5.9%, 74.9%) malloc: + 458,225 (5.6%) ./malloc/./malloc/malloc.c + 22,023 (0.3%) ./malloc/./malloc/arena.c + +> 468,151 (5.7%, 80.7%) ???:??? + +> 461,095 (5.6%, 86.3%) insert:/home/njn/grind/ws1/cachegrind/concord.c +]]></programlisting> + +<para> +This is similar to the previous section, but is grouped by functions first and +files second. Also, the entry markers are <computeroutput>></computeroutput> +instead of <computeroutput><</computeroutput>. +</para> + +<para> +You might wonder why this section is needed, and how it differs from the +previous section. The answer is inlining. In this example there are two entries +demonstrating a function whose code is effectively spread across more than one +file: <function>get_word</function> and <function>malloc</function>. Here is an +example from profiling the Rust compiler, a much larger program that uses +inlining more: +</para> + +<programlisting><![CDATA[ +> 30,469,230 (1.3%, 11.1%) <rustc_middle::ty::context::CtxtInterners>::intern_ty: + 10,269,220 (0.5%) /home/njn/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.3/src/raw/mod.rs + 7,696,827 (0.3%) /home/njn/dev/rust0/compiler/rustc_middle/src/ty/context.rs + 3,858,099 (0.2%) /home/njn/dev/rust0/library/core/src/cell.rs +]]></programlisting> + +<para> +In this case the compiled function <function>intern_ty</function> includes code +from three different source files, due to inlining. These should be examined +together. Older versions of cg_annotate presented this entry as three separate +file:function entries, which would typically be intermixed with all the other +entries, making it hard to see that they are all really part of the same +function. +</para> </sect2> -<sect2 id="cg-manual.line-by-line" xreflabel="Line-by-line Counts"> -<title>Line-by-line Counts</title> +<sect2 id="cg-manual.line-by-line" xreflabel="Per-line Counts"> +<title>Per-line Counts</title> + +<para> +By default, a source file is annotated if it contains at least one function +that meets the significance threshold. This can be disabled with the +<option>--annotate</option> option. +</para> -<para>By default, all source code annotation is also shown. (Filenames to be -annotated can also by specified manually as arguments to cg_annotate, but this -is rarely needed.) For example, the output from running <filename>cg_annotate -<filename> </filename> for our example produces the same output as above -followed by an annotated version of <filename>concord.c</filename>, a section -of which looks like:</para> +<para> +To continue the previous example, here is part of the annotation of the file +<filename>concord.c</filename>: +</para> <programlisting><![CDATA[ -------------------------------------------------------------------------------- --- Auto-annotated source: concord.c +-- Annotated source file: /home/njn/grind/ws1/cachegrind/docs/concord.c -------------------------------------------------------------------------------- -Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw - - . . . . . . . . . void init_hash_table(char *file_name, Word_Node *table[]) - 3 1 1 . . . 1 0 0 { - . . . . . . . . . FILE *file_ptr; - . . . . . . . . . Word_Info *data; - 1 0 0 . . . 1 1 1 int line = 1, i; - . . . . . . . . . - 5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info)); - . . . . . . . . . - 4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++) - 3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL; - . . . . . . . . . - . . . . . . . . . /* Open file, check it. */ - 6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r"); - 2 0 0 1 0 0 . . . if (!(file_ptr)) { - . . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name); - 1 1 1 . . . . . . exit(EXIT_FAILURE); - . . . . . . . . . } - . . . . . . . . . - 165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF) - 146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table); - . . . . . . . . . - 4 0 0 1 0 0 2 0 0 free(data); - 4 0 0 1 0 0 2 0 0 fclose(file_ptr); - 3 0 0 2 0 0 . . . }]]></programlisting> - -<para>(Although column widths are automatically minimised, a wide -terminal is clearly useful.)</para> - -<para>Each source file is clearly marked -(<computeroutput>User-annotated source</computeroutput>) as -having been chosen manually for annotation. If the file was -found in one of the directories specified with the -<option>-I</option>/<option>--include</option> option, the directory -and file are both given.</para> - -<para>Each line is annotated with its event counts. Events not -applicable for a line are represented by a dot. This is useful -for distinguishing between an event which cannot happen, and one -which can but did not.</para> - -<para>Sometimes only a small section of a source file is -executed. To minimise uninteresting output, Cachegrind only shows -annotated lines and lines within a small distance of annotated -lines. Gaps are marked with the line numbers so you know which -part of a file the shown code comes from, eg:</para> +Ir____________ + + . /* Function builds the hash table from the given file. */ + . void init_hash_table(char *file_name, Word_Node *table[]) + 8 (0.0%) { + . FILE *file_ptr; + . Word_Info *data; + 2 (0.0%) int line = 1, i; + . + . /* Structure used when reading in words and line numbers. */ + 3 (0.0%) data = (Word_Info *) create(sizeof(Word_Info)); + . + . /* Initialise entire table to NULL. */ + 2,993 (0.0%) for (i = 0; i < TABLE_SIZE; i++) + 997 (0.0%) table[i] = NULL; + . + . /* Open file, check it. */ + 4 (0.0%) file_ptr = fopen(file_name, "r"); + 2 (0.0%) if (!(file_ptr)) { + . fprintf(stderr, "Couldn't open '%s'.\n", file_name); + . exit(EXIT_FAILURE); + . } + . + . /* 'Get' the words and lines one at a time from the file, and insert them + . ** into the table one at a time. */ + 55,363 (0.7%) while ((line = get_word(data, line, file_ptr)) != EOF) + 31,632 (0.4%) insert(data->word, data->line, table); + . + 2 (0.0%) free(data); + 2 (0.0%) fclose(file_ptr); + 6 (0.0%) } +]]></programlisting> + +<para> +Each executed line is annotated with its event counts. Other lines are +annotated with a dot. This may be because they contain no executable code, or +they contain executable code but were never executed. +</para> + +<para> +You can easily tell if a function is inlined from this output. If it is not +inlined, it will have event counts on the lines containing the opening and +closing braces. If it is inlined, it will not have event counts on those lines. +In the example above, <function>init_hash_table</function> does have counts, +so you can tell it is not inlined. +</para> + +<para> +Note again that inlining can lead to surprising results. If a function +<function>f</function> is always inlined, in the file:function and +function:file sections counts will be attributed to the functions it is inlined +into, rather than itself. However, if you look at the line-by-line annotations +for <function>f</function> you'll see the counts that belong to +<function>f</function>. So it's worth looking for large counts/percentages in the +line-by-line annotations. +</para> + +<para> +Sometimes only a small section of a source file is executed. To minimise +uninteresting output, Cachegrind only shows annotated lines and lines within a +small distance of annotated lines. Gaps are marked with line numbers, for +example: +</para> <programlisting><![CDATA[ -(figures and code for line 704) --- line 704 ---------------------------------------- --- line 878 ---------------------------------------- -(figures and code for line 878)]]></programlisting> - -<para>The amount of context to show around annotated lines is -controlled by the <option>--context</option> -option.</para> - -<para>Automatic annotation is enabled by default. -cg_annotate will automatically annotate every source file it can -find that is mentioned in the function-by-function summary. -Therefore, the files chosen for auto-annotation are affected by -the <option>--sort</option> and -<option>--threshold</option> options. Each -source file is clearly marked (<computeroutput>Auto-annotated -source</computeroutput>) as being chosen automatically. Any -files that could not be found are mentioned at the end of the -output, eg:</para> +(counts and code for line 704) +-- line 375 ---------------------------------------- +-- line 514 ---------------------------------------- +(counts and code for line 878) +]]></programlisting> + +<para> +The number of lines of context shown around annotated lines is controlled by +the <option>--context</option> option. +</para> + +<para> +Any significant source files that could not be found are shown like this: +</para> <programlisting><![CDATA[ ------------------------------------------------------------------- -The following files chosen for auto-annotation could not be found: ------------------------------------------------------------------- - getc.c - ctype.c - ../sysdeps/generic/lockfile.c]]></programlisting> - -<para>This is quite common for library files, since libraries are -usually compiled with debugging information, but the source files -are often not present on a system. If a file is chosen for -annotation both manually and automatically, it -is marked as <computeroutput>User-annotated -source</computeroutput>. Use the -<option>-I</option>/<option>--include</option> option to tell Valgrind where -to look for source files if the filenames found from the debugging -information aren't specific enough.</para> - -<para> Beware that auto-annotation can produce a lot of output if your program -is large.</para> +-------------------------------------------------------------------------------- +-- Annotated source file: ./malloc/./malloc/malloc.c +-------------------------------------------------------------------------------- +Unannotated because one or more of these original files are unreadable: +- ./malloc/./malloc/malloc.c +]]></programlisting> -</sect2> +<para> +This is common for library files, because libraries are usually compiled with +debugging information but the source files are rarely present on a system. +</para> + +<para> +Cachegrind relies heavily on accurate debug info. Sometimes compilers do not +map a particular compiled instruction to line number 0, where the 0 represents +"unknown" or "none". This is annoying but does happen in practice. cg_annotate +prints these in the following way: +</para> +<programlisting><![CDATA[ +-------------------------------------------------------------------------------- +-- Annotated source file: /home/njn/dev/rust0/compiler/rustc_borrowck/src/lib.rs +-------------------------------------------------------------------------------- +Ir______________ -<sect2 id="cg-manual.assembler" xreflabel="Annotating Assembly Code Programs"> -<title>Annotating Assembly Code Programs</title> +1,046,746 (0.0%) <unknown (line 0)> +]]></programlisting> -<para>Valgrind can annotate assembly code programs too, or annotate -the assembly code generated for your C program. Sometimes this is -useful for understanding what is really happening when an -interesting line of C code is translated into multiple -instructions.</para> +<para> +Finally, when annotation is performed, the output ends with a summary of how +many counts were annotated and unannotated, and why. For example: +</para> -<para>To do this, you just need to assemble your -<computeroutput>.s</computeroutput> files with assembly-level debug -information. You can use compile with the <option>-S</option> to compile C/C++ -programs to assembly code, and then assemble the assembly code files with -<option>-g</option> to achieve this. You can then profile and annotate the -assembly code source files in the same way as C/C++ source files.</para> +<programlisting><![CDATA[ +-------------------------------------------------------------------------------- +-- Annotation summary +-------------------------------------------------------------------------------- +Ir_______________ + +3,534,817 (43.1%) annotated: files known & above threshold & readable, line numbers known + 0 annotated: files known & above threshold & readable, line numbers unknown + 0 unannotated: files known & above threshold & two or more non-identical +4,132,126 (50.4%) unannotated: files known & above threshold & unreadable + 59,950 (0.7%) unannotated: files known & below threshold + 468,163 (5.7%) unannotated: files unknown +]]></programlisting> </sect2> + <sect2 id="cg-manual.forkingprograms" xreflabel="Forking Programs"> <title>Forking Programs</title> -<para>If your program forks, the child will inherit all the profiling data that -has been gathered for the parent.</para> - -<para>If the output file format string (controlled by -<option>--cachegrind-out-file</option>) does not contain <option>%p</option>, -then the outputs from the parent and child will be intermingled in a single -output file, which will almost certainly make it unreadable by -cg_annotate.</para> + +<para> +If your program forks, the child will inherit all the profiling data that +has been gathered for the parent. +</para> + +<para> +If the output file name (controlled by <option>--cachegrind-out-file</option>) +does not contain <option>%p</option>, then the outputs from the parent and +child will be intermingled in a single output file, which will almost certainly +make it unreadable by cg_annotate. +</para> + </sect2> <sect2 id="cg-manual.annopts.warnings" xreflabel="cg_annotate Warnings"> <title>cg_annotate Warnings</title> -<para>There are a couple of situations in which -cg_annotate issues warnings.</para> +<para> +There are two situations in which cg_annotate prints warnings. +</para> <itemizedlist> <listitem> - <para>If a source file is more recent than the - <filename>cachegrind.out.<pid></filename> file. - This is because the information in - <filename>cachegrind.out.<pid></filename> is only - recorded with line numbers, so if the line numbers change at - all in the source (e.g. lines added, deleted, swapped), any - annotations will be incorrect.</para> + <para> + If a source file is more recent than the Cachegrind output file. This is + because the information in the Cachegrind output file is only recorded with + line numbers, so if the line numbers change at all in the source (e.g. + lines added, deleted, swapped), any annotations will be incorrect. + </para> </listitem> <listitem> - <para>If information is recorded about line numbers past the - end of a file. This can be caused by the above problem, - i.e. shortening the source file while using an old - <filename>cachegrind.out.<pid></filename> file. If - this happens, the figures for the bogus lines are printed - anyway (clearly marked as bogus) in case they are - important.</para> + <para> + If information is recorded about line numbers past the end of a file. This + can be caused by the above problem, e.g. shortening the source file while + using an old Cachegrind output file. If this happens, the figures for the + bogus lines are printed anyway (and clearly marked as bogus) in case they + are important. + </para> </listitem> </itemizedlist> </sect2> +<sect2 id="cg-manual.cg_merge" xreflabel="cg_merge"> +<title>Merging Cachegrind Output Files</title> -<sect2 id="cg-manual.annopts.things-to-watch-out-for" - xreflabel="Unusual Annotation Cases"> -<title>Unusual Annotation Cases</title> +<para> +cg_annotate can merge data from multiple Cachegrind output files in a single +run. (There is also a program called cg_merge that can merge multiple +Cachegrind output files into a single Cachegrind output file, but it is now +deprecated because cg_annotate's merging does a better job.) +</para> -<para>Some odd things that can occur during annotation:</para> +<para> +Use it as follows: +</para> -<itemizedlist> - <listitem> - <para>If annotating at the assembler level, you might see - something like this:</para> <programlisting><![CDATA[ - 1 0 0 . . . . . . leal -12(%ebp),%eax - 1 0 0 . . . 1 0 0 movl %eax,84(%ebx) - 2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp) - . . . . . . . . . .align 4,0x90 - 1 0 0 . . . . . . movl $.LnrB,%eax - 1 0 0 . . . 1 0 0 movl %eax,-16(%ebp)]]></programlisting> - - <para>How can the third instruction be executed twice when - the others are executed only once? As it turns out, it - isn't. Here's a dump of the executable, using - <computeroutput>objdump -d</computeroutput>:</para> -<programlisting><![CDATA[ - 8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax - 8048f28: 89 43 54 mov %eax,0x54(%ebx) - 8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp) - 8048f32: 89 f6 mov %esi,%esi - 8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax - 8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp)]]></programlisting> - - <para>Notice the extra <computeroutput>mov - %esi,%esi</computeroutput> instruction. Where did this come - from? The GNU assembler inserted it to serve as the two - bytes of padding needed to align the <computeroutput>movl - $.LnrB,%eax</computeroutput> instruction on a four-byte - boundary, but pretended it didn't exist when adding debug - information. Thus when Valgrind reads the debug info it - thinks that the <computeroutput>movl - $0x1,0xffffffec(%ebp)</computeroutput> instruction covers the - address range 0x8048f2b--0x804833 by itself, and attributes - the counts for the <computeroutput>mov - %esi,%esi</computeroutput> to it.</para> - </listitem> - - <!-- - I think this isn't true any more, not since cost centres were moved from - being associated with instruction addresses to being associated with - source line numbers. - <listitem> - <para>Inlined functions can cause strange results in the - function-by-function summary. If a function - <computeroutput>inline_me()</computeroutput> is defined in - <filename>foo.h</filename> and inlined in the functions - <computeroutput>f1()</computeroutput>, - <computeroutput>f2()</computeroutput> and - <computeroutput>f3()</computeroutput> in - <filename>bar.c</filename>, there will not be a - <computeroutput>foo.h:inline_me()</computeroutput> function - entry. Instead, there will be separate function entries for - each inlining site, i.e. - <computeroutput>foo.h:f1()</computeroutput>, - <computeroutput>foo.h:f2()</computeroutput> and - <computeroutput>foo.h:f3()</computeroutput>. To find the - total counts for - <computeroutput>foo.h:inline_me()</computeroutput>, add up - the counts from each entry.</para> - - <para>The reason for this is that although the debug info - output by GCC indicates the switch from - <filename>bar.c</filename> to <filename>foo.h</filename>, it - doesn't indicate the name of the function in - <filename>foo.h</filename>, so Valgrind keeps using the old - one.</para> - </listitem> - --> - - <listitem> - <para>Sometimes, the same filename might be represented with - a relative name and with an absolute name in different parts - of the debug info, eg: - <filename>/home/user/proj/proj.h</filename> and - <filename>../proj.h</filename>. In this case, if you use - auto-annotation, the file will be annotated twice with the - counts split between the two.</para> - </listitem> - - <listitem> - <para>If you compile some files with - <option>-g</option> and some without, some - events that take place in a file without debug info could be - attributed to the last line of a file with debug info - (whichever one gets placed before the non-debug-info file in - the executable).</para> - </listitem> +cg_annotate file1 file2 file3 ... +]]></programlisting> -</itemizedlist> +<para> +cg_annotate computes the sum of these files (effectively +<filename>file1</filename> + <filename>file2</filename> + +<filename>file3</filename>), and then produces output as usual that shows the +summed counts. +</para> -<para>These cases should be rare.</para> +<para> +The most common merging scenario is if you want to aggregate costs over +multiple runs of the same program, possibly on different inputs. +</para> </sect2> -<sect2 id="cg-manual.cg_merge" xreflabel="cg_merge"> -<title>Merging Profiles with cg_merge</title> +<sect2 id="cg-manual.cg_diff" xreflabel="cg_diff"> +<title>Differencing Cachegrind output files</title> <para> -cg_merge is a simple program which -reads multiple profile files, as created by Cachegrind, merges them -together, and writes the results into another file in the same format. -You can then examine the merged results using -<computeroutput>cg_annotate <filename></computeroutput>, as -described above. The merging functionality might be useful if you -want to aggregate costs over multiple runs of the same program, or -from a single parallel run with multiple instances of the same -program.</para> +cg_annotate can diff data from two Cachegrind output files in a single run. +(There is also a program called cg_diff that can diff two Cachegrind output +files into a single Cachegrind output file, but it is now deprecated because +cg_annotate's differencing does a better job.) +</para> <para> -cg_merge is invoked as follows: +Use it as follows: </para> <programlisting><![CDATA[ -cg_merge -o outputfile file1 file2 file3 ...]]></programlisting> +cg_annotate --diff file1 file2 +]]></programlisting> <para> -It reads and checks <computeroutput>file1</computeroutput>, then read -and checks <computeroutput>file2</computeroutput> and merges it into -the running totals, then the same with -<computeroutput>file3</computeroutput>, etc. The final results are -written to <computeroutput>outputfile</computeroutput>, or to standard -out if no output file is specified.</para> +cg_annotate computes the difference between these two files (effectively +<filename>file2</filename> - <filename>file1</filename>), and then +produces output as usual that shows the count differences. Note that many of +the counts may be negative; this indicates that the counts for the relevant +file/function/line are smaller in the second version than those in the first +version. +</para> <para> -Costs are summed on a per-function, per-line and per-instruction -basis. Because of this, the order in which the input files does not -matter, although you should take care to only mention each file once, -since any file mentioned twice will be added in twice.</para> +The simplest common scenario is comparing two Cachegrind output files that came +from the same program, but on different inputs. cg_annotate will do a good job +on this without assistance. +</para> <para> -cg_merge does not attempt to check -that the input files come from runs of the same executable. It will -happily merge together profile files from completely unrelated -programs. It does however check that the -<computeroutput>Events:</computeroutput> lines of all the inputs are -identical, so as to ensure that the addition of costs makes sense. -For example, it would be nonsensical for it to add a number indicating -D1 read references to a number from a different file indicating LL -write misses.</para> +A more complex scenario is if you want to compare Cachegrind output files from +two slightly different versions of a program that you have sitting +side-by-side, running on the same input. For example, you might have +<filename>version1/prog.c</filename> and <filename>version2/prog.c</filename>. +A straight comparison of the two would not be useful. Because functions are +always paired with filenames, a function <function>f</function> would be listed +as <filename>version1/prog.c:f</filename> for the first version but +<filename>version2/prog.c:f</filename> for the second version. +</para> <para> -A number of other syntax and sanity checks are done whilst reading the -inputs. cg_merge will stop and -attempt to print a helpful error message if any of the input files -fail these checks.</para> - -</sect2> - - -<sect2 id="cg-manual.cg_diff" xreflabel="cg_diff"> -<title>Differencing Profiles with cg_diff</title> +In this case, use the <option>--mod-filename</option> option. Its argument is a +search-and-replace expression that will be applied to all the filenames in both +Cachegrind output files. It can be used to remove minor differences in +filenames. For example, the option +<option>--mod-filename='s/version[0-9]/versionN/'</option> will suffice for the +above example. +</para> <para> -cg_diff is a simple program which -reads two profile files, as created by Cachegrind, finds the difference -between them, and writes the results into another file in the same format. -You can then examine the merged results using -<computeroutput>cg_annotate <filename></computeroutput>, as -described above. This is very useful if you want to measure how a change to -a program affected its performance. +Similarly, sometimes compilers auto-generate certain functions and give them +randomized names like <function>T.1234</function> where the suffixes vary from +build to build. You can use the <option>--mod-funcname</option> option to +remove small differences like these; it works in the same way as +<option>--mod-filename</option>. </para> <para> -cg_diff is invoked as follows: +When <option>--mod-filename</option> is used to compare two different versions +of the same program, cg_annotate will not annotate any file that is different +between the two versions, because the per-line counts are not reliable in such +a case. For example, imagine if <filename>version2/prog.c</filename> is the +same as <filename>version1/prog.c</filename> except with an extra blank line at +the top of the file. Every single per-line count will have changed. In +comparison, the per-file and per-function counts have not changed, and are +still very useful for determining differences between programs. You might think +that this means every interesting file will be left unannotated, but again +inlining means that files that are identical in the two versions can have +different counts on many lines. </para> -<programlisting><![CDATA[ -cg_diff file1 file2]]></programlisting> -<para> -It reads and checks <computeroutput>file1</computeroutput>, then read -and checks <computeroutput>file2</computeroutput>, then computes the -difference (effectively <computeroutput>file1</computeroutput> - -<computeroutput>file2</computeroutput>). The final results are written to -standard output.</para> +</sect2> -<para> -Costs are summed on a per-function basis. Per-line costs are not summed, -because doing so is too difficult. For example, consider differencing two -profiles, one from a single-file program A, and one from the same program A -where a single blank line was inserted at the top of the file. Every single -per-line count has changed. In comparison, the per-function counts have not -changed. The per-function count differences are still very useful for -determining differences between programs. Note that because the result is -the difference of two profiles, many of the counts will be negative; this -indicates that the counts for the relevant function are fewer in the second -version than those in the first version.</para> +<sect2 id="cg-manual.cache-branch-sim" xreflabel="cache-branch-sim"> +<title>Cache and Branch Simulation</title> <para> -cg_diff does not attempt to check -that the input files come from runs of the same executable. It will -happily merge together profile files from completely unrelated -programs. It does however check that the -<computeroutput>Events:</computeroutput> lines of all the inputs are -identical, so as to ensure that the addition of costs makes sense. -For example, it would be nonsensical for it to add a number indicating -D1 read references to a number from a different file indicating LL -write misses.</para> +Cachegrind can simulate how your program interacts with a machine's cache +hierarchy and/or branch predictor. + +The cache simulation models a machine with independent first-level instruction +and data caches (I1 and D1), backed by a unified second-level cache (L2). For +these machines (in the cases where Cachegrind can auto-detect the cache +configuration) Cachegrind simulates the first-level and last-level caches. +Therefore, Cachegrind always refers to the I1, D1 and LL (last-level) caches. +</para> <para> -A number of other syntax and sanity checks are done whilst reading the -inputs. cg_diff will stop and -attempt to print a helpful error message if any of the input files -fail these checks.</para> +When simulating the cache, with <option>--cache-sim=yes</option>, Cachegrind +gathers the following statistics: +</para> + +<itemizedlist> + <listitem> + <para> + I cache reads (<computeroutput>Ir</computeroutput>, which equals the number + of instructions executed), I1 cache read misses + (<computeroutput>I1mr</computeroutput>) and LL cache instruction read + misses (<computeroutput>ILmr</computeroutput>). + </para> + </listitem> + <listitem> + <para> + D cache reads (<computeroutput>Dr</computeroutput>, which equals the number + of memory reads), D1 cache read misses + (<computeroutput>D1mr</computeroutput>), and LL cache data read misses + (<computeroutput>DLmr</computeroutput>). + </para> + </listitem> + <listitem> + <... [truncated message content] |
|
From: Nicholas N. <nj...@so...> - 2023-04-21 12:44:40
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=24932ed4491a3b74f9bfacbd843074d8892c693a commit 24932ed4491a3b74f9bfacbd843074d8892c693a Author: Nicholas Nethercote <n.n...@gm...> Date: Fri Apr 21 07:06:06 2023 +1000 Update NEWS about recent Cachegrind changes. Diff: --- NEWS | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/NEWS b/NEWS index 57e39f42a3..50f171cd58 100644 --- a/NEWS +++ b/NEWS @@ -98,9 +98,31 @@ AMD64/macOS 10.13 and nanoMIPS/Linux. front end monitor commands. See CORE CHANGES. * Cachegrind: - - `cg_annotate` has been rewritten from Perl into Python. The new - version is twice as fast, has more flexible argument parsing, and - will make future improvements easier. + - `--cache-sim=no` is now the default. The cache simulation is old and + unlikely to match any real modern machine. This means only the `Ir` + event are gathered by default, but that is by far the most useful + event. + - `cg_annotate`, `cg_diff`, and `cg_merge` have been rewritten in + Python. As a result, they all have more flexible command line + argument handling, e.g. supporting `--show-percs` and + `--no-show-percs` forms as well as the existing `--show-percs=yes` + and `--show-percs=no`. + - `cg_annotate` has some functional changes. + - It's much faster, e.g. 3-4x on common cases. + - It now supports diffing (with `--diff`, `--mod-filename`, and + `--mod-funcname`) and merging (by passing multiple data files). + - It now provides more information at the file and function level. + There are now "File:function" and "Function:file" sections. These + are very useful for programs that use inlining a lot. + - Support for user-annotated files and the `-I`/`--include` option + has been removed, because it was of little use and blocked other + improvements. + - The `--auto` option is renamed `--annotate`, though the old + `--auto=yes`/`--auto=no` forms are still supported. + - `cg_diff` and `cg_merge` are now deprecated, because `cg_annotate` + now does a better job of diffing and merging. + - The Cachegrind output file format has changed very slightly, but in + ways nobody is likely to notice. * Callgrind: - Valgrind now contains python code that defines GDB callgrind |
|
From: Nicholas N. <nj...@so...> - 2023-04-21 12:44:35
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=15a11f98f5aac8dc8724a5f1241eb97a5a477998 commit 15a11f98f5aac8dc8724a5f1241eb97a5a477998 Author: Nicholas Nethercote <n.n...@gm...> Date: Fri Apr 21 16:12:33 2023 +1000 Get rid of cache config warnings with `--cache-sim=no`. By not configuring the caches in that case. This requires moving a few assertions around, because they currently assume that the caches are configured. Diff: --- cachegrind/cg_main.c | 75 ++++++++++++++++++++++++++++------------------------ 1 file changed, 40 insertions(+), 35 deletions(-) diff --git a/cachegrind/cg_main.c b/cachegrind/cg_main.c index 1ef7ce4f93..ef3ea03ebc 100644 --- a/cachegrind/cg_main.c +++ b/cachegrind/cg_main.c @@ -894,15 +894,18 @@ static void addEvent_Ir ( CgState* cgs, InstrInfo* inode ) static void addEvent_Dr ( CgState* cgs, InstrInfo* inode, Int datasize, IRAtom* ea ) { - Event* evt; tl_assert(isIRAtom(ea)); - tl_assert(datasize >= 1 && datasize <= min_line_size); + if (!clo_cache_sim) return; - if (cgs->events_used == N_EVENTS) + + tl_assert(datasize >= 1 && datasize <= min_line_size); + + if (cgs->events_used == N_EVENTS) { flushEvents(cgs); + } tl_assert(cgs->events_used >= 0 && cgs->events_used < N_EVENTS); - evt = &cgs->events[cgs->events_used]; + Event* evt = &cgs->events[cgs->events_used]; init_Event(evt); evt->tag = Ev_Dr; evt->inode = inode; @@ -914,14 +917,13 @@ void addEvent_Dr ( CgState* cgs, InstrInfo* inode, Int datasize, IRAtom* ea ) static void addEvent_Dw ( CgState* cgs, InstrInfo* inode, Int datasize, IRAtom* ea ) { - Event* evt; - tl_assert(isIRAtom(ea)); - tl_assert(datasize >= 1 && datasize <= min_line_size); if (!clo_cache_sim) return; + tl_assert(datasize >= 1 && datasize <= min_line_size); + /* Is it possible to merge this write with the preceding read? */ if (cgs->events_used > 0) { Event* lastEvt = &cgs->events[cgs->events_used-1]; @@ -939,7 +941,7 @@ void addEvent_Dw ( CgState* cgs, InstrInfo* inode, Int datasize, IRAtom* ea ) if (cgs->events_used == N_EVENTS) flushEvents(cgs); tl_assert(cgs->events_used >= 0 && cgs->events_used < N_EVENTS); - evt = &cgs->events[cgs->events_used]; + Event* evt = &cgs->events[cgs->events_used]; init_Event(evt); evt->tag = Ev_Dw; evt->inode = inode; @@ -956,11 +958,12 @@ void addEvent_D_guarded ( CgState* cgs, InstrInfo* inode, tl_assert(isIRAtom(ea)); tl_assert(guard); tl_assert(isIRAtom(guard)); - tl_assert(datasize >= 1 && datasize <= min_line_size); if (!clo_cache_sim) return; + tl_assert(datasize >= 1 && datasize <= min_line_size); + /* Adding guarded memory actions and merging them with the existing queue is too complex. Simply flush the queue and add this action immediately. Since guarded loads and stores are pretty @@ -1511,7 +1514,7 @@ static void fprint_CC_table_and_calc_totals(void) } // Summary stats must come after rest of table, since we calculate them - // during traversal. */ + // during traversal. if (clo_cache_sim && clo_branch_sim) { VG_(fprintf)(fp, "summary:" " %llu %llu %llu" @@ -1823,32 +1826,34 @@ static void cg_post_clo_init(void) VG_(malloc), "cg.main.cpci.3", VG_(free)); - VG_(post_clo_init_configure_caches)(&I1c, &D1c, &LLc, - &clo_I1_cache, - &clo_D1_cache, - &clo_LL_cache); - - // min_line_size is used to make sure that we never feed - // accesses to the simulator straddling more than two - // cache lines at any cache level - min_line_size = (I1c.line_size < D1c.line_size) ? I1c.line_size : D1c.line_size; - min_line_size = (LLc.line_size < min_line_size) ? LLc.line_size : min_line_size; - - Int largest_load_or_store_size - = VG_(machine_get_size_of_largest_guest_register)(); - if (min_line_size < largest_load_or_store_size) { - /* We can't continue, because the cache simulation might - straddle more than 2 lines, and it will assert. So let's - just stop before we start. */ - VG_(umsg)("Cachegrind: cannot continue: the minimum line size (%d)\n", - (Int)min_line_size); - VG_(umsg)(" must be equal to or larger than the maximum register size (%d)\n", - largest_load_or_store_size ); - VG_(umsg)(" but it is not. Exiting now.\n"); - VG_(exit)(1); - } + if (clo_cache_sim) { + VG_(post_clo_init_configure_caches)(&I1c, &D1c, &LLc, + &clo_I1_cache, + &clo_D1_cache, + &clo_LL_cache); + + // min_line_size is used to make sure that we never feed + // accesses to the simulator straddling more than two + // cache lines at any cache level + min_line_size = (I1c.line_size < D1c.line_size) ? I1c.line_size : D1c.line_size; + min_line_size = (LLc.line_size < min_line_size) ? LLc.line_size : min_line_size; + + Int largest_load_or_store_size + = VG_(machine_get_size_of_largest_guest_register)(); + if (min_line_size < largest_load_or_store_size) { + /* We can't continue, because the cache simulation might + straddle more than 2 lines, and it will assert. So let's + just stop before we start. */ + VG_(umsg)("Cachegrind: cannot continue: the minimum line size (%d)\n", + (Int)min_line_size); + VG_(umsg)(" must be equal to or larger than the maximum register size (%d)\n", + largest_load_or_store_size ); + VG_(umsg)(" but it is not. Exiting now.\n"); + VG_(exit)(1); + } - cachesim_initcaches(I1c, D1c, LLc); + cachesim_initcaches(I1c, D1c, LLc); + } } VG_DETERMINE_INTERFACE_VERSION(cg_pre_clo_init) |
|
From: Nicholas N. <nj...@so...> - 2023-04-21 12:44:34
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=2cccba7cae9f77aa6c2a498bc30a42bdb1ed8829 commit 2cccba7cae9f77aa6c2a498bc30a42bdb1ed8829 Author: Nicholas Nethercote <n.n...@gm...> Date: Fri Apr 21 09:33:04 2023 +1000 Tweak printing of `I refs` and `D refs` lines. Because `--cache-sim=no` is the default now, and `I refs:` looks weird by itself. Diff: --- cachegrind/cg_main.c | 4 ++-- cachegrind/tests/ann-diff1.stderr.exp | 2 +- cachegrind/tests/ann-diff2.stderr.exp | 2 +- cachegrind/tests/ann-diff3.stderr.exp | 2 +- cachegrind/tests/ann-diff4.stderr.exp | 2 +- cachegrind/tests/ann-merge1.stderr.exp | 2 +- cachegrind/tests/ann-merge2.stderr.exp | 2 +- cachegrind/tests/ann1a.stderr.exp | 2 +- cachegrind/tests/ann1b.stderr.exp | 2 +- cachegrind/tests/ann2.stderr.exp | 2 +- cachegrind/tests/chdir.stderr.exp | 2 +- cachegrind/tests/dlclose.stderr.exp | 2 +- cachegrind/tests/notpower2.stderr.exp | 4 ++-- cachegrind/tests/wrap5.stderr.exp | 2 +- cachegrind/tests/x86/fpu-28-108.stderr.exp | 2 +- 15 files changed, 17 insertions(+), 17 deletions(-) diff --git a/cachegrind/cg_main.c b/cachegrind/cg_main.c index c17ab975b0..1ef7ce4f93 100644 --- a/cachegrind/cg_main.c +++ b/cachegrind/cg_main.c @@ -1589,7 +1589,7 @@ static void cg_fini(Int exitcode) VG_(sprintf)(fmt, "%%s %%,%dllu\n", l1); /* Always print this */ - VG_(umsg)(fmt, "I refs: ", Ir_total.a); + VG_(umsg)(fmt, "I refs: ", Ir_total.a); /* If cache profiling is enabled, show D access numbers and all miss numbers */ @@ -1614,7 +1614,7 @@ static void cg_fini(Int exitcode) VG_(sprintf)(fmt, "%%s %%,%dllu (%%,%dllu rd + %%,%dllu wr)\n", l1, l2, l3); - VG_(umsg)(fmt, "D refs: ", + VG_(umsg)(fmt, "D refs: ", D_total.a, Dr_total.a, Dw_total.a); VG_(umsg)(fmt, "D1 misses: ", D_total.m1, Dr_total.m1, Dw_total.m1); diff --git a/cachegrind/tests/ann-diff1.stderr.exp b/cachegrind/tests/ann-diff1.stderr.exp index ec68407b27..28cb02fa84 100644 --- a/cachegrind/tests/ann-diff1.stderr.exp +++ b/cachegrind/tests/ann-diff1.stderr.exp @@ -1,3 +1,3 @@ -I refs: +I refs: diff --git a/cachegrind/tests/ann-diff2.stderr.exp b/cachegrind/tests/ann-diff2.stderr.exp index ec68407b27..28cb02fa84 100644 --- a/cachegrind/tests/ann-diff2.stderr.exp +++ b/cachegrind/tests/ann-diff2.stderr.exp @@ -1,3 +1,3 @@ -I refs: +I refs: diff --git a/cachegrind/tests/ann-diff3.stderr.exp b/cachegrind/tests/ann-diff3.stderr.exp index ec68407b27..28cb02fa84 100644 --- a/cachegrind/tests/ann-diff3.stderr.exp +++ b/cachegrind/tests/ann-diff3.stderr.exp @@ -1,3 +1,3 @@ -I refs: +I refs: diff --git a/cachegrind/tests/ann-diff4.stderr.exp b/cachegrind/tests/ann-diff4.stderr.exp index ec68407b27..28cb02fa84 100644 --- a/cachegrind/tests/ann-diff4.stderr.exp +++ b/cachegrind/tests/ann-diff4.stderr.exp @@ -1,3 +1,3 @@ -I refs: +I refs: diff --git a/cachegrind/tests/ann-merge1.stderr.exp b/cachegrind/tests/ann-merge1.stderr.exp index ec68407b27..28cb02fa84 100644 --- a/cachegrind/tests/ann-merge1.stderr.exp +++ b/cachegrind/tests/ann-merge1.stderr.exp @@ -1,3 +1,3 @@ -I refs: +I refs: diff --git a/cachegrind/tests/ann-merge2.stderr.exp b/cachegrind/tests/ann-merge2.stderr.exp index ec68407b27..28cb02fa84 100644 --- a/cachegrind/tests/ann-merge2.stderr.exp +++ b/cachegrind/tests/ann-merge2.stderr.exp @@ -1,3 +1,3 @@ -I refs: +I refs: diff --git a/cachegrind/tests/ann1a.stderr.exp b/cachegrind/tests/ann1a.stderr.exp index ec68407b27..28cb02fa84 100644 --- a/cachegrind/tests/ann1a.stderr.exp +++ b/cachegrind/tests/ann1a.stderr.exp @@ -1,3 +1,3 @@ -I refs: +I refs: diff --git a/cachegrind/tests/ann1b.stderr.exp b/cachegrind/tests/ann1b.stderr.exp index ec68407b27..28cb02fa84 100644 --- a/cachegrind/tests/ann1b.stderr.exp +++ b/cachegrind/tests/ann1b.stderr.exp @@ -1,3 +1,3 @@ -I refs: +I refs: diff --git a/cachegrind/tests/ann2.stderr.exp b/cachegrind/tests/ann2.stderr.exp index ec68407b27..28cb02fa84 100644 --- a/cachegrind/tests/ann2.stderr.exp +++ b/cachegrind/tests/ann2.stderr.exp @@ -1,3 +1,3 @@ -I refs: +I refs: diff --git a/cachegrind/tests/chdir.stderr.exp b/cachegrind/tests/chdir.stderr.exp index ec68407b27..28cb02fa84 100644 --- a/cachegrind/tests/chdir.stderr.exp +++ b/cachegrind/tests/chdir.stderr.exp @@ -1,3 +1,3 @@ -I refs: +I refs: diff --git a/cachegrind/tests/dlclose.stderr.exp b/cachegrind/tests/dlclose.stderr.exp index ec68407b27..28cb02fa84 100644 --- a/cachegrind/tests/dlclose.stderr.exp +++ b/cachegrind/tests/dlclose.stderr.exp @@ -1,3 +1,3 @@ -I refs: +I refs: diff --git a/cachegrind/tests/notpower2.stderr.exp b/cachegrind/tests/notpower2.stderr.exp index e8084c12c3..6960e51afb 100644 --- a/cachegrind/tests/notpower2.stderr.exp +++ b/cachegrind/tests/notpower2.stderr.exp @@ -1,12 +1,12 @@ -I refs: +I refs: I1 misses: LLi misses: I1 miss rate: LLi miss rate: -D refs: +D refs: D1 misses: LLd misses: D1 miss rate: diff --git a/cachegrind/tests/wrap5.stderr.exp b/cachegrind/tests/wrap5.stderr.exp index ec68407b27..28cb02fa84 100644 --- a/cachegrind/tests/wrap5.stderr.exp +++ b/cachegrind/tests/wrap5.stderr.exp @@ -1,3 +1,3 @@ -I refs: +I refs: diff --git a/cachegrind/tests/x86/fpu-28-108.stderr.exp b/cachegrind/tests/x86/fpu-28-108.stderr.exp index ec68407b27..28cb02fa84 100644 --- a/cachegrind/tests/x86/fpu-28-108.stderr.exp +++ b/cachegrind/tests/x86/fpu-28-108.stderr.exp @@ -1,3 +1,3 @@ -I refs: +I refs: |
|
From: Nicholas N. <nj...@so...> - 2023-04-21 12:44:29
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=307f96a519d458818d32d8c63eb3628c25db97e4 commit 307f96a519d458818d32d8c63eb3628c25db97e4 Author: Nicholas Nethercote <n.n...@gm...> Date: Fri Apr 21 15:59:39 2023 +1000 Reorder options in Cachegrind's `-h` output. Put the commonly used ones first. Diff: --- cachegrind/cg_arch.c | 2 +- cachegrind/cg_main.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/cachegrind/cg_arch.c b/cachegrind/cg_arch.c index 57570dd638..52e8982184 100644 --- a/cachegrind/cg_arch.c +++ b/cachegrind/cg_arch.c @@ -317,7 +317,7 @@ void VG_(print_cache_clo_opts)() " --I1=<size>,<assoc>,<line_size> set I1 cache manually\n" " --D1=<size>,<assoc>,<line_size> set D1 cache manually\n" " --LL=<size>,<assoc>,<line_size> set LL cache manually\n" - ); + ); } diff --git a/cachegrind/cg_main.c b/cachegrind/cg_main.c index c4e111aa30..c17ab975b0 100644 --- a/cachegrind/cg_main.c +++ b/cachegrind/cg_main.c @@ -1758,12 +1758,12 @@ static Bool cg_process_cmd_line_option(const HChar* arg) static void cg_print_usage(void) { - VG_(print_cache_clo_opts)(); VG_(printf)( +" --cachegrind-out-file=<file> output file name [cachegrind.out.%%p]\n" " --cache-sim=yes|no collect cache stats? [yes]\n" " --branch-sim=yes|no collect branch prediction stats? [no]\n" -" --cachegrind-out-file=<file> output file name [cachegrind.out.%%p]\n" ); + VG_(print_cache_clo_opts)(); } static void cg_print_debug_usage(void) |
|
From: Nicholas N. <nj...@so...> - 2023-04-21 12:44:28
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=1fdf0e728a047f0aab4de805576b6a3a84f37b79 commit 1fdf0e728a047f0aab4de805576b6a3a84f37b79 Author: Nicholas Nethercote <n.n...@gm...> Date: Wed Apr 12 10:02:13 2023 +1000 Add diff and merge capability to `cg_annotate`. And deprecate the use of `cg_diff` and `cg_merge`. Because `cg_annotate` can do a better job, even annotating source files when doing diffs in some cases. The user requests merging by passing multiple cgout files to `cg_annotate`, and diffing by passing two cgout files to `cg_annotate` along with `--diff`. Diff: --- cachegrind/cg_annotate.in | 603 +++++++++++++++++++++++---------- cachegrind/cg_diff.in | 14 +- cachegrind/cg_merge.in | 10 +- cachegrind/tests/Makefile.am | 8 + cachegrind/tests/ann-diff1.post.exp | 11 +- cachegrind/tests/ann-diff1.vgtest | 8 +- cachegrind/tests/ann-diff2.post.exp | 10 +- cachegrind/tests/ann-diff2.vgtest | 6 +- cachegrind/tests/ann-diff2b.cgout | 2 +- cachegrind/tests/ann-diff3.post.exp | 63 ++++ cachegrind/tests/ann-diff3.stderr.exp | 3 + cachegrind/tests/ann-diff3.vgtest | 8 + cachegrind/tests/ann-diff4.post.exp | 125 +++++++ cachegrind/tests/ann-diff4.stderr.exp | 3 + cachegrind/tests/ann-diff4.vgtest | 14 + cachegrind/tests/ann-diff4a-aux/w.rs | 3 + cachegrind/tests/ann-diff4a-aux/x.rs | 5 + cachegrind/tests/ann-diff4a-aux/y.rs | 5 + cachegrind/tests/ann-diff4a-aux/z.rs | 3 + cachegrind/tests/ann-diff4a.cgout | 30 ++ cachegrind/tests/ann-diff4b-aux/x.rs | 5 + cachegrind/tests/ann-diff4b-aux/y.rs | 6 + cachegrind/tests/ann-diff4b.cgout | 31 ++ cachegrind/tests/ann-merge1.post.exp | 9 +- cachegrind/tests/ann-merge1.vgtest | 5 +- cachegrind/tests/ann-merge2.post.exp | 85 +++++ cachegrind/tests/ann-merge2.stderr.exp | 3 + cachegrind/tests/ann-merge2.vgtest | 8 + cachegrind/tests/ann1a.post.exp | 42 ++- cachegrind/tests/ann1a.vgtest | 4 +- cachegrind/tests/ann1b.post.exp | 14 +- cachegrind/tests/ann1b.vgtest | 4 +- cachegrind/tests/ann2.post.exp | 23 +- cachegrind/tests/ann2.vgtest | 2 +- 34 files changed, 921 insertions(+), 254 deletions(-) diff --git a/cachegrind/cg_annotate.in b/cachegrind/cg_annotate.in index 5e64a94485..c76a760be0 100755 --- a/cachegrind/cg_annotate.in +++ b/cachegrind/cg_annotate.in @@ -34,17 +34,28 @@ from __future__ import annotations +import filecmp import os import re import sys from argparse import ArgumentParser, BooleanOptionalAction, Namespace from collections import defaultdict -from typing import DefaultDict, NoReturn, TextIO +from typing import Callable, DefaultDict, NoReturn, TextIO +def die(msg: str) -> NoReturn: + print("cg_annotate: error:", msg, file=sys.stderr) + sys.exit(1) + + +SearchAndReplace = Callable[[str], str] + # A typed wrapper for parsed args. class Args(Namespace): # None of these fields are modified after arg parsing finishes. + diff: bool + mod_filename: SearchAndReplace + mod_funcname: SearchAndReplace show: list[str] sort: list[str] threshold: float # a percentage @@ -55,6 +66,42 @@ class Args(Namespace): @staticmethod def parse() -> Args: + # We support Perl-style `s/old/new/flags` search-and-replace + # expressions, because that's how this option was implemented in the + # old Perl version of `cg_diff`. This requires conversion from + # `s/old/new/` style to `re.sub`. The conversion isn't a perfect + # emulation of Perl regexps (e.g. Python uses `\1` rather than `$1` for + # using captures in the `new` part), but it should be close enough. The + # only supported flags are `g` (global) and `i` (ignore case). + def search_and_replace(regex: str | None) -> SearchAndReplace: + if regex is None: + return lambda s: s + + # Extract the parts of an `s/old/new/tail` regex. `(?<!\\)/` is an + # example of negative lookbehind. It means "match a forward slash + # unless preceded by a backslash". + m = re.match(r"s/(.*)(?<!\\)/(.*)(?<!\\)/(g|i|gi|ig|)$", regex) + if m is None: + raise ValueError + + # Forward slashes must be escaped in an `s/old/new/` expression, + # but we then must unescape them before using them with `re.sub`. + pat = m.group(1).replace(r"\/", r"/") + repl = m.group(2).replace(r"\/", r"/") + tail = m.group(3) + + if "g" in tail: + count = 0 # unlimited + else: + count = 1 + + if "i" in tail: + flags = re.IGNORECASE + else: + flags = re.RegexFlag(0) + + return lambda s: re.sub(re.compile(pat, flags=flags), repl, s, count=count) + def comma_separated_list(values: str) -> list[str]: return values.split(",") @@ -97,9 +144,30 @@ class Args(Namespace): help=f"(deprecated) same as --no-{new_name}", ) - p = ArgumentParser(description="Process a Cachegrind output file.") + p = ArgumentParser(description="Process one or more Cachegrind output files.") p.add_argument("--version", action="version", version="%(prog)s-@VERSION@") + p.add_argument( + "--diff", + default=False, + action="store_true", + help="perform a diff between two Cachegrind output files", + ) + p.add_argument( + "--mod-filename", + type=search_and_replace, + metavar="REGEX", + default=search_and_replace(None), + help="a search-and-replace regex applied to filenames, e.g. " + "`s/prog[0-9]/progN/`", + ) + p.add_argument( + "--mod-funcname", + type=search_and_replace, + metavar="REGEX", + default=search_and_replace(None), + help="like --mod-filename, but for function names", + ) p.add_argument( "--show", type=comma_separated_list, @@ -143,12 +211,19 @@ class Args(Namespace): ) p.add_argument( "cgout_filename", - nargs=1, + nargs="+", metavar="cachegrind-out-file", help="file produced by Cachegrind", ) - return p.parse_args(namespace=Args()) + # `args0` name used to avoid shadowing the global `args`, which pylint + # doesn't like. + args0 = p.parse_args(namespace=Args()) + if args0.diff and len(args0.cgout_filename) != 2: + p.print_usage(file=sys.stderr) + die("argument --diff: requires exactly two Cachegrind output files") + + return args0 # Args are stored in a global for easy access. @@ -178,7 +253,11 @@ class Events: # Like `sort_events`, but indices into `events`, rather than names. sort_indices: list[int] - def __init__(self, text: str) -> None: + def __init__(self) -> None: + # All fields are left uninitialized here, and set instead in `init`. + pass + + def init(self, text: str) -> None: self.events = text.split() self.num_events = len(self.events) @@ -245,9 +324,15 @@ def add_cc_to_cc(a_cc: Cc, b_cc: Cc) -> None: b_cc[i] += a_count +# Subtract the counts in `a_cc` from `b_cc`. +def sub_cc_from_cc(a_cc: Cc, b_cc: Cc) -> None: + for i, a_count in enumerate(a_cc): + b_cc[i] -= a_count + + # Unrolled version of `add_cc_to_cc`, for speed. def add_cc_to_ccs( - a_cc: Cc, b_cc1: Cc, b_cc2: Cc, b_cc3: Cc, b_cc4: Cc, b_cc5: Cc + a_cc: Cc, b_cc1: Cc, b_cc2: Cc, b_cc3: Cc, b_cc4: Cc, b_cc5: Cc, total_cc: Cc ) -> None: for i, a_count in enumerate(a_cc): b_cc1[i] += a_count @@ -255,6 +340,21 @@ def add_cc_to_ccs( b_cc3[i] += a_count b_cc4[i] += a_count b_cc5[i] += a_count + total_cc[i] += a_count + + +# Unrolled version of `sub_cc_from_cc`, for speed. Note that the last one, +# `total_cc`, is added. +def sub_cc_from_ccs( + a_cc: Cc, b_cc1: Cc, b_cc2: Cc, b_cc3: Cc, b_cc4: Cc, b_cc5: Cc, total_cc: Cc +) -> None: + for i, a_count in enumerate(a_cc): + b_cc1[i] -= a_count + b_cc2[i] -= a_count + b_cc3[i] -= a_count + b_cc4[i] -= a_count + b_cc5[i] -= a_count + total_cc[i] += a_count # Update `min_cc` and `max_cc` with `self`. @@ -266,59 +366,70 @@ def update_cc_extremes(self: Cc, min_cc: Cc, max_cc: Cc) -> None: min_cc[i] = count -# A deep cost centre with a dict for the inner names and CCs. +# Note: some abbrevations used below: +# - Ofl/ofl: original filename, as mentioned in a cgout file. +# - Ofn/ofn: original function name, as mentioned in a cgout file. +# - Mfl/mfl: modified filename, the result of passing an Ofl through +# `--mod-filename`. +# - Mfn/mfn: modified function name, the result of passing an Ofn through +# `--mod-funcname`. +# - Mname/mname: modified name, used for what could be an Mfl or an Mfn. + +# A deep cost centre with a dict for the inner mnames and CCs. class Dcc: outer_cc: Cc - inner_dict_name_cc: DictNameCc + inner_dict_mname_cc: DictMnameCc - def __init__(self, outer_cc: Cc, inner_dict_name_cc: DictNameCc) -> None: + def __init__(self, outer_cc: Cc, inner_dict_mname_cc: DictMnameCc) -> None: self.outer_cc = outer_cc - self.inner_dict_name_cc = inner_dict_name_cc + self.inner_dict_mname_cc = inner_dict_mname_cc -# A deep cost centre with a list for the inner names and CCs. Used during +# A deep cost centre with a list for the inner mnames and CCs. Used during # filtering and sorting. class Lcc: outer_cc: Cc - inner_list_name_cc: ListNameCc + inner_list_mname_cc: ListMnameCc - def __init__(self, outer_cc: Cc, inner_list_name_cc: ListNameCc) -> None: + def __init__(self, outer_cc: Cc, inner_list_mname_cc: ListMnameCc) -> None: self.outer_cc = outer_cc - self.inner_list_name_cc = inner_list_name_cc + self.inner_list_mname_cc = inner_list_mname_cc -# Per-file/function CCs. The list version is used during filtering and sorting. -DictNameCc = DefaultDict[str, Cc] -ListNameCc = list[tuple[str, Cc]] +# Per-Mfl/Mfn CCs. The list version is used during filtering and sorting. +DictMnameCc = DefaultDict[str, Cc] +ListMnameCc = list[tuple[str, Cc]] -# Per-file/function DCCs. The outer names are filenames and the inner names are -# function names, or vice versa. The list version is used during filtering and -# sorting. -DictNameDcc = DefaultDict[str, Dcc] -ListNameLcc = list[tuple[str, Lcc]] +# Per-Mfl/Mfn DCCs. The outer Mnames are Mfls and the inner Mnames are Mfns, or +# vice versa. The list version is used during filtering and sorting. +DictMnameDcc = DefaultDict[str, Dcc] +ListMnameLcc = list[tuple[str, Lcc]] -# Per-line CCs, organised by filename and line number. +# Per-line CCs, organised by Mfl and line number. DictLineCc = DefaultDict[int, Cc] -DictFlDictLineCc = DefaultDict[str, DictLineCc] +DictMflDictLineCc = DefaultDict[str, DictLineCc] - -def die(msg: str) -> NoReturn: - print("cg_annotate: error:", msg, file=sys.stderr) - sys.exit(1) +# A dictionary tracking how Ofls get mapped to Mfls by `--mod-filename`. If +# `--mod-filename` isn't used, each entry will be the identity mapping: ("foo" +# -> set(["foo"])). +DictMflOfls = DefaultDict[str, set[str]] -def read_cgout_file() -> tuple[ - str, - str, - Events, - DictNameDcc, - DictNameDcc, - DictFlDictLineCc, - Cc, -]: +def read_cgout_file( + cgout_filename: str, + is_first_file: bool, + descs: list[str], + cmds: list[str], + events: Events, + dict_mfl_ofls: DictMflOfls, + dict_mfl_dcc: DictMnameDcc, + dict_mfn_dcc: DictMnameDcc, + dict_mfl_dict_line_cc: DictMflDictLineCc, + summary_cc: Cc, +) -> None: # The file format is described in Cachegrind's manual. try: - cgout_file = open(args.cgout_filename[0], "r", encoding="utf-8") + cgout_file = open(cgout_filename, "r", encoding="utf-8") except OSError as err: die(f"{err}") @@ -340,40 +451,64 @@ def read_cgout_file() -> tuple[ desc += m.group(1) + "\n" else: break + descs.append(desc) # Read "cmd:" line. (`line` is already set from the "desc:" loop.) if m := re.match(r"cmd:\s+(.*)", line): - cmd = m.group(1) + cmds.append(m.group(1)) else: parse_die("missing a `command:` line") # Read "events:" line. line = readline() if m := re.match(r"events:\s+(.*)", line): - events = Events(m.group(1)) + if is_first_file: + events.init(m.group(1)) + else: + events2 = Events() + events2.init(m.group(1)) + if events.events != events2.events: + die("events in data files don't match") else: parse_die("missing an `events:` line") def mk_empty_dict_line_cc() -> DictLineCc: return defaultdict(events.mk_empty_cc) - # The current filename and function name. - fl = "" - fn = "" - - # Different places where we accumulate CC data. - dict_fl_dcc: DictNameDcc = defaultdict(events.mk_empty_dcc) - dict_fn_dcc: DictNameDcc = defaultdict(events.mk_empty_dcc) - dict_fl_dict_line_cc: DictFlDictLineCc = defaultdict(mk_empty_dict_line_cc) - summary_cc = None + # The current Mfl and Mfn. + mfl = "" + mfn = "" + + # These values are passed in by reference and are modified by this + # function. But they can't be properly initialized until the `events:` + # line of the first file is read and the number of events is known. So + # we initialize them in an invalid state in `main`, and then + # reinitialize them properly here, before their first use. + if is_first_file: + dict_mfl_dcc.default_factory = events.mk_empty_dcc + dict_mfn_dcc.default_factory = events.mk_empty_dcc + dict_mfl_dict_line_cc.default_factory = mk_empty_dict_line_cc + summary_cc.extend(events.mk_empty_cc()) # These are refs into the dicts above, used to avoid repeated lookups. # They are all overwritten before first use. - fl_dcc = events.mk_empty_dcc() - fn_dcc = events.mk_empty_dcc() - fl_dcc_inner_fn_cc = events.mk_empty_cc() - fn_dcc_inner_fl_cc = events.mk_empty_cc() + mfl_dcc = events.mk_empty_dcc() + mfn_dcc = events.mk_empty_dcc() + mfl_dcc_inner_mfn_cc = events.mk_empty_cc() + mfn_dcc_inner_mfl_cc = events.mk_empty_cc() dict_line_cc = mk_empty_dict_line_cc() + total_cc = events.mk_empty_cc() + + # When diffing, we negate the first cgout file's counts to effectively + # achieve `cgout2 - cgout1`. + if args.diff and is_first_file: + combine_cc_with_cc = sub_cc_from_cc + combine_cc_with_ccs = sub_cc_from_ccs + else: + combine_cc_with_cc = add_cc_to_cc + combine_cc_with_ccs = add_cc_to_ccs + + summary_cc_present = False # Line matching is done in order of pattern frequency, for speed. while line := readline(): @@ -385,37 +520,54 @@ def read_cgout_file() -> tuple[ except ValueError: parse_die("malformed or too many event counts") - # Record this CC at the file:function level, the function:file - # level, and the file/line level. - add_cc_to_ccs( + # Record this CC at various levels. + combine_cc_with_ccs( cc, - fl_dcc.outer_cc, - fn_dcc.outer_cc, - fl_dcc_inner_fn_cc, - fn_dcc_inner_fl_cc, + mfl_dcc.outer_cc, + mfn_dcc.outer_cc, + mfl_dcc_inner_mfn_cc, + mfn_dcc_inner_mfl_cc, dict_line_cc[line_num], + total_cc, ) elif line.startswith("fn="): - fn = line[3:-1] - # `fl_dcc` is unchanged. - fn_dcc = dict_fn_dcc[fn] - fl_dcc_inner_fn_cc = fl_dcc.inner_dict_name_cc[fn] - fn_dcc_inner_fl_cc = fn_dcc.inner_dict_name_cc[fl] + ofn = line[3:-1] + mfn = args.mod_funcname(ofn) + # `mfl_dcc` is unchanged. + mfn_dcc = dict_mfn_dcc[mfn] + mfl_dcc_inner_mfn_cc = mfl_dcc.inner_dict_mname_cc[mfn] + mfn_dcc_inner_mfl_cc = mfn_dcc.inner_dict_mname_cc[mfl] elif line.startswith("fl="): - fl = line[3:-1] + ofl = line[3:-1] + mfl = args.mod_filename(ofl) + dict_mfl_ofls[mfl].add(ofl) # A `fn=` line should follow, overwriting the function name. - fn = "<unspecified>" - fl_dcc = dict_fl_dcc[fl] - fn_dcc = dict_fn_dcc[fn] - fl_dcc_inner_fn_cc = fl_dcc.inner_dict_name_cc[fn] - fn_dcc_inner_fl_cc = fn_dcc.inner_dict_name_cc[fl] - dict_line_cc = dict_fl_dict_line_cc[fl] + mfn = "<unspecified>" + mfl_dcc = dict_mfl_dcc[mfl] + mfn_dcc = dict_mfn_dcc[mfn] + mfl_dcc_inner_mfn_cc = mfl_dcc.inner_dict_mname_cc[mfn] + mfn_dcc_inner_mfl_cc = mfn_dcc.inner_dict_mname_cc[mfl] + dict_line_cc = dict_mfl_dict_line_cc[mfl] elif m := re.match(r"summary:\s+(.*)", line): + summary_cc_present = True try: - summary_cc = events.mk_cc(m.group(1).split()) + this_summary_cc = events.mk_cc(m.group(1).split()) + combine_cc_with_cc(this_summary_cc, summary_cc) + + # Check summary is correct. Note that `total_cc` doesn't + # get negated for the first file in a diff, unlike the + # other CCs, because it's only used here as a sanity check. + if this_summary_cc != total_cc: + msg = ( + "`summary:` line doesn't match computed total\n" + f"- summary: {this_summary_cc}\n" + f"- computed: {total_cc}" + ) + parse_die(msg) + except ValueError: parse_die("malformed or too many event counts") @@ -427,31 +579,9 @@ def read_cgout_file() -> tuple[ parse_die(f"malformed line: {line[:-1]}") # Check if summary line was present. - if not summary_cc: + if not summary_cc_present: parse_die("missing `summary:` line, aborting") - # Check summary is correct. (Only using the outer CCs.) - total_cc = events.mk_empty_cc() - for dcc in dict_fl_dcc.values(): - add_cc_to_cc(dcc.outer_cc, total_cc) - if summary_cc != total_cc: - msg = ( - "`summary:` line doesn't match computed total\n" - f"- summary: {summary_cc}\n" - f"- total: {total_cc}" - ) - parse_die(msg) - - return ( - desc, - cmd, - events, - dict_fl_dcc, - dict_fn_dcc, - dict_fl_dict_line_cc, - summary_cc, - ) - # The width of a column, in three parts. class Width: @@ -487,7 +617,7 @@ class CcPrinter: # Text of a missing CC, which can be computed in advance. missing_cc_str: str - # Must call `init_ccs` or `init_list_name_lcc` after this. + # Must call `init_ccs` or `init_list_mname_lcc` after this. def __init__(self, events: Events, summary_cc: Cc) -> None: self.events = events self.summary_cc = summary_cc @@ -505,7 +635,7 @@ class CcPrinter: self.init_widths(min_cc, max_cc, None, None) - def init_list_name_lcc(self, list_name_lcc: ListNameLcc) -> None: + def init_list_mname_lcc(self, list_mname_lcc: ListMnameLcc) -> None: self.events_prefix = " " cumul_cc = self.events.mk_empty_cc() @@ -516,10 +646,10 @@ class CcPrinter: max_cc = self.events.mk_empty_cc() min_cumul_cc = self.events.mk_empty_cc() max_cumul_cc = self.events.mk_empty_cc() - for _, lcc in list_name_lcc: + for _, lcc in list_mname_lcc: # Consider both outer and inner CCs for `count` and `perc1`. update_cc_extremes(lcc.outer_cc, min_cc, max_cc) - for _, inner_cc in lcc.inner_list_name_cc: + for _, inner_cc in lcc.inner_list_mname_cc: update_cc_extremes(inner_cc, min_cc, max_cc) # Consider only outer CCs for `perc2`. @@ -604,24 +734,24 @@ class CcPrinter: print(suffix) - def print_lcc(self, lcc: Lcc, outer_name: str, cumul_cc: Cc) -> None: - print("> ", end="") + def print_lcc(self, indent: str, lcc: Lcc, outer_mname: str, cumul_cc: Cc) -> None: + print(indent, end="") if ( - len(lcc.inner_list_name_cc) == 1 - and lcc.outer_cc == lcc.inner_list_name_cc[0][1] + len(lcc.inner_list_mname_cc) == 1 + and lcc.outer_cc == lcc.inner_list_mname_cc[0][1] ): # There is only one inner CC, it met the threshold, and it is equal # to the outer CC. Print the inner CC and outer CC in a single # line, because they are the same. - inner_name = lcc.inner_list_name_cc[0][0] - self.print_cc(lcc.outer_cc, cumul_cc, f"{outer_name}:{inner_name}") + inner_mname = lcc.inner_list_mname_cc[0][0] + self.print_cc(lcc.outer_cc, cumul_cc, f"{outer_mname}:{inner_mname}") else: # There are multiple inner CCs, and at least one met the threshold. # Print the outer CC and then the inner CCs, indented. - self.print_cc(lcc.outer_cc, cumul_cc, f"{outer_name}:") - for inner_name, inner_cc in lcc.inner_list_name_cc: + self.print_cc(lcc.outer_cc, cumul_cc, f"{outer_mname}:") + for inner_mname, inner_cc in lcc.inner_list_mname_cc: print(" ", end="") - self.print_cc(inner_cc, None, f" {inner_name}") + self.print_cc(inner_cc, None, f" {inner_mname}") print() # If `cc2` is `None`, it's a vanilla CC or inner CC. Otherwise, it's an @@ -645,15 +775,42 @@ def print_fancy(text: str) -> None: print(fancy) -def print_metadata(desc: str, cmd: str, events: Events) -> None: +def print_metadata(descs: list[str], cmds: list[str], events: Events) -> None: print_fancy("Metadata") - print(desc, end="") - print("Command: ", cmd) - print("Data file: ", args.cgout_filename[0]) + + def all_the_same(strs: list[str]) -> bool: + for i in range(len(strs) - 1): + if strs[i] != strs[i + 1]: + return False + + return True + + print("Invocation: ", *sys.argv) + + # When there are multiple descriptions, they are usually all the same. Only + # print the description once in that case. + if all_the_same(descs): + print(descs[0], end="") + else: + for i, desc in enumerate(descs): + print(f"Description {i+1}:") + print(desc, end="") + + # Commands are sometimes the same, sometimes not. Always print them + # individually, but refer to the previous one when appropriate. + if len(cmds) == 1: + print("Command: ", cmds[0]) + else: + for i, cmd in enumerate(cmds): + if i > 0 and cmds[i - 1] == cmd: + print(f"Command {i+1}: (same as Command {i})") + else: + print(f"Command {i+1}: ", cmd) + print("Events recorded: ", *events.events) print("Events shown: ", *events.show_events) print("Event sort order:", *events.sort_events) - print("Threshold: ", args.threshold) + print("Threshold: ", args.threshold, "%", sep="") print("Annotation: ", "on" if args.annotate else "off") print() @@ -668,8 +825,8 @@ def print_summary(events: Events, summary_cc: Cc) -> None: print() -def print_name_summary( - kind: str, events: Events, dict_name_dcc: DictNameDcc, summary_cc: Cc +def print_mname_summary( + kind: str, indent: str, events: Events, dict_mname_dcc: DictMnameDcc, summary_cc: Cc ) -> set[str]: # The primary sort event is used for the threshold. threshold_index = events.sort_indices[0] @@ -677,66 +834,67 @@ def print_name_summary( # Convert the threshold from a percentage to an event count. threshold = args.threshold * abs(summary_cc[threshold_index]) / 100 - def meets_threshold(name_and_cc: tuple[str, Cc]) -> bool: - cc = name_and_cc[1] + def meets_threshold(mname_and_cc: tuple[str, Cc]) -> bool: + cc = mname_and_cc[1] return abs(cc[threshold_index]) >= threshold # Create a list with the outer CC counts in sort order, so that # left-to-right list comparison does the right thing. Plus the outer name # at the end for deterministic output when all the event counts are # identical in two CCs. - def key_name_and_lcc(name_and_lcc: tuple[str, Lcc]) -> tuple[list[int], str]: - (outer_name, lcc) = name_and_lcc + def key_mname_and_lcc(mname_and_lcc: tuple[str, Lcc]) -> tuple[list[int], str]: + (outer_mname, lcc) = mname_and_lcc return ( [abs(lcc.outer_cc[i]) for i in events.sort_indices], - outer_name, + outer_mname, ) - # Similar to `key_name_and_lcc`. - def key_name_and_cc(name_and_cc: tuple[str, Cc]) -> tuple[list[int], str]: - (name, cc) = name_and_cc - return ([abs(cc[i]) for i in events.sort_indices], name) + # Similar to `key_mname_and_lcc`. + def key_mname_and_cc(mname_and_cc: tuple[str, Cc]) -> tuple[list[int], str]: + (mname, cc) = mname_and_cc + return ([abs(cc[i]) for i in events.sort_indices], mname) # This is a `filter_map` operation, which Python doesn't directly support. - list_name_lcc: ListNameLcc = [] - for outer_name, dcc in dict_name_dcc.items(): + list_mname_lcc: ListMnameLcc = [] + for outer_mname, dcc in dict_mname_dcc.items(): # Filter out inner CCs for which the primary sort event count is below the # threshold, and sort the remainder. - inner_list_name_cc = sorted( - filter(meets_threshold, dcc.inner_dict_name_cc.items()), - key=key_name_and_cc, + inner_list_mname_cc = sorted( + filter(meets_threshold, dcc.inner_dict_mname_cc.items()), + key=key_mname_and_cc, reverse=True, ) # If no inner CCs meet the threshold, ignore the entire DCC, even if # the outer CC meets the threshold. - if len(inner_list_name_cc) == 0: + if len(inner_list_mname_cc) == 0: continue - list_name_lcc.append((outer_name, Lcc(dcc.outer_cc, inner_list_name_cc))) + list_mname_lcc.append((outer_mname, Lcc(dcc.outer_cc, inner_list_mname_cc))) - list_name_lcc = sorted(list_name_lcc, key=key_name_and_lcc, reverse=True) + list_mname_lcc = sorted(list_mname_lcc, key=key_mname_and_lcc, reverse=True) printer = CcPrinter(events, summary_cc) - printer.init_list_name_lcc(list_name_lcc) + printer.init_list_mname_lcc(list_mname_lcc) print_fancy(kind + " summary") printer.print_events(" " + kind.lower()) print() # Print LCCs. - threshold_names = set([]) + threshold_mnames = set([]) cumul_cc = events.mk_empty_cc() - for name, lcc in list_name_lcc: + for mname, lcc in list_mname_lcc: add_cc_to_cc(lcc.outer_cc, cumul_cc) - printer.print_lcc(lcc, name, cumul_cc) - threshold_names.add(name) + printer.print_lcc(indent, lcc, mname, cumul_cc) + threshold_mnames.add(mname) - return threshold_names + return threshold_mnames class AnnotatedCcs: line_nums_known_cc: Cc line_nums_unknown_cc: Cc + non_identical_cc: Cc unreadable_cc: Cc below_threshold_cc: Cc files_unknown_cc: Cc @@ -744,6 +902,7 @@ class AnnotatedCcs: labels = [ " annotated: files known & above threshold & readable, line numbers known", " annotated: files known & above threshold & readable, line numbers unknown", + "unannotated: files known & above threshold & two or more non-identical", "unannotated: files known & above threshold & unreadable ", "unannotated: files known & below threshold", "unannotated: files unknown", @@ -752,6 +911,7 @@ class AnnotatedCcs: def __init__(self, events: Events) -> None: self.line_nums_known_cc = events.mk_empty_cc() self.line_nums_unknown_cc = events.mk_empty_cc() + self.non_identical_cc = events.mk_empty_cc() self.unreadable_cc = events.mk_empty_cc() self.below_threshold_cc = events.mk_empty_cc() self.files_unknown_cc = events.mk_empty_cc() @@ -760,6 +920,7 @@ class AnnotatedCcs: return [ self.line_nums_known_cc, self.line_nums_unknown_cc, + self.non_identical_cc, self.unreadable_cc, self.below_threshold_cc, self.files_unknown_cc, @@ -776,10 +937,11 @@ def mk_warning(msg: str) -> str: """ -def warn_src_file_is_newer(src_filename: str, cgout_filename: str) -> None: +def warn_ofls_are_all_newer(ofls: list[str], cgout_filename: str) -> None: + s = "".join([f"@ - {ofl}\n" for ofl in ofls]) msg = f"""\ -@ Source file '{src_filename}' is newer than data file '{cgout_filename}'. -@ Annotations may not be correct. +@ Original source files are all newer than data file '{cgout_filename}': +{s}@ Annotations may not be correct. """ print(mk_warning(msg)) @@ -798,10 +960,6 @@ def print_annotated_src_file( annotated_ccs: AnnotatedCcs, summary_cc: Cc, ) -> None: - # If the source file is more recent than the cgout file, issue warning. - if os.stat(src_file.name).st_mtime_ns > os.stat(args.cgout_filename[0]).st_mtime_ns: - warn_src_file_is_newer(src_file.name, args.cgout_filename[0]) - printer = CcPrinter(events, summary_cc) printer.init_ccs(list(dict_line_cc.values())) # The starting fancy has already been printed by the caller. @@ -884,52 +1042,101 @@ def print_annotated_src_file( print() -# This (partially) consumes `dict_fl_dict_line_cc`. +# This partially consumes `dict_mfl_dict_line_cc`, and fully consumes +# `dict_mfl_olfs`. def print_annotated_src_files( + ann_mfls: set[str], events: Events, - ann_src_filenames: set[str], - dict_fl_dict_line_cc: DictFlDictLineCc, + dict_mfl_ofls: DictMflOfls, + dict_mfl_dict_line_cc: DictMflDictLineCc, summary_cc: Cc, ) -> AnnotatedCcs: annotated_ccs = AnnotatedCcs(events) - def add_dict_line_cc_to_cc(dict_line_cc: DictLineCc | None, accum_cc: Cc) -> None: - if dict_line_cc: - for line_cc in dict_line_cc.values(): - add_cc_to_cc(line_cc, accum_cc) + def add_dict_line_cc_to_cc(dict_line_cc: DictLineCc, accum_cc: Cc) -> None: + for line_cc in dict_line_cc.values(): + add_cc_to_cc(line_cc, accum_cc) # Exclude the unknown ("???") file, which is unannotatable. - ann_src_filenames.discard("???") - dict_line_cc = dict_fl_dict_line_cc.pop("???", None) - add_dict_line_cc_to_cc(dict_line_cc, annotated_ccs.files_unknown_cc) + ann_mfls.discard("???") + if "???" in dict_mfl_dict_line_cc: + dict_line_cc = dict_mfl_dict_line_cc.pop("???") + add_dict_line_cc_to_cc(dict_line_cc, annotated_ccs.files_unknown_cc) + + def print_ann_fancy(mfl: str) -> None: + print_fancy(f"Annotated source file: {mfl}") + + # This can raise an `OSError`. + def all_ofl_contents_identical(ofls: list[str]) -> bool: + for i in range(len(ofls) - 1): + if not filecmp.cmp(ofls[i], ofls[i + 1], shallow=False): + return False + + return True - def print_ann_fancy(src_filename: str) -> None: - print_fancy(f"Annotated source file: {src_filename}") + for mfl in sorted(ann_mfls): + ofls = sorted(dict_mfl_ofls.pop(mfl)) + first_ofl = ofls[0] - for src_filename in sorted(ann_src_filenames): try: - with open(src_filename, "r", encoding="utf-8") as src_file: - dict_line_cc = dict_fl_dict_line_cc.pop(src_filename, None) - assert dict_line_cc is not None - print_ann_fancy(src_filename) - print_annotated_src_file( - events, - dict_line_cc, - src_file, - annotated_ccs, - summary_cc, + if all_ofl_contents_identical(ofls): + # All the Ofls that map to this Mfl are identical, which means we + # can annotate, and it doesn't matter which Ofl we use. + with open(first_ofl, "r", encoding="utf-8") as src_file: + dict_line_cc = dict_mfl_dict_line_cc.pop(mfl) + print_ann_fancy(mfl) + + # Because all the Ofls are identical, we can treat their + # mtimes as if they are all as early as the earliest one. + # Therefore, we warn only if the earliest source file is + # more recent than the cgout file. + min_ofl_st_mtime_ns = min( + [os.stat(ofl).st_mtime_ns for ofl in ofls] + ) + + for cgout_filename in args.cgout_filename: + if min_ofl_st_mtime_ns > os.stat(cgout_filename).st_mtime_ns: + warn_ofls_are_all_newer(ofls, cgout_filename) + + print_annotated_src_file( + events, + dict_line_cc, + src_file, + annotated_ccs, + summary_cc, + ) + else: + dict_line_cc = dict_mfl_dict_line_cc.pop(mfl) + add_dict_line_cc_to_cc(dict_line_cc, annotated_ccs.non_identical_cc) + + # We could potentially do better here. + # - Annotate until the first line where the src files diverge. + # - Also, heuristic resyncing, e.g. by looking for matching + # lines (of sufficient complexity) after a divergence. + print_ann_fancy(mfl) + print( + "Unannotated because two or more of these original files are not " + "identical:", + *ofls, + sep="\n- ", ) + print() except OSError: - dict_line_cc = dict_fl_dict_line_cc.pop(src_filename, None) + dict_line_cc = dict_mfl_dict_line_cc.pop(mfl) add_dict_line_cc_to_cc(dict_line_cc, annotated_ccs.unreadable_cc) - print_ann_fancy(src_filename) - print("This file was unreadable") + print_ann_fancy(mfl) + print( + "Unannotated because one or more of these original files are " + "unreadable:", + *ofls, + sep="\n- ", + ) print() - # Sum the CCs remaining in `dict_fl_dict_line_cc`, which are all in files + # Sum the CCs remaining in `dict_mfl_dict_line_cc`, which are all in files # below the threshold. - for dict_line_cc in dict_fl_dict_line_cc.values(): + for dict_line_cc in dict_mfl_dict_line_cc.values(): add_dict_line_cc_to_cc(dict_line_cc, annotated_ccs.below_threshold_cc) return annotated_ccs @@ -965,26 +1172,46 @@ def print_annotation_summary( def main() -> None: - ( - desc, - cmd, - events, - dict_fl_dcc, - dict_fn_dcc, - dict_fl_dict_line_cc, - summary_cc, - ) = read_cgout_file() + # Metadata, initialized to empty states. + descs: list[str] = [] + cmds: list[str] = [] + events = Events() + + # For tracking original filenames to modified filenames. + dict_mfl_ofls: DictMflOfls = defaultdict(set) + + # Different places where we accumulate CC data. Initialized to invalid + # states prior to the number of events being known. + dict_mfl_dcc: DictMnameDcc = defaultdict(None) + dict_mfn_dcc: DictMnameDcc = defaultdict(None) + dict_mfl_dict_line_cc: DictMflDictLineCc = defaultdict(None) + summary_cc: Cc = [] + + for n, filename in enumerate(args.cgout_filename): + is_first_file = n == 0 + read_cgout_file( + filename, + is_first_file, + descs, + cmds, + events, + dict_mfl_ofls, + dict_mfl_dcc, + dict_mfn_dcc, + dict_mfl_dict_line_cc, + summary_cc, + ) # Each of the following calls prints a section of the output. - print_metadata(desc, cmd, events) + print_metadata(descs, cmds, events) print_summary(events, summary_cc) - ann_src_filenames = print_name_summary( - "File:function", events, dict_fl_dcc, summary_cc + ann_mfls = print_mname_summary( + "File:function", "< ", events, dict_mfl_dcc, summary_cc ) - print_name_summary("Function:file", events, dict_fn_dcc, summary_cc) + print_mname_summary("Function:file", "> ", events, dict_mfn_dcc, summary_cc) if args.annotate: annotated_ccs = print_annotated_src_files( - events, ann_src_filenames, dict_fl_dict_line_cc, summary_cc + ann_mfls, events, dict_mfl_ofls, dict_mfl_dict_line_cc, summary_cc ) print_annotation_summary(events, annotated_ccs, summary_cc) diff --git a/cachegrind/cg_diff.in b/cachegrind/cg_diff.in index 38910f31b1..d3a63189ea 100755 --- a/cachegrind/cg_diff.in +++ b/cachegrind/cg_diff.in @@ -66,7 +66,7 @@ class Args(Namespace): if regex is None: return lambda s: s - # Extract the parts of a `s/old/new/tail` regex. `(?<!\\)/` is an + # Extract the parts of an `s/old/new/tail` regex. `(?<!\\)/` is an # example of negative lookbehind. It means "match a forward slash # unless preceded by a backslash". m = re.match(r"s/(.*)(?<!\\)/(.*)(?<!\\)/(g|i|gi|ig|)$", regex) @@ -74,7 +74,7 @@ class Args(Namespace): raise ValueError # Forward slashes must be escaped in an `s/old/new/` expression, - # but we then must unescape them before using them with `re.sub` + # but we then must unescape them before using them with `re.sub`. pat = m.group(1).replace(r"\/", r"/") repl = m.group(2).replace(r"\/", r"/") tail = m.group(3) @@ -91,7 +91,11 @@ class Args(Namespace): return lambda s: re.sub(re.compile(pat, flags=flags), repl, s, count=count) - p = ArgumentParser(description="Diff two Cachegrind output files.") + desc = ( + "Diff two Cachegrind output files. Deprecated; use " + "`cg_annotate --diff` instead." + ) + p = ArgumentParser(description=desc) p.add_argument("--version", action="version", version="%(prog)s-@VERSION@") @@ -304,8 +308,8 @@ def main() -> None: (cmd1, events1, dict_flfn_cc1, summary_cc1) = read_cgout_file(filename1) (cmd2, events2, dict_flfn_cc2, summary_cc2) = read_cgout_file(filename2) - if events1.num_events != events2.num_events: - die("events don't match") + if events1.events != events2.events: + die("events in data files don't match") # Subtract file 1's CCs from file 2's CCs, at the Flfn level. for flfn, flfn_cc1 in dict_flfn_cc1.items(): diff --git a/cachegrind/cg_merge.in b/cachegrind/cg_merge.in index 8304e8b279..7c385b4c8e 100755 --- a/cachegrind/cg_merge.in +++ b/cachegrind/cg_merge.in @@ -51,7 +51,11 @@ class Args(Namespace): @staticmethod def parse() -> Args: - p = ArgumentParser(description="Merge multiple Cachegrind output files.") + desc = ( + "Merge multiple Cachegrind output files. Deprecated; use " + "`cg_annotate` with multiple Cachegrind output files instead." + ) + p = ArgumentParser(description=desc) p.add_argument("--version", action="version", version="%(prog)s-@VERSION@") @@ -272,8 +276,8 @@ def main() -> None: events1 = events_n else: assert events1 - if events1.num_events != events_n.num_events: - die("events don't match") + if events1.events != events_n.events: + die("events in data files don't match") def write_output(f: TextIO) -> None: # These assertions hold because the loop above executes at least twice. diff --git a/cachegrind/tests/Makefile.am b/cachegrind/tests/Makefile.am index d38d300b90..9b977d5810 100644 --- a/cachegrind/tests/Makefile.am +++ b/cachegrind/tests/Makefile.am @@ -16,9 +16,17 @@ EXTRA_DIST = \ ann-diff1.post.exp ann-diff1.stderr.exp ann-diff1.vgtest \ ann-diff2.post.exp ann-diff2.stderr.exp ann-diff2.vgtest \ ann-diff2a.cgout ann-diff2b.cgout \ + ann-diff2-aux/ann-diff2-basic.rs \ + ann-diff3.post.exp ann-diff3.stderr.exp ann-diff3.vgtest \ + ann-diff4.post.exp ann-diff4.stderr.exp ann-diff4.vgtest \ + ann-diff4a.cgout ann-diff4b.cgout \ + ann-diff4a-aux/w.rs ann-diff4a-aux/x.rs ann-diff4a-aux/y.rs \ + ann-diff4a-aux/z.rs \ + ann-diff4b-aux/w.rs ann-diff4b-aux/x.rs ann-diff4b-aux/y.rs \ ann-merge1.post.exp ann-merge1.stderr.exp ann-merge1.vgtest \ ann-merge1a.cgout ann-merge1b.cgout \ ann-merge-x.rs ann-merge-y.rs \ + ann-merge2.post.exp ann-merge2.stderr.exp ann-merge2.vgtest \ ann1a.post.exp ann1a.stderr.exp ann1a.vgtest ann1.cgout \ ann1b.post.exp ann1b.stderr.exp ann1b.vgtest ann1b.cgout \ ann2.post.exp ann2.stderr.exp ann2.vgtest ann2.cgout \ diff --git a/cachegrind/tests/ann-diff1.post.exp b/cachegrind/tests/ann-diff1.post.exp index 54962b513d..d8ccea091e 100644 --- a/cachegrind/tests/ann-diff1.post.exp +++ b/cachegrind/tests/ann-diff1.post.exp @@ -1,13 +1,13 @@ -------------------------------------------------------------------------------- -- Metadata -------------------------------------------------------------------------------- +Invocation: ../cg_annotate --mod-filename=s/a.c/A.c/ --mod-funcname s/MAIN/Main/ ann-diff1.cgout Files compared: ann1.cgout; ann1b.cgout Command: ./a.out; ./a.out -Data file: ann-diff1.cgout Events recorded: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw Events shown: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw Event sort order: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw -Threshold: 0.1 +Threshold: 0.1% Annotation: on -------------------------------------------------------------------------------- @@ -22,17 +22,17 @@ Ir________________ I1mr ILmr Dr_________________ D1mr DLmr Dw D1mw DLmw -------------------------------------------------------------------------------- Ir________________________ I1mr________ ILmr________ Dr_________________________ D1mr________ DLmr________ Dw__________ D1mw________ DLmw________ file:function -> 5,000,000 (100.0%, 100.0%) 0 (n/a, n/a) 0 (n/a, n/a) -2,000,000 (100.0%, 100.0%) 0 (n/a, n/a) 0 (n/a, n/a) 0 (n/a, n/a) 0 (n/a, n/a) 0 (n/a, n/a) a.c:MAIN +< 5,000,000 (100.0%, 100.0%) 0 (n/a, n/a) 0 (n/a, n/a) -2,000,000 (100.0%, 100.0%) 0 (n/a, n/a) 0 (n/a, n/a) 0 (n/a, n/a) 0 (n/a, n/a) 0 (n/a, n/a) A.c:Main -------------------------------------------------------------------------------- -- Function:file summary -------------------------------------------------------------------------------- Ir________________________ I1mr________ ILmr________ Dr_________________________ D1mr________ DLmr________ Dw__________ D1mw________ DLmw________ function:file -> 5,000,000 (100.0%, 100.0%) 0 (n/a, n/a) 0 (n/a, n/a) -2,000,000 (100.0%, 100.0%) 0 (n/a, n/a) 0 (n/a, n/a) 0 (n/a, n/a) 0 (n/a, n/a) 0 (n/a, n/a) MAIN:a.c +> 5,000,000 (100.0%, 100.0%) 0 (n/a, n/a) 0 (n/a, n/a) -2,000,000 (100.0%, 100.0%) 0 (n/a, n/a) 0 (n/a, n/a) 0 (n/a, n/a) 0 (n/a, n/a) 0 (n/a, n/a) Main:A.c -------------------------------------------------------------------------------- --- Annotated source file: a.c +-- Annotated source file: A.c -------------------------------------------------------------------------------- Ir________________ I1mr ILmr Dr_________________ D1mr DLmr Dw D1mw DLmw @@ -45,6 +45,7 @@ Ir________________ I1mr ILmr Dr_________________ D1mr DLmr Dw D1mw DLmw 0 0 0 0 0 0 0 0 0 annotated: files known & above threshold & readable, line numbers known 5,000,000 (100.0%) 0 0 -2,000,000 (100.0%) 0 0 0 0 0 annotated: files known & above threshold & readable, line numbers unknown + 0 0 0 0 0 0 0 0 0 unannotated: files known & above threshold & two or more non-identical 0 0 0 0 0 0 0 0 0 unannotated: files known & above threshold & unreadable 0 0 0 0 0 0 0 0 0 unannotated: files known & below threshold 0 0 0 0 0 0 0 0 0 unannotated: files unknown diff --git a/cachegrind/tests/ann-diff1.vgtest b/cachegrind/tests/ann-diff1.vgtest index e379401876..ab119b3b36 100644 --- a/cachegrind/tests/ann-diff1.vgtest +++ b/cachegrind/tests/ann-diff1.vgtest @@ -1,6 +1,8 @@ # The `prog` doesn't matter because we don't use its output. Instead we test -# the post-processing of the `ann{1,1b}.cgout` test files. +# the post-processing of the cgout files. prog: ../../tests/true vgopts: --cachegrind-out-file=cachegrind.out -post: python3 ../cg_diff --mod-funcname="s/main/MAIN/" ann1.cgout ann1b.cgout > ann-diff1.cgout && python3 ../cg_annotate ann-diff1.cgout -cleanup: rm ann-diff1.cgout + +post: python3 ../cg_diff --mod-funcname="s/main/MAIN/" ann1.cgout ann1b.cgout > ann-diff1.cgout && python3 ../cg_annotate --mod-filename="s/a.c/A.c/" --mod-funcname s/MAIN/Main/ ann-diff1.cgout + +cleanup: rm cachegrind.out ann-diff1.cgout diff --git a/cachegrind/tests/ann-diff2.post.exp b/cachegrind/tests/ann-diff2.post.exp index e1060dbd23..b6567418f7 100644 --- a/cachegrind/tests/ann-diff2.post.exp +++ b/cachegrind/tests/ann-diff2.post.exp @@ -1,13 +1,13 @@ -------------------------------------------------------------------------------- -- Metadata -------------------------------------------------------------------------------- +Invocation: ../cg_annotate ann-diff2c.cgout Files compared: ann-diff2a.cgout; ann-diff2b.cgout Command: cmd1; cmd2 -Data file: ann-diff2c.cgout Events recorded: One Two Events shown: One Two Event sort order: One Two -Threshold: 0.1 +Threshold: 0.1% Annotation: on -------------------------------------------------------------------------------- @@ -22,7 +22,7 @@ One___________ Two___________ -------------------------------------------------------------------------------- One___________________ Two___________________ file:function -> 2,100 (100.0%, 100.0%) 1,900 (100.0%, 100.0%) aux/ann-diff2-basic.rs: +< 2,100 (100.0%, 100.0%) 1,900 (100.0%, 100.0%) aux/ann-diff2-basic.rs: 1,000 (47.6%) 1,000 (52.6%) groffN 1,000 (47.6%) 1,000 (52.6%) fN_ffN_fooN_F4_g5 100 (4.8%) -100 (-5.3%) basic1 @@ -41,7 +41,8 @@ One___________ Two___________ -------------------------------------------------------------------------------- -- Annotated source file: aux/ann-diff2-basic.rs -------------------------------------------------------------------------------- -This file was unreadable +Unannotated because one or more of these original files are unreadable: +- aux/ann-diff2-basic.rs -------------------------------------------------------------------------------- -- Annotation summary @@ -50,6 +51,7 @@ One___________ Two___________ 0 0 annotated: files known & above threshold & readable, line numbers known 0 0 annotated: files known & above threshold & readable, line numbers unknown + 0 0 unannotated: files known & above threshold & two or more non-identical 2,100 (100.0%) 1,900 (100.0%) unannotated: files known & above threshold & unreadable 0 0 unannotated: files known & below threshold 0 0 unannotated: files unknown diff --git a/cachegrind/tests/ann-diff2.vgtest b/cachegrind/tests/ann-diff2.vgtest index 7b395e4e48..bae3ab9875 100644 --- a/cachegrind/tests/ann-diff2.vgtest +++ b/cachegrind/tests/ann-diff2.vgtest @@ -1,6 +1,8 @@ # The `prog` doesn't matter because we don't use its output. Instead we test -# the post-processing of the `ann-diff2{a,b}.cgout` test files. +# the post-processing of the cgout files. prog: ../../tests/true vgopts: --cachegrind-out-file=cachegrind.out + post: python3 ../cg_diff --mod-filename="s/.*aux\//aux\//i" --mod-funcname="s/(f[a-z]*)[0-9]/\1N/g" ann-diff2a.cgout ann-diff2b.cgout > ann-diff2c.cgout && python3 ../cg_annotate ann-diff2c.cgout -cleanup: rm ann-diff2c.cgout + +cleanup: rm cachegrind.out ann-diff2c.cgout diff --git a/cachegrind/tests/ann-diff2b.cgout b/cachegrind/tests/ann-diff2b.cgout index 9fb733e708..e6864107bb 100644 --- a/cachegrind/tests/ann-diff2b.cgout +++ b/cachegrind/tests/ann-diff2b.cgout @@ -1,4 +1,4 @@ -desc: Description for ann-diff2a.cgout +desc: Description for ann-diff2b.cgout cmd: cmd2 events: One Two diff --git a/cachegrind/tests/ann-diff3.post.exp b/cachegrind/tests/ann-diff3.post.exp new file mode 100644 index 0000000000..fa7ea4ad7b --- /dev/null +++ b/cachegrind/tests/ann-diff3.post.exp @@ -0,0 +1,63 @@ +-------------------------------------------------------------------------------- +-- Metadata +-------------------------------------------------------------------------------- +Invocation: ../cg_annotate --diff --mod-filename=s/.*aux\//aux\//i --mod-funcname=s/(f[a-z]*)[0-9]/\1N/g ann-diff2a.cgout ann-diff2b.cgout +Description 1: +Description for ann-diff2a.cgout +Description 2: +Description for ann-diff2b.cgout +Command 1: cmd1 +Command 2: cmd2 +Events recorded: One Two +Events shown: One Two +Event sort order: One Two +Threshold: 0.1% +Annotation: on + +-------------------------------------------------------------------------------- +-- Summary +-------------------------------------------------------------------------------- +One___________ Two___________ + +2,100 (100.0%) 1,900 (100.0%) PROGRAM TOTALS + +-------------------------------------------------------------------------------- +-- File:function summary +-------------------------------------------------------------------------------- + One___________________ Two___________________ file:function + +< 2,100 (100.0%, 100.0%) 1,900 (100.0%, 100.0%) aux/ann-diff2-basic.rs: + 1,000 (47.6%) 1,000 (52.6%) groffN + 1,000 (47.6%) 1,000 (52.6%) fN_ffN_fooN_F4_g5 + 100 (4.8%) -100 (-5.3%) basic1 + +-------------------------------------------------------------------------------- +-- Function:file summary +-------------------------------------------------------------------------------- + One__________________ Two__________________ function:file + +> 1,000 (47.6%, 47.6%) 1,000 (52.6%, 52.6%) groffN:aux/ann-diff2-basic.rs + +> 1,000 (47.6%, 95.2%) 1,000 (52.6%, 105.3%) fN_ffN_fooN_F4_g5:aux/ann-diff2-basic.rs + +> 100 (4.8%, 100.0%) -100 (-5.3%, 100.0%) basic1:aux/ann-diff2-basic.rs + +-------------------------------------------------------------------------------- +-- Annotated source file: aux/ann-diff2-basic.rs +-------------------------------------------------------------------------------- +Unannotated because one or more of these original files are unreadable: +- ann2-diff-AUX/ann-diff2-basic.rs +- ann2-diff-Aux/ann-diff2-basic.rs + +-------------------------------------------------------------------------------- +-- Annotation summary +-------------------------------------------------------------------------------- +One___________ Two___________ + + 0 0 annotated: files known & above threshold & readable, line numbers known + 0 0 annotated: files known & above threshold & readable, line numbers unknown + 0 0 unannotated: files known & above threshold & two or more non-identical +2,100 (100.0%) 1,900 (100.0%) unannotated: files known & above threshold & unreadable + 0 0 unannotated: files known & below threshold + 0 0 unannotated: files unknown + diff --git a/cachegrind/tests/ann-diff3.stderr.exp b/cachegrind/tests/ann-diff3.stderr.exp new file mode 100644 index 0000000000..ec68407b27 --- /dev/null +++ b/cachegrind/tests/ann-diff3.stderr.exp @@ -0,0 +1,3 @@ + + +I refs: diff --git a/cachegrind/tests/ann-diff3.vgtest b/cachegrind/tests/ann-diff3.vgtest new file mode 100644 index 0000000000..5831e3de61 --- /dev/null +++ b/cachegrind/tests/ann-diff3.vgtest @@ -0,0 +1,8 @@ +# The `prog` doesn't matter because we don't use its output. Instead we test +# the post-processing of the cgout files. +prog: ../../tests/true +vgopts: --cachegrind-out-file=cachegrind.out + +post: python3 ../cg_annotate --diff --mod-filename="s/.*aux\//aux\//i" --mod-funcname="s/(f[a-z]*)[0-9]/\1N/g" ann-diff2a.cgout ann-diff2b.cgout + +cleanup: rm cachegrind.out diff --git a/cachegrind/tests/ann-diff4.post.exp b/cachegrind/tests/ann-diff4.post.exp new file mode 100644 index 0000000000..0196948a62 --- /dev/null +++ b/cachegrind/tests/ann-diff4.post.exp @@ -0,0 +1,125 @@ +-------------------------------------------------------------------------------- +-- Metadata +-------------------------------------------------------------------------------- +Invocation: ../cg_annotate ann-diff4a.cgout ann-diff4b.cgout --mod-filename=s/ann-diff4[ab]/ann-diff4N/ --diff +DescA +DescB +DescC +Command 1: my-command +Command 2: (same as Command 1) +Events recorded: Ir +Events shown: Ir +Event sort order: Ir +Threshold: 0.1% +Annotation: on + +-------------------------------------------------------------------------------- +-- Summary +-------------------------------------------------------------------------------- +Ir__________ + +700 (100.0%) PROGRAM TOTALS + +-------------------------------------------------------------------------------- +-- File:function summary +-------------------------------------------------------------------------------- + Ir___________________ file:function + +< 600 (85.7%, 85.7%) ann-diff4N-aux/y.rs:b + +< 200 (28.6%, 114.3%) ann-diff4N-aux/x.rs:a + +< -200 (-28.6%, 85.7%) ann-diff4N-aux/w.rs:a + +< 200 (28.6%, 114.3%) ann-diff4N-aux/no-such-file.rs:f + +< -100 (-14.3%, 100.0%) ann-diff4N-aux/z.rs:c + +-------------------------------------------------------------------------------- +-- Function:file summary +-------------------------------------------------------------------------------- + Ir___________________ function:file + +> 600 (85.7%, 85.7%) b:ann-diff4N-aux/y.rs + +> 200 (28.6%, 114.3%) f:ann-diff4N-aux/no-such-file.rs + +> -100 (-14.3%, 100.0%) c:ann-diff4N-aux/z.rs + +> 0 (0.0%, 100.0%) a: + 200 (28.6%) ann-diff4N-aux/x.rs + -200 (-28.6%) ann-diff4N-aux/w.rs + +-------------------------------------------------------------------------------- +-- Annotated source file: ann-diff4N-aux/no-such-file.rs +-------------------------------------------------------------------------------- +Unannotated because one or more of these original files are unreadable: +- ann-diff4a-aux/no-such-file.rs +- ann-diff4b-aux/no-such-file.rs + +-------------------------------------------------------------------------------- +-- Annotated source file: ann-diff4N-aux/w.rs +-------------------------------------------------------------------------------- +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +@@ WARNING @@ WARNING @@ WARNING @@ WARNING @@ WARNING @@ WARNING @@ WARNING @@ +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +@ Original source files are all newer than data file 'ann-diff4a.cgout': +@ - ann-diff4a-aux/w.rs +@ - ann-diff4b-aux/w.rs +@ Annotations may not be correct. +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +@@ WARNING @@ WARNING @@ WARNING @@ WARNING @@ WARNING @@ WARNING @@ WARNING @@ +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +@ Original source files are all newer than data file 'ann-diff4b.cgout': +@ - ann-diff4a-aux/w.rs +@ - ann-diff4b-aux/w.rs +@ Annotations may not be correct. +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +Ir___________ + +-200 (-28.6%) one + . two + . three + +-------------------------------------------------------------------------------- +-- Annotated source file: ann-diff4N-aux/x.rs +-------------------------------------------------------------------------------- +Ir___________ + + 100 (14.3%) <unknown (line 0)> + +-200 (-28.6%) one + 300 (42.9%) two + . three + . four + . five + +-------------------------------------------------------------------------------- +-- Annotated source file: ann-diff4N-aux/y.rs +-------------------------------------------------------------------------------- +Unannotated because two or more of these original files are not identical: +- ann-diff4a-aux/y.rs +- ann-diff4b-aux/y.rs + +-------------------------------------------------------------------------------- +-- Annotated source file: ann-diff4N-aux/z.rs +-------------------------------------------------------------------------------- +Unannotated because one or more of these original files are unreadable: +- ann-diff4a-aux/z.rs +- ann-diff4b-aux/z.rs + +-------------------------------------------------------------------------------- +-- Annotation summary +-------------------------------------------------------------------------------- +Ir___________ + +-100 (-14.3%) annotated: files known & above threshold & readable, line numbers known + 100 (14.3%) annotated: files known & above threshold & readable, line numbers unknown + 600 (85.7%) unannotated: files known & above threshold & two or more non-identical + 100 (14.3%) unannotated: files known & above threshold & unreadable + 0 unannotated: files known & below threshold + 0 unannotated: files unknown + diff --git a/cachegrind/tests/ann-diff4.stderr.exp b/cachegrind/tests/ann-diff4.stderr.exp new file mode 100644 index 0000000000..ec68407b27 --- /dev/null +++ b/cachegrind/tests/ann-diff4.stderr.exp @@ -0,0 +1,3 @@ + + +I refs: diff --git a/cachegrind/tests/ann-diff4.vgtest b/cachegrind/tests/ann-diff4.vgtest new file mode 100644 index 0000000000..da6e00a216 --- /dev/null +++ b/cachegrind/tests/ann-diff4.vgtest @@ -0,0 +1,14 @@ +# The `prog` doesn't matter becaus... [truncated message content] |
|
From: Jojo R <rj...@li...> - 2023-04-21 10:06:23
|
Hi,
We consider to add RVV/Vector [1] feature in valgrind, there are some
challenges.
RVV like ARM's SVE [2] programming model, it's scalable/VLA, that means
the vector length is agnostic.
ARM's SVE is not supported in valgrind :(
There are three major issues in implementing RVV instruction set in
Valgrind as following:
1. Scalable vector register width VLENB
2. Runtime changing property of LMUL and SEW
3. Lack of proper VEX IR to represent all vector operations
We propose applicable methods to solve 1 and 2. As for 3, we explore
several possible but maybe imperfect approaches to handle different cases.
We start from 1. As each guest register should be described in
VEXGuestState struct, the vector registers with scalable width of VLENB
can be added into VEXGuestState as arrays using an allowable maximum
length like 2048/4096.
The actual available access range can be determined at Valgrind startup
time by querying the CPU for its vector capability or some suitable
setup steps.
To solve problem 2, we are inspired by already-proven techniques in
QEMU, where translation blocks are broken up when certain critical CSRs
are set. Because the guest code to IR translation relies on the precise
value of LMUL/SEW and they may change within a basic block, we can break
up the basic block each time encountering a vsetvl{i} instruction and
return to the scheduler to execute the translated code and update
LMUL/SEW. Accordingly, translation cache management should be refactored
to detect the changing of LMUL/SEW to invalidate outdated code cache.
Without losing the generality, the LMUL/SEW should be encoded into an
ULong flag such that other architectures can leverage this flag to store
their arch-dependent information. The TTentry struct should also take
the flag into account no matter insertion or deletion. By doing this,
the flag carries the newest LMUL/SEW throughout the simulation and can
be passed to disassemble functions using the VEXArchInfo struct such
that we can get the real and newest value of LMUL and SEW to facilitate
our translation.
Also, some architecture-related code should be taken care of. Like
m_dispatch part, disp_cp_xindir function looks up code cache using
hardcoded assembly by checking the requested guest state IP and
translation cache entry address with no more constraints. Many other
modules should be checked to ensure the in-time update of LMUL/SEW is
instantly visible to essential parts in Valgrind.
The last remaining big issue is 3, which we introduce some ad-hoc
approaches to deal with. We summarize these approaches into three types
as following:
1. Break down a vector instruction to scalar VEX IR ops.
2. Break down a vector instruction to fixed-length VEX IR ops.
3. Use dirty helpers to realize vector instructions.
The very first method theoretically exists but is probably not
applicable as the number of IR ops explodes when a large VLENB is
adopted. Imaging a configuration of VLENB=512, SEW=8, LMUL=8, the VL is
512 * 8 / 8 = 512, meaning that a single vector instruction turns into
512 scalar instructions and each scalar instruction would be expanded to
multiple IRs. To make things worse, the tool instrumentation will insert
more IRs between adjacent scalar IR ops. As a result, the performance is
likely to be slowed down thousand times during running a real-world
application with lots of vector instructions. Therefore, the other two
methods are more promising and we will discuss them below.
2 and 3 are not mutually exclusive as we may choose a suitable method
from them to implement a vector instruction regarding its concrete
behavior. To explain these methods in detail, we present some instances
to illustrate their pros and cons.
In terms of method 2, we have real values of VLENB/LMUL/SEW. The simple
case is VLENB <= 256 and LMUL=1, where many SIMD IR ops are available
and can be directly applied to represent vector operations. However,
even when VLENB is restricted to 128, it still exceeds the maximum SIMD
width of 256 supported by VEX IR if LMUL>2. Hence, here are two variants
of method 2 to deal with long vectors:
*2.1*Add more SIMD IR ops such as 1024/2048/4096, and translate vector
instructions in the granularity of VLENB. Accordingly, VLENB=4096 with
LMUL=2 is fulfilled by two 4096 SIMD VEX IR ops.
* *pros*: it encourages VEX backend to generate more compact and
efficient SIMD code (maybe). Particularly,it accommodatesmask and
gather/scatter (indexed) instructions by delivering more information
in IR itself.
* *cons*: too many new IR ops need to be introduced in VEX as each op
of different length should implement its add/sub/mul variants. New
data types to denote long vectors are necessary too, causing
difficulties in both VEX backend register allocation and tool
instrumentation.
*2.2*Break down long vectors to multiple repeated SIMD ops. For
instance, a vadd.vv vector instruction with VLENB=256/LMUL=2/SEW=8 is
composed of four operators of Iop_Add8x16 type.
* *pros:*less efforts are required in register allocation and tool
instrumentation. The VEX frontend is able to notify the backend to
generate efficient vector instructions by existing Iops. It better
trades off the complexity of adding many long vector IR ops and the
benefit of generating high-efficiency host code.
* *cons:*it is hard to describe a mask operation given that the mask
is pretty flexible (the least significant bit of each segment of
v0). Additionally, gather/scatter instructions may have similar
problems in appropriately dividing index registers. There are
various corner cases left here such as widening arithmetic
operations (widening SIMD IR ops are currently not compatible) and
vstart CSR register. When using fixed-length IR ops to comprise a
vector instruction, we will inevitably tell each IR op which
position encoded in vstart you can start to process the data. We can
use vstart as a normal guest state virtual register to calculate
each op's start position as a guard IRExpr or obtain the value of
vstart like what we do in LMUL/SEW. Nevertheless, it is non-trivial
to decompose a vector instruction concisely.
In short, both 2.1 and 2.2 confront a dilemma in reducing engineering
efforts of refactoring Valgrind elegantly as well as implementing the
vector instruction set efficiently. Same obstacles exist in ARM SVE as
they are scalable vector instructions and flexible in many ways.
The final solution is the dirty helper. It is undoubtedly practical and
requires possibly the least engineering efforts in dealing with so many
details in Valgrind. In this design, each instruction is completed using
an inline assembly running the same instruction on the host. Moreover,
tool instrumentation already handles IRDirty except that new fields
should be added in _IRDirty struct to indicate strided/indexed/masked
memory accesses and arithmetic operations.
* *pros:*it supports all instructions without bothering to build
complicated IR expressions and statements. It executes vector
instructions using host CPU to get acceleration to some extent.
Besides, we do not need to add VEX backend to translate new IRs to
vector instructions.
* *cons:*the dirty helper always keeps its operations in a black box
such that tools can never see what happens in a dirty helper. Like
memcheck, the bit precision merit is missing once it meets a dirty
helper as the V-bit propagation chain adopts a pretty coarse
determination strategy. On the other hand, it is also not an elegant
way to implement the entire ISA extension in dirty helpers.
In summary, it is far to reach a truly applicable solution in adding
vector extensions in Valgrind. We need to do detailed and comprehensive
estimations on different vector instruction categories.
Any feedback is welcome in github [3] also.
[1] https://github.com/riscv/riscv-v-spec
[2]
https://community.arm.com/arm-research/b/articles/posts/the-arm-scalable-vector-extension-sve
[3] https://github.com/petrpavlu/valgrind-riscv64/issues/17
Thanks.
Jojo
|
|
From: Nicholas N. <n.n...@gm...> - 2023-04-20 23:06:24
|
On Fri, 21 Apr 2023 at 07:06, Bart Van Assche <bva...@ac...> wrote: > No matter how > much time is spent on tuning the .clang-format file, there will always > be code for which the formatting is made worse by clang-format than the > existing code. > It's true that there are rare cases where an auto-formatter does a bad job, such as code in tabular form. Fortunately there's an easy workaround for that too: you just put `// clang-format off` at the start of the block and `// clang-format on` at the end of the block. Nick |
|
From: Bart V. A. <bva...@ac...> - 2023-04-20 21:06:51
|
On 4/20/23 13:51, Nicholas Nethercote wrote: > On Fri, 21 Apr 2023 at 06:02, Mark Wielaard <ma...@kl... > <mailto:ma...@kl...>> wrote: > I am not a fan, but also not dead against. > > Have you ever worked on a project that uses auto-formatting? Skepticism > followed by enthusiasm is common. Paul, Julian, and I all have used it > on other codebases and are now advocates. I'm using it for Cachegrind's > Python code right now. It makes life easier. I have worked on large projects that use auto-formatting. Despite this I'm strongly opposed against reformatting existing code. No matter how much time is spent on tuning the .clang-format file, there will always be code for which the formatting is made worse by clang-format than the existing code. Bart. |
|
From: Nicholas N. <n.n...@gm...> - 2023-04-20 20:52:09
|
On Fri, 21 Apr 2023 at 06:02, Mark Wielaard <ma...@kl...> wrote: > > Sure you can work around it, but I don't think that is a great > solution. It requires everyone to make some local changes. > You can send up a .gitconfig for the project, so it'll work automatically for everyone. > I am not a fan, but also not dead against. > Have you ever worked on a project that uses auto-formatting? Skepticism followed by enthusiasm is common. Paul, Julian, and I all have used it on other codebases and are now advocates. I'm using it for Cachegrind's Python code right now. It makes life easier. Personally I am happy with emacs M-x indent-region on the code I edit. > There is a .dir-locals.el in git which catches some (but certainly not > all) formatting things. > What about people who don't use emacs? Nick |