You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(32) |
Oct
|
Nov
|
Dec
|
|
From: Paul F. <pa...@so...> - 2023-08-20 13:26:45
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=0c37fa39c104f9c7c5f0b2297e3d52e8b8d58cac commit 0c37fa39c104f9c7c5f0b2297e3d52e8b8d58cac Author: Paul Floyd <pj...@wa...> Date: Sun Aug 20 15:24:25 2023 +0200 musl: enable libstdc++ freeres Both libc and libstdc++ freeres were disabled for musl. That means that libstdc++ was showing still reachable memory on systems like Alpine. Diff: --- coregrind/vg_preloaded.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/coregrind/vg_preloaded.c b/coregrind/vg_preloaded.c index 86f6ac5a26..a792081b11 100644 --- a/coregrind/vg_preloaded.c +++ b/coregrind/vg_preloaded.c @@ -76,7 +76,7 @@ DEFINE_GDB_PY_SCRIPT(VG_GDBSCRIPTS_DIR "/valgrind-monitor.py") void VG_NOTIFY_ON_LOAD(freeres)(Vg_FreeresToRun to_run); void VG_NOTIFY_ON_LOAD(freeres)(Vg_FreeresToRun to_run) { -# if !defined(__UCLIBC__) && !defined(MUSL_LIBC) \ +# if !defined(__UCLIBC__) \ && !defined(VGPV_arm_linux_android) \ && !defined(VGPV_x86_linux_android) \ && !defined(VGPV_mips32_linux_android) \ @@ -89,6 +89,14 @@ void VG_NOTIFY_ON_LOAD(freeres)(Vg_FreeresToRun to_run) _ZN9__gnu_cxx9__freeresEv(); } +# endif + +# if !defined(__UCLIBC__) && !defined(MUSL_LIBC) \ + && !defined(VGPV_arm_linux_android) \ + && !defined(VGPV_x86_linux_android) \ + && !defined(VGPV_mips32_linux_android) \ + && !defined(VGPV_arm64_linux_android) + extern void __libc_freeres(void) __attribute__((weak)); if (((to_run & VG_RUN__LIBC_FREERES) != 0) && (__libc_freeres != NULL)) { |
|
From: Paul F. <pa...@so...> - 2023-08-20 07:05:51
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=3ee80eb4584424498147215e31603e2a2223d315 commit 3ee80eb4584424498147215e31603e2a2223d315 Author: Paul Floyd <pj...@wa...> Date: Sun Aug 20 09:04:17 2023 +0200 FreeBSD: try to make eventfd2 testcase deterministic Add a sleep to give child 1 a chance to run. Flush stdout every time. Diff: --- memcheck/tests/freebsd/eventfd2.c | 8 ++++++++ memcheck/tests/freebsd/eventfd2.stdout.exp | 20 ++++++++++---------- 2 files changed, 18 insertions(+), 10 deletions(-) diff --git a/memcheck/tests/freebsd/eventfd2.c b/memcheck/tests/freebsd/eventfd2.c index 8e2309f96d..d620907c9d 100644 --- a/memcheck/tests/freebsd/eventfd2.c +++ b/memcheck/tests/freebsd/eventfd2.c @@ -18,6 +18,7 @@ static void xsem_wait(int fd) fprintf(stdout, "wait completed on %d: count=%" PRIu64 "\n", fd, cntr); + fflush(stdout); } static void xsem_post(int fd, int count) @@ -38,21 +39,27 @@ static void sem_player(int fd1, int fd2) * (also xsem_wait above) */ fprintf(stdout, "posting 1 on %d\n", fd1); + fflush(stdout); xsem_post(fd1, 1); fprintf(stdout, "waiting on %d\n", fd2); + fflush(stdout); xsem_wait(fd2); fprintf(stdout, "posting 1 on %d\n", fd1); + fflush(stdout); xsem_post(fd1, 1); fprintf(stdout, "waiting on %d\n", fd2); + fflush(stdout); xsem_wait(fd2); fprintf(stdout, "posting 5 on %d\n", fd1); + fflush(stdout); xsem_post(fd1, 5); fprintf(stdout, "waiting 5 times on %d\n", fd2); + fflush(stdout); xsem_wait(fd2); xsem_wait(fd2); xsem_wait(fd2); @@ -88,6 +95,7 @@ int main(int argc, char **argv) sem_player(fd1, fd2); exit(0); } + sleep(1); if ((cpid_waiter = fork()) == 0) { sem_player(fd2, fd1); exit(0); diff --git a/memcheck/tests/freebsd/eventfd2.stdout.exp b/memcheck/tests/freebsd/eventfd2.stdout.exp index 6a2cd1944e..00ecdcaca7 100644 --- a/memcheck/tests/freebsd/eventfd2.stdout.exp +++ b/memcheck/tests/freebsd/eventfd2.stdout.exp @@ -1,26 +1,26 @@ posting 1 on 3 waiting on 4 +posting 1 on 4 wait completed on 4: count=1 +waiting on 3 posting 1 on 3 waiting on 4 -wait completed on 4: count=1 -posting 5 on 3 -waiting 5 times on 4 -wait completed on 4: count=1 -wait completed on 4: count=1 -wait completed on 4: count=1 -wait completed on 4: count=1 -wait completed on 4: count=1 -posting 1 on 4 -waiting on 3 wait completed on 3: count=1 posting 1 on 4 +wait completed on 4: count=1 waiting on 3 wait completed on 3: count=1 +posting 5 on 3 posting 5 on 4 +waiting 5 times on 4 +wait completed on 4: count=1 waiting 5 times on 3 +wait completed on 4: count=1 wait completed on 3: count=1 +wait completed on 4: count=1 wait completed on 3: count=1 +wait completed on 4: count=1 wait completed on 3: count=1 +wait completed on 4: count=1 wait completed on 3: count=1 wait completed on 3: count=1 |
|
From: Paul F. <pa...@so...> - 2023-08-20 06:53:34
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=b9326bb1db5fff20332fd6bdbf37741784c215ea commit b9326bb1db5fff20332fd6bdbf37741784c215ea Author: Paul Floyd <pj...@wa...> Date: Sun Aug 20 08:52:36 2023 +0200 FreeBSD: complete loading debuginfo if entering capability mode Diff: --- coregrind/m_debuginfo/debuginfo.c | 15 +++++++++++++++ coregrind/m_syswrap/syswrap-freebsd.c | 2 ++ coregrind/pub_core_debuginfo.h | 7 +++++++ 3 files changed, 24 insertions(+) diff --git a/coregrind/m_debuginfo/debuginfo.c b/coregrind/m_debuginfo/debuginfo.c index 8d1fdc6960..c37e50b9d3 100644 --- a/coregrind/m_debuginfo/debuginfo.c +++ b/coregrind/m_debuginfo/debuginfo.c @@ -5102,6 +5102,21 @@ static void caches__invalidate ( void ) { debuginfo_generation++; } +#if defined(VGO_freebsd) +void VG_(load_all_debuginfo) (void) +{ + for (DebugInfo* di = debugInfo_list; di; di = di->next) { + if (di->deferred == True) { + di->deferred = False; + ML_(read_elf_debug)( di ); + ML_(canonicaliseTables)( di ); + check_CFSI_related_invariants(di); + ML_(finish_CFSI_arrays)(di); + } + } +} +#endif + /*--------------------------------------------------------------------*/ /*--- end ---*/ /*--------------------------------------------------------------------*/ diff --git a/coregrind/m_syswrap/syswrap-freebsd.c b/coregrind/m_syswrap/syswrap-freebsd.c index 9af37cfb83..a59872b3c9 100644 --- a/coregrind/m_syswrap/syswrap-freebsd.c +++ b/coregrind/m_syswrap/syswrap-freebsd.c @@ -5645,6 +5645,8 @@ PRE(sys_cap_enter) " Please consider disabling capability by using the RUNNING_ON_VALGRIND mechanism.\n" " See http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.clientreq\n"); } + /* now complete loading debuginfo since it is not allowed after entering cap mode */ + VG_(load_all_debuginfo)(); } // SYS_cap_getmode 517 diff --git a/coregrind/pub_core_debuginfo.h b/coregrind/pub_core_debuginfo.h index ce72462178..6e93bb93c5 100644 --- a/coregrind/pub_core_debuginfo.h +++ b/coregrind/pub_core_debuginfo.h @@ -150,6 +150,13 @@ extern Bool VG_(use_CF_info) ( /*MOD*/D3UnwindRegs* uregs, info (e.g. CFI info or FPO info or ...). */ extern UInt VG_(debuginfo_generation) (void); +#if defined(VGO_freebsd) +/* Force completion of loading all debuginfo. + Needed on FreeBSD when entering capability mode since + we can't open executable files to get the debuginfo after + entering capability mode. */ +extern void VG_(load_all_debuginfo) (void); +#endif /* True if some FPO information is loaded. |
|
From: Paul F. <pj...@wa...> - 2023-08-19 20:26:30
|
On 19-08-23 21:49, Paul Floyd wrote: > https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=a0d555a0dfe078ef04ea49d991a8090ab14bd4a5 > > commit a0d555a0dfe078ef04ea49d991a8090ab14bd4a5 > Author: Paul Floyd <pj...@wa...> > Date: Sat Aug 19 21:37:33 2023 +0200 > > Always cleanup on exit from ML_(read_elf_object) On second thoughts, I think that this was a real leak in the deferred loading of debuginfo. I just happened to do a kernel upgrade at the same time. However I still have one testcase that is failing because of this change. The TC uses capability mode (a bit like Linux seccomp). Previously the VG_(open) for the eager read of debuginfo was allowed because it happened before entering capability mode. Now the deferred load is being attempted after entering capability mode. 68833 memcheck-amd64-free CALL openat(AT_FDCWD,0x1002884a30,0<O_RDONLY>) 68833 memcheck-amd64-free NAMI "/usr/home/paulf/scratch/valgrind/memcheck/vgpreload_memcheck-amd64-freebsd.so" 68833 memcheck-amd64-free CAP restricted VFS lookup 68833 memcheck-amd64-free RET openat -1 errno 94 Not permitted in capability mode That fails so the TC has a few more warnings. A+ Paul |
|
From: Paul F. <pa...@so...> - 2023-08-19 19:49:52
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=a0d555a0dfe078ef04ea49d991a8090ab14bd4a5 commit a0d555a0dfe078ef04ea49d991a8090ab14bd4a5 Author: Paul Floyd <pj...@wa...> Date: Sat Aug 19 21:37:33 2023 +0200 Always cleanup on exit from ML_(read_elf_object) I'm still a but baffled as to why this wasn't seen earlier. A FreeBSD testcase started failing with kernel 13.2 patch 2, which is quite a minor change. The testcase gets an fd from pdfork and the parent does a printf with the fd then zaps the process with pdkill. Standalone the fd is 3, and that's what the expected contains. However, when it started failing I saw with lsof that fds 3 and 4 were associated with the guest exe and ld-elf.so.1. Diff: --- coregrind/m_debuginfo/readelf.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/coregrind/m_debuginfo/readelf.c b/coregrind/m_debuginfo/readelf.c index f99d3dfd2c..ac72f98fb5 100644 --- a/coregrind/m_debuginfo/readelf.c +++ b/coregrind/m_debuginfo/readelf.c @@ -1916,6 +1916,7 @@ Bool ML_(read_elf_object) ( struct _DebugInfo* di ) Word i, j; Bool dynbss_present = False; Bool sdynbss_present = False; + Bool retval = False; /* Image for the main ELF file we're working with. */ DiImage* mimg = NULL; @@ -2944,19 +2945,16 @@ Bool ML_(read_elf_object) ( struct _DebugInfo* di ) } } - return True; + retval = True; - out: - { - /* Last, but not least, detach from the image. */ - if (mimg) ML_(img_done)(mimg); + out: - if (svma_ranges) VG_(deleteXA)(svma_ranges); + /* Last, but not least, detach from the image. */ + if (mimg) ML_(img_done)(mimg); - return False; - } /* out: */ + if (svma_ranges) VG_(deleteXA)(svma_ranges); - /* NOTREACHED */ + return retval; } Bool ML_(read_elf_debug) ( struct _DebugInfo* di ) |
|
From: Paul F. <pj...@wa...> - 2023-08-18 06:15:38
|
On 18-08-23 00:21, John Reiser wrote:
>> -hex=$( $DIS_PATH -F scf_handle_bind $libscf | perl -pe '($_) =
>> /0x(?:4d01)?526570(\d{2}),/' )
>> +hex=$( $DIS_PATH -F scf_handle_bind $libscf | grep 526570 | sed
>> 's/.*526570//;s/,.*//' )
>
> Surely the string "526570" deserves a code comment about its origin,
> maintenance, etc.
Done, but it is just "the bytes before the protocol version".
A+
Paul
|
|
From: Paul F. <pa...@so...> - 2023-08-18 06:14:09
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=639fa4053d8ef97bd211dc541029622ce86dd6fc commit 639fa4053d8ef97bd211dc541029622ce86dd6fc Author: Paul Floyd <pj...@wa...> Date: Fri Aug 18 08:12:53 2023 +0200 Solaris: explain configure detection of scf repository door version Diff: --- configure.ac | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/configure.ac b/configure.ac index 913e8e8c5d..e9592f47ec 100755 --- a/configure.ac +++ b/configure.ac @@ -4537,6 +4537,16 @@ AC_PATH_PROG(DIS_PATH, dis, false) if test "x$DIS_PATH" = "xfalse"; then AC_MSG_FAILURE([Object code disassembler (`dis') not found.]) fi +# The illumos source is (or was) here +# https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libscf/common/lowlevel.c#L1148 +# specifically the line +# +# request.rdr_version = REPOSITORY_DOOR_VERSION; +# +# rdr_version is a 32bit unsigned int +# The macro REPOSITORY_DOOR_VERSION contains the ascii letters "Rep" in the top 3 +# bytes and the door version in the lowest byte. Hence we look for Rep which is 526570 +# in hex and then extrace the following byte. AC_CHECK_LIB(scf, scf_handle_bind, [], [ AC_MSG_WARN([Function `scf_handle_bind' was not found in `libscf'.]) AC_MSG_ERROR([Cannot determine version of the repository cache protocol.]) |
|
From: John R. <jr...@bi...> - 2023-08-17 22:21:51
|
> -hex=$( $DIS_PATH -F scf_handle_bind $libscf | perl -pe '($_) = /0x(?:4d01)?526570(\d{2}),/' )
> +hex=$( $DIS_PATH -F scf_handle_bind $libscf | grep 526570 | sed 's/.*526570//;s/,.*//' )
Surely the string "526570" deserves a code comment about its origin,
maintenance, etc.
|
|
From: Paul F. <pa...@so...> - 2023-08-17 20:10:00
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=991bf3789a828e60c24776a1cd2f8271255596bc commit 991bf3789a828e60c24776a1cd2f8271255596bc Author: Paul Floyd <pj...@wa...> Date: Thu Aug 17 22:05:47 2023 +0200 Bug 472963 - Broken regular expression in configure.ac Was extracting the last two decimal digits from a hex humber. I switched to using grep and sed because the proposed solution didn't work on Solaris 11.3. Diff: --- NEWS | 1 + configure.ac | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index 055dc41945..56f4701e00 100644 --- a/NEWS +++ b/NEWS @@ -47,6 +47,7 @@ are not entered into bugzilla tend to get forgotten about or ignored. 470978 s390x: Valgrind cannot start qemu-kvm when "sysctl vm.allocate_pgste=0" 471807 Add support for lazy reading and downloading of DWARF debuginfo 472219 Syscall param ppoll(ufds.events) points to uninitialised byte(s) +472963 Broken regular expression in configure.ac To see details of a given bug, visit https://bugs.kde.org/show_bug.cgi?id=XXXXXX diff --git a/configure.ac b/configure.ac index b4e9c11428..913e8e8c5d 100755 --- a/configure.ac +++ b/configure.ac @@ -4552,7 +4552,7 @@ if ! $DIS_PATH -F scf_handle_bind $libscf | grep -q -E '0x(4d01)?526570'; then AC_MSG_WARN([Function `scf_handle_bind' does not contain repository cache protocol version.]) AC_MSG_ERROR([Cannot determine version of the repository cache protocol.]) fi -hex=$( $DIS_PATH -F scf_handle_bind $libscf | perl -pe '($_) = /0x(?:4d01)?526570(\d{2}),/' ) +hex=$( $DIS_PATH -F scf_handle_bind $libscf | grep 526570 | sed 's/.*526570//;s/,.*//' ) if test -z "$hex"; then AC_MSG_WARN([Version of the repository cache protocol is empty?!]) AC_MSG_ERROR([Cannot determine version of the repository cache protocol.]) |
|
From: Mark W. <ma...@so...> - 2023-08-16 12:17:13
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=60f7e89ba32b54d73b9e36d49e28d0f559ade0b9 commit 60f7e89ba32b54d73b9e36d49e28d0f559ade0b9 Author: Aaron Merey <am...@re...> Date: Fri Jun 30 18:31:42 2023 -0400 Support lazy reading and downloading of DWARF debuginfo Currently valgrind attempts to read DWARF .debug_* sections as well as separate debuginfo files for ELF binaries as soon as a shared library is loaded. This might also result in the downloading of separate debuginfo files via debuginfod. This is inefficient when some of this debuginfo never ends up being used by valgrind while running the client process. This patch adds support for lazy reading and downloading of DWARF debuginfo. When an ELF shared library is loaded, the reading of .debug_* sections as well as separate or alternate debuginfo is deferred until valgrind handles an instruction pointer corresponding to a text segment of the shared library. At this point the deferred sections and separate debug files are loaded. This feature is only supported on ELF platforms. https://bugs.kde.org/show_bug.cgi?id=471807 ChangeLog * debuginfo.c (di_notify_ACHIEVE_ACCEPT_STATE): Replace read_elf_debug_info with read_elf_object. (addr_load_di): New function. Attempts to load deferred debuginfo associated with a given address. (load_di): New function. Attempts to load a given deferred debuginfo associated with a given address. (describe_IP): Add calls to load_di and addr_load_di. (find_DiCfSI): Add call to load_di. * priv_readelf.h (read_elf_object): New declaration. (read_elf_debug): Ditto. * priv_storage.h (struct _DebugInfo): New field 'bool deferred'. * readelf.c (read_elf_debug_info): Split into read_elf_object and read_elf_debug. (read_elf_object): Read non .debug_* section from an ELF binary. (read_elf_debug): Read .debug_* sections from an ELF binary as as well any separate/alternate debuginfo files. * storage.c (canonicaliseSymtab): Remove assert in order to support canonicalization of deferred _DebugInfo. (finish_CFSI_arrays): Add early return if _DebugInfo is deferred in order to avoid freeing memory that will be needed when reading debuginfo at a later time. (canonicaliseTables): Ditto. * pub_core_debuginfo.h (addr_load_di): New declaration. (load_di): New declaration. Diff: --- NEWS | 1 + coregrind/m_debuginfo/debuginfo.c | 57 ++++- coregrind/m_debuginfo/priv_readelf.h | 24 +- coregrind/m_debuginfo/priv_storage.h | 7 + coregrind/m_debuginfo/readelf.c | 437 +++++++++++++++++++++++------------ coregrind/m_debuginfo/storage.c | 13 +- coregrind/pub_core_debuginfo.h | 4 + 7 files changed, 379 insertions(+), 164 deletions(-) diff --git a/NEWS b/NEWS index 867d2f0f43..055dc41945 100644 --- a/NEWS +++ b/NEWS @@ -45,6 +45,7 @@ are not entered into bugzilla tend to get forgotten about or ignored. Assertion 'resolved' failed 470830 Don't print actions vgdb me ... continue for vgdb --multi mode 470978 s390x: Valgrind cannot start qemu-kvm when "sysctl vm.allocate_pgste=0" +471807 Add support for lazy reading and downloading of DWARF debuginfo 472219 Syscall param ppoll(ufds.events) points to uninitialised byte(s) To see details of a given bug, visit diff --git a/coregrind/m_debuginfo/debuginfo.c b/coregrind/m_debuginfo/debuginfo.c index 22b41def21..8d1fdc6960 100644 --- a/coregrind/m_debuginfo/debuginfo.c +++ b/coregrind/m_debuginfo/debuginfo.c @@ -959,14 +959,16 @@ static ULong di_notify_ACHIEVE_ACCEPT_STATE ( struct _DebugInfo* di ) discard_DebugInfos_which_overlap_with( di ); /* The DebugInfoMappings that now exist in the FSM may involve - overlaps. This confuses ML_(read_elf_debug_info), and may cause + overlaps. This confuses ML_(read_elf_*), and may cause it to compute wrong biases. So de-overlap them now. See http://bugzilla.mozilla.org/show_bug.cgi?id=788974 */ truncate_DebugInfoMapping_overlaps( di, di->fsm.maps ); /* And acquire new info. */ # if defined(VGO_linux) || defined(VGO_solaris) || defined(VGO_freebsd) - ok = ML_(read_elf_debug_info)( di ); + ok = ML_(read_elf_object)( di ); + if (ok) + di->deferred = True; # elif defined(VGO_darwin) ok = ML_(read_macho_debug_info)( di ); # else @@ -1443,6 +1445,50 @@ ULong VG_(di_notify_mmap)( Addr a, Bool allow_SkFileV, Int use_fd ) } } +/* Load DI if it has a text segment containing A and DI hasn't already + been loaded. */ + +void VG_(load_di)( DebugInfo *di, Addr a) +{ + if (!di->deferred + || !di->text_present + || di->text_size <= 0 + || di->text_avma > a + || a >= di->text_avma + di->text_size) + return; + + di->deferred = False; + ML_(read_elf_debug) (di); + ML_(canonicaliseTables)( di ); + + /* Check invariants listed in + Comment_on_IMPORTANT_REPRESENTATIONAL_INVARIANTS in + priv_storage.h. */ + check_CFSI_related_invariants(di); + ML_(finish_CFSI_arrays)(di); +} + +/* Attempt to load DebugInfo with a text segment containing A, + if such a debuginfo hasn't already been loaded. */ + +void VG_(addr_load_di)( Addr a ) +{ + DebugInfo *di; + + di = VG_(find_DebugInfo)(VG_(current_DiEpoch)(), a); + if (di != NULL) + if (di->deferred) { + di->deferred = False; + ML_(read_elf_debug) (di); + ML_(canonicaliseTables)( di ); + + /* Check invariants listed in + Comment_on_IMPORTANT_REPRESENTATIONAL_INVARIANTS in + priv_storage.h. */ + check_CFSI_related_invariants(di); + ML_(finish_CFSI_arrays)(di); + } +} /* Unmap is simpler - throw away any SegInfos intersecting [a, a+len). */ @@ -2746,6 +2792,11 @@ const HChar* VG_(describe_IP)(DiEpoch ep, Addr eip, const InlIPCursor *iipc) Bool know_objname; Bool know_srcloc; + if (iipc && iipc->di) + VG_(load_di) (iipc->di, eip); + else + VG_(addr_load_di) (eip); + if (is_bottom(iipc)) { // At the bottom (towards main), we describe the fn at eip. know_fnname = VG_(clo_sym_offsets) @@ -3090,6 +3141,8 @@ static void find_DiCfSI ( /*OUT*/DebugInfo** diP, if (!is_DI_valid_for_epoch(di, curr_epoch)) continue; + VG_(load_di)(di, ip); + /* Use the per-DebugInfo summary address ranges to skip inapplicable DebugInfos quickly. */ if (di->cfsi_used == 0) diff --git a/coregrind/m_debuginfo/priv_readelf.h b/coregrind/m_debuginfo/priv_readelf.h index 57aa0cc3f3..7e0fa17c9d 100644 --- a/coregrind/m_debuginfo/priv_readelf.h +++ b/coregrind/m_debuginfo/priv_readelf.h @@ -44,13 +44,23 @@ extern Bool ML_(is_elf_object_file)( const void* image, SizeT n_image, Bool rel_ok ); -/* The central function for reading ELF debug info. For the - object/exe specified by the SegInfo, find ELF sections, then read - the symbols, line number info, file name info, CFA (stack-unwind - info) and anything else we want, into the tables within the - supplied SegInfo. -*/ -extern Bool ML_(read_elf_debug_info) ( DebugInfo* di ); +/* Read the ELF binary specified by DI. For the object/exe specified + by the SegInfo, find ELF sections, then read the symbols, line number + info, file name info, CFA (stack-unwind info) and anything else we + want, into the tables within the supplied SegInfo. + + .debug_* sections as well as any separate debuginfo files are not + loaded by this function but instead by ML_(read_elf_debug). This + separation facilitates lazy loading of debuginfo. */ +extern Bool ML_(read_elf_object) ( DebugInfo* di ); + +/* Read .debug_* sections from the ELF binary specified by DI. Also + attempt to load any separate debuginfo files associated with the + object. + + ML_(read_elf_object) should be called on DI before calling this + function. */ +extern Bool ML_(read_elf_debug) ( DebugInfo* di ); extern Bool ML_(check_elf_and_get_rw_loads) ( Int fd, const HChar* filename, Int * rw_load_count ); diff --git a/coregrind/m_debuginfo/priv_storage.h b/coregrind/m_debuginfo/priv_storage.h index a4b90d36b3..b959873ab8 100644 --- a/coregrind/m_debuginfo/priv_storage.h +++ b/coregrind/m_debuginfo/priv_storage.h @@ -678,6 +678,13 @@ struct _DebugInfo { invalid and should not be consulted. */ Bool have_dinfo; /* initially False */ + /* If true then the reading of .debug_* section has been deferred + until it this information is required (such as when printing + a stacktrace). Additionally, if true then the reading of any + separate debuginfo files associated with this object has also + been deferred. */ + Bool deferred; + /* All the rest of the fields in this structure are filled in once we have committed to reading the symbols and debug info (that is, at the point where .have_dinfo is set to True). */ diff --git a/coregrind/m_debuginfo/readelf.c b/coregrind/m_debuginfo/readelf.c index ce7b7998de..f99d3dfd2c 100644 --- a/coregrind/m_debuginfo/readelf.c +++ b/coregrind/m_debuginfo/readelf.c @@ -1836,6 +1836,44 @@ static HChar* readlink_path (const HChar *path) return buf; } +#define FINDX_MIMG(_sec_name, _sec_escn, _post_fx) \ + do { \ + ElfXX_Shdr a_shdr; \ + ML_(img_get)(&a_shdr, mimg, \ + INDEX_BIS(shdr_mioff, i, shdr_ment_szB), \ + sizeof(a_shdr)); \ + if (0 == ML_(img_strcmp_c)(mimg, shdr_strtab_mioff \ + + a_shdr.sh_name, _sec_name)) { \ + Bool nobits; \ + _sec_escn.img = mimg; \ + _sec_escn.ioff = (DiOffT)a_shdr.sh_offset; \ + _sec_escn.szB = a_shdr.sh_size; \ + if (!check_compression(&a_shdr, &_sec_escn)) { \ + ML_(symerr)(di, True, " Compression type is unsupported"); \ + goto out; \ + } \ + nobits = a_shdr.sh_type == SHT_NOBITS; \ + vg_assert(_sec_escn.img != NULL); \ + vg_assert(_sec_escn.ioff != DiOffT_INVALID); \ + TRACE_SYMTAB( "%-18s: ioff %llu .. %llu\n", \ + _sec_name, (ULong)a_shdr.sh_offset, \ + ((ULong)a_shdr.sh_offset) + a_shdr.sh_size - 1); \ + /* SHT_NOBITS sections have zero size in the file. */ \ + if (!nobits && \ + a_shdr.sh_offset + \ + a_shdr.sh_size > ML_(img_real_size)(mimg)) { \ + ML_(symerr)(di, True, \ + " section beyond image end?!"); \ + goto out; \ + } \ + _post_fx; \ + } \ + } while (0); + +/* Version with no post-effects */ +#define FIND_MIMG(_sec_name, _sec_escn) \ + FINDX_MIMG(_sec_name, _sec_escn, /**/) + /* The central function for reading ELF debug info. For the object/exe specified by the DebugInfo, find ELF sections, then read the symbols, line number info, file name info, CFA (stack-unwind @@ -1843,7 +1881,7 @@ static HChar* readlink_path (const HChar *path) supplied DebugInfo. */ -Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di ) +Bool ML_(read_elf_object) ( struct _DebugInfo* di ) { /* This function is long and complex. That, and the presence of nested scopes, means it's not always easy to see which parts are @@ -1874,7 +1912,7 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di ) /* TOPLEVEL */ - Bool res, ok; + Bool ok; Word i, j; Bool dynbss_present = False; Bool sdynbss_present = False; @@ -1882,12 +1920,6 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di ) /* Image for the main ELF file we're working with. */ DiImage* mimg = NULL; - /* Ditto for any ELF debuginfo file that we might happen to load. */ - DiImage* dimg = NULL; - - /* Ditto for alternate ELF debuginfo file that we might happen to load. */ - DiImage* aimg = NULL; - /* ELF header offset for the main file. Should be zero since the ELF header is at start of file. */ DiOffT ehdr_mioff = 0; @@ -1970,8 +2002,6 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di ) got, plt, and toc. ---------------------------------------------------------- */ - res = False; - if (VG_(clo_verbosity) > 1 || VG_(clo_trace_redir)) VG_(message)(Vg_DebugMsg, "Reading syms from %s\n", di->fsm.filename ); @@ -2056,7 +2086,7 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di ) shdr_strtab_mioff = ehdr_mioff /* isn't this always zero? */ + a_shdr.sh_offset; - if (!ML_(img_valid)(mimg, shdr_strtab_mioff, + if (!ML_(img_valid)(mimg, shdr_strtab_mioff, 1/*bogus, but we don't know the real size*/ )) { ML_(symerr)(di, True, "Invalid ELF Section Header String Table"); goto out; @@ -2798,10 +2828,6 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di ) di->text_avma - di->text_bias, di->text_avma ); - TRACE_SYMTAB("\n"); - TRACE_SYMTAB("------ Finding image addresses " - "for debug-info sections ------\n"); - /* TOPLEVEL */ /* Find interesting sections, read the symbol table(s), read any debug information. Each section is located either in the main, @@ -2821,27 +2847,6 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di ) # if defined(VGO_solaris) DiSlice ldynsym_escn = DiSlice_INVALID; // .SUNW_ldynsym # endif - DiSlice debuglink_escn = DiSlice_INVALID; // .gnu_debuglink - DiSlice debugaltlink_escn = DiSlice_INVALID; // .gnu_debugaltlink - DiSlice debug_line_escn = DiSlice_INVALID; // .debug_line (dwarf2) - DiSlice debug_info_escn = DiSlice_INVALID; // .debug_info (dwarf2) - DiSlice debug_types_escn = DiSlice_INVALID; // .debug_types (dwarf4) - DiSlice debug_abbv_escn = DiSlice_INVALID; // .debug_abbrev (dwarf2) - DiSlice debug_str_escn = DiSlice_INVALID; // .debug_str (dwarf2) - DiSlice debug_line_str_escn = DiSlice_INVALID; // .debug_line_str(dwarf5) - DiSlice debug_ranges_escn = DiSlice_INVALID; // .debug_ranges (dwarf2) - DiSlice debug_rnglists_escn = DiSlice_INVALID; // .debug_rnglists(dwarf5) - DiSlice debug_loclists_escn = DiSlice_INVALID; // .debug_loclists(dwarf5) - DiSlice debug_addr_escn = DiSlice_INVALID; // .debug_addr (dwarf5) - DiSlice debug_str_offsets_escn = DiSlice_INVALID; // .debug_str_offsets (dwarf5) - DiSlice debug_loc_escn = DiSlice_INVALID; // .debug_loc (dwarf2) - DiSlice debug_frame_escn = DiSlice_INVALID; // .debug_frame (dwarf2) - DiSlice debug_line_alt_escn = DiSlice_INVALID; // .debug_line (alt) - DiSlice debug_info_alt_escn = DiSlice_INVALID; // .debug_info (alt) - DiSlice debug_abbv_alt_escn = DiSlice_INVALID; // .debug_abbrev (alt) - DiSlice debug_str_alt_escn = DiSlice_INVALID; // .debug_str (alt) - DiSlice dwarf1d_escn = DiSlice_INVALID; // .debug (dwarf1) - DiSlice dwarf1l_escn = DiSlice_INVALID; // .line (dwarf1) DiSlice opd_escn = DiSlice_INVALID; // .opd (dwarf2, // ppc64be-linux) DiSlice ehframe_escn[N_EHFRAME_SECTS]; // .eh_frame (dwarf2) @@ -2868,118 +2873,282 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di ) /* TOPLEVEL */ /* Iterate over section headers (again) */ for (i = 0; i < ehdr_m.e_shnum; i++) { + /* NAME ElfSec */ + FIND_MIMG( ".dynsym", dynsym_escn) + FIND_MIMG( ".dynstr", dynstr_escn) + FIND_MIMG( ".symtab", symtab_escn) + FIND_MIMG( ".strtab", strtab_escn) +# if defined(VGO_solaris) + FIND_MIMG( ".SUNW_ldynsym", ldynsym_escn) +# endif -# define FINDX(_sec_name, _sec_escn, _post_fx) \ - do { \ - ElfXX_Shdr a_shdr; \ - ML_(img_get)(&a_shdr, mimg, \ - INDEX_BIS(shdr_mioff, i, shdr_ment_szB), \ - sizeof(a_shdr)); \ - if (0 == ML_(img_strcmp_c)(mimg, shdr_strtab_mioff \ - + a_shdr.sh_name, _sec_name)) { \ - Bool nobits; \ - _sec_escn.img = mimg; \ - _sec_escn.ioff = (DiOffT)a_shdr.sh_offset; \ - _sec_escn.szB = a_shdr.sh_size; \ - if (!check_compression(&a_shdr, &_sec_escn)) { \ - ML_(symerr)(di, True, " Compression type is unsupported"); \ - goto out; \ - } \ - nobits = a_shdr.sh_type == SHT_NOBITS; \ - vg_assert(_sec_escn.img != NULL); \ - vg_assert(_sec_escn.ioff != DiOffT_INVALID); \ - TRACE_SYMTAB( "%-18s: ioff %llu .. %llu\n", \ - _sec_name, (ULong)a_shdr.sh_offset, \ - ((ULong)a_shdr.sh_offset) + a_shdr.sh_size - 1); \ - /* SHT_NOBITS sections have zero size in the file. */ \ - if (!nobits && \ - a_shdr.sh_offset + \ - a_shdr.sh_size > ML_(img_real_size)(mimg)) { \ - ML_(symerr)(di, True, \ - " section beyond image end?!"); \ - goto out; \ - } \ - _post_fx; \ - } \ - } while (0); + FINDX_MIMG( ".eh_frame", ehframe_escn[ehframe_mix], + do { ehframe_mix++; vg_assert(ehframe_mix <= N_EHFRAME_SECTS); + } while (0) + ) + /* Comment_on_EH_FRAME_MULTIPLE_INSTANCES: w.r.t. .eh_frame + multi-instance kludgery, how are we assured that the order + in which we fill in ehframe_escn[] is consistent with the + order in which we previously filled in di->ehframe_avma[] + and di->ehframe_size[] ? By the fact that in both cases, + these arrays were filled in by iterating over the section + headers top-to-bottom. So both loops (this one and the + previous one) encounter the .eh_frame entries in the same + order and so fill in these arrays in a consistent order. + */ + } /* Iterate over section headers (again) */ - /* Version with no post-effects */ -# define FIND(_sec_name, _sec_escn) \ - FINDX(_sec_name, _sec_escn, /**/) + /* Check some sizes */ + vg_assert((dynsym_escn.szB % sizeof(ElfXX_Sym)) == 0); + vg_assert((symtab_escn.szB % sizeof(ElfXX_Sym)) == 0); +# if defined(VGO_solaris) + vg_assert((ldynsym_escn.szB % sizeof(ElfXX_Sym)) == 0); +# endif - /* NAME ElfSec */ - FIND( ".dynsym", dynsym_escn) - FIND( ".dynstr", dynstr_escn) - FIND( ".symtab", symtab_escn) - FIND( ".strtab", strtab_escn) + /* Read symbols */ + { + void (*read_elf_symtab)(struct _DebugInfo*, const HChar*, + DiSlice*, DiSlice*, DiSlice*, Bool); +# if defined(VGP_ppc64be_linux) + read_elf_symtab = read_elf_symtab__ppc64be_linux; +# else + read_elf_symtab = read_elf_symtab__normal; +# endif + if (symtab_escn.img != NULL) + read_elf_symtab(di, "symbol table", + &symtab_escn, &strtab_escn, &opd_escn, + False); + read_elf_symtab(di, "dynamic symbol table", + &dynsym_escn, &dynstr_escn, &opd_escn, + False); # if defined(VGO_solaris) - FIND( ".SUNW_ldynsym", ldynsym_escn) + read_elf_symtab(di, "local dynamic symbol table", + &ldynsym_escn, &dynstr_escn, &opd_escn, + False); # endif + } - FIND( ".gnu_debuglink", debuglink_escn) - FIND( ".gnu_debugaltlink", debugaltlink_escn) + /* TOPLEVEL */ + /* Read .eh_frame and .debug_frame (call-frame-info) if any. Do + the .eh_frame section(s) first. */ + vg_assert(di->n_ehframe >= 0 && di->n_ehframe <= N_EHFRAME_SECTS); + for (i = 0; i < di->n_ehframe; i++) { + /* see Comment_on_EH_FRAME_MULTIPLE_INSTANCES above for why + this next assertion should hold. */ + vg_assert(ML_(sli_is_valid)(ehframe_escn[i])); + vg_assert(ehframe_escn[i].szB == di->ehframe_size[i]); + ML_(read_callframe_info_dwarf3)( di, + ehframe_escn[i], + di->ehframe_avma[i], + True/*is_ehframe*/ ); + } + } + + return True; + + out: + { + /* Last, but not least, detach from the image. */ + if (mimg) ML_(img_done)(mimg); + + if (svma_ranges) VG_(deleteXA)(svma_ranges); - FIND( ".debug_line", debug_line_escn) + return False; + } /* out: */ + + /* NOTREACHED */ +} + +Bool ML_(read_elf_debug) ( struct _DebugInfo* di ) +{ + Word i, j; + Bool res = True; + Bool ok; + + /* Image for the main ELF file we're working with. */ + DiImage* mimg = NULL; + + /* Ditto for any ELF debuginfo file that we might happen to load. */ + DiImage* dimg = NULL; + + /* Ditto for alternate ELF debuginfo file that we might happen to load. */ + DiImage* aimg = NULL; + + /* Section header image addr, # entries, entry size. Also the + associated string table. */ + DiOffT shdr_mioff = 0; + UWord shdr_mnent = 0; + UWord shdr_ment_szB = 0; + DiOffT shdr_strtab_mioff = 0; + + DiOffT ehdr_mioff = 0; + + /* Connect to the primary object image, so that we can read symbols + and line number info out of it. It will be disconnected + immediately thereafter; it is only connected transiently. */ + mimg = ML_(img_from_local_file)(di->fsm.filename); + if (mimg == NULL) { + VG_(message)(Vg_UserMsg, "warning: connection to image %s failed\n", + di->fsm.filename ); + VG_(message)(Vg_UserMsg, " no debug info loaded\n" ); + return False; + } + + /* Ok, the object image is available. Now verify that it is a + valid ELF .so or executable image. */ + ok = is_elf_object_file_by_DiImage(mimg, False); + if (!ok) { + ML_(symerr)(di, True, "Invalid ELF Header"); + goto out; + } + + /* Find where the program and section header tables are, and give + up if either is missing or outside the image (bogus). */ + ElfXX_Ehdr ehdr_m; + vg_assert(ehdr_mioff == 0); // ensured by its initialisation + ok = ML_(img_valid)(mimg, ehdr_mioff, sizeof(ehdr_m)); + vg_assert(ok); // ML_(is_elf_object_file) should ensure this + ML_(img_get)(&ehdr_m, mimg, ehdr_mioff, sizeof(ehdr_m)); + + shdr_mioff = ehdr_mioff + ehdr_m.e_shoff; + shdr_mnent = ehdr_m.e_shnum; + shdr_ment_szB = ehdr_m.e_shentsize; + + if (shdr_mnent == 0 + || !ML_(img_valid)(mimg, shdr_mioff, shdr_mnent * shdr_ment_szB)) { + ML_(symerr)(di, True, "Missing or invalid ELF Section Header Table"); + goto out; + } + + /* Also find the section header's string table, and validate. */ + /* checked previously by is_elf_object_file: */ + vg_assert(ehdr_m.e_shstrndx != SHN_UNDEF); + + // shdr_mioff is the offset of the section header table + // and we need the ehdr_m.e_shstrndx'th entry + { ElfXX_Shdr a_shdr; + ML_(img_get)(&a_shdr, mimg, + INDEX_BIS(shdr_mioff, ehdr_m.e_shstrndx, shdr_ment_szB), + sizeof(a_shdr)); + shdr_strtab_mioff + = ehdr_mioff /* isn't this always zero? */ + a_shdr.sh_offset; + + if (!ML_(img_valid)(mimg, shdr_strtab_mioff, + 1/*bogus, but we don't know the real size*/ )) { + ML_(symerr)(di, True, "Invalid ELF Section Header String Table"); + goto out; + } + } + + TRACE_SYMTAB("\n"); + TRACE_SYMTAB("------ Finding image addresses " + "for debug-info sections ------\n"); + /* TOPLEVEL */ + /* Find interesting sections, read the symbol table(s), read any + debug information. Each section is located either in the main, + debug or alt-debug files, but only in one. For each section, + |section_escn| records which of |mimg|, |dimg| or |aimg| we + found it in, along with the section's image offset and its size. + The triples (section_img, section_ioff, section_szB) are + consistent, in that they are always either (NULL, + DiOffT_INVALID, 0), or refer to the same image, and are all + assigned together. */ + + { + /* TOPLEVEL */ + DiSlice strtab_escn = DiSlice_INVALID; // .strtab + DiSlice symtab_escn = DiSlice_INVALID; // .symtab + DiSlice debuglink_escn = DiSlice_INVALID; // .gnu_debuglink + DiSlice debugaltlink_escn = DiSlice_INVALID; // .gnu_debugaltlink + DiSlice debug_line_escn = DiSlice_INVALID; // .debug_line (dwarf2) + DiSlice debug_info_escn = DiSlice_INVALID; // .debug_info (dwarf2) + DiSlice debug_types_escn = DiSlice_INVALID; // .debug_types (dwarf4) + DiSlice debug_abbv_escn = DiSlice_INVALID; // .debug_abbrev (dwarf2) + DiSlice debug_str_escn = DiSlice_INVALID; // .debug_str (dwarf2) + DiSlice debug_line_str_escn = DiSlice_INVALID; // .debug_line_str(dwarf5) + DiSlice debug_ranges_escn = DiSlice_INVALID; // .debug_ranges (dwarf2) + DiSlice debug_rnglists_escn = DiSlice_INVALID; // .debug_rnglists(dwarf5) + DiSlice debug_loclists_escn = DiSlice_INVALID; // .debug_loclists(dwarf5) + DiSlice debug_addr_escn = DiSlice_INVALID; // .debug_addr (dwarf5) + DiSlice debug_str_offsets_escn = DiSlice_INVALID; // .debug_str_offsets (dwarf5) + DiSlice debug_loc_escn = DiSlice_INVALID; // .debug_loc (dwarf2) + DiSlice debug_frame_escn = DiSlice_INVALID; // .debug_frame (dwarf2) + DiSlice debug_line_alt_escn = DiSlice_INVALID; // .debug_line (alt) + DiSlice debug_info_alt_escn = DiSlice_INVALID; // .debug_info (alt) + DiSlice debug_abbv_alt_escn = DiSlice_INVALID; // .debug_abbrev (alt) + DiSlice debug_str_alt_escn = DiSlice_INVALID; // .debug_str (alt) + DiSlice dwarf1d_escn = DiSlice_INVALID; // .debug (dwarf1) + DiSlice dwarf1l_escn = DiSlice_INVALID; // .line (dwarf1) + DiSlice opd_escn = DiSlice_INVALID; // .opd (dwarf2, + // ppc64be-linux) + + /* TOPLEVEL */ + /* Iterate over section headers (again) */ + for (i = 0; i < ehdr_m.e_shnum; i++) { + + /* NAME ElfSec */ + FIND_MIMG( ".symtab", symtab_escn) + FIND_MIMG( ".strtab", strtab_escn) + FIND_MIMG( ".gnu_debuglink", debuglink_escn) + FIND_MIMG( ".gnu_debugaltlink", debugaltlink_escn) + + FIND_MIMG( ".debug_line", debug_line_escn) if (!ML_(sli_is_valid)(debug_line_escn)) - FIND(".zdebug_line", debug_line_escn) + FIND_MIMG(".zdebug_line", debug_line_escn) - FIND( ".debug_info", debug_info_escn) + FIND_MIMG( ".debug_info", debug_info_escn) if (!ML_(sli_is_valid)(debug_info_escn)) - FIND(".zdebug_info", debug_info_escn) + FIND_MIMG(".zdebug_info", debug_info_escn) - FIND( ".debug_types", debug_types_escn) + FIND_MIMG( ".debug_types", debug_types_escn) if (!ML_(sli_is_valid)(debug_types_escn)) - FIND(".zdebug_types", debug_types_escn) + FIND_MIMG(".zdebug_types", debug_types_escn) - FIND( ".debug_abbrev", debug_abbv_escn) + FIND_MIMG( ".debug_abbrev", debug_abbv_escn) if (!ML_(sli_is_valid)(debug_abbv_escn)) - FIND(".zdebug_abbrev", debug_abbv_escn) + FIND_MIMG(".zdebug_abbrev", debug_abbv_escn) - FIND( ".debug_str", debug_str_escn) + FIND_MIMG( ".debug_str", debug_str_escn) if (!ML_(sli_is_valid)(debug_str_escn)) - FIND(".zdebug_str", debug_str_escn) + FIND_MIMG(".zdebug_str", debug_str_escn) - FIND( ".debug_line_str", debug_line_str_escn) + FIND_MIMG( ".debug_line_str", debug_line_str_escn) if (!ML_(sli_is_valid)(debug_line_str_escn)) - FIND(".zdebug_str", debug_line_str_escn) + FIND_MIMG(".zdebug_str", debug_line_str_escn) - FIND( ".debug_ranges", debug_ranges_escn) + FIND_MIMG( ".debug_ranges", debug_ranges_escn) if (!ML_(sli_is_valid)(debug_ranges_escn)) - FIND(".zdebug_ranges", debug_ranges_escn) + FIND_MIMG(".zdebug_ranges", debug_ranges_escn) - FIND( ".debug_rnglists", debug_rnglists_escn) + FIND_MIMG( ".debug_rnglists", debug_rnglists_escn) if (!ML_(sli_is_valid)(debug_rnglists_escn)) - FIND(".zdebug_rnglists", debug_rnglists_escn) + FIND_MIMG(".zdebug_rnglists", debug_rnglists_escn) - FIND( ".debug_loclists", debug_loclists_escn) + FIND_MIMG( ".debug_loclists", debug_loclists_escn) if (!ML_(sli_is_valid)(debug_loclists_escn)) - FIND(".zdebug_loclists", debug_loclists_escn) + FIND_MIMG(".zdebug_loclists", debug_loclists_escn) - FIND( ".debug_loc", debug_loc_escn) + FIND_MIMG( ".debug_loc", debug_loc_escn) if (!ML_(sli_is_valid)(debug_loc_escn)) - FIND(".zdebug_loc", debug_loc_escn) + FIND_MIMG(".zdebug_loc", debug_loc_escn) - FIND( ".debug_frame", debug_frame_escn) + FIND_MIMG( ".debug_frame", debug_frame_escn) if (!ML_(sli_is_valid)(debug_frame_escn)) - FIND(".zdebug_frame", debug_frame_escn) + FIND_MIMG(".zdebug_frame", debug_frame_escn) - FIND( ".debug_addr", debug_addr_escn) + FIND_MIMG( ".debug_addr", debug_addr_escn) if (!ML_(sli_is_valid)(debug_addr_escn)) - FIND(".zdebug_addr", debug_addr_escn) + FIND_MIMG(".zdebug_addr", debug_addr_escn) - FIND( ".debug_str_offsets", debug_str_offsets_escn) + FIND_MIMG( ".debug_str_offsets", debug_str_offsets_escn) if (!ML_(sli_is_valid)(debug_str_offsets_escn)) - FIND(".zdebug_str_offsets", debug_str_offsets_escn) + FIND_MIMG(".zdebug_str_offsets", debug_str_offsets_escn) - FIND( ".debug", dwarf1d_escn) - FIND( ".line", dwarf1l_escn) + FIND_MIMG( ".debug", dwarf1d_escn) + FIND_MIMG( ".line", dwarf1l_escn) - FIND( ".opd", opd_escn) + FIND_MIMG( ".opd", opd_escn) - FINDX( ".eh_frame", ehframe_escn[ehframe_mix], - do { ehframe_mix++; vg_assert(ehframe_mix <= N_EHFRAME_SECTS); - } while (0) - ) /* Comment_on_EH_FRAME_MULTIPLE_INSTANCES: w.r.t. .eh_frame multi-instance kludgery, how are we assured that the order in which we fill in ehframe_escn[] is consistent with the @@ -2991,8 +3160,6 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di ) order and so fill in these arrays in a consistent order. */ -# undef FINDX -# undef FIND } /* Iterate over section headers (again) */ /* TOPLEVEL */ @@ -3465,53 +3632,23 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di ) } /* Find all interesting sections */ } /* do we have a debug image? */ - /* TOPLEVEL */ - /* Check some sizes */ - vg_assert((dynsym_escn.szB % sizeof(ElfXX_Sym)) == 0); vg_assert((symtab_escn.szB % sizeof(ElfXX_Sym)) == 0); -# if defined(VGO_solaris) - vg_assert((ldynsym_escn.szB % sizeof(ElfXX_Sym)) == 0); -# endif /* TOPLEVEL */ /* Read symbols */ { void (*read_elf_symtab)(struct _DebugInfo*, const HChar*, DiSlice*, DiSlice*, DiSlice*, Bool); - Bool symtab_in_debug; # if defined(VGP_ppc64be_linux) read_elf_symtab = read_elf_symtab__ppc64be_linux; # else read_elf_symtab = read_elf_symtab__normal; # endif - symtab_in_debug = symtab_escn.img == dimg; - read_elf_symtab(di, "symbol table", - &symtab_escn, &strtab_escn, &opd_escn, - symtab_in_debug); - read_elf_symtab(di, "dynamic symbol table", - &dynsym_escn, &dynstr_escn, &opd_escn, - False); -# if defined(VGO_solaris) - read_elf_symtab(di, "local dynamic symbol table", - &ldynsym_escn, &dynstr_escn, &opd_escn, - False); -# endif - } - - /* TOPLEVEL */ - /* Read .eh_frame and .debug_frame (call-frame-info) if any. Do - the .eh_frame section(s) first. */ - vg_assert(di->n_ehframe >= 0 && di->n_ehframe <= N_EHFRAME_SECTS); - for (i = 0; i < di->n_ehframe; i++) { - /* see Comment_on_EH_FRAME_MULTIPLE_INSTANCES above for why - this next assertion should hold. */ - vg_assert(ML_(sli_is_valid)(ehframe_escn[i])); - vg_assert(ehframe_escn[i].szB == di->ehframe_size[i]); - ML_(read_callframe_info_dwarf3)( di, - ehframe_escn[i], - di->ehframe_avma[i], - True/*is_ehframe*/ ); + if (symtab_escn.img != NULL) + read_elf_symtab(di, "symbol table", + &symtab_escn, &strtab_escn, &opd_escn, + True); } if (ML_(sli_is_valid)(debug_frame_escn)) { ML_(read_callframe_info_dwarf3)( di, @@ -3643,8 +3780,6 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di ) if (dimg) ML_(img_done)(dimg); if (aimg) ML_(img_done)(aimg); - if (svma_ranges) VG_(deleteXA)(svma_ranges); - return res; } /* out: */ diff --git a/coregrind/m_debuginfo/storage.c b/coregrind/m_debuginfo/storage.c index c3fa62e96c..3ad114607b 100644 --- a/coregrind/m_debuginfo/storage.c +++ b/coregrind/m_debuginfo/storage.c @@ -1297,8 +1297,7 @@ void ML_(addVar)( struct _DebugInfo* di, that those extra sections have the same bias as .text, but that seems a reasonable assumption to me. */ /* This is assured us by top level steering logic in debuginfo.c, - and it is re-checked at the start of - ML_(read_elf_debug_info). */ + and it is re-checked at the start of ML_(read_elf_object). */ vg_assert(di->fsm.have_rx_map && di->fsm.rw_map_count); if (level > 0 && ML_(find_rx_mapping)(di, aMin, aMax) == NULL) { if (VG_(clo_verbosity) > 1) { @@ -1725,7 +1724,6 @@ static void canonicaliseSymtab ( struct _DebugInfo* di ) for (i = 0; i < di->symtab_used; i++) { DiSym* sym = &di->symtab[i]; vg_assert(sym->pri_name); - vg_assert(!sym->sec_names); } /* Sort by address. */ @@ -2383,6 +2381,9 @@ void ML_(finish_CFSI_arrays) ( struct _DebugInfo* di ) vg_assert (f_holes == n_holes); vg_assert (pos == new_used); + if (di->deferred) + return; + di->cfsi_used = new_used; di->cfsi_size = new_used; ML_(dinfo_free) (di->cfsi_rd); @@ -2398,9 +2399,13 @@ void ML_(canonicaliseTables) ( struct _DebugInfo* di ) canonicaliseLoctab ( di ); canonicaliseInltab ( di ); ML_(canonicaliseCFI) ( di ); + canonicaliseVarInfo ( di ); + + if (di->deferred) + return; + if (di->cfsi_m_pool) VG_(freezeDedupPA) (di->cfsi_m_pool, ML_(dinfo_shrink_block)); - canonicaliseVarInfo ( di ); if (di->strpool) VG_(freezeDedupPA) (di->strpool, ML_(dinfo_shrink_block)); if (di->fndnpool) diff --git a/coregrind/pub_core_debuginfo.h b/coregrind/pub_core_debuginfo.h index 938ed00cc1..ce72462178 100644 --- a/coregrind/pub_core_debuginfo.h +++ b/coregrind/pub_core_debuginfo.h @@ -76,6 +76,10 @@ extern void VG_(di_notify_pdb_debuginfo)( Int fd, Addr avma, extern void VG_(di_notify_vm_protect)( Addr a, SizeT len, UInt prot ); #endif +extern void VG_(addr_load_di)( Addr a ); + +extern void VG_(load_di)( DebugInfo *di, Addr a ); + extern void VG_(di_discard_ALL_debuginfo)( void ); /* Like VG_(get_fnname), but it does not do C++ demangling nor Z-demangling |
|
From: Jojo R <rj...@li...> - 2023-08-04 06:04:16
|
Hi, We are glad to open source RVV implementation here: https://github.com/rjiejie/valgrind-riscv64 3 kinds extra ISAs were added in this repo: RV64Zfh : Half-precision floating-point RV64Xthead [1] : T-THEAD vendor extension for RV64G RV64V0p7 [2] : Vector 0.7.1 RV64V : Vector 1.x, coming soon :) [1] https://github.com/T-head-Semi/thead-extension-spec [2] https://github.com/riscv/riscv-v-spec/releases/tag/0.7.1 Regards --Jojo 在 2023/7/17 15:05, Jojo R 写道: > > Hi, > > Sorry for the late reply, > > i have been pushing the progress of valgrind RVV implementation 😄 > We finished the first version and tested with full RVV intrinsics spec. > > For real project and developers, we implement the first useable/ full > functionality's RVV valgrind with dirtycall method, > and we will make experiment or optimize RVV implementation on ideal > RVV design. > > Back to the RVV RFC, we are happy to share our thinking of design, see > attachment for more details :) > > Regards > > --Jojo > > 在 2023/4/21 17:25, Jojo R 写道: >> >> Hi, >> >> We consider to add RVV/Vector [1] feature in valgrind, there are some >> challenges. >> RVV like ARM's SVE [2] programming model, it's scalable/VLA, that >> means the vector length is agnostic. >> ARM's SVE is not supported in valgrind :( >> >> There are three major issues in implementing RVV instruction set in >> Valgrind as following: >> >> 1. Scalable vector register width VLENB >> 2. Runtime changing property of LMUL and SEW >> 3. Lack of proper VEX IR to represent all vector operations >> >> We propose applicable methods to solve 1 and 2. As for 3, we explore >> several possible but maybe imperfect approaches to handle different >> cases. >> >> We start from 1. As each guest register should be described in >> VEXGuestState struct, the vector registers with scalable width of >> VLENB can be added into VEXGuestState as arrays using an allowable >> maximum length like 2048/4096. >> >> The actual available access range can be determined at Valgrind >> startup time by querying the CPU for its vector capability or some >> suitable setup steps. >> >> >> To solve problem 2, we are inspired by already-proven techniques in >> QEMU, where translation blocks are broken up when certain critical >> CSRs are set. Because the guest code to IR translation relies on the >> precise value of LMUL/SEW and they may change within a basic block, >> we can break up the basic block each time encountering a vsetvl{i} >> instruction and return to the scheduler to execute the translated >> code and update LMUL/SEW. Accordingly, translation cache management >> should be refactored to detect the changing of LMUL/SEW to invalidate >> outdated code cache. Without losing the generality, the LMUL/SEW >> should be encoded into an ULong flag such that other architectures >> can leverage this flag to store their arch-dependent information. The >> TTentry struct should also take the flag into account no matter >> insertion or deletion. By doing this, the flag carries the newest >> LMUL/SEW throughout the simulation and can be passed to disassemble >> functions using the VEXArchInfo struct such that we can get the real >> and newest value of LMUL and SEW to facilitate our translation. >> >> Also, some architecture-related code should be taken care of. Like >> m_dispatch part, disp_cp_xindir function looks up code cache using >> hardcoded assembly by checking the requested guest state IP and >> translation cache entry address with no more constraints. Many other >> modules should be checked to ensure the in-time update of LMUL/SEW is >> instantly visible to essential parts in Valgrind. >> >> >> The last remaining big issue is 3, which we introduce some ad-hoc >> approaches to deal with. We summarize these approaches into three >> types as following: >> >> 1. Break down a vector instruction to scalar VEX IR ops. >> 2. Break down a vector instruction to fixed-length VEX IR ops. >> 3. Use dirty helpers to realize vector instructions. >> >> The very first method theoretically exists but is probably not >> applicable as the number of IR ops explodes when a large VLENB is >> adopted. Imaging a configuration of VLENB=512, SEW=8, LMUL=8, the VL >> is 512 * 8 / 8 = 512, meaning that a single vector instruction turns >> into 512 scalar instructions and each scalar instruction would be >> expanded to multiple IRs. To make things worse, the tool >> instrumentation will insert more IRs between adjacent scalar IR ops. >> As a result, the performance is likely to be slowed down thousand >> times during running a real-world application with lots of vector >> instructions. Therefore, the other two methods are more promising and >> we will discuss them below. >> >> 2 and 3 are not mutually exclusive as we may choose a suitable method >> from them to implement a vector instruction regarding its concrete >> behavior. To explain these methods in detail, we present some >> instances to illustrate their pros and cons. >> >> In terms of method 2, we have real values of VLENB/LMUL/SEW. The >> simple case is VLENB <= 256 and LMUL=1, where many SIMD IR ops are >> available and can be directly applied to represent vector operations. >> However, even when VLENB is restricted to 128, it still exceeds the >> maximum SIMD width of 256 supported by VEX IR if LMUL>2. Hence, here >> are two variants of method 2 to deal with long vectors: >> >> >> *2.1*Add more SIMD IR ops such as 1024/2048/4096, and translate >> vector instructions in the granularity of VLENB. Accordingly, >> VLENB=4096 with LMUL=2 is fulfilled by two 4096 SIMD VEX IR ops. >> >> * *pros*: it encourages VEX backend to generate more compact and >> efficient SIMD code (maybe). Particularly,it accommodatesmask and >> gather/scatter (indexed) instructions by delivering more >> information in IR itself. >> * *cons*: too many new IR ops need to be introduced in VEX as each >> op of different length should implement its add/sub/mul variants. >> New data types to denote long vectors are necessary too, causing >> difficulties in both VEX backend register allocation and tool >> instrumentation. >> >> *2.2*Break down long vectors to multiple repeated SIMD ops. For >> instance, a vadd.vv vector instruction with VLENB=256/LMUL=2/SEW=8 is >> composed of four operators of Iop_Add8x16 type. >> >> * *pros:*less efforts are required in register allocation and tool >> instrumentation. The VEX frontend is able to notify the backend >> to generate efficient vector instructions by existing Iops. It >> better trades off the complexity of adding many long vector IR >> ops and the benefit of generating high-efficiency host code. >> * *cons:*it is hard to describe a mask operation given that the >> mask is pretty flexible (the least significant bit of each >> segment of v0). Additionally, gather/scatter instructions may >> have similar problems in appropriately dividing index registers. >> There are various corner cases left here such as widening >> arithmetic operations (widening SIMD IR ops are currently not >> compatible) and vstart CSR register. When using fixed-length IR >> ops to comprise a vector instruction, we will inevitably tell >> each IR op which position encoded in vstart you can start to >> process the data. We can use vstart as a normal guest state >> virtual register to calculate each op's start position as a guard >> IRExpr or obtain the value of vstart like what we do in LMUL/SEW. >> Nevertheless, it is non-trivial to decompose a vector instruction >> concisely. >> >> In short, both 2.1 and 2.2 confront a dilemma in reducing engineering >> efforts of refactoring Valgrind elegantly as well as implementing the >> vector instruction set efficiently. Same obstacles exist in ARM SVE >> as they are scalable vector instructions and flexible in many ways. >> >> The final solution is the dirty helper. It is undoubtedly practical >> and requires possibly the least engineering efforts in dealing with >> so many details in Valgrind. In this design, each instruction is >> completed using an inline assembly running the same instruction on >> the host. Moreover, tool instrumentation already handles IRDirty >> except that new fields should be added in _IRDirty struct to indicate >> strided/indexed/masked memory accesses and arithmetic operations. >> >> * *pros:*it supports all instructions without bothering to build >> complicated IR expressions and statements. It executes vector >> instructions using host CPU to get acceleration to some extent. >> Besides, we do not need to add VEX backend to translate new IRs >> to vector instructions. >> * *cons:*the dirty helper always keeps its operations in a black >> box such that tools can never see what happens in a dirty helper. >> Like memcheck, the bit precision merit is missing once it meets a >> dirty helper as the V-bit propagation chain adopts a pretty >> coarse determination strategy. On the other hand, it is also not >> an elegant way to implement the entire ISA extension in dirty >> helpers. >> >> In summary, it is far to reach a truly applicable solution in adding >> vector extensions in Valgrind. We need to do detailed and >> comprehensive estimations on different vector instruction categories. >> >> Any feedback is welcome in github [3] also. >> >> >> [1] https://github.com/riscv/riscv-v-spec >> >> [2] >> https://community.arm.com/arm-research/b/articles/posts/the-arm-scalable-vector-extension-sve >> >> [3] https://github.com/petrpavlu/valgrind-riscv64/issues/17 >> >> >> Thanks. >> >> Jojo >> >> >> >> _______________________________________________ >> Valgrind-developers mailing list >> Val...@li... >> https://lists.sourceforge.net/lists/listinfo/valgrind-developers > > > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Jojo R <rj...@gm...> - 2023-08-04 05:45:19
|
在 2023/7/26 03:55, Petr Pavlu 写道: > On 17. Jul 23 15:05, Jojo R wrote: >> Hi, >> >> Sorry for the late reply, >> >> i have been pushing the progress of valgrind RVV implementation 😄 >> We finished the first version and tested with full RVV intrinsics spec. >> >> For real project and developers, we implement the first useable/ full >> functionality's RVV valgrind with dirtycall method, >> and we will make experiment or optimize RVV implementation on ideal RVV >> design. >> >> Back to the RVV RFC, we are happy to share our thinking of design, see >> attachment for more details :) > This is a good summary. > > As mentioned in another part of the thread, I think that in long run it > will be indeed needed to implement the approach described as "RVV to > variable-length IR". I hope to help with making sure it can work for Arm > SVE too. > > I guess if initial experiments show that this option is hard and will > take time to implement then it could make sense in short term for the > RISC-V port to go with the "RVV to dirty helper" implementation. > > Thanks, > Petr Ok, experiments are helpfull, also we will open souce our RVV implementation soon :) Regards -- Jojo > > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Feiyang C. <chr...@gm...> - 2023-08-04 03:23:19
|
Hi, I want to inquire about the status of my patch. As of now, I haven't received any feedback or response regarding its review or potential inclusion in the upcoming release. I understand that everyone in the team is busy with their respective tasks, but I am eager to know if there are any plans to consider my contribution for integration. Thanks, Feiyang |
|
From: Ashley, W. <wa...@am...> - 2023-08-01 18:50:43
|
Sorry, I missed one file
diff --git a/coregrind/m_initimg/initimg-linux.c b/coregrind/m_initimg/initimg-linux.c
index 7a7d45335..7680baa8e 100644
--- a/coregrind/m_initimg/initimg-linux.c
+++ b/coregrind/m_initimg/initimg-linux.c
@@ -734,7 +734,8 @@ Addr setup_client_stack( void* init_sp,
| VKI_HWCAP_SHA2 \
| VKI_HWCAP_CRC32 \
| VKI_HWCAP_FP \
- | VKI_HWCAP_ASIMD)
+ | VKI_HWCAP_ASIMD \
+ | VKI_HWCAP_ASIMDDP)
auxv->u.a_val &= ARM64_SUPPORTED_HWCAP;
}
# endif
|
|
From: Ashley, W. <wa...@am...> - 2023-07-28 17:04:29
|
https://bugs.kde.org/show_bug.cgi?id=460616 This adds support for the FEAT_DotProd instructions that are optional in arm v8.2 and 8.3 -a profiles, then mandatory in v8.4+ (at least that's my understanding from ARM's docs). I hope this is the correct mechanism to submit a change. I don't have a sourceware account so I can't push it to a user branch there and reference that. commit fd75f20fa461b326c4f5734b8dd001ad0661e58f Author: William Ashley <wa...@am...> Date: Thu Jul 27 14:49:17 2023 +0000 Bug 460616 - Add support for aarch64 dotprod instructions This change adds support for the FEAT_DotProd instructions SDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>] SDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb> UDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>] UDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb> diff --git a/.gitignore b/.gitignore index 6538eb718..c39f03b9a 100644 --- a/.gitignore +++ b/.gitignore @@ -1710,6 +1710,7 @@ /none/tests/arm64/fp_and_simd_v82 /none/tests/arm64/integer /none/tests/arm64/memory +/none/tests/arm64/simd_dotprod /none/tests/arm64/simd_v81 # /none/tests/darwin/ diff --git a/VEX/priv/guest_arm64_toIR.c b/VEX/priv/guest_arm64_toIR.c index 16a7e075f..9fe164483 100644 --- a/VEX/priv/guest_arm64_toIR.c +++ b/VEX/priv/guest_arm64_toIR.c @@ -9113,6 +9113,21 @@ IRTemp math_RHADD ( UInt size, Bool isU, IRTemp aa, IRTemp bb ) } +/* Generate IR to do {U,S}ADDLP */ +static +IRTemp math_ADDLP ( UInt sizeNarrow, Bool isU, IRTemp src ) +{ + IRTemp res = newTempV128(); + assign(res, + binop(mkVecADD(sizeNarrow+1), + mkexpr(math_WIDEN_EVEN_OR_ODD_LANES( + isU, True/*fromOdd*/, sizeNarrow, mkexpr(src))), + mkexpr(math_WIDEN_EVEN_OR_ODD_LANES( + isU, False/*!fromOdd*/, sizeNarrow, mkexpr(src))))); + return res; +} + + /* QCFLAG tracks the SIMD sticky saturation status. Update the status thusly: if, after application of |opZHI| to both |qres| and |nres|, they have the same value, leave QCFLAG unchanged. Otherwise, set it @@ -13406,12 +13421,7 @@ Bool dis_AdvSIMD_two_reg_misc(/*MB_OUT*/DisResult* dres, UInt insn) IRTemp sum = newTempV128(); IRTemp res = newTempV128(); assign(src, getQReg128(nn)); - assign(sum, - binop(mkVecADD(size+1), - mkexpr(math_WIDEN_EVEN_OR_ODD_LANES( - isU, True/*fromOdd*/, size, mkexpr(src))), - mkexpr(math_WIDEN_EVEN_OR_ODD_LANES( - isU, False/*!fromOdd*/, size, mkexpr(src))))); + sum = math_ADDLP(size, isU, src); assign(res, isACC ? binop(mkVecADD(size+1), mkexpr(sum), getQReg128(dd)) : mkexpr(sum)); putQReg128(dd, math_MAYBE_ZERO_HI64(bitQ, res)); @@ -15692,6 +15702,91 @@ Bool dis_AdvSIMD_fp_to_from_int_conv(/*MB_OUT*/DisResult* dres, UInt insn) } +static +Bool dis_AdvSIMD_dot_product(/*MB_OUT*/DisResult* dres, UInt insn) +{ + /* by element + 31 30 29 28 23 21 20 15 11 10 9 4 + 0 Q U 01111 size L m 1110 H 0 n d + vector + 31 30 29 28 23 21 20 15 11 10 9 4 + 0 Q U 01110 size 0 m 1001 0 1 n d + */ +# define INSN(_bMax,_bMin) SLICE_UInt(insn, (_bMax), (_bMin)) + if (INSN(31,31) != 0) { + return False; + } + UInt bitQ = INSN(30,30); + UInt bitU = INSN(29,29); + UInt opcode1 = INSN(28,24); + UInt size = INSN(23,22); + UInt bitL = INSN(21,21); + UInt mm = INSN(20,16); + UInt opcode2 = INSN(15,12); + UInt bitH = INSN(11,11); + UInt opcode3 = INSN(10,10); + UInt nn = INSN(9,5); + UInt dd = INSN(4,0); + UInt index = (bitH << 1) + bitL; + vassert(index <= 3); + + Bool byElement; + if (opcode1 == BITS5(0,1,1,1,1) + && opcode2 == BITS4(1,1,1,0) + && opcode3 == 0) { + byElement = True; + } else if (opcode1 == BITS5(0,1,1,1,0) + && opcode2 == BITS4(1,0,0,1) + && opcode3 == 1 + && bitL == 0 && bitH == 0) { + byElement = False; + } else { + return False; + } + + // '10' is the only valid size + if (size != X10) return False; + + IRExpr* src1 = math_MAYBE_ZERO_HI64_fromE(bitQ, getQReg128(nn)); + IRExpr* src2 = getQReg128(mm); + if (byElement) { + src2 = mkexpr(math_DUP_VEC_ELEM(src2, X10, index)); + } + + IROp mulOp = bitU ? Iop_Mull8Ux8 : Iop_Mull8Sx8; + IRTemp loProductSums = math_ADDLP( + X01, bitU, math_BINARY_WIDENING_V128(False, mulOp, src1, src2)); + IRTemp hiProductSums = math_ADDLP( + X01, bitU, math_BINARY_WIDENING_V128(True, mulOp, src1, src2)); + + IRTemp res = newTempV128(); + assign(res, binop(Iop_Add32x4, + mk_CatEvenLanes32x4(hiProductSums, loProductSums), + mk_CatOddLanes32x4(hiProductSums, loProductSums))); + + // These instructions accumulate into the destination, but in non-q + // form the upper 64 bits get forced to 0 + IRExpr* accVal = math_MAYBE_ZERO_HI64_fromE(bitQ, getQReg128(dd)); + putQReg128(dd, binop(mkVecADD(size), mkexpr(res), accVal)); + + const HChar* nm = bitU ? "udot" : "sdot"; + const HChar* destWidth = nameArr_Q_SZ(bitQ, size); + const HChar* srcWidth = nameArr_Q_SZ(bitQ, X00); + if (byElement) { + DIP("%s v%u.%s, v%u.%s, v%u.4b[%u]\n", nm, + dd, destWidth, + nn, srcWidth, mm, index); + } else { + DIP("%s v%u.%s, v%u.%s, v%u.%s\n", nm, + dd, destWidth, + nn, srcWidth, mm, srcWidth); + } + + return True; +# undef INSN +} + + static Bool dis_ARM64_simd_and_fp(/*MB_OUT*/DisResult* dres, UInt insn, const VexArchInfo* archinfo, Bool sigill_diag) @@ -15767,6 +15862,8 @@ Bool dis_ARM64_simd_and_fp(/*MB_OUT*/DisResult* dres, UInt insn, if (UNLIKELY(ok)) return True; ok = dis_AdvSIMD_fp_to_from_int_conv(dres, insn); if (UNLIKELY(ok)) return True; + ok = dis_AdvSIMD_dot_product(dres, insn); + if (UNLIKELY(ok)) return True; return False; } diff --git a/configure.ac b/configure.ac index b4e9c1142..7a2da9d7c 100755 --- a/configure.ac +++ b/configure.ac @@ -3730,6 +3730,31 @@ CFLAGS="$save_CFLAGS" AM_CONDITIONAL(BUILD_ARMV82_TESTS, test x$ac_have_armv82_feature = xyes) +# Does the C compiler support the armv82-a+dotprod flag and assembler dotprod instructions +# Note, this doesn't generate a C-level symbol. It generates a +# automake-level symbol (BUILD_ARMV82_DOTPROD_TESTS), used in test Makefile.am's +AC_MSG_CHECKING([if gcc supports the armv82-a+dotprod feature flag and assembler supports dotprod instructions]) + +save_CFLAGS="$CFLAGS" +CFLAGS="$CFLAGS -march=armv8.2-a+dotprod -Werror" +AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ +int main() +{ + __asm__ __volatile__("sdot v1.4s, v2.16b, v3.16b"); + return 0; +} +]])], [ +ac_have_armv82_dotprod_feature=yes +AC_MSG_RESULT([yes]) +], [ +ac_have_armv82_dotprod_feature=no +AC_MSG_RESULT([no]) +]) +CFLAGS="$save_CFLAGS" + +AM_CONDITIONAL(BUILD_ARMV82_DOTPROD_TESTS, test x$ac_have_armv82_dotprod_feature = xyes) + + # XXX JRS 2010 Oct 13: what is this for? For sure, we don't need this # when building the tool executables. I think we should get rid of it. # diff --git a/none/tests/arm64/Makefile.am b/none/tests/arm64/Makefile.am index 4a06f0996..cc0ed1481 100644 --- a/none/tests/arm64/Makefile.am +++ b/none/tests/arm64/Makefile.am @@ -11,6 +11,7 @@ EXTRA_DIST = \ memory.stdout.exp memory.stderr.exp memory.vgtest \ atomics_v81.stdout.exp atomics_v81.stderr.exp atomics_v81.vgtest \ simd_v81.stdout.exp simd_v81.stderr.exp simd_v81.vgtest \ + simd_dotprod.stdout.exp simd_dotprod.stderr.exp simd_dotprod.vgtest \ fmadd_sub.stdout.exp fmadd_sub.stderr.exp fmadd_sub.vgtest \ fp_and_simd_v82.stdout.exp fp_and_simd_v82.stderr.exp \ fp_and_simd_v82.vgtest \ @@ -40,6 +41,10 @@ if BUILD_ARMV82_TESTS check_PROGRAMS += fp_and_simd_v82 endif +if BUILD_ARMV82_DOTPROD_TESTS + check_PROGRAMS += simd_dotprod +endif + AM_CFLAGS += @FLAG_M64@ AM_CXXFLAGS += @FLAG_M64@ AM_CCASFLAGS += @FLAG_M64@ @@ -49,6 +54,7 @@ allexec_CFLAGS = $(AM_CFLAGS) @FLAG_W_NO_NONNULL@ crc32_CFLAGS = $(AM_CFLAGS) -march=armv8-a+crc atomics_v81_CFLAGS = $(AM_CFLAGS) -march=armv8.1-a simd_v81_CFLAGS = $(AM_CFLAGS) -march=armv8.1-a+crypto +simd_dotprod_CFLAGS = $(AM_CFLAGS) -march=armv8.2-a+dotprod fp_and_simd_CFLAGS = $(AM_CFLAGS) -march=armv8-a+crypto fp_and_simd_v82_CFLAGS = $(AM_CFLAGS) -march=armv8.2-a+fp16+crypto integer_CFLAGS = $(AM_CFLAGS) -g -O0 -DTEST_BFM=0 diff --git a/none/tests/arm64/simd_dotprod.c b/none/tests/arm64/simd_dotprod.c new file mode 100644 index 000000000..ca67da551 --- /dev/null +++ b/none/tests/arm64/simd_dotprod.c @@ -0,0 +1,110 @@ +#include <stdio.h> +#include <assert.h> + +typedef unsigned char UChar; +typedef unsigned int UInt; +typedef signed int Int; + +#define ITERS 1 + +union _V128 { + UChar u8[16]; +}; +typedef union _V128 V128; + +static inline UChar randUChar ( void ) +{ + static UInt seed = 80021; + seed = 1103515245 * seed + 12345; + return (seed >> 17) & 0xFF; +} + +/* Generates a random V128. */ +static void randV128 ( /*OUT*/V128* v) +{ + static UInt nCalls = 0; + Int i; + nCalls++; + for (i = 0; i < 16; i++) { + v->u8[i] = randUChar(); + } + if (0 == (nCalls & 0xFF)) + printf("randV128: %u calls\n", nCalls); +} + +static void showV128 ( V128* v ) +{ + Int i; + for (i = 15; i >= 0; i--) + printf("%02x", (Int)v->u8[i]); +} + +#define GEN_BINARY_TEST_BODY(INSN,SUFFIXD,SUFFIXN,SUFFIXM) \ + Int i; \ + for (i = 0; i < ITERS; i++) { \ + V128 block[3]; \ + randV128(&block[0]); \ + randV128(&block[1]); \ + randV128(&block[2]); \ + __asm__ __volatile__( \ + "ldr q7, [%0, #0];" \ + "ldr q8, [%0, #16];" \ + "ldr q9, [%0, #32];" \ + #INSN " v9." #SUFFIXD ", v7." #SUFFIXN ", v8." SUFFIXM " ; " \ + "str q9, [%0, #32];" \ + : : "r"(&block[0]) : "memory", "v7", "v8", "v9" \ + ); \ + printf(#INSN " v9." #SUFFIXD \ + ", v7." #SUFFIXN ", v8." SUFFIXM " "); \ + showV128(&block[0]); printf(" "); \ + showV128(&block[1]); printf(" "); \ + showV128(&block[2]); printf("\n"); \ + } \ + +#define GEN_BINARY_TEST_BY_ELEM(INSN,SUFFIXD,SUFFIXN,MELEM) \ + __attribute__((noinline)) \ + static void test_##INSN##_##SUFFIXD##_##SUFFIXN##_elem_##MELEM () { \ + GEN_BINARY_TEST_BODY(INSN,SUFFIXD,SUFFIXN,"4b[" #MELEM "]") \ + } + +#define GEN_BINARY_TEST(INSN,SUFFIXD,SUFFIXN,SUFFIXM) \ + __attribute__((noinline)) \ + static void test_##INSN##_##SUFFIXD##_##SUFFIXN##_##SUFFIXM () { \ + GEN_BINARY_TEST_BODY(INSN,SUFFIXD,SUFFIXN,#SUFFIXM) \ + } + +GEN_BINARY_TEST(sdot, 2s, 8b, 8b) +GEN_BINARY_TEST(udot, 2s, 8b, 8b) +GEN_BINARY_TEST(sdot, 4s, 16b, 16b) +GEN_BINARY_TEST(udot, 4s, 16b, 16b) +GEN_BINARY_TEST_BY_ELEM(sdot, 2s, 8b, 0) +GEN_BINARY_TEST_BY_ELEM(udot, 2s, 8b, 1) +GEN_BINARY_TEST_BY_ELEM(sdot, 4s, 16b, 2) +GEN_BINARY_TEST_BY_ELEM(udot, 4s, 16b, 3) + +int main ( void ) +{ + assert(sizeof(V128) == 16); + + // ======================== {S,U}DOT by element ==================== + // sdot 2s,8b,4b[0] + // udot 2s,8b,4b[1] + // sdot 4s,16b,4b[2] + // udot 4s,16b,4b[3] + test_sdot_2s_8b_elem_0(); + test_udot_2s_8b_elem_1(); + test_sdot_4s_16b_elem_2(); + test_udot_4s_16b_elem_3(); + + // ======================== {S,U}DOT vector ======================== + // sdot 2s,8b,8b + // udot 2s,8b,8b + // sdot 4s,16b,16b + // udot 4s,16b,16b + test_sdot_2s_8b_8b(); + test_udot_2s_8b_8b(); + test_sdot_4s_16b_16b(); + test_udot_4s_16b_16b(); + + return 0; +} diff --git a/none/tests/arm64/simd_dotprod.stderr.exp b/none/tests/arm64/simd_dotprod.stderr.exp new file mode 100644 index 000000000..e69de29bb diff --git a/none/tests/arm64/simd_dotprod.stdout.exp b/none/tests/arm64/simd_dotprod.stdout.exp new file mode 100644 index 000000000..88724550d --- /dev/null +++ b/none/tests/arm64/simd_dotprod.stdout.exp @@ -0,0 +1,8 @@ +sdot v9.2s, v7.8b, v8.4b[0] 5175e39d19c9ca1e98f24a4984175700 7d6528c5fa956a0d69c3e9a6af27d13b 000000000000000047b8fac3eeef3914 +udot v9.2s, v7.8b, v8.4b[1] b6d2fb5aa7bc5127fe9915e556a044b2 19a348215c3a67fd399182c2dbcc2d38 0000000000000000842c23cf5066b549 +sdot v9.4s, v7.16b, v8.4b[2] d89998df5035ed364a4bc43968bc40e5 cb509970b8136c85d740b80eb7839b97 f9dd31bff8c05f5456afd620b0ca1b30 +udot v9.4s, v7.16b, v8.4b[3] 5ff85bc9535c191fd3a727d1a705f65d d8bc5c6dee699597398e0039cf03663d 20a33823cbca1faf542f38453df87d2b +sdot v9.2s, v7.8b, v8.8b d182c916cebc2e17cfaff39be272ef40 6897b536bbe4da8a369dab4f9465b86e 0000000000000000f4e068450523c8a1 +udot v9.2s, v7.8b, v8.8b 95264321bf3b68b255c2b9e2c95c9810 81f2a547be8d181184ededbc53239dcf 00000000000000008d6b78e8f7e97e90 +sdot v9.4s, v7.16b, v8.16b f0350ca70523e0e45ba1ec54e87d39b3 0a3e0f7c75cb0842b95ed64d3b13ff64 e98e9eeaa89323fc54cac842e13de403 +udot v9.4s, v7.16b, v8.16b 0a5f45c55f1c9202b76ddefcb0ebfe6e c84ab713406845904d325b2d5a70a792 5f49643cced88b926263a4c2727e0a11 diff --git a/none/tests/arm64/simd_dotprod.vgtest b/none/tests/arm64/simd_dotprod.vgtest new file mode 100644 index 000000000..1997e64fa --- /dev/null +++ b/none/tests/arm64/simd_dotprod.vgtest @@ -0,0 +1,3 @@ +prog: simd_dotprod +prereq: test -x simd_dotprod && ../../../tests/arm64_features asimddp +vgopts: -q |
|
From: Petr P. <pet...@da...> - 2023-07-25 19:55:29
|
On 17. Jul 23 15:05, Jojo R wrote: > Hi, > > Sorry for the late reply, > > i have been pushing the progress of valgrind RVV implementation 😄 > We finished the first version and tested with full RVV intrinsics spec. > > For real project and developers, we implement the first useable/ full > functionality's RVV valgrind with dirtycall method, > and we will make experiment or optimize RVV implementation on ideal RVV > design. > > Back to the RVV RFC, we are happy to share our thinking of design, see > attachment for more details :) This is a good summary. As mentioned in another part of the thread, I think that in long run it will be indeed needed to implement the approach described as "RVV to variable-length IR". I hope to help with making sure it can work for Arm SVE too. I guess if initial experiments show that this option is hard and will take time to implement then it could make sense in short term for the RISC-V port to go with the "RVV to dirty helper" implementation. Thanks, Petr |
|
From: Paul F. <pa...@so...> - 2023-07-24 20:10:41
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=6ce0979884a8f246c80a098333ceef1a7b7f694d commit 6ce0979884a8f246c80a098333ceef1a7b7f694d Author: Paul Floyd <pj...@wa...> Date: Mon Jul 24 22:06:00 2023 +0200 Bug 472219 - Syscall param ppoll(ufds.events) points to uninitialised byte(s) Add checks that (p)poll fd is not negative. If it is negative, don't check the events field. Diff: --- .gitignore | 1 + NEWS | 1 + coregrind/m_syswrap/syswrap-freebsd.c | 12 ++++++------ coregrind/m_syswrap/syswrap-generic.c | 6 ++++-- coregrind/m_syswrap/syswrap-linux.c | 6 ++++-- coregrind/m_syswrap/syswrap-solaris.c | 5 +++-- memcheck/tests/Makefile.am | 4 ++++ memcheck/tests/bug472219.c | 16 ++++++++++++++++ memcheck/tests/bug472219.stderr.exp | 0 memcheck/tests/bug472219.vgtest | 2 ++ memcheck/tests/freebsd/scalar.c | 12 +++++++++--- memcheck/tests/freebsd/scalar.stderr.exp | 23 +++++++++++++++++++---- memcheck/tests/freebsd/scalar.stderr.exp-x86 | 23 +++++++++++++++++++---- memcheck/tests/solaris/scalar.stderr.exp | 4 ---- memcheck/tests/x86-linux/scalar.stderr.exp | 5 ----- 15 files changed, 88 insertions(+), 32 deletions(-) diff --git a/.gitignore b/.gitignore index 9e16ac126d..6538eb718b 100644 --- a/.gitignore +++ b/.gitignore @@ -845,6 +845,7 @@ /memcheck/tests/bug287260 /memcheck/tests/bug340392 /memcheck/tests/bug464969_d_demangle +/memcheck/tests/bug472219 /memcheck/tests/calloc-overflow /memcheck/tests/cdebug_zlib /memcheck/tests/cdebug_zlib_gnu diff --git a/NEWS b/NEWS index 783612fbb9..867d2f0f43 100644 --- a/NEWS +++ b/NEWS @@ -45,6 +45,7 @@ are not entered into bugzilla tend to get forgotten about or ignored. Assertion 'resolved' failed 470830 Don't print actions vgdb me ... continue for vgdb --multi mode 470978 s390x: Valgrind cannot start qemu-kvm when "sysctl vm.allocate_pgste=0" +472219 Syscall param ppoll(ufds.events) points to uninitialised byte(s) To see details of a given bug, visit https://bugs.kde.org/show_bug.cgi?id=XXXXXX diff --git a/coregrind/m_syswrap/syswrap-freebsd.c b/coregrind/m_syswrap/syswrap-freebsd.c index 6b9f3d2109..9af37cfb83 100644 --- a/coregrind/m_syswrap/syswrap-freebsd.c +++ b/coregrind/m_syswrap/syswrap-freebsd.c @@ -6124,15 +6124,15 @@ PRE(sys_ppoll) struct vki_pollfd *, fds, unsigned int, nfds, struct vki_timespec *, timeout, vki_sigset_t *, newsigmask); - if (ML_(safe_to_deref)(fds, ARG2*sizeof(struct vki_pollfd))) { - for (i = 0; i < ARG2; i++) { - PRE_MEM_READ( "ppoll(fds.fd)", - (Addr)(&fds[i].fd), sizeof(fds[i].fd) ); + for (i = 0; i < ARG2; i++) { + PRE_MEM_READ( "ppoll(fds.fd)", + (Addr)(&fds[i].fd), sizeof(fds[i].fd) ); + if (ML_(safe_to_deref)(&fds[i].fd, sizeof(fds[i].fd)) && fds[i].fd >= 0) { PRE_MEM_READ( "ppoll(fds.events)", (Addr)(&fds[i].events), sizeof(fds[i].events) ); - PRE_MEM_WRITE( "ppoll(fds.revents)", - (Addr)(&fds[i].revents), sizeof(fds[i].revents) ); } + PRE_MEM_WRITE( "ppoll(fds.revents)", + (Addr)(&fds[i].revents), sizeof(fds[i].revents) ); } if (ARG3) { diff --git a/coregrind/m_syswrap/syswrap-generic.c b/coregrind/m_syswrap/syswrap-generic.c index efdae60e10..ed9d14685f 100644 --- a/coregrind/m_syswrap/syswrap-generic.c +++ b/coregrind/m_syswrap/syswrap-generic.c @@ -4339,8 +4339,10 @@ PRE(sys_poll) for (i = 0; i < ARG2; i++) { PRE_MEM_READ( "poll(ufds.fd)", (Addr)(&ufds[i].fd), sizeof(ufds[i].fd) ); - PRE_MEM_READ( "poll(ufds.events)", - (Addr)(&ufds[i].events), sizeof(ufds[i].events) ); + if (ML_(safe_to_deref)(&ufds[i].fd, sizeof(ufds[i].fd)) && ufds[i].fd >= 0) { + PRE_MEM_READ( "poll(ufds.events)", + (Addr)(&ufds[i].events), sizeof(ufds[i].events) ); + } PRE_MEM_WRITE( "poll(ufds.revents)", (Addr)(&ufds[i].revents), sizeof(ufds[i].revents) ); } diff --git a/coregrind/m_syswrap/syswrap-linux.c b/coregrind/m_syswrap/syswrap-linux.c index f8621f8f0d..20c68c877c 100644 --- a/coregrind/m_syswrap/syswrap-linux.c +++ b/coregrind/m_syswrap/syswrap-linux.c @@ -1984,8 +1984,10 @@ static void ppoll_pre_helper ( ThreadId tid, SyscallArgLayout* layout, for (i = 0; i < ARG2; i++) { PRE_MEM_READ( "ppoll(ufds.fd)", (Addr)(&ufds[i].fd), sizeof(ufds[i].fd) ); - PRE_MEM_READ( "ppoll(ufds.events)", - (Addr)(&ufds[i].events), sizeof(ufds[i].events) ); + if (ufds[i].fd >= 0) { + PRE_MEM_READ( "ppoll(ufds.events)", + (Addr)(&ufds[i].events), sizeof(ufds[i].events) ); + } PRE_MEM_WRITE( "ppoll(ufds.revents)", (Addr)(&ufds[i].revents), sizeof(ufds[i].revents) ); } diff --git a/coregrind/m_syswrap/syswrap-solaris.c b/coregrind/m_syswrap/syswrap-solaris.c index 8a2a140f95..ed3cb4a551 100644 --- a/coregrind/m_syswrap/syswrap-solaris.c +++ b/coregrind/m_syswrap/syswrap-solaris.c @@ -7831,8 +7831,9 @@ PRE(sys_pollsys) for (i = 0; i < ARG2; i++) { vki_pollfd_t *u = &ufds[i]; PRE_FIELD_READ("poll(ufds.fd)", u->fd); - /* XXX Check if it's valid? */ - PRE_FIELD_READ("poll(ufds.events)", u->events); + if (ML_(safe_to_deref)(&ufds[i].fd, sizeof(ufds[i].fd)) && ufds[i].fd >= 0) { + PRE_FIELD_READ("poll(ufds.events)", u->events); + } PRE_FIELD_WRITE("poll(ufds.revents)", u->revents); } diff --git a/memcheck/tests/Makefile.am b/memcheck/tests/Makefile.am index 5a17fd35d4..307f47bd8e 100644 --- a/memcheck/tests/Makefile.am +++ b/memcheck/tests/Makefile.am @@ -118,6 +118,7 @@ EXTRA_DIST = \ bug340392.stderr.exp bug340392.vgtest \ bug464969_d_demangle.stderr.exp bug464969_d_demangle.vgtest \ bug464969_d_demangle.stdout.exp \ + bug472219.stderr.exp bug472219.vgtest \ calloc-overflow.stderr.exp calloc-overflow.vgtest\ cdebug_zlib.stderr.exp cdebug_zlib.vgtest \ cdebug_zlib_gnu.stderr.exp cdebug_zlib_gnu.vgtest \ @@ -415,6 +416,7 @@ check_PROGRAMS = \ bug287260 \ bug340392 \ bug464969_d_demangle \ + bug472219 \ calloc-overflow \ client-msg \ clientperm \ @@ -566,6 +568,7 @@ leak_cpp_interior_SOURCES = leak_cpp_interior.cpp accounting_CFLAGS = $(AM_CFLAGS) @FLAG_W_NO_ALLOC_SIZE_LARGER_THAN@ badfree_CFLAGS = $(AM_CFLAGS) @FLAG_W_NO_FREE_NONHEAP_OBJECT@ bug155125_CFLAGS = $(AM_CFLAGS) -Wno-unused-result @FLAG_W_NO_ALLOC_SIZE_LARGER_THAN@ +bug472219_CFLAGS = $(AM_CFLAGS) @FLAG_W_NO_UNINITIALIZED@ mallinfo_CFLAGS = $(AM_CFLAGS) -Wno-deprecated-declarations malloc3_CFLAGS = $(AM_CFLAGS) @FLAG_W_NO_ALLOC_SIZE_LARGER_THAN@ sbfragment_CFLAGS = $(AM_CFLAGS) -Wno-deprecated-declarations @@ -663,6 +666,7 @@ reach_thread_register_LDADD = -lpthread realloc_size_zero_CFLAGS = $(AM_CFLAGS) @FLAG_W_NO_INCOMPATIBLE_POINTER_TYPES_DISCARDS_QUALIFIERS@ realloc_size_zero_mismatch_SOURCES = realloc_size_zero_mismatch.cpp +realloc_size_zero_mismatch_CXXFLAGS = $(AM_CXXFLAGS) @FLAG_W_NO_MISMATCHED_NEW_DELETE@ resvn_stack_CFLAGS = $(AM_CFLAGS) @FLAG_W_NO_UNINITIALIZED@ diff --git a/memcheck/tests/bug472219.c b/memcheck/tests/bug472219.c new file mode 100644 index 0000000000..88567caa2c --- /dev/null +++ b/memcheck/tests/bug472219.c @@ -0,0 +1,16 @@ +#include <poll.h> +#include <stdlib.h> +#include "../../config.h" + +int main() +{ + int uninit; + struct pollfd fds[] = {{-1, uninit, 0}, {2, POLLIN, 0}}; + + poll(fds, 2, 100); + +#if defined(HAVE_PPOLL) + struct timespec timeout = {0, 1e8}; + ppoll(fds, 2, &timeout, NULL); +#endif +} diff --git a/memcheck/tests/bug472219.stderr.exp b/memcheck/tests/bug472219.stderr.exp new file mode 100644 index 0000000000..e69de29bb2 diff --git a/memcheck/tests/bug472219.vgtest b/memcheck/tests/bug472219.vgtest new file mode 100644 index 0000000000..8cd48c785d --- /dev/null +++ b/memcheck/tests/bug472219.vgtest @@ -0,0 +1,2 @@ +prog: bug472219 +vgopts: -q diff --git a/memcheck/tests/freebsd/scalar.c b/memcheck/tests/freebsd/scalar.c index c6a7ff2d5c..6c8d81aa6e 100644 --- a/memcheck/tests/freebsd/scalar.c +++ b/memcheck/tests/freebsd/scalar.c @@ -781,9 +781,15 @@ int main(void) /* netbsd newreboot 208 */ /* SYS_poll 209 */ - GO(SYS_poll, "3s 3m"); + GO(SYS_poll, "2s 2m"); SY(SYS_poll, x0, x0+1, x0); FAIL; + { + struct pollfd fds = { x0, x0, x0 }; + GO(SYS_poll, "0s 2m"); + SY(SYS_poll, &fds, 1, 1); SUCC; + } + /* SYS_freebsd7___semctl 220 */ GO(SYS_freebsd7___semctl, "(IPC_INFO) 4s 1m"); SY(SYS_freebsd7___semctl, x0, x0, x0+IPC_INFO, x0+1); FAIL; @@ -1948,8 +1954,8 @@ int main(void) { struct pollfd arg1; arg1.fd = arg1.events = arg1.revents = x0; - GO(SYS_ppoll, "2s 2+2m"); - SY(SYS_ppoll, &arg1, 1, x0+1, x0+1); FAIL; + GO(SYS_ppoll, "2s 2+2m"); + SY(SYS_ppoll, &arg1, 1, x0+1, x0+1); FAIL; } /* SYS_futimens 546 */ diff --git a/memcheck/tests/freebsd/scalar.stderr.exp b/memcheck/tests/freebsd/scalar.stderr.exp index 2595bd38c5..5a4f3230f1 100644 --- a/memcheck/tests/freebsd/scalar.stderr.exp +++ b/memcheck/tests/freebsd/scalar.stderr.exp @@ -1529,7 +1529,7 @@ Syscall param getpgid(pid) contains uninitialised byte(s) ... --------------------------------------------------------- -209: SYS_poll 3s 3m +209: SYS_poll 2s 2m --------------------------------------------------------- Syscall param poll(ufds) contains uninitialised byte(s) ... @@ -1544,13 +1544,20 @@ Syscall param poll(ufds.fd) points to unaddressable byte(s) ... Address 0x........ is not stack'd, malloc'd or (recently) free'd -Syscall param poll(ufds.events) points to unaddressable byte(s) +Syscall param poll(ufds.revents) points to unaddressable byte(s) ... Address 0x........ is not stack'd, malloc'd or (recently) free'd -Syscall param poll(ufds.revents) points to unaddressable byte(s) +--------------------------------------------------------- +209: SYS_poll 0s 2m +--------------------------------------------------------- +Syscall param poll(ufds.fd) points to uninitialised byte(s) ... - Address 0x........ is not stack'd, malloc'd or (recently) free'd + Address 0x........ is on thread 1's stack + +Syscall param poll(ufds.events) points to uninitialised byte(s) + ... + Address 0x........ is on thread 1's stack --------------------------------------------------------- 220: SYS_freebsd7___semctl (IPC_INFO) 4s 1m @@ -4968,6 +4975,14 @@ Syscall param ppoll(timeout) contains uninitialised byte(s) Syscall param ppoll(newsigmask) contains uninitialised byte(s) ... +Syscall param ppoll(fds.fd) points to unaddressable byte(s) + ... + Address 0x........ is not stack'd, malloc'd or (recently) free'd + +Syscall param ppoll(fds.revents) points to unaddressable byte(s) + ... + Address 0x........ is not stack'd, malloc'd or (recently) free'd + Syscall param ppoll(timeout) points to unaddressable byte(s) ... Address 0x........ is not stack'd, malloc'd or (recently) free'd diff --git a/memcheck/tests/freebsd/scalar.stderr.exp-x86 b/memcheck/tests/freebsd/scalar.stderr.exp-x86 index e995fc28d6..a45d0601c3 100644 --- a/memcheck/tests/freebsd/scalar.stderr.exp-x86 +++ b/memcheck/tests/freebsd/scalar.stderr.exp-x86 @@ -1529,7 +1529,7 @@ Syscall param getpgid(pid) contains uninitialised byte(s) ... --------------------------------------------------------- -209: SYS_poll 3s 3m +209: SYS_poll 2s 2m --------------------------------------------------------- Syscall param poll(ufds) contains uninitialised byte(s) ... @@ -1544,13 +1544,20 @@ Syscall param poll(ufds.fd) points to unaddressable byte(s) ... Address 0x........ is not stack'd, malloc'd or (recently) free'd -Syscall param poll(ufds.events) points to unaddressable byte(s) +Syscall param poll(ufds.revents) points to unaddressable byte(s) ... Address 0x........ is not stack'd, malloc'd or (recently) free'd -Syscall param poll(ufds.revents) points to unaddressable byte(s) +--------------------------------------------------------- +209: SYS_poll 0s 2m +--------------------------------------------------------- +Syscall param poll(ufds.fd) points to uninitialised byte(s) ... - Address 0x........ is not stack'd, malloc'd or (recently) free'd + Address 0x........ is on thread 1's stack + +Syscall param poll(ufds.events) points to uninitialised byte(s) + ... + Address 0x........ is on thread 1's stack --------------------------------------------------------- 220: SYS_freebsd7___semctl (IPC_INFO) 4s 1m @@ -5023,6 +5030,14 @@ Syscall param ppoll(timeout) contains uninitialised byte(s) Syscall param ppoll(newsigmask) contains uninitialised byte(s) ... +Syscall param ppoll(fds.fd) points to unaddressable byte(s) + ... + Address 0x........ is not stack'd, malloc'd or (recently) free'd + +Syscall param ppoll(fds.revents) points to unaddressable byte(s) + ... + Address 0x........ is not stack'd, malloc'd or (recently) free'd + Syscall param ppoll(timeout) points to unaddressable byte(s) ... Address 0x........ is not stack'd, malloc'd or (recently) free'd diff --git a/memcheck/tests/solaris/scalar.stderr.exp b/memcheck/tests/solaris/scalar.stderr.exp index 1a04979d19..a1b5d97d7a 100644 --- a/memcheck/tests/solaris/scalar.stderr.exp +++ b/memcheck/tests/solaris/scalar.stderr.exp @@ -3244,10 +3244,6 @@ Syscall param poll(ufds.fd) points to unaddressable byte(s) ... Address 0x........ is not stack'd, malloc'd or (recently) free'd -Syscall param poll(ufds.events) points to unaddressable byte(s) - ... - Address 0x........ is not stack'd, malloc'd or (recently) free'd - Syscall param poll(ufds.revents) points to unaddressable byte(s) ... Address 0x........ is not stack'd, malloc'd or (recently) free'd diff --git a/memcheck/tests/x86-linux/scalar.stderr.exp b/memcheck/tests/x86-linux/scalar.stderr.exp index b9202a8c2f..6b8c7677f5 100644 --- a/memcheck/tests/x86-linux/scalar.stderr.exp +++ b/memcheck/tests/x86-linux/scalar.stderr.exp @@ -2122,11 +2122,6 @@ Syscall param poll(ufds.fd) points to unaddressable byte(s) by 0x........: main (scalar.c:761) Address 0x........ is not stack'd, malloc'd or (recently) free'd -Syscall param poll(ufds.events) points to unaddressable byte(s) - ... - by 0x........: main (scalar.c:761) - Address 0x........ is not stack'd, malloc'd or (recently) free'd - Syscall param poll(ufds.revents) points to unaddressable byte(s) ... by 0x........: main (scalar.c:761) |
|
From: Paul F. <pa...@so...> - 2023-07-24 19:33:46
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=b368b44c552d0deb4d0ee77968cb0e8e02a07812 commit b368b44c552d0deb4d0ee77968cb0e8e02a07812 Author: Paul Floyd <pj...@wa...> Date: Mon Jul 24 21:32:45 2023 +0200 Solaris: add a configure test for getaddrinfo Not available on Solaris 11.3 Diff: --- configure.ac | 3 +++ drd/tests/getaddrinfo.vgtest | 2 +- helgrind/tests/Makefile.am | 5 ++++- helgrind/tests/getaddrinfo.vgtest | 1 + 4 files changed, 9 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index 4dbb1753c7..b4e9c11428 100755 --- a/configure.ac +++ b/configure.ac @@ -4849,6 +4849,7 @@ AC_CHECK_FUNCS([ \ copy_file_range \ epoll_create \ epoll_pwait \ + getaddrinfo \ klogctl \ mallinfo \ memchr \ @@ -4916,6 +4917,8 @@ AM_CONDITIONAL([HAVE_SETCONTEXT], [test x$ac_cv_func_setcontext = xyes]) AM_CONDITIONAL([HAVE_SWAPCONTEXT], [test x$ac_cv_func_swapcontext = xyes]) AM_CONDITIONAL([HAVE_MEMFD_CREATE], [test x$ac_cv_func_memfd_create = xyes]) +AM_CONDITIONAL([HAVE_GETADDRINFO], + [test x$ac_cv_func_getaddrinfo = xyes]) if test x$VGCONF_PLATFORM_PRI_CAPS = xMIPS32_LINUX \ -o x$VGCONF_PLATFORM_PRI_CAPS = xMIPS64_LINUX \ diff --git a/drd/tests/getaddrinfo.vgtest b/drd/tests/getaddrinfo.vgtest index 6faa2b6bde..a62baadb92 100644 --- a/drd/tests/getaddrinfo.vgtest +++ b/drd/tests/getaddrinfo.vgtest @@ -1,3 +1,3 @@ -prereq: ./supported_libpthread +prereq: ./supported_libpthread && test -e ../../helgrind/tests/getaddrinfo vgopts: -q prog: ../../helgrind/tests/getaddrinfo diff --git a/helgrind/tests/Makefile.am b/helgrind/tests/Makefile.am index 13e2d4db66..3e2efad0be 100755 --- a/helgrind/tests/Makefile.am +++ b/helgrind/tests/Makefile.am @@ -154,7 +154,6 @@ check_PROGRAMS = \ cond_timedwait_invalid \ cond_timedwait_test \ free_is_write \ - getaddrinfo \ hg01_all_ok \ hg02_deadlock \ hg03_inherit \ @@ -239,6 +238,10 @@ check_PROGRAMS += annotate_rwlock annotate_rwlock_CFLAGS = $(AM_CFLAGS) @FLAG_W_NO_UNUSED_BUT_SET_VARIABLE@ endif +if HAVE_GETADDRINFO +check_PROGRAMS += getaddrinfo +endif + AM_CFLAGS += $(AM_FLAG_M3264_PRI) AM_CXXFLAGS += $(AM_FLAG_M3264_PRI) diff --git a/helgrind/tests/getaddrinfo.vgtest b/helgrind/tests/getaddrinfo.vgtest index b58c618887..9543cbd046 100644 --- a/helgrind/tests/getaddrinfo.vgtest +++ b/helgrind/tests/getaddrinfo.vgtest @@ -1,2 +1,3 @@ +prereq: test -e getaddrinfo prog: getaddrinfo vgopts: -q |
|
From: Paul F. <pa...@so...> - 2023-07-23 17:24:01
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=37dd8263942708d20af9c04b9ac6f601cf3aa020 commit 37dd8263942708d20af9c04b9ac6f601cf3aa020 Author: Paul Floyd <pj...@wa...> Date: Sun Jul 23 19:22:51 2023 +0200 FreeBSD: Add a DRD supppression for getaddrinfo On FreeBSD 13.2 x86 Diff: --- freebsd-drd.supp | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/freebsd-drd.supp b/freebsd-drd.supp index 93ad79f4bd..33f7e8ede2 100644 --- a/freebsd-drd.supp +++ b/freebsd-drd.supp @@ -240,3 +240,12 @@ obj:*/lib*/libc.so.7 fun:vsprintf } +{ + DRD-FREEBSD132-GETADDRINFO + drd:ConflictingAccess + ... + obj:*/lib*/libc.so.7 + fun:nsdispatch + obj:*/lib*/libc.so.7 + fun:getaddrinfo +} |
|
From: Wu, F. <fe...@in...> - 2023-07-19 01:25:15
|
On 7/19/2023 3:08 AM, Petr Pavlu wrote:
> On 11. Jul 23 19:28, Wu, Fei wrote:
>> On 7/11/2023 4:50 AM, Petr Pavlu wrote:
>>> On 6. Jul 23 20:39, Wu, Fei wrote:
>>>> [...]
>>>>
>>>> This approach will introduce a bunch of new vlen Vector IRs, especially
>>>> the arithmetic IRs such as vadd, my goal is for a good solution which
>>>> takes reasonable time to reach usable status, yet still be able to
>>>> evolve and generic enough for other vector ISA. Any comments?
>
> This personally looks to me as a right direction. Supporting scalable
> vector extensions in Valgrind as a first-class citizen would be my
> preferred choice. I think it is something that will be needed to handle
> Arm SVE and RISC-V RVV well. On the other hand, it is likely the most
> complex approach and could take time to iron out.
>
>>> Could you please share a repository with your changes or send them to me
>>> as patches? I have a few questions but I think it might be easier for me
>>> first to see the actual code.
>>>
>> Please see attachment. It's a very raw version to just verify the idea,
>> mask is not added but expected to be done as mentioned above, it's based
>> on commit 71272b2529 on your branch, patch 0013 is the key.
>
> Thanks for sharing this code. The previous discussions and this series
> introduces a new concept of translating client code per some CPU state.
> That is something I spent most time thinking about.
>
> I can see it is indeed necessary for RVV. In particular, this
> "versioning" of translations allows that Valgrind IR can statically
> express an element type of each vector operation, i.e. that it is an
> operation on I32, F64, ... An alternative would be to try to express the
> type dynamically in IR. That should be still somewhat manageable in the
> toIR frontend but I have a hard time seeing how it would work for the
> instrumentation and codegen.
>
> The versioning should work well for RVV translations because my
> expectation is that most RVV loops will consist of a call to vsetvli
> (with a static vtype), followed by some actual vector operations. Such
> a block then requires only one translation.
>
> This is however true only if translations are versioned just per vtype,
> without vl. If I understood correctly, the patches version them per vl
> too but it isn't clear to me conceptually if this is really necessary.
>
Yes, this series does version vl, it helps the situation such as in the
last patch, it can break the large vl to multiple small vl operations,
in case the backend doesn't have a register allocation algorithm for LMUL>1.
> For instance, I think VAdd8 could look as follows:
> VAdd8(<len>, <in1>, <in2>, <flags?>) where <len> is something as
> IRExpr_Get(OFFB_VL, Ity_I64).
>
> Another problem which I noticed is that blocks containing no RVV
> instructions are also versioned. Consider the following:
> while (true) {
> // (1) some RVV code which can set vtype to different values
> // (2) a large chunk of non-RVV code
> }
>
> The code in (2) will currently have multiple same translations for each
> residue left in vtype by (1).
>
Yes, indeed. This is one place to optimize.
> In general, I think the concept of allowing translations per some CPU
> state could be useful in other cases and for other architectures too.
> For RISC-V, it could be beneficial for floating-point operations. My
> expectation is that regular RISC-V FP code will have instructions with
> encoded rm=DYN and always executed with frm=RNE. The current approach is
> that the toIR frontend generates an IR which reads the rounding mode
> from frm and remaps it to the Valgrind's representation. The codegen
> then does the opposite. The idea here is that the frontend would know
> the actual rounding mode and could create IR which has directly this
> mode, for instance, AddF64(Irrm_NEAREST, <in1>, <in2>). The codegen then
> doesn't need to know how to handle any dynamic rounding modes as they
> become static.
>
> I plan to look further into this series. Specifically, I'd like to have
> a stab at adding some basic support for Arm SVE to get a better
> understanding if this is generic enough.
>
Great, I will add more RVV support if it's proved to be the right
direction, and thank you for the review.
Thanks,
Fei.
> Thanks,
> Petr
|
|
From: Petr P. <pet...@da...> - 2023-07-18 19:26:03
|
On 11. Jul 23 19:28, Wu, Fei wrote:
> On 7/11/2023 4:50 AM, Petr Pavlu wrote:
> > On 6. Jul 23 20:39, Wu, Fei wrote:
> >> [...]
> >>
> >> This approach will introduce a bunch of new vlen Vector IRs, especially
> >> the arithmetic IRs such as vadd, my goal is for a good solution which
> >> takes reasonable time to reach usable status, yet still be able to
> >> evolve and generic enough for other vector ISA. Any comments?
This personally looks to me as a right direction. Supporting scalable
vector extensions in Valgrind as a first-class citizen would be my
preferred choice. I think it is something that will be needed to handle
Arm SVE and RISC-V RVV well. On the other hand, it is likely the most
complex approach and could take time to iron out.
> > Could you please share a repository with your changes or send them to me
> > as patches? I have a few questions but I think it might be easier for me
> > first to see the actual code.
> >
> Please see attachment. It's a very raw version to just verify the idea,
> mask is not added but expected to be done as mentioned above, it's based
> on commit 71272b2529 on your branch, patch 0013 is the key.
Thanks for sharing this code. The previous discussions and this series
introduces a new concept of translating client code per some CPU state.
That is something I spent most time thinking about.
I can see it is indeed necessary for RVV. In particular, this
"versioning" of translations allows that Valgrind IR can statically
express an element type of each vector operation, i.e. that it is an
operation on I32, F64, ... An alternative would be to try to express the
type dynamically in IR. That should be still somewhat manageable in the
toIR frontend but I have a hard time seeing how it would work for the
instrumentation and codegen.
The versioning should work well for RVV translations because my
expectation is that most RVV loops will consist of a call to vsetvli
(with a static vtype), followed by some actual vector operations. Such
a block then requires only one translation.
This is however true only if translations are versioned just per vtype,
without vl. If I understood correctly, the patches version them per vl
too but it isn't clear to me conceptually if this is really necessary.
For instance, I think VAdd8 could look as follows:
VAdd8(<len>, <in1>, <in2>, <flags?>) where <len> is something as
IRExpr_Get(OFFB_VL, Ity_I64).
Another problem which I noticed is that blocks containing no RVV
instructions are also versioned. Consider the following:
while (true) {
// (1) some RVV code which can set vtype to different values
// (2) a large chunk of non-RVV code
}
The code in (2) will currently have multiple same translations for each
residue left in vtype by (1).
In general, I think the concept of allowing translations per some CPU
state could be useful in other cases and for other architectures too.
For RISC-V, it could be beneficial for floating-point operations. My
expectation is that regular RISC-V FP code will have instructions with
encoded rm=DYN and always executed with frm=RNE. The current approach is
that the toIR frontend generates an IR which reads the rounding mode
from frm and remaps it to the Valgrind's representation. The codegen
then does the opposite. The idea here is that the frontend would know
the actual rounding mode and could create IR which has directly this
mode, for instance, AddF64(Irrm_NEAREST, <in1>, <in2>). The codegen then
doesn't need to know how to handle any dynamic rounding modes as they
become static.
I plan to look further into this series. Specifically, I'd like to have
a stab at adding some basic support for Arm SVE to get a better
understanding if this is generic enough.
Thanks,
Petr
|
|
From: Wu, F. <fe...@in...> - 2023-07-18 01:44:56
|
On 7/11/2023 7:28 PM, Wu, Fei wrote: > On 7/11/2023 4:50 AM, Petr Pavlu wrote: >> On 6. Jul 23 20:39, Wu, Fei wrote: >>> On 5/29/2023 11:29 AM, Wu, Fei wrote: >>>> On 5/28/2023 1:06 AM, Petr Pavlu wrote: >>>>> On 21. Apr 23 17:25, Jojo R wrote: >>>>>> We consider to add RVV/Vector [1] feature in valgrind, there are some >>>>>> challenges. >>>>>> RVV like ARM's SVE [2] programming model, it's scalable/VLA, that means the >>>>>> vector length is agnostic. >>>>>> ARM's SVE is not supported in valgrind :( >>>>>> >>>>>> There are three major issues in implementing RVV instruction set in Valgrind >>>>>> as following: >>>>>> >>>>>> 1. Scalable vector register width VLENB >>>>>> 2. Runtime changing property of LMUL and SEW >>>>>> 3. Lack of proper VEX IR to represent all vector operations >>>>>> >>>>>> We propose applicable methods to solve 1 and 2. As for 3, we explore several >>>>>> possible but maybe imperfect approaches to handle different cases. >>>>>> >>> I did a very basic prototype for vlen Vector-IR, particularly on RISC-V >>> Vector (RVV): >>> >>> * Define new iops such as Iop_VAdd8/16/32/64, the difference from >>> existing SIMD version is that no element number is specified like >>> Iop_Add8x32 >>> >>> * Define new IR type Ity_VLen along side existing types such as Ity_I64, >>> Ity_V256 >>> >>> * Define new class HRcVecVLen in HRegClass for vlen vector registers >>> The real length is embedded in both IROp and IRType for vlen ops/types, >>> it's runtime-decided and already known when handling insn such as vadd, >>> this leads to more flexibility, e.g. backend can issue extra vsetvl if >>> necessary. >>> >>> With the above, RVV instruction in the guest can be passed from >>> frontend, to memcheck, to the backend, and generate the final RVV insn >>> during host isel, a very basic testcase has been tested. >>> >>> Now here comes to the complexities: >>> >>> 1. RVV has the concept of LMUL, which groups multiple (or partial) >>> vector registers, e.g. when LMUL==2, v2 means the real v2+v3. This >>> complicates the register allocation. >>> >>> 2. RVV uses the "implicit" v0 for mask, its content must be loaded to >>> the exact "v0" register instead of any other ones if host isel wants to >>> leverage RVV insn, this implicitness in ISA requires more explicitness >>> in Valgrind implementation. >>> >>> For #1 LMUL, a new register allocation algorithm for it can be added, >>> and it will be great if someone is willing to try it, I'm not sure how >>> much effort it will take. The other way is splitting it into multiple >>> ops which only takes one vector register, taking vadd for example, 2 >>> vadd will run with LMUL=1 for one vadd with LMUL=2, this is still okay >>> for the widening insn, most of the arithmetic insns can be covered in >>> this way. The exception could be register gather insn vrgather, which we >>> can consult other ways for it, e.g. scalar or helper. >>> >>> For #2 v0 mask, one way is to handle the mask in the very beginning at >>> guest_riscv64_toIR.c, similar to what AVX port does: >>> >>> a) Read the whole dest register without mask >>> b) Generate unmasked result by running op without mask >>> c) Applying mask to a,b and generate the final dest >>> >>> by doing this, insn with mask is converted to non-mask ones, although >>> more insns are generated but the performance should be acceptable. There >>> are still exceptions, e.g. vadc (Add-with-Carry), v0 is not used as mask >>> but as carry, but just as mentioned above, it's okay to use other ways >>> for a few insns. Eventually, we can pass v0 mask down to the backend if >>> it's proved a better solution. >>> >>> This approach will introduce a bunch of new vlen Vector IRs, especially >>> the arithmetic IRs such as vadd, my goal is for a good solution which >>> takes reasonable time to reach usable status, yet still be able to >>> evolve and generic enough for other vector ISA. Any comments? >> >> Could you please share a repository with your changes or send them to me >> as patches? I have a few questions but I think it might be easier for me >> first to see the actual code. >> > Please see attachment. It's a very raw version to just verify the idea, > mask is not added but expected to be done as mentioned above, it's based > on commit 71272b2529 on your branch, patch 0013 is the key. > Hi Petr, Have you taken a look? Any comments? Thanks, Fei. > btw, I will setup a repository but it takes a few days to pass the > internal process. > > Thanks, > Fei. > >> Thanks, >> Petr |
|
From: Jojo R <rj...@li...> - 2023-07-17 07:06:22
|
Hi,
Sorry for the late reply,
i have been pushing the progress of valgrind RVV implementation 😄
We finished the first version and tested with full RVV intrinsics spec.
For real project and developers, we implement the first useable/ full
functionality's RVV valgrind with dirtycall method,
and we will make experiment or optimize RVV implementation on ideal RVV
design.
Back to the RVV RFC, we are happy to share our thinking of design, see
attachment for more details :)
Regards
--Jojo
在 2023/4/21 17:25, Jojo R 写道:
>
> Hi,
>
> We consider to add RVV/Vector [1] feature in valgrind, there are some
> challenges.
> RVV like ARM's SVE [2] programming model, it's scalable/VLA, that
> means the vector length is agnostic.
> ARM's SVE is not supported in valgrind :(
>
> There are three major issues in implementing RVV instruction set in
> Valgrind as following:
>
> 1. Scalable vector register width VLENB
> 2. Runtime changing property of LMUL and SEW
> 3. Lack of proper VEX IR to represent all vector operations
>
> We propose applicable methods to solve 1 and 2. As for 3, we explore
> several possible but maybe imperfect approaches to handle different cases.
>
> We start from 1. As each guest register should be described in
> VEXGuestState struct, the vector registers with scalable width of
> VLENB can be added into VEXGuestState as arrays using an allowable
> maximum length like 2048/4096.
>
> The actual available access range can be determined at Valgrind
> startup time by querying the CPU for its vector capability or some
> suitable setup steps.
>
>
> To solve problem 2, we are inspired by already-proven techniques in
> QEMU, where translation blocks are broken up when certain critical
> CSRs are set. Because the guest code to IR translation relies on the
> precise value of LMUL/SEW and they may change within a basic block, we
> can break up the basic block each time encountering a vsetvl{i}
> instruction and return to the scheduler to execute the translated code
> and update LMUL/SEW. Accordingly, translation cache management should
> be refactored to detect the changing of LMUL/SEW to invalidate
> outdated code cache. Without losing the generality, the LMUL/SEW
> should be encoded into an ULong flag such that other architectures can
> leverage this flag to store their arch-dependent information. The
> TTentry struct should also take the flag into account no matter
> insertion or deletion. By doing this, the flag carries the newest
> LMUL/SEW throughout the simulation and can be passed to disassemble
> functions using the VEXArchInfo struct such that we can get the real
> and newest value of LMUL and SEW to facilitate our translation.
>
> Also, some architecture-related code should be taken care of. Like
> m_dispatch part, disp_cp_xindir function looks up code cache using
> hardcoded assembly by checking the requested guest state IP and
> translation cache entry address with no more constraints. Many other
> modules should be checked to ensure the in-time update of LMUL/SEW is
> instantly visible to essential parts in Valgrind.
>
>
> The last remaining big issue is 3, which we introduce some ad-hoc
> approaches to deal with. We summarize these approaches into three
> types as following:
>
> 1. Break down a vector instruction to scalar VEX IR ops.
> 2. Break down a vector instruction to fixed-length VEX IR ops.
> 3. Use dirty helpers to realize vector instructions.
>
> The very first method theoretically exists but is probably not
> applicable as the number of IR ops explodes when a large VLENB is
> adopted. Imaging a configuration of VLENB=512, SEW=8, LMUL=8, the VL
> is 512 * 8 / 8 = 512, meaning that a single vector instruction turns
> into 512 scalar instructions and each scalar instruction would be
> expanded to multiple IRs. To make things worse, the tool
> instrumentation will insert more IRs between adjacent scalar IR ops.
> As a result, the performance is likely to be slowed down thousand
> times during running a real-world application with lots of vector
> instructions. Therefore, the other two methods are more promising and
> we will discuss them below.
>
> 2 and 3 are not mutually exclusive as we may choose a suitable method
> from them to implement a vector instruction regarding its concrete
> behavior. To explain these methods in detail, we present some
> instances to illustrate their pros and cons.
>
> In terms of method 2, we have real values of VLENB/LMUL/SEW. The
> simple case is VLENB <= 256 and LMUL=1, where many SIMD IR ops are
> available and can be directly applied to represent vector operations.
> However, even when VLENB is restricted to 128, it still exceeds the
> maximum SIMD width of 256 supported by VEX IR if LMUL>2. Hence, here
> are two variants of method 2 to deal with long vectors:
>
>
> *2.1*Add more SIMD IR ops such as 1024/2048/4096, and translate vector
> instructions in the granularity of VLENB. Accordingly, VLENB=4096 with
> LMUL=2 is fulfilled by two 4096 SIMD VEX IR ops.
>
> * *pros*: it encourages VEX backend to generate more compact and
> efficient SIMD code (maybe). Particularly,it accommodatesmask and
> gather/scatter (indexed) instructions by delivering more
> information in IR itself.
> * *cons*: too many new IR ops need to be introduced in VEX as each
> op of different length should implement its add/sub/mul variants.
> New data types to denote long vectors are necessary too, causing
> difficulties in both VEX backend register allocation and tool
> instrumentation.
>
> *2.2*Break down long vectors to multiple repeated SIMD ops. For
> instance, a vadd.vv vector instruction with VLENB=256/LMUL=2/SEW=8 is
> composed of four operators of Iop_Add8x16 type.
>
> * *pros:*less efforts are required in register allocation and tool
> instrumentation. The VEX frontend is able to notify the backend to
> generate efficient vector instructions by existing Iops. It better
> trades off the complexity of adding many long vector IR ops and
> the benefit of generating high-efficiency host code.
> * *cons:*it is hard to describe a mask operation given that the mask
> is pretty flexible (the least significant bit of each segment of
> v0). Additionally, gather/scatter instructions may have similar
> problems in appropriately dividing index registers. There are
> various corner cases left here such as widening arithmetic
> operations (widening SIMD IR ops are currently not compatible) and
> vstart CSR register. When using fixed-length IR ops to comprise a
> vector instruction, we will inevitably tell each IR op which
> position encoded in vstart you can start to process the data. We
> can use vstart as a normal guest state virtual register to
> calculate each op's start position as a guard IRExpr or obtain the
> value of vstart like what we do in LMUL/SEW. Nevertheless, it is
> non-trivial to decompose a vector instruction concisely.
>
> In short, both 2.1 and 2.2 confront a dilemma in reducing engineering
> efforts of refactoring Valgrind elegantly as well as implementing the
> vector instruction set efficiently. Same obstacles exist in ARM SVE as
> they are scalable vector instructions and flexible in many ways.
>
> The final solution is the dirty helper. It is undoubtedly practical
> and requires possibly the least engineering efforts in dealing with so
> many details in Valgrind. In this design, each instruction is
> completed using an inline assembly running the same instruction on the
> host. Moreover, tool instrumentation already handles IRDirty except
> that new fields should be added in _IRDirty struct to indicate
> strided/indexed/masked memory accesses and arithmetic operations.
>
> * *pros:*it supports all instructions without bothering to build
> complicated IR expressions and statements. It executes vector
> instructions using host CPU to get acceleration to some extent.
> Besides, we do not need to add VEX backend to translate new IRs to
> vector instructions.
> * *cons:*the dirty helper always keeps its operations in a black box
> such that tools can never see what happens in a dirty helper. Like
> memcheck, the bit precision merit is missing once it meets a dirty
> helper as the V-bit propagation chain adopts a pretty coarse
> determination strategy. On the other hand, it is also not an
> elegant way to implement the entire ISA extension in dirty helpers.
>
> In summary, it is far to reach a truly applicable solution in adding
> vector extensions in Valgrind. We need to do detailed and
> comprehensive estimations on different vector instruction categories.
>
> Any feedback is welcome in github [3] also.
>
>
> [1] https://github.com/riscv/riscv-v-spec
>
> [2]
> https://community.arm.com/arm-research/b/articles/posts/the-arm-scalable-vector-extension-sve
>
> [3] https://github.com/petrpavlu/valgrind-riscv64/issues/17
>
>
> Thanks.
>
> Jojo
>
>
>
> _______________________________________________
> Valgrind-developers mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Wu, F. <fe...@in...> - 2023-07-11 11:29:25
|
On 7/11/2023 4:50 AM, Petr Pavlu wrote: > On 6. Jul 23 20:39, Wu, Fei wrote: >> On 5/29/2023 11:29 AM, Wu, Fei wrote: >>> On 5/28/2023 1:06 AM, Petr Pavlu wrote: >>>> On 21. Apr 23 17:25, Jojo R wrote: >>>>> We consider to add RVV/Vector [1] feature in valgrind, there are some >>>>> challenges. >>>>> RVV like ARM's SVE [2] programming model, it's scalable/VLA, that means the >>>>> vector length is agnostic. >>>>> ARM's SVE is not supported in valgrind :( >>>>> >>>>> There are three major issues in implementing RVV instruction set in Valgrind >>>>> as following: >>>>> >>>>> 1. Scalable vector register width VLENB >>>>> 2. Runtime changing property of LMUL and SEW >>>>> 3. Lack of proper VEX IR to represent all vector operations >>>>> >>>>> We propose applicable methods to solve 1 and 2. As for 3, we explore several >>>>> possible but maybe imperfect approaches to handle different cases. >>>>> >> I did a very basic prototype for vlen Vector-IR, particularly on RISC-V >> Vector (RVV): >> >> * Define new iops such as Iop_VAdd8/16/32/64, the difference from >> existing SIMD version is that no element number is specified like >> Iop_Add8x32 >> >> * Define new IR type Ity_VLen along side existing types such as Ity_I64, >> Ity_V256 >> >> * Define new class HRcVecVLen in HRegClass for vlen vector registers >> The real length is embedded in both IROp and IRType for vlen ops/types, >> it's runtime-decided and already known when handling insn such as vadd, >> this leads to more flexibility, e.g. backend can issue extra vsetvl if >> necessary. >> >> With the above, RVV instruction in the guest can be passed from >> frontend, to memcheck, to the backend, and generate the final RVV insn >> during host isel, a very basic testcase has been tested. >> >> Now here comes to the complexities: >> >> 1. RVV has the concept of LMUL, which groups multiple (or partial) >> vector registers, e.g. when LMUL==2, v2 means the real v2+v3. This >> complicates the register allocation. >> >> 2. RVV uses the "implicit" v0 for mask, its content must be loaded to >> the exact "v0" register instead of any other ones if host isel wants to >> leverage RVV insn, this implicitness in ISA requires more explicitness >> in Valgrind implementation. >> >> For #1 LMUL, a new register allocation algorithm for it can be added, >> and it will be great if someone is willing to try it, I'm not sure how >> much effort it will take. The other way is splitting it into multiple >> ops which only takes one vector register, taking vadd for example, 2 >> vadd will run with LMUL=1 for one vadd with LMUL=2, this is still okay >> for the widening insn, most of the arithmetic insns can be covered in >> this way. The exception could be register gather insn vrgather, which we >> can consult other ways for it, e.g. scalar or helper. >> >> For #2 v0 mask, one way is to handle the mask in the very beginning at >> guest_riscv64_toIR.c, similar to what AVX port does: >> >> a) Read the whole dest register without mask >> b) Generate unmasked result by running op without mask >> c) Applying mask to a,b and generate the final dest >> >> by doing this, insn with mask is converted to non-mask ones, although >> more insns are generated but the performance should be acceptable. There >> are still exceptions, e.g. vadc (Add-with-Carry), v0 is not used as mask >> but as carry, but just as mentioned above, it's okay to use other ways >> for a few insns. Eventually, we can pass v0 mask down to the backend if >> it's proved a better solution. >> >> This approach will introduce a bunch of new vlen Vector IRs, especially >> the arithmetic IRs such as vadd, my goal is for a good solution which >> takes reasonable time to reach usable status, yet still be able to >> evolve and generic enough for other vector ISA. Any comments? > > Could you please share a repository with your changes or send them to me > as patches? I have a few questions but I think it might be easier for me > first to see the actual code. > Please see attachment. It's a very raw version to just verify the idea, mask is not added but expected to be done as mentioned above, it's based on commit 71272b2529 on your branch, patch 0013 is the key. btw, I will setup a repository but it takes a few days to pass the internal process. Thanks, Fei. > Thanks, > Petr |
|
From: Petr P. <pet...@da...> - 2023-07-10 21:06:01
|
On 6. Jul 23 20:39, Wu, Fei wrote: > On 5/29/2023 11:29 AM, Wu, Fei wrote: > > On 5/28/2023 1:06 AM, Petr Pavlu wrote: > >> On 21. Apr 23 17:25, Jojo R wrote: > >>> We consider to add RVV/Vector [1] feature in valgrind, there are some > >>> challenges. > >>> RVV like ARM's SVE [2] programming model, it's scalable/VLA, that means the > >>> vector length is agnostic. > >>> ARM's SVE is not supported in valgrind :( > >>> > >>> There are three major issues in implementing RVV instruction set in Valgrind > >>> as following: > >>> > >>> 1. Scalable vector register width VLENB > >>> 2. Runtime changing property of LMUL and SEW > >>> 3. Lack of proper VEX IR to represent all vector operations > >>> > >>> We propose applicable methods to solve 1 and 2. As for 3, we explore several > >>> possible but maybe imperfect approaches to handle different cases. > >>> > I did a very basic prototype for vlen Vector-IR, particularly on RISC-V > Vector (RVV): > > * Define new iops such as Iop_VAdd8/16/32/64, the difference from > existing SIMD version is that no element number is specified like > Iop_Add8x32 > > * Define new IR type Ity_VLen along side existing types such as Ity_I64, > Ity_V256 > > * Define new class HRcVecVLen in HRegClass for vlen vector registers > The real length is embedded in both IROp and IRType for vlen ops/types, > it's runtime-decided and already known when handling insn such as vadd, > this leads to more flexibility, e.g. backend can issue extra vsetvl if > necessary. > > With the above, RVV instruction in the guest can be passed from > frontend, to memcheck, to the backend, and generate the final RVV insn > during host isel, a very basic testcase has been tested. > > Now here comes to the complexities: > > 1. RVV has the concept of LMUL, which groups multiple (or partial) > vector registers, e.g. when LMUL==2, v2 means the real v2+v3. This > complicates the register allocation. > > 2. RVV uses the "implicit" v0 for mask, its content must be loaded to > the exact "v0" register instead of any other ones if host isel wants to > leverage RVV insn, this implicitness in ISA requires more explicitness > in Valgrind implementation. > > For #1 LMUL, a new register allocation algorithm for it can be added, > and it will be great if someone is willing to try it, I'm not sure how > much effort it will take. The other way is splitting it into multiple > ops which only takes one vector register, taking vadd for example, 2 > vadd will run with LMUL=1 for one vadd with LMUL=2, this is still okay > for the widening insn, most of the arithmetic insns can be covered in > this way. The exception could be register gather insn vrgather, which we > can consult other ways for it, e.g. scalar or helper. > > For #2 v0 mask, one way is to handle the mask in the very beginning at > guest_riscv64_toIR.c, similar to what AVX port does: > > a) Read the whole dest register without mask > b) Generate unmasked result by running op without mask > c) Applying mask to a,b and generate the final dest > > by doing this, insn with mask is converted to non-mask ones, although > more insns are generated but the performance should be acceptable. There > are still exceptions, e.g. vadc (Add-with-Carry), v0 is not used as mask > but as carry, but just as mentioned above, it's okay to use other ways > for a few insns. Eventually, we can pass v0 mask down to the backend if > it's proved a better solution. > > This approach will introduce a bunch of new vlen Vector IRs, especially > the arithmetic IRs such as vadd, my goal is for a good solution which > takes reasonable time to reach usable status, yet still be able to > evolve and generic enough for other vector ISA. Any comments? Could you please share a repository with your changes or send them to me as patches? I have a few questions but I think it might be easier for me first to see the actual code. Thanks, Petr |