From: Adhemerval Z. N. <adh...@li...> - 2025-05-13 08:29:50
|
On 12/05/25 12:12, Mark Wielaard wrote: > Hi Adhemerval, > > On Fri, May 09, 2025 at 11:08:42AM -0300, Adhemerval Zanella Netto wrote: >> On 25/04/25 19:06, Mark Wielaard wrote: >>> valgrind 3.25.0 have been released and is now in Fedora rawhide and >>> Fedora 42 with the new glibc syscall_cancel frames. The tail calls on >>> aarch64 still seem to be a problem for observability of the syscall >>> call stack. >> >> I am trying to check if patch to inline the cancellation wrappers [1] >> would help it, but I am not sure how exactly would handle stacktraces >> that should be artificial and only represented for debug information. > > I got that patch [1] working so I could test it myself (only on x86_64 > for now). How do you properly test a glibc installed in a non-default > location though? Currently I am doing something like: > > export LD_LIBRARY_PATH=/usr/local/glibc/lib > /usr/local/glibc/lib/ld-linux-x86-64.so.2 memcheck/tests/sendmsg > > This can also work with valgrind in between because it doesn't use > glibc dynamically itself. But it is not the easiest way to test > things. I used -Wl,-dynamic-linker and -Wl,-rpath instead at configure: --enable-only64bit \ LDFLAGS="-Wl,--dynamic-linker,/path/to/glibc-build/testroot.pristine/lib/ld-linux-aarch64.so.1 \ -Wl,--rpath,/path/to/glibc-build/testroot.pristine/lib64" It won't use glibc startup object (*ct*.o), neither the headers; but using a recent systems it should not be a problem (the difference should be minimal). > >> With the patch applied, both x86_64 and aarch64 should inline the >> syscall_cancel and internal_syscall_cancel call, only required an >> extra __syscall_cancel_arch call for the case when the process it >> multithread. >> >> On x86_64 it does shows as expected: >> >> valgrind-git (x86_64)$ ./coregrind/valgrind memcheck/tests/sendmsg >> [...] >> ==2131145== Syscall param sendmsg(msg) points to uninitialised byte(s) >> ==2131145== at 0x4972EE0: syscall_cancel (sysdep-cancel.h:83) >> ==2131145== by 0x4972EE0: sendmsg (sendmsg.c:28) >> ==2131145== by 0x4001332: main (sendmsg.c:46) >> ==2131145== Address 0x1ffefff850 is on thread 1's stack >> ==2131145== in frame #1, created by main (sendmsg.c:13) >> [...] >> >> But on aarch64 it shows internal_syscall_cancel, which is indeed >> inlined: >> >> valgrind-git (aarch64)$ ./coregrind/valgrind memcheck/tests/sendmsg >> [...] >> ==483437== Syscall param sendmsg(msg) points to uninitialised byte(s) >> ==483437== at 0x49D250C: internal_syscall_cancel (sysdep-cancel.h:44) >> ==483437== by 0x49D250C: syscall_cancel (sysdep-cancel.h:79) >> ==483437== by 0x49D250C: sendmsg (sendmsg.c:28) >> ==483437== by 0x4000B4B: main (sendmsg.c:46) >> ==483437== Address 0x1ffefffaf8 is on thread 1's stack >> ==483437== in frame #1, created by main (sendmsg.c:13) >> >> I am not sure if valgrind consider this an error, nor if it should be >> valgrind or compiler to handle this correctly. I am not aware of any >> attribute if can properly used to 'hide' internal_syscall_cancel in >> this case, or even if it makes sense. > > It isn't an "error" but any extra frames (either "real" or inlined) on > top of the function that does the actual system call causes existing > suppressions and test wrappers to no longer work. So valgrind is > responsible for filtering them out. For actual extra frames we now do > (since 3.25.0). If the extra syscall frame wrappers are inlined they > don't show up if no debuginfo/DWARF is available for glibc. But if it > is available we'll have to filter them out and/or don't look them up > for the top-level syscalls. > > Selectively removing these inlined calls is a little fragile at the > moment in valgrind, since it is done at a later time than capturing > the backtrace addresses. So either we have to pass through that we are > handling a syscall at the moment and so don't want to "expand" the > top-level frame with any inlines. And/Or we match on the "magic" > inlined syscall cancel frames (everywhere). If I understood correctly these seems to be an issue only for valgrind regression tests, right? If so, could this be handled solely on test validation? Or does valgrind aims to hide such inlined calls on stacktrace reports as well? > > You could use __attribute__(__artificial__)) which should mark the > function with DW_AT_artificial. Valgrind doesn't know about this > attribute yet, but we could probably with some extra work. Yeah, this is the first thing I tried to see if valgrind could handle it. Doing this on glibc should be easy enough if you think it would be way to handle it on valgrind. > > Thanks, > > Mark > >> [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/cancel-wrappers-inline |