From: Adhemerval Z. N. <adh...@li...> - 2025-05-09 14:29:31
|
On 25/04/25 19:06, Mark Wielaard wrote: > Hi all, > > On Fri, Apr 18, 2025 at 05:46:54PM +0200, Mark Wielaard wrote: >> On Mon, 2025-04-14 at 14:06 +0200, Mark Wielaard wrote: >>> On Sun, Apr 06, 2025 at 03:23:54PM +0200, Mark Wielaard wrote: >>>> On Mon, Mar 31, 2025 at 11:29:41AM +0200, Mark Wielaard wrote: >>>>> On Fri, Mar 28, 2025 at 07:02:28PM +0100, Mark Wielaard wrote: >>>>>> On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote: >>>>>>> Without this change, the system call wrapper function is not visible >>>>>>> on the stack at the time of the system call, which causes problems >>>>>>> for interception tools such as valgrind. >>>>>>> >>>>>>> Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix >>>>>>> Race conditions in pthread cancellation [BZ#12683]"). >>>>>>> >>>>>>> Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu. >>>>>>> (We're still discussing if valgrind needs this, but if it does, here's a >>>>>>> patch.) >>>>>> >>>>>> I implemented the valgrind part of skipping the syscall_cancel frames >>>>>> here: https://bugs.kde.org/show_bug.cgi?id=502126#c2 >>>>>> And there is a valgrind package build for fedora rawhide: >>>>>> https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393 >>>>>> >>>>>> For ppc64le, s390x and x86_64 that patch seems enough. >>>>>> >>>>>> For i686 and aarch64 there does seem to be an issue with missing the >>>>>> glibc calling function because of a tail call. >>>>>> >>>>>> Also on i686 there is another extra frame on top __libc_do_syscall. >>>>> >>>>> I extended the patch to cover some extra sycall wrapper function >>>>> symbols on i386 and armhf and pushed it to valgrind trunk and >>>>> VALGRIND_3_24_BRANCH. There are builds for fedora rawhide and >>>>> f42. This does seem to show that only on arm64 the tail calls >>>>> obscure observing the full call stack. >>>> >>>> This has now landed in fedora rawhide and f42. Test results look good, >>>> except for some if the arm64 tests where the tail calls obscure >>>> observing the full call stack. Please let me know if you need any more >>>> input from us to get this fix in glibc. >>> >>> Please let me know. Valgrind test results for syscall backtraces on >>> anything except arm64 look good. We are working on valgrind 3.25.0 >>> now, to be released around April 24. >> >> valgrind 3.25.0-RC1 has been released and test results look good on >> most arches. arm64 does show the issue described above where the tail >> calls obscure observing the full call stack when doing system calls. > > valgrind 3.25.0 have been released and is now in Fedora rawhide and > Fedora 42 with the new glibc syscall_cancel frames. The tail calls on > aarch64 still seem to be a problem for observability of the syscall > call stack. > I am trying to check if patch to inline the cancellation wrappers [1] would help it, but I am not sure how exactly would handle stacktraces that should be artificial and only represented for debug information. With the patch applied, both x86_64 and aarch64 should inline the syscall_cancel and internal_syscall_cancel call, only required an extra __syscall_cancel_arch call for the case when the process it multithread. On x86_64 it does shows as expected: valgrind-git (x86_64)$ ./coregrind/valgrind memcheck/tests/sendmsg [...] ==2131145== Syscall param sendmsg(msg) points to uninitialised byte(s) ==2131145== at 0x4972EE0: syscall_cancel (sysdep-cancel.h:83) ==2131145== by 0x4972EE0: sendmsg (sendmsg.c:28) ==2131145== by 0x4001332: main (sendmsg.c:46) ==2131145== Address 0x1ffefff850 is on thread 1's stack ==2131145== in frame #1, created by main (sendmsg.c:13) [...] But on aarch64 it shows internal_syscall_cancel, which is indeed inlined: valgrind-git (aarch64)$ ./coregrind/valgrind memcheck/tests/sendmsg [...] ==483437== Syscall param sendmsg(msg) points to uninitialised byte(s) ==483437== at 0x49D250C: internal_syscall_cancel (sysdep-cancel.h:44) ==483437== by 0x49D250C: syscall_cancel (sysdep-cancel.h:79) ==483437== by 0x49D250C: sendmsg (sendmsg.c:28) ==483437== by 0x4000B4B: main (sendmsg.c:46) ==483437== Address 0x1ffefffaf8 is on thread 1's stack ==483437== in frame #1, created by main (sendmsg.c:13) I am not sure if valgrind consider this an error, nor if it should be valgrind or compiler to handle this correctly. I am not aware of any attribute if can properly used to 'hide' internal_syscall_cancel in this case, or even if it makes sense. [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/cancel-wrappers-inline |