|
From: Aaron M. <am...@re...> - 2023-08-30 19:10:08
|
Hi Carl, Sorry for the delay. I'm currently away for the next couple weeks, however I was able to take a look at these regressions. It looks like debuginfo is not always lazily loaded on ppc64le since it's possible for neither describe_IP or find_DiCfSI to be called before symtab lookups during stacktrace. describe_IP and find_DiCfSI contain calls to lazily load debuginfo, so if they are not called before stacktrace printing it results in missing debuginfo and lower quality stacktraces. I've attached a patch that fixed the regressions for me when I tested this on a ppc64le machine. It adds lazy debuginfo loading during ppc get_StackTrace_wrk. Aaron On Tue, Aug 29, 2023 at 3:15 PM Carl Love <ce...@us...> wrote: > > Mark, Paul, Aaron: > > On Sun, 2023-08-27 at 17:36 +0200, Mark Wielaard wrote: > > Hi Paul, > > > > On Sat, Aug 26, 2023 at 06:26:29AM +0200, Paul Floyd wrote: > > > I was just looking at valgrind-testresults and there was a jump in > > > the number of failures on ppc64le on Aug 17th, just after the > > > deferred debuginfo reading change. > > > > > > One random example > > > > > > ================================================= > > > ./valgrind-old/drd/tests/tc16_byterace.stderr.diff > > > ================================================= > > > --- tc16_byterace.stderr.exp 2023-08-17 03:01:09.168107928 +0000 > > > +++ tc16_byterace.stderr.out 2023-08-17 03:28:20.030515805 +0000 > > > @@ -1,8 +1,7 @@ > > > > > > Conflicting load by thread 1 at 0x........ size 1 > > > at 0x........: main (tc16_byterace.c:34) > > > -Location 0x........ is 0 bytes inside bytes[4], > > > -a global variable declared at tc16_byterace.c:7 > > > +Allocation context: BSS section of tc16_byterace > > > > > > It does look to me like this is a debuginfo issue. > > > > You are correct, this was caused by: > > > > commit 60f7e89ba32b54d73b9e36d49e28d0f559ade0b9 > > Author: Aaron Merey <am...@re...> > > Date: Fri Jun 30 18:31:42 2023 -0400 > > > > Support lazy reading and downloading of DWARF debuginfo > > > > That commit shouldn't have been architecture specific, but it > > apparently was. I put some early analysis into the bug > > > > https://bugs.kde.org/show_bug.cgi?id=471807#c16 > > > > The patch depends on a call to find_DiCfSI triggering a full > > debuginfo load. > > find_DiCfSI is (indirectly called) when ML_(get_CFA) is called. > > It looks like ppc64le doesn't call ML_(get_CFA) because we have the > > following in > > coregrind/m_debuginfo/d3basics.c > > > > #if defined(VGP_ppc32_linux) || defined(VGP_ppc64be_linux) \ > > || defined(VGP_ppc64le_linux) > > /* Valgrind on ppc32/ppc64 currently doesn't use unwind > > info. */ > > uw1 = ML_(read_Addr)((UChar*)regs->sp); > > #else > > uw1 = ML_(get_CFA)(regs->ip, regs->sp, regs->fp, 0, > > ~(UWord) 0); > > #endif > > I verified that the patch from Aaron causes regression failures on > Power 9 and Power 10. Per the comment above, not sure why PowerPC does > not support the get_CFA call? Unfortunately, I don't know much about > callgrind or the debuginfo stuff. Not obvious to me at first glance > how to fix the issue. > > I would be happy to help test a patch or work on a patch if someone has > specific suggestions on how to fix the issue on PowerPC. > > Carl > |