Re: [Valgrind-developers] ppc64le results

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Carl,

Sorry for the delay.  I'm currently away for the next couple weeks, however
I was able to take a look at these regressions.

It looks like debuginfo is not always lazily loaded on ppc64le since it's
possible for neither describe_IP or find_DiCfSI to be called before symtab
lookups during stacktrace.  describe_IP and find_DiCfSI contain calls
to lazily load debuginfo, so if they are not called before stacktrace printing
it results in missing debuginfo and lower quality stacktraces.

I've attached a patch that fixed the regressions for me when I tested this on
a ppc64le machine.  It adds lazy debuginfo loading during ppc
get_StackTrace_wrk.

Aaron

On Tue, Aug 29, 2023 at 3:15 PM Carl Love <ce...@us...> wrote:
>
> Mark, Paul, Aaron:
>
> On Sun, 2023-08-27 at 17:36 +0200, Mark Wielaard wrote:
> > Hi Paul,
> >
> > On Sat, Aug 26, 2023 at 06:26:29AM +0200, Paul Floyd wrote:
> > > I was just looking at valgrind-testresults and there was a jump in
> > > the number of failures on ppc64le on Aug 17th, just after the
> > > deferred debuginfo reading change.
> > >
> > > One random example
> > >
> > > =================================================
> > > ./valgrind-old/drd/tests/tc16_byterace.stderr.diff
> > > =================================================
> > > --- tc16_byterace.stderr.exp    2023-08-17 03:01:09.168107928 +0000
> > > +++ tc16_byterace.stderr.out    2023-08-17 03:28:20.030515805 +0000
> > > @@ -1,8 +1,7 @@
> > >
> > >  Conflicting load by thread 1 at 0x........ size 1
> > >     at 0x........: main (tc16_byterace.c:34)
> > > -Location 0x........ is 0 bytes inside bytes[4],
> > > -a global variable declared at tc16_byterace.c:7
> > > +Allocation context: BSS section of tc16_byterace
> > >
> > > It does look to me like this is a debuginfo issue.
> >
> > You are correct, this was caused by:
> >
> > commit 60f7e89ba32b54d73b9e36d49e28d0f559ade0b9
> > Author: Aaron Merey <am...@re...>
> > Date:   Fri Jun 30 18:31:42 2023 -0400
> >
> >     Support lazy reading and downloading of DWARF debuginfo
> >
> > That commit shouldn't have been architecture specific, but it
> > apparently was. I put some early analysis into the bug
> >
> > https://bugs.kde.org/show_bug.cgi?id=471807#c16
> >
> > The patch depends on a call to find_DiCfSI triggering a full
> > debuginfo load.
> > find_DiCfSI is (indirectly called) when ML_(get_CFA) is called.
> > It looks like ppc64le doesn't call ML_(get_CFA) because we have the
> > following in
> > coregrind/m_debuginfo/d3basics.c
> >
> > #if defined(VGP_ppc32_linux) || defined(VGP_ppc64be_linux) \
> >     || defined(VGP_ppc64le_linux)
> >             /* Valgrind on ppc32/ppc64 currently doesn't use unwind
> > info. */
> >             uw1 = ML_(read_Addr)((UChar*)regs->sp);
> > #else
> >             uw1 = ML_(get_CFA)(regs->ip, regs->sp, regs->fp, 0,
> > ~(UWord) 0);
> > #endif
>
> I verified that the patch from Aaron causes regression failures on
> Power 9 and Power 10.  Per the comment above, not sure why PowerPC does
> not support the get_CFA call?  Unfortunately, I don't know much about
> callgrind or the debuginfo stuff.  Not obvious to me at first glance
> how to fix the issue.
>
> I would be happy to help test a patch or work on a patch if someone has
> specific suggestions on how to fix the issue on PowerPC.
>
>                                Carl
>