Re: [Valgrind-developers] running outer helgrind on inner none

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, 2012-02-21 at 00:19 +0100, Julian Seward wrote:
> > Any idea if the patch below is the way to go ?
> > Or if there is something which I have not understood/did wrong ?
> 
> Your analysis + patch sound plausible, is the best I can say.
> You are in unexplored territory -- I don't think anybody has been
> down this road before.

Continuing based on this patch: the whole stuff was crashing with
SIGSEGV during stack trace production.

After investigation, here is what I believe is happening:
The inner Valgrind has two stacks for each thread it is running :
   the stack used by Valgrind
   the stack used by the synthetic cpu (used by the guest code).
When the guess process is doing a system call, the inner valgrind
is switching the stack to the synthetic stack (so as to have
the system call executed in the desired guest context).

The outer Valgrind detects this stack switch
(and reports "client switching stack?" warning).

This is all ok.

Except than when the outer Valgrind has to produce a stack trace, it
does not have a correct value for the stack limits : the 'fp_min' value
is the current value of the stack pointer (this is correct),
but the 'fp_max' value is from the stack of the synthetic cpu,
while the fp_max value should differ depending on the fact that
the inner valgrind is working on one or the other stack.
This means that the stacktrace unwinding code believes it can continue
to unwind past the segment stack.
This then causes a SIGSEGV.

I have solved this problem with the following patch (currently
only for the amd64 stack unwinding, but I think it should be
generalised).

The below is based on the assumption that whatever the stack value,
the fp_max value must be in the segment where the current stack pointer
is.
With this patch, I was able to run (on amd64) helgrind over none,
helgrind over memcheck, memcheck over memcheck, ...

I have annotated the pipe lock and the futex lock with RWLOCK
annotations. With this, helgrind detects some (but not many) possible
data races (on the current trunk i.e. the "single threaded" version).
(without annotation, helgrind detected thousands of problems, which is
not surprising).

The patch below must be in the "outer" valgrind, so cannot be made
conditional on ENABLE_INNER. 

Feedback about this analysis and the patch below would be appreciated.
Note that the below should  probably better be put in a common
place between all architectures (e.g. close to the VG_(stack_limits)
call or even maybe inside  VG_(stack_limits)).
Effectively, whatever the architecture, it looks like the stack highest
word must be in the segment where the stack pointer is currently.

If the below is ok, I will prepare 3 (cleaned up) patches:
  1. a patch with the change below
  2. the patch to have the inner ignoring the vgpreload
     of the outer (conditionalised on ENABLE_INNER)
  3. an helgrind annotation patch (conditionalised on ENABLE_INNER)

Thanks

Philippe


Index: coregrind/m_stacktrace.c
===================================================================

--- coregrind/m_stacktrace.c	(revision 12398)
+++ coregrind/m_stacktrace.c	(working copy)
@@ -253,6 +253,12 @@
    if (fp_max >= sizeof(Addr))
       fp_max -= sizeof(Addr);
 
+   {
+      const NSegment * stack = VG_(am_find_nsegment) ( uregs.xsp ); 
+      if (fp_max > stack->end)
+         fp_max = stack->end;
+   }
+
    if (debug)
       VG_(printf)("max_n_ips=%d fp_min=0x%lx fp_max_orig=0x%lx, "
                   "fp_max=0x%lx ip=0x%lx fp=0x%lx\n",