Re: [Valgrind-developers] helgrind

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Mon, 2002-10-07 at 11:00, Julian Seward wrote:

    On Monday 07 October 2002 1:14 am, Jeremy Fitzhardinge wrote:
    > Can go into a few more details about what's wrong with helgrind?  Does
    > it get the Eraser algorithm wrong, or does it just need some more
    > refinement?
    >
    > I read the Eraser paper and the extension suggested in "Visual Threads",
    > and it all looks pretty straightforward.  Once I get some profiling
    > machinery going, I'm definitely interested in working on helgrind.

    It generates bazillions of errors.  We don't know why.

    The only _known_ cause for tyhis is that glibc has a counter for
    the number of relocations done by the dynamic linker, and this is
    incremented without locking.  That's allegedly harmless because
    it's only for stats purposes, but it's still something that needs
    to be suippressed or worked around.  Problem with suppression is
    that there is no fixed call stack at the error point; it happens
    at the first use of every dynamic symbol (or something) so the
    normal stack-based-identification of the suppression mechanism
    won't work.  We haven't thought of a way round this yet.

I had a play with this last night, and noticed the dynamic linker count
thing.  Several things occurred to me:

        * Obviously, helgrind needs to have suppression implemented.  I
          suspect it needs dynamic suppression (hints inserted in the
          code) as well as static suppression rules.
        * I noticed that the symtab stuff only collects function
          symbols, but we're really going to want static and global data
          symbols as well, so they can be used for both reports and
          suppressions.
        * Also, a fair amount of the memcheck error report stuff needs
          to be reused as well, so dynamic memory can be identified in
          terms of its allocation site.
        * I think the algorithm needs a new state, which means "can be
          touched by any thread for any reason".  This would mainly be
          for error suppression, so that you can annotate it that way,
          and so that once you've reported that location you can silence
          further errors.  Perhaps you could use "exclusive" state with
          a magic thread ID meaning "everyone".
        * The Eraser paper didn't cover the case where a mutex identity
          can be reused (say the mutex is part of an object in dynamic
          memory which is reallocated).  I'm guessing that on freeing
          some memory, you have to go through all the mutex sets looking
          for a mutex in the freed memory range, and poison that set
          somehow.
        * The Eraser paper also used a resolution of 4 bytes because
          that's the smallest the alpha guarantees for atomic
          operations.  The x86 goes down to 1 byte; I wonder if we need
          more resolution (ie, are there neighbouring memory locations
          being updated by different threads, which are causing spurious
          errors?).

    J