|
From: Jeremy F. <je...@go...> - 2002-10-08 01:09:56
|
On Mon, 2002-10-07 at 11:00, Julian Seward wrote:
On Monday 07 October 2002 1:14 am, Jeremy Fitzhardinge wrote:
> Can go into a few more details about what's wrong with helgrind? Does
> it get the Eraser algorithm wrong, or does it just need some more
> refinement?
>
> I read the Eraser paper and the extension suggested in "Visual Threads",
> and it all looks pretty straightforward. Once I get some profiling
> machinery going, I'm definitely interested in working on helgrind.
It generates bazillions of errors. We don't know why.
The only _known_ cause for tyhis is that glibc has a counter for
the number of relocations done by the dynamic linker, and this is
incremented without locking. That's allegedly harmless because
it's only for stats purposes, but it's still something that needs
to be suippressed or worked around. Problem with suppression is
that there is no fixed call stack at the error point; it happens
at the first use of every dynamic symbol (or something) so the
normal stack-based-identification of the suppression mechanism
won't work. We haven't thought of a way round this yet.
I had a play with this last night, and noticed the dynamic linker count
thing. Several things occurred to me:
* Obviously, helgrind needs to have suppression implemented. I
suspect it needs dynamic suppression (hints inserted in the
code) as well as static suppression rules.
* I noticed that the symtab stuff only collects function
symbols, but we're really going to want static and global data
symbols as well, so they can be used for both reports and
suppressions.
* Also, a fair amount of the memcheck error report stuff needs
to be reused as well, so dynamic memory can be identified in
terms of its allocation site.
* I think the algorithm needs a new state, which means "can be
touched by any thread for any reason". This would mainly be
for error suppression, so that you can annotate it that way,
and so that once you've reported that location you can silence
further errors. Perhaps you could use "exclusive" state with
a magic thread ID meaning "everyone".
* The Eraser paper didn't cover the case where a mutex identity
can be reused (say the mutex is part of an object in dynamic
memory which is reallocated). I'm guessing that on freeing
some memory, you have to go through all the mutex sets looking
for a mutex in the freed memory range, and poison that set
somehow.
* The Eraser paper also used a resolution of 4 bytes because
that's the smallest the alpha guarantees for atomic
operations. The x86 goes down to 1 byte; I wonder if we need
more resolution (ie, are there neighbouring memory locations
being updated by different threads, which are causing spurious
errors?).
J
|