|
From: Josef W. <Jos...@gm...> - 2002-10-04 18:41:16
|
Forgot to send to the list...
On Friday 04 October 2002 12:48, Nicholas Nethercote wrote:
> On Fri, 4 Oct 2002, Josef Weidendorfer wrote:
> > The only problem I saw was that I need a valgrind version of the LIBC
> > "unlink", which I already mailed to Nick...
>
> I just added VG_(unlink) to head; it's untested, hopefully I got it
> right.
Thanks!
> > Regarding the valgrind skin architecture: Shouldn't it be possible to
> > "stack" skins? At the moment, for my skin I have to include all the
> > cachegrind code again. And if the cachegrind skin decides to simulate=
a
> > 3rd level cache, I have to copy it.
>
> Hmm, the LD_PRELOADing of two shared objects (skin.so + core.so) is
> already a bit fragile, having multiple .so's feels like a bad idea to
> me... Anyway, aren't Cachegrind and your patched version dissimilar
> enough that it wouldn't be easy to "stack" them in a sensible way? A
> better way might be to factor out the common code which gets included i=
n
> both skin .so's, if you see what I mean. This should be done with
> addrcheck and memcheck at some stage because they share a lot of identi=
cal
> code.
Yes. It seems to be the simplest way.
Two skins stacked on each other seems strange: The 1st does instrumentati=
on,=20
and the UCode outcome is instrumentated by the 2nd. And so on...
Regarding the LD_PRELOADing: can't this made be explicit by runtime-loadi=
ng of=20
the skins from valgrind.so (i.e. a plugin architecture)?
But a general "cost center API" would be nice to have for skins counting=20
events, linked as library to any such skin.
We need cost types (can be an array of subcost types), and cost center ta=
rget=20
types (e.g. instruction, basic block, call, function, ELF object, memory=20
access etc..) using these cost types. For this, we need register_XXX=20
functions, supplying call backs for zeroing/adding/writing ASCII version/=
=2E..
For each cost center a skin creates, it registers it and links it either =
to=20
some other cost center target ("parent") or to an existing object of the=20
valgrind core (e.g. basic block, memory area, ...).
Dumping out the profile could be fully done in a generic way: We need a "=
cost=20
center position" structure and go through all available positions of=20
registered cost centers (using e.g. the "parent" relation).
I'm sure the file format of dumps would change to be a lot more generic.
Support for per-thread cost centers could be generic, too.
I still need some time to think about it.
> Thinking longer term, your version of Cachegrind could entirely replace
> the original Cachegrind one day, since AFAICT your Cachegrind's
> functionality is a strict superset of my Cachegrind's.
Yes :-)
But it's still your baby, I only extended it. I can make small patches fo=
r=20
independent features separately (e.g. jump cost centers, recursion detect=
ion,=20
shared lib support [alias hash/_dl_resolve_runtime], "compressed" profile=
=20
format, threading support), and you decide if some modification is needed=
or=20
we can put it in as it is...
> > Perhaps you have some suggestions for my problem with recursive calls=
:
> >
> > Suppose a call chain starting from A: A calls B and C; C calls A agai=
n.
> > [...]
> > Suggestions?
>
> My brain is melting. Do you know how gprof handles it?
If I understand correctly:
gprof makes an additional virtual function, calling in "cycle" out of the
"A=3D>C=3D>A" chain, with subcycle objects A and C.
I would draw this cycle as area, splitting it up totally among A and C as
subareas. Unfortunately this doesn't show the function where the cycle wa=
s=20
entered from the outside. So I can rename the cycle back to the function=20
where the cycle was entered, and I'm back to my original proposal :-)
Still the question is how gprof calculates its results.
(see reply to the mail of Jeremy, too)
|