|
From: Jeremy F. <je...@go...> - 2002-10-14 16:17:05
|
On Mon, 2002-10-14 at 04:15, Josef Weidendorfer wrote:
Hi Jeremy,
I just looked over your vgprof skin: Seems quite cool :-)
The client side requests (e.g. VALGRIND_DUMP_PROFILE) are quite
useful. But they need changing source and recompilation. Did you
already thought about alternate solutions?
I have the "--dumpat=" command line option (should be renamed to
--dump-at-entering=" and adding "--dump-at-leaving=").
The main reason I added them was for doing profiling of an interactive
application. I actually bound them to keystrokes so I can do things
like:
1. zero stats
2. move around the UI
3. grab profile
Static profile snapshots at particular function entry/exits wouldn't
have been that useful.
Additionally I allow interactive controlling a cachegrind run by creating
"cachegrind.cmd" files. At the moment, simply a dump is made when
detecting this file. But I want to add commands for cachegrind to read and
execute them, e.g. "DUMP NOW" or "DUMP AT ENTERING xxx" or
"DELETE DUMP AT ENTERING xxx".
It would be cool if we could come up with some kind of "standard" for this.
Especially as I would like to add a (v)gprof import filter for KCachegrind,
and you create trace parts, too: KCachegrind has a toolbar button "force
dump", creating a "cachegrind.cmd" file (This should be renamed to
"valgrind.cmd").
Well, it seems that there's some plan to add a mechanism so that
valgrind can report its results via a socket rather than simply writing
out to stderr. Such a socket seems like a better way of communicating
with a skin than polling on a file.
For actual configuration-type information, I was thinking of
implementing some suppression file keywords, so that I can include or
exclude particular parts of the program. It would make sense to add a
suppression keyword to trigger a profile dump as well.
The idea of different "weights" for different instructions seems to be
quite useful to add to cachegrind (as new event type).
How did you get your weights?
They can be quite different for every different processor (AMD/Intel).
And the best thing would be to get the values by measuring online,
with an additional tool to put measured weights into a config file (like
calibrator for the cache latencies).
That's all a complete hack at the moment. It doesn't even measure the
right thing; it adds weights based on the UInstrs, but doesn't take into
account the original x86 instruction's performance characteristics. The
alternative, which is to simply count x86 instructions, gives somewhat
misleading results because it assumes that all instructions take the
same amount of time to run.
As I understand, you have a new gmon format version. I would suggest
the following for the new format:
For each header, add a section length field to allow skipping this
section if the reader doesn't know about the section type (You once said that
this is a shortcoming of the current format yourself).
Yes, that's a problem with using gmon.out, but the main reason for using
it was to reuse the gprof code base, and with luck be able to get the
changes merged into the binutils mainline. It isn't a very nice format
on the whole, so it certainly isn't hard to come up with something
improved, but I don't think its worth the effort to completely change
the gmon.out format.
Multiple history sections seem to be supported in the format. Can (v)gprof
handle these, e.g. choosing them per option? Or does the gprof output have
different columns for each event type?
The gprof program can only really deal with one unit at a time. And I
haven't really looked into recording other units yet, though elapsed
time and CPU-time seem like the most useful.
I would like to add an gprof import filter to KCachegrind: Supporting an
extensionable format would be good for this. I suppose I will have to copy
all the symbol reading stuff from binutils. The nice thing of the cachegrind
format is that you already have symbol names...
Yes, or the stuff from valgrind itself. As far as I can tell, libbfd is
pretty inefficient at doing the things that gprof needs done (in
particular, mapping code addresses to file:lines). The advantage is
that it supports lots of different file formats.
If I understand correctly, you associate the weights of a BB to the start
address of this BB. Is this granularity enough for annotated source output of
your vgprof?
I haven't used annotated source, because gprof is too inefficient at
reading lots of line-level symbol tables. But yes, I think accumulating
the BB's instructions to the first address doesn't upset things too much
(at the source level, you may see something later in a BB supposedly
taking no time, but it wouldn't be that hard to work out what's going
on).
J
|