|
From: ilya s. <ily...@gm...> - 2011-09-19 17:30:33
|
dear valgrind developers,
how hard would it be to implement the following extension to
cachegrind/callgrind:
right now, when reporting the cost of a function, the cost of all
invocations is aggregated together.
(callgrind separates the invocations by caller, but that's as
fine-grained as it goes).
the proposed extension would do the following:
- for each function (or for requested functions), for a chosen
statistic (Ir, DLmr, etc), produce a _histogram_ of invocation costs.
in each bin, keep the # of invocations and their total cost.
this will show e.g. how much you'd save by optimizing the cheap
common cases,
vs. the costly rare cases.
an additional extension would let the user print their own debug
information for representative
invocations falling into a given bin. this would work only for
completely deterministic user programs.
- there would be a valgrind client request, VALGRIND_SHOULD_PRINT,
which the user would
add to their function to tell them whether to print debug
information for this invocation:
void myFunc( ComplexStructure *arg ) {
if( VALGRIND_SHOULD_PRINT )
arg->print();
...
}
- there would be an option to cachegrind/callgrind to record and
save in an output file, for a given function and
a given range of costs, the "invocation ids" of some small number
of invocations of that function in that cost range.
"invocation id" of a function is simply the order number of its
invocation (e.g. the 134701st invocation of this function).
- there would be another option to cachegrind/callgrind to read the
file recorded above and to have VALGRIND_SHOULD_PRINT return 1
for the recorded invocations, and 0 for all others.
this would allow finer-grained profiling than is currently possible.
how hard would this be to add?
thanks,
ilya
|
|
From: Josef W. <Jos...@gm...> - 2011-09-19 20:24:33
|
Hi Ilya,
On Monday 19 September 2011, ilya shlyakhter wrote:
> how hard would it be to implement the following extension to
> cachegrind/callgrind:
> ...
this list (which I do not understand completely) suggests that
you have some concrete use case in mind. What do you want to
accomplish in the end?
> right now, when reporting the cost of a function, the cost of all
> invocations is aggregated together.
> (callgrind separates the invocations by caller, but that's as
> fine-grained as it goes).
Hmm.. you can also dump counters to a file anytime you want,
e.g. with "--dump-after=<func>". You then can post-process
the files, and do your own statistics on it.
For how to parse the files in PERL, see "callgrind_annotate".
If writing millions of files is to expensive, it probably is
easier to think about a way to pass measurements to a script
in a more light-weight way (e.g. via pipes). You should
be able to implement your suggestions in your own script then.
> the proposed extension would do the following:
> - for each function (or for requested functions), for a chosen
> statistic (Ir, DLmr, etc), produce a _histogram_ of invocation costs.
> in each bin, keep the # of invocations and their total cost.
> an additional extension would let the user print their own debug
> information for representative
> invocations falling into a given bin. this would work only for
> completely deterministic user programs.
> - there would be a valgrind client request, VALGRIND_SHOULD_PRINT,
> which the user would
> add to their function to tell them whether to print debug
> information for this invocation:
>
> void myFunc( ComplexStructure *arg ) {
> if( VALGRIND_SHOULD_PRINT )
> arg->print();
> ...
> }
> - there would be an option to cachegrind/callgrind to record and
> save in an output file, for a given function and
> a given range of costs, the "invocation ids" of some small number
> of invocations of that function in that cost range.
> "invocation id" of a function is simply the order number of its
> invocation (e.g. the 134701st invocation of this function).
>
> - there would be another option to cachegrind/callgrind to read the
> file recorded above and to have VALGRIND_SHOULD_PRINT return 1
> for the recorded invocations, and 0 for all others.
I don't understand the benefit. How would you use that feature?
> this would allow finer-grained profiling than is currently possible.
> how hard would this be to add?
If you come up with a patch, you still need to prove that it is worth
merging and maintaining. It seems easier to me to try to come up
with a general solution which allows users to implement their own
post-processing/statistics, as mentioned above.
E.g. for adding histograms, you not only need to change cachegrind/callgrind,
but also extend the format and parsers, such as {cg,callgrind}_annotate,
and the KCachegrind GUI.
That said, it would be cool to have histograms.
Josef
>
> thanks,
>
> ilya
>
> ------------------------------------------------------------------------------
> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
> Learn about the latest advances in developing for the
> BlackBerry® mobile platform with sessions, labs & more.
> See new tools and technologies. Register for BlackBerry® DevCon today!
> http://p.sf.net/sfu/rim-devcon-copy1
> _______________________________________________
> Valgrind-developers mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-developers
>
|
|
From: ilya s. <ily...@gm...> - 2011-09-21 14:39:21
|
On Mon, Sep 19, 2011 at 4:24 PM, Josef Weidendorfer
<Jos...@gm...> wrote:
> E.g. for adding histograms, you not only need to change cachegrind/callgrind,
> but also extend the format and parsers, such as {cg,callgrind}_annotate,
> and the KCachegrind GUI.
Not necessarily: you could do what you did for different function
arguments -- for each function
for which you want histograms, create a separate function name for
each bin. And the current
"callee map" function of KCachegrind could effectively display the histogram.
Say you have a function myFunc() that's called 1,000,000 times.
You create functions myFunc_1, myFunc_2, myFunc_3 to record counts
from invocations
that took (say) <100,000 Ir's, 100,000-1,000,000 Ir's and >1,000,000
Ir's respectively.
The callee map view would then show which group of calls takes the
most resources.
So, after each myFunc() invocation, you would have to check that
invocation's Ir count,
and move all the counts recorded under this invocation to one of the
myFunc_? records.
You could also perhaps record each call to myFunc() as a call to
myFunc() that then
delegates the call to one of the myFunc_? routines; then KCachegrind's
callee map view
would be the actual histogram for myFunc().
ilya
|
|
From: ilya s. <ily...@gm...> - 2011-09-19 21:40:58
|
>you have some concrete use case in mind. What do you want to
>accomplish in the end?
Say valgrind shows that I'm spending 60% of time in function myFunc(),
and now I want to optimize myFunc().
Say myFunc is called 1,000,000 times, but 60% of its total runtime is spent
in 5% of these calls. I want to (1) know that this is the case (hence
the histograms), and
(2) get a look at these 5% of calls -- what is it about the arguments
to these calls, that
makes myFunc() take disproportionate time on them? (hence the
VALGRIND_SHOULD_PRINT client request).
When implementing a function or choosing a data structure there is
often a tradeoff: you can speed it up
on some inputs while slowing it down on others.
E.g. if your function processes, lists, you might add special-case
checks and code for short lists,
which will speed things up for short lists but slow things down for
longer lists that don't fall under
the special cases. Having histograms would tell you what fraction of
time gets spent on longer vs shorter lists.
If a histogram shows that the bulk of time is spent on a small
fraction of calls that each take >100,000 instructions,
I want to examine some of these calls -- specifically, the inputs to
them. The VALGRIND_SHOULD_PRINT client request
would let me do that.
I've been doing this manually by manually timing myFunc(), building a
histogram of its runtimes, finding which bins
dominate the histogram; then, looking at the inputs of some
representative invocations that fall in these runtime bins,
by: putting a static counter inside the function so you know when
you're on its N'th invocation; printing the counter
value for the first few invocations whose runtime falls into the bins
of interest; then adding code at the start of function,
"if counter == one of the printed counter values, print the function arguments".
Building this into the profiler would let me 1) avoid doing the above
for each function I want to optimize, and 2) do this
not just for runtime but for all other statistics gathered by
cachegrind/callgrind.
Does this explanation help, or should I write some more detailed examples?
>If writing millions of files is to expensive,
it would be too expensive in my case.
>easier to think about a way to pass measurements to a script
>in a more light-weight way (e.g. via pipes). You should
>be able to implement your suggestions in your own script then.
thanks, that might work for making histograms (esp. if I limit
data-gathering to that one function).
but it wouldn't work for printing the arguments of some representative
function invocations falling into a given range of runtimes.
ilya
On Mon, Sep 19, 2011 at 4:24 PM, Josef Weidendorfer
<Jos...@gm...> wrote:
> Hi Ilya,
>
> On Monday 19 September 2011, ilya shlyakhter wrote:
>> how hard would it be to implement the following extension to
>> cachegrind/callgrind:
>> ...
>
> this list (which I do not understand completely) suggests that
> you have some concrete use case in mind. What do you want to
> accomplish in the end?
>
>> right now, when reporting the cost of a function, the cost of all
>> invocations is aggregated together.
>> (callgrind separates the invocations by caller, but that's as
>> fine-grained as it goes).
>
> Hmm.. you can also dump counters to a file anytime you want,
> e.g. with "--dump-after=<func>". You then can post-process
> the files, and do your own statistics on it.
> For how to parse the files in PERL, see "callgrind_annotate".
>
> If writing millions of files is to expensive, it probably is
> easier to think about a way to pass measurements to a script
> in a more light-weight way (e.g. via pipes). You should
> be able to implement your suggestions in your own script then.
>
>> the proposed extension would do the following:
>> - for each function (or for requested functions), for a chosen
>> statistic (Ir, DLmr, etc), produce a _histogram_ of invocation costs.
>> in each bin, keep the # of invocations and their total cost.
>
>> an additional extension would let the user print their own debug
>> information for representative
>> invocations falling into a given bin. this would work only for
>> completely deterministic user programs.
>> - there would be a valgrind client request, VALGRIND_SHOULD_PRINT,
>> which the user would
>> add to their function to tell them whether to print debug
>> information for this invocation:
>>
>> void myFunc( ComplexStructure *arg ) {
>> if( VALGRIND_SHOULD_PRINT )
>> arg->print();
>> ...
>> }
>> - there would be an option to cachegrind/callgrind to record and
>> save in an output file, for a given function and
>> a given range of costs, the "invocation ids" of some small number
>> of invocations of that function in that cost range.
>> "invocation id" of a function is simply the order number of its
>> invocation (e.g. the 134701st invocation of this function).
>>
>> - there would be another option to cachegrind/callgrind to read the
>> file recorded above and to have VALGRIND_SHOULD_PRINT return 1
>> for the recorded invocations, and 0 for all others.
>
> I don't understand the benefit. How would you use that feature?
>
>> this would allow finer-grained profiling than is currently possible.
>> how hard would this be to add?
>
> If you come up with a patch, you still need to prove that it is worth
> merging and maintaining. It seems easier to me to try to come up
> with a general solution which allows users to implement their own
> post-processing/statistics, as mentioned above.
>
> E.g. for adding histograms, you not only need to change cachegrind/callgrind,
> but also extend the format and parsers, such as {cg,callgrind}_annotate,
> and the KCachegrind GUI.
>
> That said, it would be cool to have histograms.
>
> Josef
>
>>
>> thanks,
>>
>> ilya
>>
>> ------------------------------------------------------------------------------
>> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
>> Learn about the latest advances in developing for the
>> BlackBerry® mobile platform with sessions, labs & more.
>> See new tools and technologies. Register for BlackBerry® DevCon today!
>> http://p.sf.net/sfu/rim-devcon-copy1
>> _______________________________________________
>> Valgrind-developers mailing list
>> Val...@li...
>> https://lists.sourceforge.net/lists/listinfo/valgrind-developers
>>
>
>
>
|
|
From: Josef W. <Jos...@gm...> - 2011-09-20 12:54:49
Attachments:
seppar.patch
|
On Monday 19 September 2011, ilya shlyakhter wrote: > I've been doing this manually by manually timing myFunc(), building a > histogram of its runtimes, finding which bins > dominate the histogram; then, looking at the inputs of some > representative invocations that fall in these runtime bins, > by: putting a static counter inside the function so you know when > you're on its N'th invocation; printing the counter > value for the first few invocations whose runtime falls into the bins > of interest; then adding code at the start of function, > "if counter == one of the printed counter values, print the function arguments". > Building this into the profiler would let me 1) avoid doing the above > for each function I want to optimize, and 2) do this > not just for runtime but for all other statistics gathered by > cachegrind/callgrind. > > Does this explanation help, or should I write some more detailed examples? I think I got the idea of this 2-stage procedure. What about the following approach: you expect that some parameter settings of myfunc() influence its runtime. Callgrind would allow you to embed information for bins of this parameter setting into the function name. E.g. for myfunc(int a), the profile results would show you two functions: "myfunc:a>=5" : 90 calls, 1 billion instructions executed "myfunc:a<5" : 5 calls, 2 million instructions executed I think in the end you should be able to reach the same goal you get with your 2-stage approach. We did such an extension in the past, see http://www.lrr.in.tum.de/~kuestner/proper09.pdf Attached is a patch for that (not sure it applies against current SVN or 3.6.1), with the following short doc: Format is --separate-par=<fn_pattern>':'<intpar_num>[':'<intval>(','<intval>)*] If no value is given, separate by every different value Otherwise, this allows for multiple buckets: x < intval1, intval1 < x <= intval2, ... , intvalX <= x with x being the <intpar_num>s int parameter of functions matching fn_pattern. Multiple seppar requests for same function are allowed It never made it into a release, as the usage was too low-level and the feature is platform-dependent (only works with 32bit x86 for now!): you have to know the stack layout. But it proved to be quite useful. > >If writing millions of files is to expensive, > it would be too expensive in my case. > > >easier to think about a way to pass measurements to a script > >in a more light-weight way (e.g. via pipes). You should > >be able to implement your suggestions in your own script then. > > thanks, that might work for making histograms (esp. if I limit > data-gathering to that one function). > but it wouldn't work for printing the arguments of some representative > function invocations falling into a given range of runtimes. If one sends the measurement data for every function invocation via pipe to an external process, one can also send a string identifying the parameters with every invocation. Josef > > ilya > > On Mon, Sep 19, 2011 at 4:24 PM, Josef Weidendorfer > <Jos...@gm...> wrote: > > Hi Ilya, > > > > On Monday 19 September 2011, ilya shlyakhter wrote: > >> how hard would it be to implement the following extension to > >> cachegrind/callgrind: > >> ... > > > > this list (which I do not understand completely) suggests that > > you have some concrete use case in mind. What do you want to > > accomplish in the end? > > > >> right now, when reporting the cost of a function, the cost of all > >> invocations is aggregated together. > >> (callgrind separates the invocations by caller, but that's as > >> fine-grained as it goes). > > > > Hmm.. you can also dump counters to a file anytime you want, > > e.g. with "--dump-after=<func>". You then can post-process > > the files, and do your own statistics on it. > > For how to parse the files in PERL, see "callgrind_annotate". > > > > If writing millions of files is to expensive, it probably is > > easier to think about a way to pass measurements to a script > > in a more light-weight way (e.g. via pipes). You should > > be able to implement your suggestions in your own script then. > > > >> the proposed extension would do the following: > >> - for each function (or for requested functions), for a chosen > >> statistic (Ir, DLmr, etc), produce a _histogram_ of invocation costs. > >> in each bin, keep the # of invocations and their total cost. > > > >> an additional extension would let the user print their own debug > >> information for representative > >> invocations falling into a given bin. this would work only for > >> completely deterministic user programs. > >> - there would be a valgrind client request, VALGRIND_SHOULD_PRINT, > >> which the user would > >> add to their function to tell them whether to print debug > >> information for this invocation: > >> > >> void myFunc( ComplexStructure *arg ) { > >> if( VALGRIND_SHOULD_PRINT ) > >> arg->print(); > >> ... > >> } > >> - there would be an option to cachegrind/callgrind to record and > >> save in an output file, for a given function and > >> a given range of costs, the "invocation ids" of some small number > >> of invocations of that function in that cost range. > >> "invocation id" of a function is simply the order number of its > >> invocation (e.g. the 134701st invocation of this function). > >> > >> - there would be another option to cachegrind/callgrind to read the > >> file recorded above and to have VALGRIND_SHOULD_PRINT return 1 > >> for the recorded invocations, and 0 for all others. > > > > I don't understand the benefit. How would you use that feature? > > > >> this would allow finer-grained profiling than is currently possible. > >> how hard would this be to add? > > > > If you come up with a patch, you still need to prove that it is worth > > merging and maintaining. It seems easier to me to try to come up > > with a general solution which allows users to implement their own > > post-processing/statistics, as mentioned above. > > > > E.g. for adding histograms, you not only need to change cachegrind/callgrind, > > but also extend the format and parsers, such as {cg,callgrind}_annotate, > > and the KCachegrind GUI. > > > > That said, it would be cool to have histograms. > > > > Josef > > > >> > >> thanks, > >> > >> ilya > >> > >> ------------------------------------------------------------------------------ > >> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA > >> Learn about the latest advances in developing for the > >> BlackBerry® mobile platform with sessions, labs & more. > >> See new tools and technologies. Register for BlackBerry® DevCon today! > >> http://p.sf.net/sfu/rim-devcon-copy1 > >> _______________________________________________ > >> Valgrind-developers mailing list > >> Val...@li... > >> https://lists.sourceforge.net/lists/listinfo/valgrind-developers > >> > > > > > > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |
|
From: ilya s. <ily...@gm...> - 2011-09-20 20:26:49
|
> of myfunc() influence its runtime. Callgrind would allow you to embed > information for bins of this parameter setting into the function name. > E.g. for myfunc(int a), the profile results would show you two functions: > > "myfunc:a>=5" : 90 calls, 1 billion instructions executed > "myfunc:a<5" : 5 calls, 2 million instructions executed Thanks, that's very helpful, and is actually another extension I wanted to suggest :) But, I want to bin functions by their actual execution cost (whether in Ir, Dr etc), which is only known when the function has run; and then, print the arguments of a few representative function invocations falling into a given _cost_ bin. Here's a simpler request that would let me manually simulate what I want to do: Is it possible to create a client request that returns the current value of a particular statistic (Ir, Dr, ...) gathered by cachegrind/callgrind -- just as I can get the "processor time spent so far" using standard function calls? (I know I can dump this to a file, but that's too slow. Can I get this with a simple client request/API function call?) I could then do with that statistic what I now do with runtime (which isn't too reliable) -- both make histograms and print the args for representative invocations falling into a given bin. Thanks a lot, ilya On Tue, Sep 20, 2011 at 8:54 AM, Josef Weidendorfer <Jos...@gm...> wrote: > On Monday 19 September 2011, ilya shlyakhter wrote: >> I've been doing this manually by manually timing myFunc(), building a >> histogram of its runtimes, finding which bins >> dominate the histogram; then, looking at the inputs of some >> representative invocations that fall in these runtime bins, >> by: putting a static counter inside the function so you know when >> you're on its N'th invocation; printing the counter >> value for the first few invocations whose runtime falls into the bins >> of interest; then adding code at the start of function, >> "if counter == one of the printed counter values, print the function arguments". >> Building this into the profiler would let me 1) avoid doing the above >> for each function I want to optimize, and 2) do this >> not just for runtime but for all other statistics gathered by >> cachegrind/callgrind. >> >> Does this explanation help, or should I write some more detailed examples? > > I think I got the idea of this 2-stage procedure. > > What about the following approach: you expect that some parameter settings > of myfunc() influence its runtime. Callgrind would allow you to embed > information for bins of this parameter setting into the function name. > E.g. for myfunc(int a), the profile results would show you two functions: > > "myfunc:a>=5" : 90 calls, 1 billion instructions executed > "myfunc:a<5" : 5 calls, 2 million instructions executed > > I think in the end you should be able to reach the same goal you get with > your 2-stage approach. > > We did such an extension in the past, see > http://www.lrr.in.tum.de/~kuestner/proper09.pdf > > Attached is a patch for that (not sure it applies against current SVN or > 3.6.1), with the following short doc: > > Format is > --separate-par=<fn_pattern>':'<intpar_num>[':'<intval>(','<intval>)*] > If no value is given, separate by every different value > Otherwise, this allows for multiple buckets: > x < intval1, intval1 < x <= intval2, ... , intvalX <= x > with x being the <intpar_num>s int parameter of functions > matching fn_pattern. > > Multiple seppar requests for same function are allowed > > It never made it into a release, as the usage was too low-level and > the feature is platform-dependent (only works with 32bit x86 for now!): > you have to know the stack layout. But it proved to be quite useful. > >> >If writing millions of files is to expensive, >> it would be too expensive in my case. >> >> >easier to think about a way to pass measurements to a script >> >in a more light-weight way (e.g. via pipes). You should >> >be able to implement your suggestions in your own script then. >> >> thanks, that might work for making histograms (esp. if I limit >> data-gathering to that one function). >> but it wouldn't work for printing the arguments of some representative >> function invocations falling into a given range of runtimes. > > If one sends the measurement data for every function invocation via > pipe to an external process, one can also send a string identifying > the parameters with every invocation. > > Josef > >> >> ilya >> >> On Mon, Sep 19, 2011 at 4:24 PM, Josef Weidendorfer >> <Jos...@gm...> wrote: >> > Hi Ilya, >> > >> > On Monday 19 September 2011, ilya shlyakhter wrote: >> >> how hard would it be to implement the following extension to >> >> cachegrind/callgrind: >> >> ... >> > >> > this list (which I do not understand completely) suggests that >> > you have some concrete use case in mind. What do you want to >> > accomplish in the end? >> > >> >> right now, when reporting the cost of a function, the cost of all >> >> invocations is aggregated together. >> >> (callgrind separates the invocations by caller, but that's as >> >> fine-grained as it goes). >> > >> > Hmm.. you can also dump counters to a file anytime you want, >> > e.g. with "--dump-after=<func>". You then can post-process >> > the files, and do your own statistics on it. >> > For how to parse the files in PERL, see "callgrind_annotate". >> > >> > If writing millions of files is to expensive, it probably is >> > easier to think about a way to pass measurements to a script >> > in a more light-weight way (e.g. via pipes). You should >> > be able to implement your suggestions in your own script then. >> > >> >> the proposed extension would do the following: >> >> - for each function (or for requested functions), for a chosen >> >> statistic (Ir, DLmr, etc), produce a _histogram_ of invocation costs. >> >> in each bin, keep the # of invocations and their total cost. >> > >> >> an additional extension would let the user print their own debug >> >> information for representative >> >> invocations falling into a given bin. this would work only for >> >> completely deterministic user programs. >> >> - there would be a valgrind client request, VALGRIND_SHOULD_PRINT, >> >> which the user would >> >> add to their function to tell them whether to print debug >> >> information for this invocation: >> >> >> >> void myFunc( ComplexStructure *arg ) { >> >> if( VALGRIND_SHOULD_PRINT ) >> >> arg->print(); >> >> ... >> >> } >> >> - there would be an option to cachegrind/callgrind to record and >> >> save in an output file, for a given function and >> >> a given range of costs, the "invocation ids" of some small number >> >> of invocations of that function in that cost range. >> >> "invocation id" of a function is simply the order number of its >> >> invocation (e.g. the 134701st invocation of this function). >> >> >> >> - there would be another option to cachegrind/callgrind to read the >> >> file recorded above and to have VALGRIND_SHOULD_PRINT return 1 >> >> for the recorded invocations, and 0 for all others. >> > >> > I don't understand the benefit. How would you use that feature? >> > >> >> this would allow finer-grained profiling than is currently possible. >> >> how hard would this be to add? >> > >> > If you come up with a patch, you still need to prove that it is worth >> > merging and maintaining. It seems easier to me to try to come up >> > with a general solution which allows users to implement their own >> > post-processing/statistics, as mentioned above. >> > >> > E.g. for adding histograms, you not only need to change cachegrind/callgrind, >> > but also extend the format and parsers, such as {cg,callgrind}_annotate, >> > and the KCachegrind GUI. >> > >> > That said, it would be cool to have histograms. >> > >> > Josef >> > >> >> >> >> thanks, >> >> >> >> ilya >> >> >> >> ------------------------------------------------------------------------------ >> >> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA >> >> Learn about the latest advances in developing for the >> >> BlackBerry® mobile platform with sessions, labs & more. >> >> See new tools and technologies. Register for BlackBerry® DevCon today! >> >> http://p.sf.net/sfu/rim-devcon-copy1 >> >> _______________________________________________ >> >> Valgrind-developers mailing list >> >> Val...@li... >> >> https://lists.sourceforge.net/lists/listinfo/valgrind-developers >> >> >> > >> > >> > >> >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense. >> http://p.sf.net/sfu/splunk-d2dcopy1 >> _______________________________________________ >> Valgrind-developers mailing list >> Val...@li... >> https://lists.sourceforge.net/lists/listinfo/valgrind-developers >> > > |
|
From: Josef W. <Jos...@gm...> - 2011-09-21 15:23:09
|
On Tuesday 20 September 2011, ilya shlyakhter wrote:
> > of myfunc() influence its runtime. Callgrind would allow you to embed
> > information for bins of this parameter setting into the function name.
> > E.g. for myfunc(int a), the profile results would show you two functions:
> >
> > "myfunc:a>=5" : 90 calls, 1 billion instructions executed
> > "myfunc:a<5" : 5 calls, 2 million instructions executed
>
> Thanks, that's very helpful, and is actually another extension I
> wanted to suggest :)
Ha!
> But, I want to bin functions by their actual execution cost (whether
> in Ir, Dr etc),
> which is only known when the function has run; and then, print the arguments
> of a few representative function invocations falling into a given _cost_ bin.
I am still wondering if you do not get the same information in the end with
above feature. For each bin for a given parameter set, you at least get the
sum of costs for all calls falling into the bin, as well as the average per
call. And you now the parameter settings, as the bins are choosen according
to parameters.
> Here's a simpler request that would let me manually simulate what I want to do:
>
> Is it possible to create a client request that returns the current
> value of a particular statistic (Ir, Dr, ...) gathered by
> cachegrind/callgrind -- just as I
> can get the "processor time spent so far" using standard function calls?
> (I know I can dump this to a file, but that's too slow. Can I get
> this with a simple client request/API function call?)
That should be easy, yes.
> I could then do with that statistic what I now do with runtime (which
> isn't too reliable) -- both make histograms and print the args for
> representative invocations
> falling into a given bin.
Printing a string including your parameters as part of the client request
also should be easy. Perhaps something like (if a is an parameter of your function):
CALLGRIND_PRINTONRETURN_1("Call number %C to %F(a = %d): Ir = %I[Ir]/%E[Ir]\n", a);
With the string printed when the current function is left, and some
placeholders implicitely defined, e.g.
%C: how often the current function was called
%F: function name of current function
%I[<event>]: inclusive cost spent in this function call for event type <event>
Then you also can define your own format for your postprocessing of choice.
On Wednesday 21 September 2011, ilya shlyakhter wrote:
> On Mon, Sep 19, 2011 at 4:24 PM, Josef Weidendorfer
> <Jos...@gm...> wrote:
> > E.g. for adding histograms, you not only need to change cachegrind/callgrind,
> > but also extend the format and parsers, such as {cg,callgrind}_annotate,
> > and the KCachegrind GUI.
>
> Not necessarily: you could do what you did for different function
> arguments -- for each function
> for which you want histograms, create a separate function name for
> each bin. And the current
> "callee map" function of KCachegrind could effectively display the histogram.
Hmm... sounds a little bit like misuse. But the bigger problem is ...
> Say you have a function myFunc() that's called 1,000,000 times.
> You create functions myFunc_1, myFunc_2, myFunc_3 to record counts
> from invocations
> that took (say) <100,000 Ir's, 100,000-1,000,000 Ir's and >1,000,000
> Ir's respectively.
> The callee map view would then show which group of calls takes the
> most resources.
>
> So, after each myFunc() invocation, you would have to check that
> invocation's Ir count,
> and move all the counts recorded under this invocation to one of the
> myFunc_? records.
... this only would be possible for the inclusive cost of the function call
when returning, but it is not possible after the fact for all self cost spent
inside the function, as I just add to a counter with the self cost since
program start. And you also need to change it somehow for callee's of the function.
That is exactly the reason why it works with the parameter bins: the parameter
is known when the function is entered, and at that point of time I still can
change the used function name.
You see, changing functions after the fact is quite tricky, and is different from
collecting histogram data.
Josef
|
|
From: ilya s. <ily...@gm...> - 2011-09-21 21:15:54
|
> I am still wondering if you do not get the same information in the end with
> above feature. For each bin for a given parameter set, you at least get the
> sum of costs for all calls falling into the bin, as well as the average per
> call. And you now the parameter settings, as the bins are choosen according
> to parameters.
I think, binning by parameters solves a different problem than binning by cost.
There isn't always a simple parameter by which to bin. Say my function works
on trees, and takes longer on less-balanced trees. There isn't a
simple "balancedness"
parameter passed to the function; and trying to compute it on-the-fly
would be expensive
and could change the profile. Or say that some subset of invocations
has bad cache
performance; this might not be related so straightforwardly to a
parameter value.
If I knew what aspects of the input affect my runtime, I'd bin by those aspects;
but that's what I'm trying to find out :) I want to know, what's
common to those
invocations taking a disproportionate amount of the function's total cost?
>> Is it possible to create a client request that returns the current
> value of a particular statistic (Ir, Dr, ...)
>That should be easy, yes.
That would be a huge help.
>Printing a string including your parameters as part of the client request
>also should be easy.
It'd be much too expensive to print every invocation (there are
millions of invocations;
I do profiling on a large inputs to see the order-of-growth behavior.)
I'd like to print a small representative sample of invocations falling
into a given cost bin.
Also, my function consumes its inputs to produce the result (e.g.
destructively merging two lists),
so if I get the cost at the end, it's too late to print the inputs --
hence the two-pass approach.
>... this only would be possible for the inclusive cost of the function call
>when returnin
That would be fine. I mostly profile using inclusive costs anyway.
>You see, changing functions after the fact is quite tricky, and is different from
>collecting histogram data.
But you can predefine a separate function for each bin the user specifies,
and move the inclusive counts to that function after each invocation of the
original function, right?
So, callgrind would have a command-line option "split function F into
K bins (range MIN-MAX) by cost measure X"
(where X is Ir, Dr etc).
But even just collecting histogram data would be great.
Maybe, saved to an additional file, to keep the trace format for the main file?
On Wed, Sep 21, 2011 at 11:22 AM, Josef Weidendorfer
<Jos...@gm...> wrote:
> On Tuesday 20 September 2011, ilya shlyakhter wrote:
>> > of myfunc() influence its runtime. Callgrind would allow you to embed
>> > information for bins of this parameter setting into the function name.
>> > E.g. for myfunc(int a), the profile results would show you two functions:
>> >
>> > "myfunc:a>=5" : 90 calls, 1 billion instructions executed
>> > "myfunc:a<5" : 5 calls, 2 million instructions executed
>>
>> Thanks, that's very helpful, and is actually another extension I
>> wanted to suggest :)
>
> Ha!
>
>> But, I want to bin functions by their actual execution cost (whether
>> in Ir, Dr etc),
>> which is only known when the function has run; and then, print the arguments
>> of a few representative function invocations falling into a given _cost_ bin.
>
> I am still wondering if you do not get the same information in the end with
> above feature. For each bin for a given parameter set, you at least get the
> sum of costs for all calls falling into the bin, as well as the average per
> call. And you now the parameter settings, as the bins are choosen according
> to parameters.
>
>> Here's a simpler request that would let me manually simulate what I want to do:
>>
>> Is it possible to create a client request that returns the current
>> value of a particular statistic (Ir, Dr, ...) gathered by
>> cachegrind/callgrind -- just as I
>> can get the "processor time spent so far" using standard function calls?
>> (I know I can dump this to a file, but that's too slow. Can I get
>> this with a simple client request/API function call?)
>
> That should be easy, yes.
>
>> I could then do with that statistic what I now do with runtime (which
>> isn't too reliable) -- both make histograms and print the args for
>> representative invocations
>> falling into a given bin.
>
> Printing a string including your parameters as part of the client request
> also should be easy. Perhaps something like (if a is an parameter of your function):
>
> CALLGRIND_PRINTONRETURN_1("Call number %C to %F(a = %d): Ir = %I[Ir]/%E[Ir]\n", a);
>
> With the string printed when the current function is left, and some
> placeholders implicitely defined, e.g.
> %C: how often the current function was called
> %F: function name of current function
> %I[<event>]: inclusive cost spent in this function call for event type <event>
>
> Then you also can define your own format for your postprocessing of choice.
>
> On Wednesday 21 September 2011, ilya shlyakhter wrote:
>> On Mon, Sep 19, 2011 at 4:24 PM, Josef Weidendorfer
>> <Jos...@gm...> wrote:
>> > E.g. for adding histograms, you not only need to change cachegrind/callgrind,
>> > but also extend the format and parsers, such as {cg,callgrind}_annotate,
>> > and the KCachegrind GUI.
>>
>> Not necessarily: you could do what you did for different function
>> arguments -- for each function
>> for which you want histograms, create a separate function name for
>> each bin. And the current
>> "callee map" function of KCachegrind could effectively display the histogram.
>
> Hmm... sounds a little bit like misuse. But the bigger problem is ...
>
>> Say you have a function myFunc() that's called 1,000,000 times.
>> You create functions myFunc_1, myFunc_2, myFunc_3 to record counts
>> from invocations
>> that took (say) <100,000 Ir's, 100,000-1,000,000 Ir's and >1,000,000
>> Ir's respectively.
>> The callee map view would then show which group of calls takes the
>> most resources.
>>
>> So, after each myFunc() invocation, you would have to check that
>> invocation's Ir count,
>> and move all the counts recorded under this invocation to one of the
>> myFunc_? records.
>
> ... this only would be possible for the inclusive cost of the function call
> when returning, but it is not possible after the fact for all self cost spent
> inside the function, as I just add to a counter with the self cost since
> program start. And you also need to change it somehow for callee's of the function.
>
> That is exactly the reason why it works with the parameter bins: the parameter
> is known when the function is entered, and at that point of time I still can
> change the used function name.
>
> You see, changing functions after the fact is quite tricky, and is different from
> collecting histogram data.
>
> Josef
>
|