From: Josef W. <Jos...@gm...> - 2004-06-02 14:41:24
|
Hi, I finally found time to check out the call-graph feature of OProfile, and I have some questions regarding this: As I understand it, OProfile in call-graph mode still does statistical sampling, but only records the context in more detail, allowing to distinguish self costs depending on caller chains of functions. But opstack provides some child costs. How is this calculated? Without doing instrumentation (like gprof), IMHO the only way to get sensible data for child counts is to always trace back way up to the top of the stack (i.e. main). Otherwise, it is easy possible that calls happening are never detected, and thus, you only have partitial results. Am I right here, and wouldn't it be more correct for a post processing tool to simple give out self costs for call chains? Related is a question about recursive cycles. I get the following output when profiling the rendering of a webpage in konqueror for RenderObject::findNextLayer (this function is calling itself quite often): self % child % app name symbol name 16 0.9981 124 1.6309 libkhtml.so.4.2.0 khtml:: RenderContainer::appendChildNode(khtml::RenderObject*) 6 0.3743 157 2.0650 libkhtml.so.4.2.0 khtml:: RenderContainer::insertChildNode(khtml::RenderObject*, khtml::Rende rObject*) 1581 98.6276 7322 96.3041 libkhtml.so.4.2.0 khtml:: RenderObject::findNextLayer(khtml::RenderLayer*, khtml::RenderObjec t*, bool) 1581 4.4856 7322 20.7740 libkhtml.so.4.2.0 khtml:: RenderObject::findNextLayer(khtml::RenderLayer*, khtml::RenderObject* , bool) 1581 59.1249 7322 100.000 libkhtml.so.4.2.0 khtml:: RenderObject::findNextLayer(khtml::RenderLayer*, khtml::RenderObjec t*, bool) 150 5.6096 0 0 libkhtml.so.4.2.0 __i686. get_pc_thunk.bx 282 10.5460 0 0 libkhtml.so.4.2.0 khtml:: RenderContainer::firstChild() const 323 12.0793 0 0 libkhtml.so.4.2.0 khtml:: RenderBox::layer() const 190 7.1055 0 0 libkhtml.so.4.2.0 khtml:: RenderObject::firstChild() const 148 5.5348 0 0 libkhtml.so.4.2.0 khtml:: RenderObject::layer() const The function is listed both as caller and as callee of itself, but with different "child%". Shouldn't this be the same? To be honest, I can not make any sense out of child costs for recursive calls. I looked for a way to integrate the call-graph feature of OProfile for visualization with KCachegrind. First, I simply want to show self costs for call chains. But I don't see a way to extract the sampled call chains from the output of any command line tool or even from the data in /var/lib/ oprofile/samples. It looks like oprofiled does some postprocessing here, and throws away the call chains? Thanks for the powerful tool, Josef |
From: John L. <le...@mo...> - 2004-06-02 15:00:51
|
On Wed, Jun 02, 2004 at 04:28:28PM +0200, Josef Weidendorfer wrote: > Related is a question about recursive cycles. I get the following output when > profiling the rendering of a webpage in konqueror for Phil, can you look at this? > I looked for a way to integrate the call-graph feature of OProfile for > visualization with KCachegrind. First, I simply want to show self costs for > call chains. But I don't see a way to extract the sampled call chains from > the output of any command line tool or even from the data in /var/lib/ > oprofile/samples. The libpp directory contains the client interface that opstack uses. It should be pretty easy to see how to use it by looking at opstack.cpp regards john |
From: Josef W. <Jos...@gm...> - 2004-06-02 15:48:51
|
On Wednesday 02 June 2004 17:00, John Levon wrote: > On Wed, Jun 02, 2004 at 04:28:28PM +0200, Josef Weidendorfer wrote: > > I looked for a way to integrate the call-graph feature of OProfile for > > visualization with KCachegrind. First, I simply want to show self costs > > for call chains. But I don't see a way to extract the sampled call chains > > from the output of any command line tool or even from the data in > > /var/lib/ oprofile/samples. > > The libpp directory contains the client interface that opstack uses. It > should be pretty easy to see how to use it by looking at opstack.cpp Hi John, thanks for the fast reply! In libpp/callgraph_container.h there are classes storing samples for call arcs involving 2 functions (arc_recorder/callgraph_container). I thought somewhere there has to be a function/class where I can get samples for call chains with e.g. 5 functions (A>B>C>D>E) when I said "opcontrol -c=5". opstack gives out 2-function call relationships only, and not e.g. "There were 10 samples in function E with backtrace A>B>C>D>E". But your kernel modules seems to measure this...? Cheers, Josef PS: Of course, I can write a PERL script for conversion of the data from opstack to my format. But as I said in the last mail, I'm not yet convinced about the usefulness of the child costs. > > regards > john |
From: John L. <le...@mo...> - 2004-06-02 16:37:28
|
On Wed, Jun 02, 2004 at 05:48:43PM +0200, Josef Weidendorfer wrote: > In libpp/callgraph_container.h there are classes storing samples for call arcs > involving 2 functions (arc_recorder/callgraph_container). I thought somewhere > there has to be a function/class where I can get samples for call chains with > e.g. 5 functions (A>B>C>D>E) when I said "opcontrol -c=5". opstack gives out > 2-function call relationships only, and not e.g. "There were 10 samples in > function E with backtrace A>B>C>D>E". But your kernel modules seems to > measure this...? It's difficult to store the full backtrace information efficiently, and of dubious utility. So we do what gprof does and store only A>B, B>C, C>D, D>E regards john |
From: Josef W. <Jos...@gm...> - 2004-06-02 18:15:13
|
Hi, On Wednesday 02 June 2004 18:36, John Levon wrote: > On Wed, Jun 02, 2004 at 05:48:43PM +0200, Josef Weidendorfer wrote: > > In libpp/callgraph_container.h there are classes storing samples for call > > arcs involving 2 functions (arc_recorder/callgraph_container). I thought > > somewhere there has to be a function/class where I can get samples for > > call chains with e.g. 5 functions (A>B>C>D>E) when I said "opcontrol > > -c=5". opstack gives out 2-function call relationships only, and not e.g. > > "There were 10 samples in function E with backtrace A>B>C>D>E". But your > > kernel modules seems to measure this...? > > It's difficult to store the full backtrace information efficiently, and I know that this is quite difficult in the scope of realtime measurement. > of dubious utility. So we do what gprof does and store only A>B, B>C, > C>D, D>E GProf has full information on call relationship because it does instrumentation, but OProfile does sampling only, and AFAICS it can't make sure that no call relationship is lost (unless you always do the backtrace up to main). The algorithm GProf is using to progagate child costs up the call chain can't work here. Cheers, Josef > > regards > john |
From: John L. <le...@mo...> - 2004-06-02 22:10:26
|
On Wed, Jun 02, 2004 at 08:15:05PM +0200, Josef Weidendorfer wrote: > GProf has full information on call relationship because it does > instrumentation, but OProfile does sampling only, and AFAICS it can't make > sure that no call relationship is lost (unless you always do the backtrace up > to main). The algorithm GProf is using to progagate child costs up the call > chain can't work here. Where in the gprof file format is this kept? john |
From: Philippe E. <ph...@wa...> - 2004-06-02 16:20:21
|
On Wed, 02 Jun 2004 at 16:28 +0000, Josef Weidendorfer wrote: > Hi, > > I finally found time to check out the call-graph feature of OProfile, > and I have some questions regarding this: As I understand it, OProfile > in call-graph mode still does statistical sampling, but only records the > context in more detail, allowing to distinguish self costs depending on > caller chains of functions. > But opstack provides some child costs. How is this calculated? Without > doing instrumentation (like gprof), IMHO the only way to get sensible data > for child counts is to always trace back way up to the top of the stack (i.e. for each incoming sample we trace back to the top of stack but we don't store the call chain for each sample, too costly, instead we record only (from_eip, to_eip). For a sample we don't know the whole chain, only the parent, the parent know his parent etc. This mean than after recording the chain are mixed into samples files. > main). Otherwise, it is easy possible that calls happening are never > detected, and thus, you only have partitial results. Am I right here, and > wouldn't it be more correct for a post processing tool to simple give out self > costs for call chains? two limitations, one obvious: we trace only when receiving samples, the second stated above: we don't record the whole chaine. > > Related is a question about recursive cycles. I get the following > output when profiling the rendering of a webpage in konqueror for > RenderObject::findNextLayer (this function is calling itself quite often): Shrinked a bit the ouput. self % child % symbol name 16 0.9981 124 1.6309 RenderContainer::appendChildNode 6 0.3743 157 2.0650 RenderContainer::insertChildNode 1581 98.6276 7322 96.3041 RenderObject::findNextLayer 1581 4.4856 7322 20.7740 RenderObject::findNextLayer 1581 59.1249 7322 100.000 RenderObject::findNextLayer 150 5.6096 0 0 __i686.get_pc_thunk.bx 282 10.5460 0 0 RenderContainer::firstChild() const 323 12.0793 0 0 RenderBox::layer() const 190 7.1055 0 0 RenderObject::firstChild() const 148 5.5348 0 0 RenderObject::layer() const > The function is listed both as caller and as callee of itself, but with > different "child%". Shouldn't this be the same? To be honest, I can not make > any sense out of child costs for recursive calls. findNextLayer received 1581 (4.48% of sampels). The 7322 count must be interpreted as: we received 1581 samples, counts cumulation along the call stack give 7322 samples, since we don't know the stack depth (remember we only have one record (from_eip, to_eip) for the call findNextLayer -> findNextLayer) this is only meaningfull to get an estimation on the average recursive call depth. Second the child count, the sum of child count must give 100%, here it look like findNextLayer is costly but not the other child. Last word, this is experimental stuff, I wrote the output code but with recursive call I've real problem to interpret it ... > > I looked for a way to integrate the call-graph feature of OProfile for > visualization with KCachegrind. First, I simply want to show self costs for > call chains. But I don't see a way to extract the sampled call chains from > the output of any command line tool or even from the data in /var/lib/ > oprofile/samples. It looks like oprofiled does some postprocessing here, and > throws away the call chains? I think the whole problem come from the way we record call stack as a (from_eip, to_eip) not as a whole caller chain. regards, Phil |
From: Josef W. <Jos...@gm...> - 2004-06-02 18:02:46
|
Hi Phil, thanks for the explanation. I know this is experimental, but looks like a valuable addition to OProfile. I myself already had quite a hard time with recursive functions in KCachegrind, and I came to the conclusion that there is no way around cycle detection, and grouping functions calling themselve recursively into artificial functions (the same is done in GProf), unless you have full call chains from the top. On Wednesday 02 June 2004 20:22, Philippe Elie wrote: > for each incoming sample we trace back to the top of stack but we don't > store the call chain for each sample, too costly, instead we record > only (from_eip, to_eip). For a sample we don't know the whole chain, only > the parent, the parent know his parent etc. This mean than after recording > the chain are mixed into samples files. But this way you throw away infomation: * If there are A>B>C and D>B>E, you lose the information that there never was e.g. A>B>E. * instead of 2 chains A>B>C and A>C>B, you see a not existing cycle B<>C. And these cases are not retrieved at least with C++ code, even more with QTs signal/slots (some callback mechanism). AFAIK, this is additionally limited by the --callgraph depth parameter. Thus, the user can control what "costly" does mean. Is it completly out of scope to provide call chains to post processing tools sometimes in the future? BTW, Calltree (my extension of cachegrind/valgrind to track call graphs) is able to dump out call chains of a maximum specified length. For 20, I get half a million of different call chains for a konqueror startup ;-) I think that the output of opstack currently only is trustful if you run e.g. with "opcontrol -c=100", to be sure to not lose any call relationships for samples measured. Example: A big program with main() calling 5 functions in reality. With a depth of say 5 you can't make sure that all 5 functions appear as childs of main(). Especially, it's possible that the call path where most of the samples happen, even doesn't appear as child of main(), but perhaps isolated, or - more misleading - as child of some function because 1 sample enclosed such an arc. Please prove me wrong here ;-) > Shrinked a bit the ouput. > > self % child % symbol name > 16 0.9981 124 1.6309 RenderContainer::appendChildNode > 6 0.3743 157 2.0650 RenderContainer::insertChildNode > 1581 98.6276 7322 96.3041 RenderObject::findNextLayer > 1581 4.4856 7322 20.7740 RenderObject::findNextLayer > 1581 59.1249 7322 100.000 RenderObject::findNextLayer > 150 5.6096 0 0 __i686.get_pc_thunk.bx > 282 10.5460 0 0 RenderContainer::firstChild() const > 323 12.0793 0 0 RenderBox::layer() const > 190 7.1055 0 0 RenderObject::firstChild() const > 148 5.5348 0 0 RenderObject::layer() const > > > The function is listed both as caller and as callee of itself, but with > > different "child%". Shouldn't this be the same? To be honest, I can not > > make any sense out of child costs for recursive calls. > > findNextLayer received 1581 (4.48% of sampels). The 7322 count must be > interpreted as: we received 1581 samples, counts cumulation along the > call stack give 7322 samples, since we don't know the stack depth (remember > we only have one record (from_eip, to_eip) for the call findNextLayer -> > findNextLayer) this is only meaningfull to get an estimation on the average > recursive call depth. Hmm... and what's the average depth in this case? It depends on where the function calls itself: at the beginning, at the end... I see that 1581 samples where measured while in findNextLayer. But there are no call arcs to findNextLayer which include these 1581. That's because I used a maximal depth of 3 (see above problem with missed arcs). Moreover, I am sure that the 7322 increases if I use a bigger depth parameter. > Second the child count, the sum of child count must give 100%, here it > look like findNextLayer is costly but not the other child. If you show recursive arc costs, this can be more than 100%: Suppose a simple program with the only call path A>B>B>B>C, with 90% of the samples happening in C. Then the recursive cost B>B can be as high as 180% percent, as the 90% go 2 times over this arc. With this example, it should be obvious to better not show recursive arc costs, as this confuses the user. > Last word, this is experimental stuff, I wrote the output code but > with recursive call I've real problem to interpret it ... I only want to make you aware of some problems in the current implementation. And I would be happy if my comments lead to improvements in OProfile. My experience with tools is this (debugger, profiler, etc): if people can't make sense of the data they get from a tool, they assume it to be buggy, and usually never will use it again, especially if it's open-source. Cheers, Josef |
From: John L. <le...@mo...> - 2004-06-02 22:09:30
|
On Wed, Jun 02, 2004 at 08:02:34PM +0200, Josef Weidendorfer wrote: > I think that the output of opstack currently only is trustful if you run e.g. > with "opcontrol -c=100", to be sure to not lose any call relationships for I do not agree. Certainly there are issues in complicated pieces of code, especially with recursion, but for the main part, the profile results tend to be pretty realistic and understandable. > samples measured. Example: A big program with main() calling 5 functions in > reality. With a depth of say 5 you can't make sure that all 5 functions > appear as childs of main(). I would not recommend such a low depth. However, there are circumstances where this is indeed enough. We should probably document a reasonable backtrace depth. regards john |
From: Josef W. <Jos...@gm...> - 2004-06-03 09:22:06
|
On Thursday 03 June 2004 00:10, John Levon wrote: > On Wed, Jun 02, 2004 at 08:15:05PM +0200, Josef Weidendorfer wrote: > > GProf has full information on call relationship because it does > > instrumentation, but OProfile does sampling only, and AFAICS it can't > > make sure that no call relationship is lost (unless you always do the > > backtrace up to main). The algorithm GProf is using to progagate child > > costs up the call chain can't work here. > > Where in the gprof file format is this kept? About call arcs? The format has different sections, and one of them holds the call count for the call arcs appearing in a program. For the gprof algorithm, look at the original gprof paper, e.g. at http://docs.freebsd.org/44doc/psd/18.gprof/paper.pdf On Thursday 03 June 2004 00:09, John Levon wrote: > On Wed, Jun 02, 2004 at 08:02:34PM +0200, Josef Weidendorfer wrote: > > I think that the output of opstack currently only is trustful if you run > > e.g. with "opcontrol -c=100", to be sure to not lose any call > > relationships for > > I do not agree. Certainly there are issues in complicated pieces of > code, especially with recursion, but for the main part, the profile > results tend to be pretty realistic and understandable. OK. As this is a pragmatically comprimise between quality of the measurement strategy and quality of measurement results (low overhead), it depends on the program to be profiled and the experience of the user. > We should probably document a reasonable backtrace depth. Perhaps additionally some statistics output in the log file describing in which portion of the samples the backtraces reached the top with the current depth limit? Yesterday I did a konqueror profile with -c=100, and it looks actually quite good, as main() is appearing at the top. One issue about the output (Current CVS version 0.8.1cvs): ------------------------------------------------------------------------------- 0 0 8323 100.000 libc.so.6 __libc_start_main 0 0 8283 86.8785 konqueror main 0 0 8281 95.3593 lib_konqueror.so kdemain 3 100.000 403 4.6407 ld-2.3.3.so _dl_runtime_resolve ------------------------------------------------------------------------------- ... 1 0.6944 6125 9.8330 libqt-mt.so QObject::activate_signal 0 0 8281 13.2943 lib_konqueror.so kdemain 0 0 8283 13.2975 konqueror main 3 0.0315 403 4.2270 ld-2.3.3.so _dl_runtime_resolve 19 50.0000 2038 84.1801 ld-2.3.3.so _dl_lookup_symbol_x 19 50.0000 383 15.8199 ld-2.3.3.so fixup ------------------------------------------------------------------------------- runtime_resolve is the function doing lazy linking of symbols among shared libraries, and its called in a huge number of places. The line explaining the call main->runtime_resolve gets an absolute child cost of 403 which is wrong IMHO, as 403 is total cost of runtime_resolve. Shouldn't the call arc number only represent the cost going over this arc, i. e. the absolute cost of main->runtime_resolve be the same as main->runtime_resolve (which is given as 8283)? I estimate a cost of maximally 2 here. Josef > > john |
From: John L. <le...@mo...> - 2004-06-03 12:10:18
|
On Thu, Jun 03, 2004 at 11:21:49AM +0200, Josef Weidendorfer wrote: > > Where in the gprof file format is this kept? > > About call arcs? The format has different sections, and one of them holds the > call count for the call arcs appearing in a program. For the gprof algorithm, > look at the original gprof paper, e.g. at > http://docs.freebsd.org/44doc/psd/18.gprof/paper.pdf Indeed. Now look at the file format. It does NOT store full backtraces, but only A>B, B>C, C>D. As the paper explains the static call graph is used but this is NOT present in the sample file format. In case you missed my point here: there's nothing preventing oprofile using the same analysis, since we have a) the dynamic two-valued call graph data b) the static call graph (from the binary) In particular, we can and do output in gprof format. There is no requirement for storing full backtraces (at least, not if you want to bring up gprof, since that doesn't either). > ------------------------------------------------------------------------------- > 0 0 8323 100.000 libc.so.6 __libc_start_main > 0 0 8283 86.8785 konqueror main > 0 0 8281 95.3593 lib_konqueror.so kdemain > 3 100.000 403 4.6407 ld-2.3.3.so _dl_runtime_resolve > ------------------------------------------------------------------------------- > ... > 1 0.6944 6125 9.8330 libqt-mt.so QObject::activate_signal > 0 0 8281 13.2943 lib_konqueror.so kdemain > 0 0 8283 13.2975 konqueror main > 3 0.0315 403 4.2270 ld-2.3.3.so _dl_runtime_resolve > 19 50.0000 2038 84.1801 ld-2.3.3.so _dl_lookup_symbol_x > 19 50.0000 383 15.8199 ld-2.3.3.so fixup > ------------------------------------------------------------------------------- > > runtime_resolve is the function doing lazy linking of symbols among shared > libraries, and its called in a huge number of places. > The line explaining the call main->runtime_resolve gets an absolute child cost > of 403 which is wrong IMHO, as 403 is total cost of runtime_resolve. > Shouldn't the call arc number only represent the cost going over this arc, i. > e. the absolute cost of main->runtime_resolve be the same as > main->runtime_resolve (which is given as 8283)? I estimate a cost of maximally > 2 here. I don't know. Phil, what do you think? regards john |
From: Josef W. <Jos...@gm...> - 2004-06-04 10:00:33
|
[missed the list] Hi, On Thursday 03 June 2004 14:10, John Levon wrote: > On Thu, Jun 03, 2004 at 11:21:49AM +0200, Josef Weidendorfer wrote: > > > Where in the gprof file format is this kept? > > > > About call arcs? The format has different sections, and one of them holds > > the call count for the call arcs appearing in a program. For the gprof > > algorithm, look at the original gprof paper, e.g. at > > http://docs.freebsd.org/44doc/psd/18.gprof/paper.pdf > > Indeed. Now look at the file format. It does NOT store full backtraces, > but only A>B, B>C, C>D. As the paper explains the static call graph is > used but this is NOT present in the sample file format. Sorry about any misunderstanding here. I never claimed GProf raw data includes full backtraces. This is in fact not needed for the gprof analysis. > In case you missed my point here: there's nothing preventing oprofile > using the same analysis, since we have > > a) the dynamic two-valued call graph data > b) the static call graph (from the binary) The GProf algorithm uses (see section 3 of the paper): - arcs appearing in the static and dynamic call graph, - call counts of arcs (C^r_e), - (self) cost of routines. In OProfile, when you not traverse full backtraces, you only get part of the arcs in the program (your point a). Even if these arcs are available in the static call graph, the algorithm needs the traversal count. The prerequisite for gprof is not met. So I fail to see your point. > In particular, we can and do output in gprof format. Ah, sorry, I missed this. BTW, one problem of gprof is that the format currently only can handle one binary image, making the analysis of code using shared libraries impossible. > There is no requirement for storing full backtraces (at least, not if > you want to bring up gprof, since that doesn't either). For gprof, full backtraces are indeed not needed. But you need the arcs of the full backtraces for gprof to give useful results. But gprof aside, my original point was that I think that backtraces are a useful information for postprocessing tools. I see that its storage is costly (OTOH, you already store the backtraces in the kernel buffer). Cheers, Josef > > regards > john |
From: John L. <le...@mo...> - 2004-06-04 14:53:14
|
On Fri, Jun 04, 2004 at 11:53:24AM +0200, Josef Weidendorfer wrote: > The GProf algorithm uses (see section 3 of the paper): > - arcs appearing in the static and dynamic call graph, > - call counts of arcs (C^r_e), > - (self) cost of routines. Essentially, we have all of these (in a statistical manner). We record the dynamic arcs, and our "self cost" is approximated via the normal histogram sampling. > In OProfile, when you not traverse full backtraces, you only get part of the > arcs in the program (your point a). Even if these arcs are available in the > static call graph, the algorithm needs the traversal count. The prerequisite > for gprof is not met. > So I fail to see your point. Again, I never recommended a backtrace depth of 3 or whatever low value you set it to. > Ah, sorry, I missed this. > BTW, one problem of gprof is that the format currently only can handle one > binary image, making the analysis of code using shared libraries impossible. You can, however, use opgprof to output data for a shared library for one binary, and then use gprof on that. I'd be interested to know if the opgprof/gprof output along with a sensible backtrace depth is "good enough" for your needs. regards, john |
From: Josef W. <Jos...@gm...> - 2004-06-04 15:47:00
|
Hi John, On Friday 04 June 2004 16:52, John Levon wrote: > On Fri, Jun 04, 2004 at 11:53:24AM +0200, Josef Weidendorfer wrote: > > The GProf algorithm uses (see section 3 of the paper): > > - arcs appearing in the static and dynamic call graph, > > - call counts of arcs (C^r_e), > > - (self) cost of routines. > > Essentially, we have all of these (in a statistical manner). We record > the dynamic arcs, and our "self cost" is approximated via the normal > histogram sampling. As I said, I believe full backtraces are needed, but as with recursion, the stack can get arbitrary deep, and the overhead will be quite high. I thought about possible solutions to this problem. If you would have the possibility to see the range of stack touched among two samples in one process, it would be possible to do a partitial backtrace only. At least for processes, one could modify return addresses while doing a backtrace, and inject code into the address space which updates the highest stack address touched. But perhaps this idea is crazy ;-) > > Ah, sorry, I missed this. > > BTW, one problem of gprof is that the format currently only can handle > > one binary image, making the analysis of code using shared libraries > > impossible. > > You can, however, use opgprof to output data for a shared library for one > binary, and then use gprof on that. Some time ago, Jeremy Fitzhardinge made a profiler using Valgrind (vgprof) which outputs slightly modified gprof data: this data was able to have multiple histogram sections for all the shared libraries of a program. He also provided a patch to gprof (sent it to the binutils project) to be able to read this data. I think that officially enhancing the gprof format is the way to go here. > I'd be interested to know if the opgprof/gprof output along with a > sensible backtrace depth is "good enough" for your needs. But this approach can't cope with arcs crossing border of 2 shared libraries. Even if I want to profile the khtml component of konqueror alone, this one depends heavily on QT :-( I will come back with results if I find some time to do experiments; perhaps using libpp directly, as you suggested. Cheers, Josef > > regards, > john |
From: John L. <le...@mo...> - 2004-06-04 15:58:42
|
On Fri, Jun 04, 2004 at 05:45:49PM +0200, Josef Weidendorfer wrote: > As I said, I believe full backtraces are needed, but as with recursion, the > stack can get arbitrary deep, and the overhead will be quite high. Yeah. > I thought about possible solutions to this problem. If you would have > the possibility to see the range of stack touched among two samples in > one process, it would be possible to do a partitial backtrace only. > At least for processes, one could modify return addresses while doing > a backtrace, and inject code into the address space which updates the > highest stack address touched. But perhaps this idea is crazy ;-) I don't know what you mean here. > multiple histogram sections for all the shared libraries of a program. He > also provided a patch to gprof (sent it to the binutils project) to be able > to read this data. I think that officially enhancing the gprof format is the > way to go here. I think people are refusing to change gprof in this way. > > I'd be interested to know if the opgprof/gprof output along with a > > sensible backtrace depth is "good enough" for your needs. > > But this approach can't cope with arcs crossing border of 2 shared libraries. That's correct. regards john |