From: Philippe E. <ph...@wa...> - 2003-08-16 14:52:35
|
actually the output is: $ opreport -l vma samples % samples % app name symbol name 0804a5e0 103306 16.8419 529 14.6133 oprofiled opd_put_sample 0804cb00 42776 6.9737 345 9.5304 oprofiled odb_insert we need to add optionnally a field to provide metrics comparison. like: vma samples % samples % %1 - %2 app name symbol name 0804a5e0 103306 16.8419 529 14.6133 (+2.2286) oprofiled opd_put_sample 0804cb00 42776 6.9737 345 9.5304 (-2.5567) oprofiled odb_insert here the metrics is %1 - %0 where %x is the counter number (numbered from left to right). The above example is for two counter but should works for any number of counter. This is for opreport but a similar interface would be used for opdiff. So I think we need a --compare flags to add this field, a --metrics= to allow specifying the metrics (with a default metrics we must choose) and a --sort to change sort order. --compare no parameters, when specified add a field for all counter expect the counter at left in output --metrics="%N / %0" I've already code to implement that, but only a vague idea if it's usefull and how we must specify the user interface Graydon, you worked on something similar through script, any comments ? Especially what do we need to avoid you update your python bindings each time an internal structure of oprofile change ? regards, Phil |
From: graydon h. <gr...@re...> - 2003-08-18 15:19:20
|
Philippe Elie <ph...@wa...> writes: > Graydon, you worked on something similar through script, > any comments ? oh, neat! yes, a couple: - you'll want a way to sort & filter by the metric, not the raw scores. - you may wish to define a file format of default metrics paired with events, similar to the event definition file you already have, so that people can say --analysis=L2-hit-miss-ratio - for these sort of calculations you may need to apply some statistical methods. oprofile data can be rather noisy. when you are sorting and filtering raw counts this is not a major problem because raw counts are big relative to the noise. when you calculate ratios, differences, other such metrics, the noise is greatly magnified, and quality of results is much lower. at bare minimum, I could suggest printing the (scaled) variance of the metric over the sample set, side by side with the metric value. or print the metric value as a quartile or something. simply printing percent differences will produce many misleading results. > Especially what do we need to avoid you update your python bindings > each time an internal structure of oprofile change ? oh, hmm. I am increasingly seeing opreport absorbing most ideas I had in post-processing scripts, and I'd just as soon see that trend continue.. I wouldn't bother worrying about compatibility with the python wrapper. it's small and if anyone *really* wants it I can easily revive/revise it. -graydon |
From: John L. <le...@mo...> - 2003-08-18 15:43:25
|
On Mon, Aug 18, 2003 at 11:04:33AM -0400, graydon hoare wrote: > oh, neat! yes, a couple: It's going to take some effort to make an interface like this "naturally" usable I think. > events, similar to the event definition file you already have, so > that people can say --analysis=L2-hit-miss-ratio A nice idea and some away towards our implicit "provide information not just data" goal. One low-tech approach would be a cookbook section in the manual discussing common cases like "How do I profile cache misses on the Pentium IV" ? > continue.. I wouldn't bother worrying about compatibility with the > python wrapper. it's small and if anyone *really* wants it I can > easily revive/revise it. A powerful set of bindings would be nice at some point, but compatibility is of absolutely zero concern to me until our 1.0 release. The 1.0 branch will maintain API compatibility (though probably not ABI compatibility) unless that requirement conflicts with fixing a significant bug. regards john -- Khendon's Law: If the same point is made twice by the same person, the thread is over. |
From: William C. <wc...@nc...> - 2003-08-18 16:09:12
|
John Levon wrote: > On Mon, Aug 18, 2003 at 11:04:33AM -0400, graydon hoare wrote: > > >>oh, neat! yes, a couple: > > > It's going to take some effort to make an interface like this > "naturally" usable I think. > > >> events, similar to the event definition file you already have, so >> that people can say --analysis=L2-hit-miss-ratio One needs to be a careful with ratios. The ratios can be misleading. I did some experiments with gcc. I found that the better quality code had worse cache miss ratio because the redundant memory references cached really well. The overall number of memory references decreased in the optimized code, but the cacheable ones were removed more effectively by the optimizer, causing the ratio to worsen. However, in terms of absolute performance it was improved. > A nice idea and some away towards our implicit "provide information not > just data" goal. > > One low-tech approach would be a cookbook section in the manual > discussing common cases like "How do I profile cache misses on the > Pentium IV" ? Yes, having a cookbook section of the manual describing how look for common problems, e.g. excessive memory references and cache misses, would be good. Once we have some reasonable combinations we can automate it and have bindings. One difficulty with the oprofile data is that it can provide some very nice data about the machine code metrics. However, for some things it might be difficult for the developer to do much about it at the source level, e.g. changing the compiler's instruction selection or instruction scheduling. >>continue.. I wouldn't bother worrying about compatibility with the >>python wrapper. it's small and if anyone *really* wants it I can >>easily revive/revise it. > > > A powerful set of bindings would be nice at some point, but > compatibility is of absolutely zero concern to me until our 1.0 release. > > The 1.0 branch will maintain API compatibility (though probably not ABI > compatibility) unless that requirement conflicts with fixing a > significant bug. > > regards > john > |
From: Georg H. <Geo...@rr...> - 2003-08-19 09:24:24
|
On Mon, Aug 18, 2003 at 12:04:19PM -0400, William Cohen wrote: > One needs to be a careful with ratios. The ratios can be misleading. I Yes, that's true, but hardware counter profiling is not a topic for absolute beginers. So this is actually a documentation issue ;-) > Yes, having a cookbook section of the manual describing how look for > common problems, e.g. excessive memory references and cache misses, > would be good. Once we have some reasonable combinations we can automate > it and have bindings. One difficulty with the oprofile data is that it For quick first-level analysis there is a tool on SGI MIPS systems called 'perfex' which can (on request) use all available performance counters for MIPS (30 in all) in a multiplexed manner (because only 2 can be used concurrently). The tool then gives statistical output about performance counter differences before and after a program run. The best part, however, is the derived metrics section in the output where useful things like "FP loads & stores per FP instr" or "L2 cache hit rate" or "L2--L1 bandwidth used" are available. Of course, the user must be aware that such data should always be digested with a grain of salt, and that in most cases detailed source-level profiling is required. My point is that it would be quite handy to have a tool like perfex, built on oprofile and an abstraction layer that handles the different couter names and meanings across platforms. We are trying to develop here, but the counter multiplexing is quite difficult to do on the tool level - it should be built into oprofile itself. Is anybody working on such a scheme yet? Regards, Georg. |
From: Philippe E. <ph...@wa...> - 2003-08-19 11:55:19
|
Georg Hager wrote: > On Mon, Aug 18, 2003 at 12:04:19PM -0400, William Cohen wrote: > built on oprofile and an abstraction layer that handles the different > couter names and meanings across platforms. We are trying to develop If handling different events name across architecture is done in oprofile can you post the patch ? One way to do is through avent alias name: $ cat events/i386.piii.alias CYCLE_COUNT CPU_CLK_UHALTED L1_CACHE_MISS DCU_LINES_IN $ cat events/i386.p4.alias CYCLE_COUNT GLOBAL_POWER_EVENTS L1_CACHE_MISS BSQ_CACHE_REFERENCE:0x7 Second number is the unit mask. When using such event name users can't specify an um. With this approach we can't use alias name already used by an architecture... > here, but the counter multiplexing is quite difficult to do on the > tool level - it should be built into oprofile itself. Is anybody > working on such a scheme yet? no, but it's in our TODO file, a first step is already done in 0.6 by removing all user visible counter number. Virtualising counter require works for each architecture supported for 2.4 and 2.6 drivers. regards, Philippe Elie |