From: John L. <le...@mo...> - 2002-01-01 22:39:11
|
On Tue, Jan 01, 2002 at 10:30:06PM +0100, Philippe Elie wrote: > I agree the needs to better document oprofile but in your examples > I don't think than seeing a ratio different to the expected ratio is due > to irq latency. The latency is the same for sample which belongs to > fun1 or fun2. I means than the probability to get samples to fun1 which > would credited to fun2 and the reverse case are identical jeff actually saw this problem. He was using two similar counters on a Duron, and they contradicated each other. Added other code inbetween fixed it. This was current CVS. If a lot of samples happen to be causing the overflow at the end of one function but not the other, then we can easily see samples for the one ending up in the other, and vice versa. for any particular sample, it can easily end up in the "next" function as we count it. The effect is ameliorated for larger functions... > If you see a different ratio than expected it can come: > - prologue/epilogue have the same cost for the two fun and represent > a greater relative overhead for the short fun > - one of the function have better alignment (for the function itself or for > branch inside function) > - many other penality can occur in real code such a memory partial stall > due to code in fun2 epilogue but the penality itself occur in fun1 etc. sure, but none of these really explain the massive differences jeff saw when running with two counters and RETIRED_INSNS vs. RETIRED_OPS. The functions are pretty similar ... > > const int base_loop = 100; > > void fun1() > { > for (int i = 0 ; i < base_loop ; ++i) ; > } > > void fun2() > { > for (int i = 0 ; i < base_loop * 1000 ; ++i) ; > } > > int main() > { > while (1) { fun1(); fun2(); } > > return 0; > } > > $oprofpp -l ... > > main[0x080483a0]: 0.0045% (218 samples) > fun1__Fv[0x08048380]: 0.1037% (5050 samples) > fun2__Fv[0x08048390]: 99.8918% (4863213 samples) > > Other test with decreased base_loop value show the ratio increase in flavor > of fun1 because the call/ret prologue/epilogue remains constant. sure, I get similar things. In fact with his code on Intel, I get the two counters mostly agreeing. > The second example can probably goes in docs under "irq latency problem" yes, definitely. > The doc must speak also about inaccuracy going from debug info itself, code move, > scheduled insn etc. yep. > Also it would be great to have a better overview of all utilities, with more > real examples of use. I can write it but this stuff would require a > rewording by a native speaker... I can certainly do this if you make a start. > P.S.: john your email address was broken, you have perhaps missed some mail yes, I dropped about a day or two's email. Did I miss anything oprofile related ? regards john -- "We're standing there pounding a dead parrot on the counter, and the management response is to frantically swap in new counters to see if that fixes the problem." - Peter Gutmann |