|
From: Hynek S. <hs+...@ox...> - 2006-08-13 21:38:57
|
Hello Josef, Josef Weidendorfer <Jos...@gm...> writes: First of all, many thanks for your detailed answer! >> I'm profiling software using Callgrind and as I have some rather >> strange results, I'd like to ask some questions to go for sure... I >> understand, that callgrind counts "instructions". Does it mean, that >> it simply adds up assembler instructions and uses it as the costs? > Yes. "Instructions fetched" is one cost type provided by > Cachegrind/Callgrind. You should not confuse this with any time cost. I don't. :) >> I ask because it seems as a strange metric to me because some >> instructions take longer than others. > Probably. However, a time estimation using instruction latencies as > factors is not better either. Relevant is the throughput of a given > instruction stream, and not single latencies. For estimation of that, > you need to simulate a CPU pipeline; and to match some reality, you > need to know the branch prediction algorithm, and the superscalar > configuration of your processor and so on (which is not really > documented BTW). Even with these parameters, a simulator probably > would be way too slow to be practical. I don't doubt that, sorry if it sounded elsewise. In fact that was my thought behind the question - that it's impossible to compute time from instructions (on today's CPUs). > Cachegrind/Callgrind is good to see whether your code has cache problems > and potential for cache optimizations. Together with average L1/L2 cache > latencies, you can come up with a rough time estimation which is often > quite good (there is a derived cost type "Cycle Estimation" provided > with KCachegrind which defaults to 10/100 cycles latency for L1/L2. You > should adjust the formula for your machine. To be honest, I have problems to see it because I have no comparision. When do I have cache problems? All my functions have cache miss sums for both L1 and L2 < 0.5... > However, if you do not have cache problems, there probably is no good > way to estimate the time by using the "instruction fetch" cost given by > Cachegrind/Callgrind. Ok. The docs at Valgrind sounded pretty general purpose (`Callgrind: a heavyweight profiler'), so I've been trying to use it generally. :) The callgraph functions are excellent though. >> This would explain, why pthread_mutex_lock()=20 >> seems to hog the most costs and memcpy() (which gprof indentified as the >> major hog) is only neglectable. > This sounds like you only look at the instruction fetches even though you > have a lot of L2 misses. Can it be that you run with the cache simulator= =20 > switched off (which is the default with Callgrind)? > Use "--simulate-cache=3Dyes" and look at the cycle estimation cost (in > KCachegrind). I tried both but, to my shame, I've overlooked the "Cycle Estimation". > BTW, gprof is doing source instrumentation, and depending on the > application, overhead can be near 100%. This also disturbs the > measurement itself.=20 I know - I found that even minimal instrumentation (ie. rdtsc) can have huge impact on the results. > Why is OProfile's granularity too low for you? In contrast to GProf, > you even can adjust the sample interval there to tune the > overhead.=20 I'm profiling _thin_ network layers over gigabit ethernet whose latencies are measured in < 100 =C2=B5s. OProfile's lowest granularity is 3,000 cycles; if I'm not mistaken, I'd need 2,200 on my 2.2 GHz CPU to have 1,000,000 samples / second. So, any hints for throughout measuring? Callgrind was kindof my last hope... > GProf is doing sample too, but only with timers, and with the handler > in user land. OProfile really should be more exact, as it does sample > handling in kernel space with lower latency. Ok, this is a shock for me now...I always thought, that gprof doesn't sample. :( Why does it instrumentation then? Just for the callgraph? Got to look at its internals I guess. Or to just throw it in the trashcan. >> I couldn't find anything about this neither in the Valgrind manual >> nor on the KCachegrind pages, so I hope that someone here can help >> me... > This is becoming a FAQ. I will try to come up with something for the > Callgrind manual. I'm sorry. :( I guess I had the wrong search terms for Gmane. -hs |