From: John R. <jr...@bi...> - 2023-01-29 17:25:43
|
On 2023-01-29, Paul Floyd wrote: > My recommendations for this are: > > 1/ PMU/PMC (performance monitoring unit/counter) event counting tools (perf record on Linux, pmcstat on FreeBSD, Oracle Studio collect on Solaris, don't know for macOS). These can record events such as cache misses with the associated callstacks. You can then use tools HotSpot and > perfgrind/kcachegrind (I hae used HotSpot but not perfgrind). > > The big advantage of this is that the PMCs are part of the hardware and the overhead of doing this is minor. The only slight limitation is that then number of counters is limited. Another disadvantage: the hardware does not know which accesses belong to the target code versus which accesses belong to the code of valgrind itself. Even if the hardware could separate accesses on that basis, it does not know about stack frames. Allocating a stack frame shortly after CALL, and discarding it shortly before RETURN, can be significant reasons for cache misses, either immediately or in the near future. Then there are system calls, which might significantly alter cache contents. Sometimes the resulting cache misses should be included (they most certainly do affect wall clock time), but in some other cases you may wish that the operating system was ignored. If the target program uses threads, then using memory for inter-thread communication (semaphore, mutex, pipeline, etc.) becomes another factor. |