I am using an older version of oprofile ~0.8.1 with the kernel module backported to 2.4 arm xscale.  I find that the output for a single application is not so useful without callgraph support.  A large portion of the samples are in shared libraries.

As a workaround, for every pmu interrupt, I would like to add a cpu buffer sample per frame in the backtrace.  I believe this is done in the current 2.6 except with a little more sophistication.  I can use the logic from 2.6's arch/arm/oprofile/backtrace.c to determine if there is a valid frame pointer and walk back to each frame in the call stack.  Then sync_buffer will lookup the appropriate dcookie based on the offset and add the appropriate event entries.

I don't need the end result to be pretty and don't need the full callgraph support.  I just want to be able to get a feel for how time is spent in each function + children.

Any suggestions and/or concerns?