After spending today with the code, I found that it actually wouldn't be that much work to backport the callgraph support to 2.4 arm. I have done so and it seems to be working wonderfully. I'll do some more testing and post patches if anyone is interested.
I am using an older version of oprofile ~0.8.1 with the kernel module backported to 2.4 arm xscale. I find that the output for a single application is not so useful without callgraph support. A large portion of the samples are in shared libraries.
As a workaround, for every pmu interrupt, I would like to add a cpu buffer sample per frame in the backtrace. I believe this is done in the current 2.6 except with a little more sophistication. I can use the logic from 2.6's arch/arm/oprofile/backtrace.c to determine if there is a valid frame pointer and walk back to each frame in the call stack. Then sync_buffer will lookup the appropriate dcookie based on the offset and add the appropriate event entries.
I don't need the end result to be pretty and don't need the full callgraph support. I just want to be able to get a feel for how time is spent in each function + children.
Any suggestions and/or concerns?