Shiny is a lightning fast, fully documented & by-far-easiest-to-use C/C++/Lua profiler with no extensive surgery. Results are smoothed & shown in run-time as a call-tree or sorted-by-time. Output also renderable as graphs in Ogre3D or your custom engine
KCachegrind visualizes traces generated by profiling, including a tree map and a call graph visualization of the calls happening. It's designed to be fast for very large programs like KDE applications.