From: Philippe E. <ph...@wa...> - 2004-03-02 20:34:07
|
On Tue, 02 Mar 2004 at 10:33 +0000, Will Cohen wrote: > I have been looking over the various backends of GCC of a current cvs > checkout of GCC to find out how portable the current implementation of > OProfile call-graph support is going to be. GCC has the define > CAN_DEBUG_WITHOUT_FP defined in most of the compiler backends. This > define will cause the compiler to omit the frame pointer when there is > any optimization (-O1 or greater). I suggest than RH people do some measure, code size and speed, it's not obvious at all than fp will really improve performance depending on sub-arch. On P4 I get the following suprising number (gcc 3.3.3): -O2 with frame pointer real 0m0.905s text data bss dec hex filename 587859 1464 6668 595991 91817 pp/opreport w/o frame pointer real 0m0.626s text data bss dec hex filename 532798 1268 6988 541054 8417e pp/opreport three povray run on tomb.pov w/o display at 640*480 -O2 -mcpu=pentium4 -ffast-math -ansi -falign-functions=4 -falign-jumps=4 -falign-loops=4 -mpreferred-stack-boundary=3 with frame pointer user 0m7.136s user 0m7.170s user 0m7.122s fp text data bss dec hex filename 498934 20520 12976 532430 81fce x-povray w/o frame pointer user 0m6.612s user 0m6.649s user 0m6.594s no fp text data bss dec hex filename 511878 20520 12976 545374 8525e x-povray I don't understand the number for opreport: a lot faster and smaller executable with fp whilst for povray a sensible improvement in speed but size increase w/o fp. Perhaps opreport is showing excessive trace cache miss... A few years ago I was not conveinced by -fomit-frame-pointer, it's a bit more conveincing nowaday, you (meaning RH guy deciding if -fp will be added) need just to /prove/ it's a good choice :) Hype about one more register free on an arch with a small number of register is not a rationale nor argument about benchmark improvment. > The only architecture that OProfile supports that has frame pointers > when the optimization is turned on is the i386. The x86_64 omits it. The > other OProfile platforms: s390 s390x ia64, ppc64, alpha, hppa, sparc, > and arm all define CAN_DEBUG_WITHOUT_FP. At Red Hat there has been > discussion about setting the compiler options to omit frame pointers for > i386 to improve performance. you must take account the no bt available w/o debug info and gdb, most users are not developper, it's possible to say to them gdb the core file then bt, it's more annoying to say to them, load the cd with debug info... etc. (is this cd provided for all RH version ?) KDE can bt w/o debuginfo iirc, and I've a small 6 line function which can bt in an application w/o debug info. At my point of view I'll be conveinced by -fomit-frame-pointer by default iff it save at least 2/3% in speed w/o penality in executable size. > The use of the stack unwind information available for gdb has been > mentioned. However, that wouldn't be available to the nmi handler > routine and would be too expensive to use. It is also unlikely that this > information would be available/processable for samples in the kernel. yeps unhopefully that's not possible. rgards, Phil |