|
From: Nicholas N. <nj...@ca...> - 2004-08-24 11:12:46
|
Hi, Is there any particular reason why Memcheck is built at -O2, but all the other tools are built at -O? I noticed because I'm experimenting with Makefile.am include statements. It seems really good, I can pull out all the common muck between the different Makefile.am files into a single one, which removes a whole lot of duplication. It will make things easier for the multi-arch restructuring, too. Does anyone know if there are any possible downsides to this? It seems like a very good feature to use, although I'd be happy to wait until after 2.2.0 before committing it just to be safe. N |
|
From: Tom H. <th...@cy...> - 2004-08-24 11:25:00
|
In message <Pin...@he...>
Nicholas Nethercote <nj...@ca...> wrote:
> Is there any particular reason why Memcheck is built at -O2, but all
> the other tools are built at -O?
Not that I know of. It's actually a real pain how hard it is to
change compiler flags globally and how CFLAGS is ignore. I have a
patch I apply when build RPMs of valgrind to allow rpm to set the
optimisation flags, and it has to patch every Makefile...
> I noticed because I'm experimenting with Makefile.am include
> statements. It seems really good, I can pull out all the common muck
> between the different Makefile.am files into a single one, which
> removes a whole lot of duplication. It will make things easier for
> the multi-arch restructuring, too.
Sounds like a good idea to me - single source is nearly always best.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Julian S. <js...@ac...> - 2004-08-24 12:31:37
|
> Is there any particular reason why Memcheck is built at -O2, but all the > other tools are built at -O? Is -O / -O2 the only distinction, or is it -O vs -O2 -fomit-frame-pointer? I seem to remember trying to max out performance for the helper routines for memcheck (obviously) and hence have -O2 -fomit-frame-pointer -malign-somthing-or-other=4 iirc. The last flag says that a 4-byte aligned stack is OK and stopped some clueless versions of gcc from adding/subtracting constants from the stack pointer to get it aligned, which turned out to be a significant time waster for the memcheck memory access helpers. Such alignment is particularly useless given that in those helpers gcc never generates any %esp-relative accesses anyway. Duh. I don't have a problem with -O2 per se, but my general feel over N years of hacking is that it doesn't give much performance improvment over -O and exposes one to more gcc bugs, and makes compilation takes longer, hence it wasn't worth it. That's just religion, though, so I'd say -O2ifiy it if you like. -fomit-frame-pointer generally makes debugging difficult, so that is probably not a good thing to plumb everywhere. In fact there are some places (where we want to get stack snapshots of, eg vg_libpthread.c) which positively have to say -fno-omit-frame-pointer, I believe. J |
|
From: Nicholas N. <nj...@ca...> - 2004-08-24 14:33:07
|
On Tue, 24 Aug 2004, Julian Seward wrote: >> Is there any particular reason why Memcheck is built at -O2, but all the >> other tools are built at -O? > > Is -O / -O2 the only distinction, or is it -O vs -O2 -fomit-frame-pointer? All tools use -fomit-frame-pointer, only -O/-O2 is different. > I seem to remember trying to max out performance for the helper > routines for memcheck (obviously) and hence have > -O2 -fomit-frame-pointer -malign-somthing-or-other=4 iirc. > > The last flag says that a 4-byte aligned stack is OK and stopped some > clueless versions of gcc from adding/subtracting constants from the > stack pointer to get it aligned, which turned out to be a significant > time waster for the memcheck memory access helpers. Such alignment is > particularly useless given that in those helpers gcc never generates > any %esp-relative accesses anyway. Duh. All tools use @PREFERRED_STACK_BOUNDARY@ too, as does the core. > I don't have a problem with -O2 per se, but my general feel over N > years of hacking is that it doesn't give much performance improvment > over -O and exposes one to more gcc bugs, and makes compilation takes > longer, hence it wasn't worth it. That's just religion, though, so > I'd say -O2ifiy it if you like. > > -fomit-frame-pointer generally makes debugging difficult, so that > is probably not a good thing to plumb everywhere. In fact there are > some places (where we want to get stack snapshots of, eg > vg_libpthread.c) which positively have to say -fno-omit-frame-pointer, > I believe. Yup. I plan to just preserve the current flag usage, which is pretty easy -- eg. you can have -O as the normal, and then say AM_CFLAGS += -O2 within memcheck/Makefile.am. N |
|
From: Jeremy F. <je...@go...> - 2004-08-24 23:07:37
|
On Tue, 2004-08-24 at 13:31 +0100, Julian Seward wrote: > -fomit-frame-pointer generally makes debugging difficult, so that > is probably not a good thing to plumb everywhere. In fact there are > some places (where we want to get stack snapshots of, eg > vg_libpthread.c) which positively have to say -fno-omit-frame-pointer, > I believe. -fomit-frame-pointer is pretty dubious as an optimisation these days. It seems to generally increase code size, which eliminates any other improvement it might have (at least with Linux; not sure about V). I tend to just remove it from my builds, since it gets in the way more than it seems to help. J |
|
From: Bryan O'S. <bo...@se...> - 2004-08-25 05:11:20
|
On Tue, 2004-08-24 at 16:05 -0700, Jeremy Fitzhardinge wrote: > -fomit-frame-pointer is pretty dubious as an optimisation these days. > It seems to generally increase code size, which eliminates any other > improvement it might have (at least with Linux; not sure about V). Any time I've benchmarked its effects recently, they have been in the zero-to-negative range, rather than positive. And yes, it always seems to result in increased code size. <b |
|
From: Julian S. <js...@ac...> - 2004-08-25 08:20:33
|
> On Wednesday 25 August 2004 06:11, Bryan O'Sullivan wrote: > On Tue, 2004-08-24 at 16:05 -0700, Jeremy Fitzhardinge wrote: > > -fomit-frame-pointer is pretty dubious as an optimisation these days. > > It seems to generally increase code size, which eliminates any other > > improvement it might have (at least with Linux; not sure about V). > > Any time I've benchmarked its effects recently, they have been in the > zero-to-negative range, rather than positive. And yes, it always seems > to result in increased code size. That's a pretty strange considering the purpose is purportedly to make %ebp available for use, thereby reducing spill traffic etc. Usually, giving reg-alloc more regs to play with is helpful. Any idea why it doesn't help? J |
|
From: Nicholas N. <nj...@ca...> - 2004-08-25 09:46:24
|
[CC'ing to gcc list; GCC readers, we're puzzled that -fomit-frame-pointer
seems to increase file sizes on x86... can anyone help?]
On Wed, 25 Aug 2004, Julian Seward wrote:
>>> -fomit-frame-pointer is pretty dubious as an optimisation these days.
>>> It seems to generally increase code size, which eliminates any other
>>> improvement it might have (at least with Linux; not sure about V).
>>
>> Any time I've benchmarked its effects recently, they have been in the
>> zero-to-negative range, rather than positive. And yes, it always seems
>> to result in increased code size.
>
> That's a pretty strange considering the purpose is purportedly to
> make %ebp available for use, thereby reducing spill traffic etc.
> Usually, giving reg-alloc more regs to play with is helpful.
Yes... the gcc man page says about -fomit-frame-pointer:
This avoids the instructions to save, set up and restore
frame pointers; it also makes an extra register available in many
functions.
> Any idea why it doesn't help?
It does seem very odd, but I've confirmed that -fomit-frame-pointer
increases code sizes, here are the numbers:
normal -fomit-fp %diff
addrcheck/vgskin_addrcheck.so 141359 149215 +5.6%
cachegrind/vgskin_cachegrind.so 66020 71684 +8.6%
corecheck/vgskin_corecheck.so 21359 21359 0%
helgrind/vgskin_helgrind.so 86305 92089 +6.7%
lackey/vgskin_lackey.so 26577 27229 +2.4%
massif/vgskin_massif.so 63368 67824 +7.0%
memcheck/vgskin_memcheck.so 242314 261238 +7.8%
none/vgskin_none.so 21249 21249 0%
That's with GCC 3.2.2, on x86. Compile flags are:
$(WERROR) -Winline -Wall -Wshadow -O -fomit-frame-pointer \
-mpreferred-stack-boundary=2 -g
except that vgskin_memcheck.so uses -O2. The shared objects are just
linked with -shared in the standard way; and the .o files going into the
.so files have similar code size increases with -fomit-fp.
(I'm not sure where $(WERROR) comes from, is that a Make built-in?)
The weird thing is, when I look at the object code everything appears as
you'd expect, ie. the -fomit-fp code looks smaller. Eg. the sizes of
"objdump -d lackey/vgskin_lackey.so" are:
normal: 34169
-fomit-fp: 32872
ie. -fomit-fp is smaller.
And here's some representative output of "objdump -d
lackey/vgskin_lackey.so"
normal:
00000e27 <add_one_BB>:
e27: 55 push %ebp
e28: 89 e5 mov %esp,%ebp
e2a: 83 05 98 27 00 00 01 addl $0x1,0x2798
e31: 83 15 9c 27 00 00 00 adcl $0x0,0x279c
e38: 83 05 a8 27 00 00 01 addl $0x1,0x27a8
e3f: 83 15 ac 27 00 00 00 adcl $0x0,0x27ac
e46: c9 leave
e47: c3 ret
-fomit-fp:
00000e23 <add_one_BB>:
e23: 83 05 78 27 00 00 01 addl $0x1,0x2778
e2a: 83 15 7c 27 00 00 00 adcl $0x0,0x277c
e31: 83 05 88 27 00 00 01 addl $0x1,0x2788
e38: 83 15 8c 27 00 00 00 adcl $0x0,0x278c
e3f: c3 ret
As you'd expect, the -fomit-fp code is shorter.
So why are the total file sizes larger with -fomit-frame-pointer? Do any
GCC people know?
Thanks.
N
|
|
From: Andrew H. <ap...@re...> - 2004-08-25 09:55:28
|
Nicholas Nethercote writes: > > So why are the total file sizes larger with -fomit-frame-pointer? Do any > GCC people know? Unwind info? Put both versions of one of the files (memcheck/vgskin_memcheck.so ?) on the web and I'll have a look. Andrew. |
|
From: Florian W. <fw...@de...> - 2004-08-25 09:55:57
|
* Nicholas Nethercote: > So why are the total file sizes larger with -fomit-frame-pointer? Have you already ruled out debugging information? Since you compile with -g, this seems the most likely culprit. (If there is not frame pointer, more debugging information is needed to access local variables, so this growth isn't really avoidable.) |
|
From: Nicholas N. <nj...@ca...> - 2004-08-25 10:11:05
|
On Wed, 25 Aug 2004, Florian Weimer wrote: >> So why are the total file sizes larger with -fomit-frame-pointer? > > Have you already ruled out debugging information? Since you compile > with -g, this seems the most likely culprit. (If there is not frame > pointer, more debugging information is needed to access local > variables, so this growth isn't really avoidable.) Ah, I didn't think of that, I think it answers the question... for vgskin_lackey.so, I get the following sizes: normal: 26577 -fomit-fp: 27299 normal stripped: 7780 -fomit-fp stripped: 7748 The stripped versions look as expected, ie. with -fomit-fp it is smaller. It's interesting to see this effect, and good to know why it occurs. On Wed, 25 Aug 2004, Andrew Haley wrote: > Unwind info? I guess not, since it seems to be the debug info. (The code is all C, I should have mentioned that... is unwind info for C++ only? Either way, seems like it doesn't matter.) Thanks very much, everyone! N |
|
From: Andrew H. <ap...@re...> - 2004-08-25 10:37:25
|
Nicholas Nethercote writes:
> On Wed, 25 Aug 2004, Florian Weimer wrote:
>
> >> So why are the total file sizes larger with -fomit-frame-pointer?
> >
> > Have you already ruled out debugging information? Since you compile
> > with -g, this seems the most likely culprit. (If there is not frame
> > pointer, more debugging information is needed to access local
> > variables, so this growth isn't really avoidable.)
>
> Ah, I didn't think of that, I think it answers the question... for
> vgskin_lackey.so, I get the following sizes:
>
> normal: 26577
> -fomit-fp: 27299
>
> normal stripped: 7780
> -fomit-fp stripped: 7748
>
> The stripped versions look as expected, ie. with -fomit-fp it is smaller.
> It's interesting to see this effect, and good to know why it occurs.
Here you are:
/tmp/vgskin_lackey.so-normal
27 .debug_frame 00000144 00000000 00000000 0000418c 2**2
CONTENTS, READONLY, DEBUGGING
/tmp/vgskin_lackey.so-fomitfp
27 .debug_frame 000003f8 00000000 00000000 00004164 2**2
CONTENTS, READONLY, DEBUGGING
|
|
From: Nicholas N. <nj...@ca...> - 2004-08-25 10:42:47
|
On Wed, 25 Aug 2004, Andrew Haley wrote: > > Ah, I didn't think of that, I think it answers the question... for > > vgskin_lackey.so, I get the following sizes: > > > > normal: 26577 > > -fomit-fp: 27299 > > > > normal stripped: 7780 > > -fomit-fp stripped: 7748 > > > > The stripped versions look as expected, ie. with -fomit-fp it is smaller. > > Only by 0.4%, which is nearly at the level of noise. Is this true of > the other executables as well? For vgskin_memcheck.so, the biggest of the relevant files: normal: 242344 -fomit-fp: 261268 normal stripped: 90588 -fomit-fp stripped: 90044 The difference is 0.6%. I guess the conclusion is that -fomit-fp does make things smaller when no debug info is present, but not by much. And that this is because although fewer instructions are generated, those generated are on average longer... N |
|
From: Andrew H. <ap...@re...> - 2004-08-25 12:23:56
|
Nicholas Nethercote writes: > > I guess the conclusion is that -fomit-fp does make things smaller when no > debug info is present, but not by much. And that this is because although > fewer instructions are generated, those generated are on average longer... You might like to try -momit-leaf-frame-pointer. Andrew. |
|
From: Falk H. <hue...@in...> - 2004-08-25 10:02:03
|
Nicholas Nethercote <nj...@ca...> writes:
> So why are the total file sizes larger with -fomit-frame-pointer? Do
> any GCC people know?
One reason might be that accessing relative to stack pointer takes one
byte more than relative to frame pointer:
int f(int n) { volatile int a[4]; return a[1] + a[2]; }
% gcc -c -O3 test.c && objdump -dr test.o
00000000 <f>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 18 sub $0x18,%esp
6: 8b 45 ec mov 0xffffffec(%ebp),%eax
9: 8b 4d f0 mov 0xfffffff0(%ebp),%ecx
c: 01 c8 add %ecx,%eax
e: c9 leave
f: c3 ret
% gcc -fomit-frame-pointer -c -O3 test.c && objdump -dr test.o
00000000 <f>:
0: 83 ec 1c sub $0x1c,%esp
3: 8b 44 24 04 mov 0x4(%esp,1),%eax
7: 8b 4c 24 08 mov 0x8(%esp,1),%ecx
b: 01 c8 add %ecx,%eax
d: 83 c4 1c add $0x1c,%esp
10: c3 ret
--
Falk
|
|
From: Andrew H. <ap...@re...> - 2004-08-25 10:08:25
|
Falk Hueffner writes: > Nicholas Nethercote <nj...@ca...> writes: > > > So why are the total file sizes larger with -fomit-frame-pointer? Do > > any GCC people know? > > One reason might be that accessing relative to stack pointer takes one > byte more than relative to frame pointer: Could be, but 7%? I really want to see those files. Andrew. |
|
From: Nicholas N. <nj...@ca...> - 2004-08-25 10:13:13
|
On Wed, 25 Aug 2004, Andrew Haley wrote: > > > So why are the total file sizes larger with -fomit-frame-pointer? Do > > > any GCC people know? > > > > One reason might be that accessing relative to stack pointer takes one > > byte more than relative to frame pointer: > > Could be, but 7%? I really want to see those files. I've put them (unstripped) at: www.cl.cam.ac.uk/~njn25/vgskin_lackey.so-normal www.cl.cam.ac.uk/~njn25/vgskin_lackey.so-fomitfp N |
|
From: Andrew H. <ap...@re...> - 2004-08-25 10:17:36
|
Nicholas Nethercote writes: > On Wed, 25 Aug 2004, Florian Weimer wrote: > > >> So why are the total file sizes larger with -fomit-frame-pointer? > > > > Have you already ruled out debugging information? Since you compile > > with -g, this seems the most likely culprit. (If there is not frame > > pointer, more debugging information is needed to access local > > variables, so this growth isn't really avoidable.) > > Ah, I didn't think of that, I think it answers the question... for > vgskin_lackey.so, I get the following sizes: > > normal: 26577 > -fomit-fp: 27299 > > normal stripped: 7780 > -fomit-fp stripped: 7748 > > The stripped versions look as expected, ie. with -fomit-fp it is smaller. Only by 0.4%, which is nearly at the level of noise. Is this true of the other executables as well? Andrew. |
|
From: Julian S. <js...@ac...> - 2004-08-25 11:23:58
|
> For vgskin_memcheck.so, the biggest of the relevant files: > > normal: 242344 > -fomit-fp: 261268 > > normal stripped: 90588 > -fomit-fp stripped: 90044 > > The difference is 0.6%. > > I guess the conclusion is that -fomit-fp does make things smaller when no > debug info is present, but not by much. And that this is because although > fewer instructions are generated, those generated are on average longer... Well, after dodging all those red herrings, what I thought the original proposition was is that code generated with -fomit-fp runs more slowly than without. Which sounds a bit unlikely. So does anyone have any cycle counts with/without -fomit-fp? =46rom my days of bzip2 hacking, I seem to remember -fomit-fp made things run 5% ish faster, which is kinda what you'd expect when increasing the available integer regs from 6 to 7. Dually, turning on -fpic, which puts %ebx out of action, aiui, gave a 7% ish speed loss. Perhaps on modern machines with reg-renaming and a relatively weaker memory hierarchy, the differences are smaller? J |
|
From: Nicholas N. <nj...@ca...> - 2004-08-25 11:27:10
|
On Wed, 25 Aug 2004, Julian Seward wrote: > Well, after dodging all those red herrings, what I thought the original > proposition was is that code generated with -fomit-fp runs more slowly > than without. Which sounds a bit unlikely. So does anyone have any > cycle counts with/without -fomit-fp? > > From my days of bzip2 hacking, I seem to remember -fomit-fp made things > run 5% ish faster, which is kinda what you'd expect when increasing the > available integer regs from 6 to 7. Dually, turning on -fpic, which > puts %ebx out of action, aiui, gave a 7% ish speed loss. Perhaps on > modern machines with reg-renaming and a relatively weaker memory > hierarchy, the differences are smaller? And since Valgrind spends the majority of its time in code it generates itself, any effect will be much smaller than for more normal programs... therefore, I'm inclined to stop using -fomit-fp. N |
|
From: Julian S. <js...@ac...> - 2004-08-25 23:33:03
|
> And since Valgrind spends the majority of its time in code it generates > itself, any effect will be much smaller than for more normal programs... > therefore, I'm inclined to stop using -fomit-fp. I agree in general. However, V does spend (I assume) quite a lot of time in small tool-supplied helper functions such as MC_(helperc_LOADV4). And it would be a shame to make those run more slowly. That said, perhaps the performance effects are in the noise, and doing -fomit-fp in some places and not others just wasn't worth the hassle. That would be nice if it was true; but I don't have numbers. J |