|
From: Christian L. <chr...@le...> - 2003-03-17 12:12:25
Attachments:
valgrind_label.diff
|
Hello,
first let me say:
valgrind is absolutly great, thank you for it!
It saved me allready often.
Because I was not able to add MMX support (seems not to be exactly
easy).
I saw this realy big switch(opt){case 0x00....} statement and thought
"this must be slow".
So I tried the "Labels as Values" feature of the gcc.
(http://www.dis.com/gnu/gcc/gcc_79.html#SEC79)
Yes it is a gcc feature, but because is specially for x86/Linux I can't
see a problem.
"benchmark":
core:/home/ijuz/Mail/la# ls -l
total 4508
-rw------- 1 ijuz ijuz 4602189 Mar 17 11:56
politech_at_politechbot_com
core:/home/ijuz/Mail/la# nice -n -10 time /work/dev/val/bin/valgrind
gzip -9 politech_at_politechbot_com
normali-cvs:
13.57user 0.09system 0:14.07elapsed 97%CPU (0avgtext+0avgdata
0maxresident)k
13.51user 0.08system 0:14.01elapsed 96%CPU (0avgtext+0avgdata
0maxresident)k
patched-cvs:
12.54user 0.06system 0:13.07elapsed 96%CPU (0avgtext+0avgdata
0maxresident)k
12.56user 0.06system 0:12.97elapsed 97%CPU (0avgtext+0avgdata
0maxresident)k
this means (average):
13.54 -> 12.55
that is a speedup of about 7% in this case
If you (Julian or Nick) are perhaps interested in it, I would clean it up
further and remove this one switch() { } statement.
I appended the patch, I hope nobody objects because of this few kb.
(It is against the cvs from now)
Regards,
Christian Leber
--
"Omnis enim res, quae dando non deficit, dum habetur et non datur,
nondum habetur, quomodo habenda est." (Aurelius Augustinus)
Translation: <http://gnuhh.org/work/fsf-europe/augustinus.html>
|
|
From: Nicholas N. <nj...@ca...> - 2003-03-17 12:37:56
|
On Mon, 17 Mar 2003, Christian Leber wrote:
> I saw this realy big switch(opt){case 0x00....} statement and thought
> "this must be slow".
>
> So I tried the "Labels as Values" feature of the gcc.
> (http://www.dis.com/gnu/gcc/gcc_79.html#SEC79)
> Yes it is a gcc feature, but because is specially for x86/Linux I can't
> see a problem.
[snip]
> 13.54 -> 12.55
> that is a speedup of about 7% in this case
>
> If you (Julian or Nick) are perhaps interested in it, I would clean it up
> further and remove this one switch() { } statement.
Interesting. Two comments:
1. Can you try it with a few other programs? It would be nice to see if
the 7% speedup is consistent across programs. Some programs I've used
for performance timing in the past:
gzip
bzip2
latex
konqueror (startup then quit immediately)
Batch programs are obviously more suitable for this.
2. The bottom of the webpage you mention says:
"An alternate way to write the above example is
static const int array[] = { &&foo - &&foo, &&bar - &&foo,
&&hack - &&foo };
goto *(&&foo + array[i]);
This is more friendly to code living in shared libraries, as it reduces
the number of dynamic relocations that are needed, and by consequence,
allows the data to be read-only."
Valgrind is packaged as a shared library. Perhaps this might improve
things further?
If you could try these two experiments, and tell us the results, that
would be very helpful.
Thanks.
N
|
|
From: Christian L. <chr...@le...> - 2003-03-17 14:06:53
|
On Mon, Mar 17, 2003 at 12:37:52PM +0000, Nicholas Nethercote wrote:
> 1. Can you try it with a few other programs? It would be nice to see if
> the 7% speedup is consistent across programs. Some programs I've used
> for performance timing in the past:
>
> gzip
> bzip2
48.36user 0.05system 0:49.79elapsed 97%CPU (0avgtext+0avgdata
0maxresident)k
48.47user 0.07system 0:51.22elapsed 94%CPU (0avgtext+0avgdata
0maxresident)k
patched:
47.93user 0.09system 0:49.17elapsed 97%CPU (0avgtext+0avgdata
0maxresident)k
47.91user 0.09system 0:49.22elapsed 97%CPU (0avgtext+0avgdata
0maxresident)k
> latex
real 0m5.088s
user 0m4.790s
sys 0m0.130s
patched:
real 0m4.741s
user 0m4.490s
sys 0m0.140s
> konqueror (startup then quit immediately)
It eats up all CPU till it -9 kill it when I quit it.
(I still have KDE 2.2)
> 2. The bottom of the webpage you mention says:
>
> "An alternate way to write the above example is
>
> static const int array[] = { &&foo - &&foo, &&bar - &&foo,
> &&hack - &&foo };
> goto *(&&foo + array[i]);
>
> This is more friendly to code living in shared libraries, as it reduces
> the number of dynamic relocations that are needed, and by consequence,
> allows the data to be read-only."
>
> Valgrind is packaged as a shared library. Perhaps this might improve
> things further?
Yes, but initially I did not get it working, was a little stupid error.
I'll try it, but this takes some time.
cachegrind shows me with a little test programm, that it's slower with
the later thing. But it's not a shared library and I don't know how
often it has to be done.
Regards,
Christian Leber
--
"Omnis enim res, quae dando non deficit, dum habetur et non datur,
nondum habetur, quomodo habenda est." (Aurelius Augustinus)
Translation: <http://gnuhh.org/work/fsf-europe/augustinus.html>
|
|
From: Christian L. <chr...@le...> - 2003-03-17 14:33:19
Attachments:
valgrind_label_relativ.diff
|
On Mon, Mar 17, 2003 at 12:37:52PM +0000, Nicholas Nethercote wrote: > Valgrind is packaged as a shared library. Perhaps this might improve > things further? Ok, I changed it (patch included). But the numbers are not good, actually a performance decrease against switch(). gzip: 13.78user 0.03system 0:14.21elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 13.71user 0.06system 0:14.16elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k bzip2: 48.49user 0.10system 0:49.79elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 48.66user 0.04system 0:50.06elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k latex: real 0m5.056s user 0m4.880s sys 0m0.090s Regards, Christian Leber -- "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." (Aurelius Augustinus) Translation: <http://gnuhh.org/work/fsf-europe/augustinus.html> |
|
From: Jason E. <ja...@ca...> - 2003-03-17 19:20:00
|
On Mon, Mar 17, 2003 at 03:33:13PM +0100, Christian Leber wrote: > On Mon, Mar 17, 2003 at 12:37:52PM +0000, Nicholas Nethercote wrote: > > > Valgrind is packaged as a shared library. Perhaps this might improve > > things further? > > Ok, I changed it (patch included). > > But the numbers are not good, actually a performance decrease against > switch(). I've recently been doing some experimentation with computed gotos in an unrelated program, and I've also observed a slowdown in most cases. This indicates to me that gcc typically does a fine job of optimizing switch statements, and there isn't a whole lot to be gained by second guessing it in such cases. Jason |
|
From: Nicholas N. <nj...@ca...> - 2003-03-17 19:37:29
|
On Mon, 17 Mar 2003, Jason Evans wrote: > > But the numbers are not good, actually a performance decrease against > > switch(). > > I've recently been doing some experimentation with computed gotos in an > unrelated program, and I've also observed a slowdown in most cases. This > indicates to me that gcc typically does a fine job of optimizing switch > statements, and there isn't a whole lot to be gained by second guessing it > in such cases. I think Christian got good results with the first version, but not with the second version that computes each label address as a difference from a base label address. So it looks like it (the first version) is worthwhile. I've had mixed successes with computed gotos in the past as well... branch prediction is very important; I once tried the computed goto trick which saved me quite a few instructions, but made the branches more or less unpredictable, and the end result was little difference. N |