From: Christian L. <chr...@le...> - 2003-03-17 12:12:25
Attachments:
valgrind_label.diff
|
Hello, first let me say: valgrind is absolutly great, thank you for it! It saved me allready often. Because I was not able to add MMX support (seems not to be exactly easy). I saw this realy big switch(opt){case 0x00....} statement and thought "this must be slow". So I tried the "Labels as Values" feature of the gcc. (http://www.dis.com/gnu/gcc/gcc_79.html#SEC79) Yes it is a gcc feature, but because is specially for x86/Linux I can't see a problem. "benchmark": core:/home/ijuz/Mail/la# ls -l total 4508 -rw------- 1 ijuz ijuz 4602189 Mar 17 11:56 politech_at_politechbot_com core:/home/ijuz/Mail/la# nice -n -10 time /work/dev/val/bin/valgrind gzip -9 politech_at_politechbot_com normali-cvs: 13.57user 0.09system 0:14.07elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 13.51user 0.08system 0:14.01elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k patched-cvs: 12.54user 0.06system 0:13.07elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k 12.56user 0.06system 0:12.97elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k this means (average): 13.54 -> 12.55 that is a speedup of about 7% in this case If you (Julian or Nick) are perhaps interested in it, I would clean it up further and remove this one switch() { } statement. I appended the patch, I hope nobody objects because of this few kb. (It is against the cvs from now) Regards, Christian Leber -- "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." (Aurelius Augustinus) Translation: <http://gnuhh.org/work/fsf-europe/augustinus.html> |
From: Nicholas N. <nj...@ca...> - 2003-03-17 12:37:56
|
On Mon, 17 Mar 2003, Christian Leber wrote: > I saw this realy big switch(opt){case 0x00....} statement and thought > "this must be slow". > > So I tried the "Labels as Values" feature of the gcc. > (http://www.dis.com/gnu/gcc/gcc_79.html#SEC79) > Yes it is a gcc feature, but because is specially for x86/Linux I can't > see a problem. [snip] > 13.54 -> 12.55 > that is a speedup of about 7% in this case > > If you (Julian or Nick) are perhaps interested in it, I would clean it up > further and remove this one switch() { } statement. Interesting. Two comments: 1. Can you try it with a few other programs? It would be nice to see if the 7% speedup is consistent across programs. Some programs I've used for performance timing in the past: gzip bzip2 latex konqueror (startup then quit immediately) Batch programs are obviously more suitable for this. 2. The bottom of the webpage you mention says: "An alternate way to write the above example is static const int array[] = { &&foo - &&foo, &&bar - &&foo, &&hack - &&foo }; goto *(&&foo + array[i]); This is more friendly to code living in shared libraries, as it reduces the number of dynamic relocations that are needed, and by consequence, allows the data to be read-only." Valgrind is packaged as a shared library. Perhaps this might improve things further? If you could try these two experiments, and tell us the results, that would be very helpful. Thanks. N |
From: Christian L. <chr...@le...> - 2003-03-17 14:06:53
|
On Mon, Mar 17, 2003 at 12:37:52PM +0000, Nicholas Nethercote wrote: > 1. Can you try it with a few other programs? It would be nice to see if > the 7% speedup is consistent across programs. Some programs I've used > for performance timing in the past: > > gzip > bzip2 48.36user 0.05system 0:49.79elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 48.47user 0.07system 0:51.22elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k patched: 47.93user 0.09system 0:49.17elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 47.91user 0.09system 0:49.22elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k > latex real 0m5.088s user 0m4.790s sys 0m0.130s patched: real 0m4.741s user 0m4.490s sys 0m0.140s > konqueror (startup then quit immediately) It eats up all CPU till it -9 kill it when I quit it. (I still have KDE 2.2) > 2. The bottom of the webpage you mention says: > > "An alternate way to write the above example is > > static const int array[] = { &&foo - &&foo, &&bar - &&foo, > &&hack - &&foo }; > goto *(&&foo + array[i]); > > This is more friendly to code living in shared libraries, as it reduces > the number of dynamic relocations that are needed, and by consequence, > allows the data to be read-only." > > Valgrind is packaged as a shared library. Perhaps this might improve > things further? Yes, but initially I did not get it working, was a little stupid error. I'll try it, but this takes some time. cachegrind shows me with a little test programm, that it's slower with the later thing. But it's not a shared library and I don't know how often it has to be done. Regards, Christian Leber -- "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." (Aurelius Augustinus) Translation: <http://gnuhh.org/work/fsf-europe/augustinus.html> |
From: Christian L. <chr...@le...> - 2003-03-17 14:33:19
Attachments:
valgrind_label_relativ.diff
|
On Mon, Mar 17, 2003 at 12:37:52PM +0000, Nicholas Nethercote wrote: > Valgrind is packaged as a shared library. Perhaps this might improve > things further? Ok, I changed it (patch included). But the numbers are not good, actually a performance decrease against switch(). gzip: 13.78user 0.03system 0:14.21elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 13.71user 0.06system 0:14.16elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k bzip2: 48.49user 0.10system 0:49.79elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 48.66user 0.04system 0:50.06elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k latex: real 0m5.056s user 0m4.880s sys 0m0.090s Regards, Christian Leber -- "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." (Aurelius Augustinus) Translation: <http://gnuhh.org/work/fsf-europe/augustinus.html> |
From: Jason E. <ja...@ca...> - 2003-03-17 19:20:00
|
On Mon, Mar 17, 2003 at 03:33:13PM +0100, Christian Leber wrote: > On Mon, Mar 17, 2003 at 12:37:52PM +0000, Nicholas Nethercote wrote: > > > Valgrind is packaged as a shared library. Perhaps this might improve > > things further? > > Ok, I changed it (patch included). > > But the numbers are not good, actually a performance decrease against > switch(). I've recently been doing some experimentation with computed gotos in an unrelated program, and I've also observed a slowdown in most cases. This indicates to me that gcc typically does a fine job of optimizing switch statements, and there isn't a whole lot to be gained by second guessing it in such cases. Jason |
From: Nicholas N. <nj...@ca...> - 2003-03-17 19:37:29
|
On Mon, 17 Mar 2003, Jason Evans wrote: > > But the numbers are not good, actually a performance decrease against > > switch(). > > I've recently been doing some experimentation with computed gotos in an > unrelated program, and I've also observed a slowdown in most cases. This > indicates to me that gcc typically does a fine job of optimizing switch > statements, and there isn't a whole lot to be gained by second guessing it > in such cases. I think Christian got good results with the first version, but not with the second version that computes each label address as a difference from a base label address. So it looks like it (the first version) is worthwhile. I've had mixed successes with computed gotos in the past as well... branch prediction is very important; I once tried the computed goto trick which saved me quite a few instructions, but made the branches more or less unpredictable, and the end result was little difference. N |