|
From: Thomas R. <tr...@st...> - 2011-11-14 21:07:40
|
Julian Seward wrote: > > > No takers? :-( > > The PowerPC back end (host_ppc_isel.c) does support exact rounding > on all FP instructions. Have a look at how it avoids setting the > rounding mode for every FP instruction. Ok, I see. I took that approach and made more patches. I filed it all in https://bugs.kde.org/show_bug.cgi?id=136779 to avoid more big attachment spam. One thing I don't get however: why does the PPC backend not put any rounding on instructions like Add32Fx4? Is that an oversight, or does the processor actually not support rounding modes there? > So, an important question which I didn't see addressed by your > previous message is, what is the effect on performance? Comparing the following versions: * vg-nord: valgrind r12232, VEX r2225 (the base version I built on) * vg-slowrd: as per the patches posted, with a few fixes * valgrind: vg-slowrd plus the PPC backend's strategy I'm getting the following out of the perf/ tests: -- bigcode1 -- bigcode1 vg-nord :0.11s no: 2.1s (19.2x, -----) me: 3.9s (35.2x, -----) bigcode1 vg-slowrd :0.11s no: 2.1s (19.0x, 0.9%) me: 3.9s (35.2x, 0.0%) bigcode1 valgrind :0.11s no: 2.1s (18.9x, 1.4%) me: 3.9s (35.0x, 0.5%) -- bigcode2 -- bigcode2 vg-nord :0.12s no: 4.5s (37.8x, -----) me: 9.3s (77.2x, -----) bigcode2 vg-slowrd :0.12s no: 4.5s (37.3x, 1.3%) me: 9.3s (77.7x, -0.6%) bigcode2 valgrind :0.12s no: 4.5s (37.5x, 0.9%) me: 9.5s (79.4x, -2.9%) -- bz2 -- bz2 vg-nord :0.55s no: 2.9s ( 5.2x, -----) me: 8.0s (14.5x, -----) bz2 vg-slowrd :0.55s no: 3.0s ( 5.4x, -4.5%) me: 8.3s (15.1x, -4.4%) bz2 valgrind :0.55s no: 2.9s ( 5.2x, -0.7%) me: 8.5s (15.4x, -6.4%) -- fbench -- fbench vg-nord :0.25s no: 1.3s ( 5.1x, -----) me: 4.3s (17.3x, -----) fbench vg-slowrd :0.25s no: 1.7s ( 6.7x,-32.3%) me: 4.8s (19.1x,-10.6%) fbench valgrind :0.25s no: 1.2s ( 5.0x, 2.4%) me: 4.4s (17.7x, -2.5%) -- ffbench -- ffbench vg-nord :0.21s no: 0.9s ( 4.1x, -----) me: 2.9s (14.0x, -----) ffbench vg-slowrd :0.21s no: 1.2s ( 5.9x,-42.5%) me: 3.2s (15.2x, -8.8%) ffbench valgrind :0.21s no: 0.9s ( 4.4x, -5.7%) me: 3.0s (14.1x, -1.0%) -- heap -- heap vg-nord :0.10s no: 1.0s ( 9.8x, -----) me: 6.6s (66.3x, -----) heap vg-slowrd :0.10s no: 1.0s ( 9.8x, 0.0%) me: 6.6s (66.1x, 0.3%) heap valgrind :0.10s no: 0.8s ( 8.5x, 13.3%) me: 6.6s (65.6x, 1.1%) -- sarp -- sarp vg-nord :0.02s no: 0.2s (10.5x, -----) me: 2.3s (116.5x, -----) sarp vg-slowrd :0.02s no: 0.2s (10.5x, 0.0%) me: 2.7s (136.0x,-16.7%) sarp valgrind :0.02s no: 0.2s (10.5x, 0.0%) me: 2.4s (122.0x, -4.7%) -- tinycc -- tinycc vg-nord :0.18s no: 2.3s (12.7x, -----) me:11.0s (60.9x, -----) tinycc vg-slowrd :0.18s no: 2.3s (12.9x, -1.7%) me:10.9s (60.8x, 0.1%) tinycc valgrind :0.18s no: 2.3s (13.0x, -2.2%) me:10.9s (60.6x, 0.5%) So apparently it's essentially the same, except that slowrd is really slow in floating point stuff and the PPC strategy again gets it into the same ballpark as before. I cannot explain the huge speedup in 'heap', but it's really consistent at that. Then I'm also seeing some test failures that all seem to be related to line numbers, such as in memcheck/tests/addressable --- addressable.stderr.exp 2010-11-24 16:47:41.933247339 +0100 +++ addressable.stderr.out 2011-11-14 22:02:40.798433827 +0100 @@ -61,7 +61,7 @@ For counts of detected and suppressed errors, rerun with: -v ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Uninitialised byte(s) found during client check request - at 0x........: test5 (addressable.c:85) + at 0x........: test5 (addressable.c:87) by 0x........: main (addressable.c:125) Address 0x........ is not stack'd, malloc'd or (recently) free'd Do you have any pointers as to what could be going wrong there? I'm somewhat at a loss, since that test doesn't seem to have anything to do with floats. Other test failures concern the definedness of FP results, I'll have to look into that. Apparently my strategy of ignoring the rounding mode argument within memcheck was too simple... -- Thomas Rast trast@{inf,student}.ethz.ch |