|
From: Nicholas N. <nj...@cs...> - 2006-03-27 11:46:28
|
Hi, I just merged in the COMPVBITS branch. Hopefully things will keep working, but let me know if you have any problems with Memcheck as a result. The following figures show the performance improvement for 20 of the 26 SPEC2000 benchmarks, running the "test" inputs on a 3GHz P4. I compare Valgrind 3.1.X vs. the pre-COMPVBITS-merge-trunk vs. the post-COMPVBITS-merge-trunk. You can see we'd already got good improvements in the trunk since 3.1.X, and this commit improves things even more. In summary, the pre-COMPVBITS-merge trunk has a geometric mean time reduction of 11.5%, which means the programs ran on average 1.13x faster than 3.1.1. The post-COMPVBITS-merge trunk has a geometric mean time reduction of 22.9%, which means the programs run on average 1.30x faster than 3.1.1. Nick -- ammp -- ammp vg-3.1.X : 9.8s nl:48.1s ( 4.9x, -----) mc:403.6s (41.2x, -----) ammp trunk1 : 9.8s nl:37.8s ( 3.9x, 21.4%) mc:356.7s (36.4x, 11.6%) ammp trunk2 : 9.8s nl:38.0s ( 3.9x, 21.0%) mc:313.4s (32.0x, 22.4%) -- applu -- applu vg-3.1.X : 0.5s nl: 5.5s (10.3x, -----) mc:16.8s (31.7x, -----) applu trunk1 : 0.5s nl: 4.3s ( 8.0x, 22.0%) mc:15.6s (29.4x, 7.4%) applu trunk2 : 0.5s nl: 4.3s ( 8.1x, 21.4%) mc:14.5s (27.3x, 14.0%) -- apsi -- apsi vg-3.1.X : 8.5s nl:67.5s ( 8.0x, -----) mc:232.3s (27.5x, -----) apsi trunk1 : 8.5s nl:54.2s ( 6.4x, 19.7%) mc:205.7s (24.3x, 11.4%) apsi trunk2 : 8.5s nl:53.4s ( 6.3x, 20.9%) mc:184.4s (21.8x, 20.6%) -- art -- art vg-3.1.X : 3.2s nl:30.3s ( 9.3x, -----) mc:125.2s (38.5x, -----) art trunk1 : 3.2s nl:21.2s ( 6.5x, 30.2%) mc:105.6s (32.5x, 15.7%) art trunk2 : 3.2s nl:21.2s ( 6.5x, 30.1%) mc:93.2s (28.7x, 25.6%) -- bzip2 -- bzip2 vg-3.1.X : 7.3s nl:36.7s ( 5.0x, -----) mc:187.9s (25.6x, -----) bzip2 trunk1 : 7.3s nl:28.4s ( 3.9x, 22.5%) mc:167.1s (22.8x, 11.1%) bzip2 trunk2 : 7.3s nl:28.6s ( 3.9x, 22.1%) mc:148.9s (20.3x, 20.8%) -- crafty -- crafty vg-3.1.X : 3.9s nl:36.6s ( 9.4x, -----) mc:186.4s (47.8x, -----) crafty trunk1 : 3.9s nl:28.2s ( 7.2x, 23.0%) mc:176.8s (45.3x, 5.2%) crafty trunk2 : 3.9s nl:28.1s ( 7.2x, 23.4%) mc:157.0s (40.3x, 15.8%) -- equake -- equake vg-3.1.X : 1.0s nl:10.2s (10.3x, -----) mc:32.7s (33.0x, -----) equake trunk1 : 1.0s nl: 7.9s ( 7.9x, 23.1%) mc:27.5s (27.8x, 15.7%) equake trunk2 : 1.0s nl: 7.8s ( 7.9x, 23.7%) mc:23.5s (23.7x, 28.0%) -- gap -- gap vg-3.1.X : 0.7s nl: 9.3s (14.0x, -----) mc:40.1s (59.8x, -----) gap trunk1 : 0.7s nl: 6.7s (10.0x, 28.7%) mc:36.6s (54.6x, 8.8%) gap trunk2 : 0.7s nl: 6.6s ( 9.9x, 29.2%) mc:31.1s (46.5x, 22.3%) -- gcc -- gcc vg-3.1.X : 1.3s nl:19.0s (14.6x, -----) mc:73.0s (56.2x, -----) gcc trunk1 : 1.3s nl:14.6s (11.2x, 23.2%) mc:64.2s (49.4x, 12.1%) gcc trunk2 : 1.3s nl:14.7s (11.3x, 22.6%) mc:58.0s (44.6x, 20.6%) -- gzip -- gzip vg-3.1.X : 1.6s nl:10.3s ( 6.5x, -----) mc:48.2s (30.3x, -----) gzip trunk1 : 1.6s nl: 7.6s ( 4.8x, 26.6%) mc:39.9s (25.1x, 17.3%) gzip trunk2 : 1.6s nl: 7.5s ( 4.7x, 26.9%) mc:29.0s (18.2x, 39.8%) -- mcf -- mcf vg-3.1.X : 0.2s nl: 1.2s ( 5.9x, -----) mc: 5.0s (24.0x, -----) mcf trunk1 : 0.2s nl: 0.9s ( 4.5x, 23.6%) mc: 3.6s (17.3x, 27.8%) mcf trunk2 : 0.2s nl: 1.0s ( 4.6x, 22.0%) mc: 3.0s (14.3x, 40.3%) -- mesa -- mesa vg-3.1.X : 2.1s nl:22.4s (10.8x, -----) mc:91.3s (44.1x, -----) mesa trunk1 : 2.1s nl:16.1s ( 7.8x, 28.2%) mc:73.2s (35.4x, 19.9%) mesa trunk2 : 2.1s nl:16.0s ( 7.7x, 28.5%) mc:60.4s (29.2x, 33.8%) -- mgrid -- mgrid vg-3.1.X :36.8s nl:294.2s ( 8.0x, -----) mc:964.4s (26.2x, -----) mgrid trunk1 :36.8s nl:211.5s ( 5.8x, 28.1%) mc:893.4s (24.3x, 7.4%) mgrid trunk2 :36.8s nl:216.5s ( 5.9x, 26.4%) mc:782.4s (21.3x, 18.9%) -- parser -- parser vg-3.1.X : 2.7s nl:18.6s ( 6.9x, -----) mc:106.5s (39.4x, -----) parser trunk1 : 2.7s nl:14.1s ( 5.2x, 24.5%) mc:85.3s (31.6x, 19.9%) parser trunk2 : 2.7s nl:14.1s ( 5.2x, 24.5%) mc:61.1s (22.6x, 42.6%) -- sixtrack -- sixtrack vg-3.1.X : 9.9s nl:85.5s ( 8.6x, -----) mc:262.2s (26.4x, -----) sixtrack trunk1 : 9.9s nl:65.9s ( 6.6x, 23.0%) mc:238.2s (24.0x, 9.2%) sixtrack trunk2 : 9.9s nl:67.5s ( 6.8x, 21.1%) mc:213.8s (21.5x, 18.4%) -- swim -- swim vg-3.1.X : 0.5s nl: 4.1s ( 7.9x, -----) mc:11.3s (21.7x, -----) swim trunk1 : 0.5s nl: 3.1s ( 6.0x, 23.6%) mc: 9.9s (19.0x, 12.7%) swim trunk2 : 0.5s nl: 3.1s ( 6.0x, 24.3%) mc: 8.9s (17.1x, 21.4%) -- twolf -- twolf vg-3.1.X : 0.3s nl: 2.5s ( 9.1x, -----) mc: 9.7s (34.5x, -----) twolf trunk1 : 0.3s nl: 2.1s ( 7.6x, 16.9%) mc: 9.3s (33.1x, 3.9%) twolf trunk2 : 0.3s nl: 2.1s ( 7.6x, 16.9%) mc: 8.3s (29.8x, 13.7%) -- vortex -- vortex vg-3.1.X : 4.1s nl:71.6s (17.3x, -----) mc:386.6s (93.6x, -----) vortex trunk1 : 4.1s nl:47.0s (11.4x, 34.4%) mc:338.8s (82.0x, 12.4%) vortex trunk2 : 4.1s nl:47.0s (11.4x, 34.4%) mc:277.3s (67.1x, 28.3%) -- vpr -- vpr vg-3.1.X : 1.5s nl:14.2s ( 9.5x, -----) mc:62.9s (41.9x, -----) vpr trunk1 : 1.5s nl:10.7s ( 7.1x, 24.4%) mc:56.0s (37.4x, 10.9%) vpr trunk2 : 1.5s nl:10.7s ( 7.2x, 24.3%) mc:53.8s (35.9x, 14.4%) -- wupwise -- wupwise vg-3.1.X : 7.8s nl:113.2s (14.5x, -----) mc:349.9s (44.7x, -----) wupwise trunk1 : 7.8s nl:86.9s (11.1x, 23.2%) mc:303.6s (38.8x, 13.2%) wupwise trunk2 : 7.8s nl:85.3s (10.9x, 24.7%) mc:268.8s (34.4x, 23.2%) == 20 programs, 120 timings ================= |
|
From: Ashley P. <as...@qu...> - 2006-03-27 13:29:05
|
On Mon, 2006-03-27 at 22:46 +1100, Nicholas Nethercote wrote: > Hi, > > I just merged in the COMPVBITS branch. Hopefully things will keep working, > but let me know if you have any problems with Memcheck as a result. Thank you. This makes a big difference to the performance I'm seeing, between this and r5774 last week I'm now seeing a factor of 10 speedup in many cases. Ashley, |
|
From: Nicholas N. <nj...@cs...> - 2006-03-27 23:58:11
|
On Mon, 27 Mar 2006, Ashley Pittman wrote: >> I just merged in the COMPVBITS branch. Hopefully things will keep working, >> but let me know if you have any problems with Memcheck as a result. > > Thank you. This makes a big difference to the performance I'm seeing, > between this and r5774 last week I'm now seeing a factor of 10 speedup > in many cases. Whoa! That's great :) Do you know how much of the improvement is from r5774? Because I wouldn't expect the COMPVBITS changes to improve performance by more than a factor of 1.5 or so. I imagine the spinning change would have much more of an impact for your applications. Nick |
|
From: Ashley P. <as...@qu...> - 2006-03-28 14:54:51
|
On Tue, 2006-03-28 at 10:52 +1100, Nicholas Nethercote wrote: > On Mon, 27 Mar 2006, Ashley Pittman wrote: > > >> I just merged in the COMPVBITS branch. Hopefully things will keep working, > >> but let me know if you have any problems with Memcheck as a result. > > > > Thank you. This makes a big difference to the performance I'm seeing, > > between this and r5774 last week I'm now seeing a factor of 10 speedup > > in many cases. > > Whoa! That's great :) Do you know how much of the improvement is from > r5774? Because I wouldn't expect the COMPVBITS changes to improve > performance by more than a factor of 1.5 or so. I imagine the spinning > change would have much more of an impact for your applications. I'm not sure where it all came from, I already had r5774 in my tree before it was committed which probably made most of the difference, when I updated yesterday I got just over a months worth of updates less r5774. memcheck is now in the same ballpark as --tool=none which it never has been before. Ashley, |
|
From: Julian S. <js...@ac...> - 2006-03-28 16:46:10
|
> r5774. memcheck is now in the same ballpark as --tool=none which it > never has been before. Ashley - you mean when you do MPI bandwidth tests? 'cos although memcheck is much improved as a result of this, it's still nowhere near as fast as 'none' for computation. For a MPI test though, in which mostly you're waiting for the nic, all that V is really doing is painting memory as it goes in/out, as per your ioctls wrappers, and for this particular case - changing memory permissions - Nick's work does indeed give a big speedup. J |
|
From: Ashley P. <as...@qu...> - 2006-03-28 17:01:38
|
On Tue, 2006-03-28 at 17:45 +0100, Julian Seward wrote: > > r5774. memcheck is now in the same ballpark as --tool=none which it > > never has been before. > > Ashley - you mean when you do MPI bandwidth tests? It's MPI latency I've been looking at, natively it's ~2uSec, under valgrind with --tool=none it's about 35 uSec and until recently it's been ~800 uSec under memcheck. memcheck is now almost the same as none. > 'cos although > memcheck is much improved as a result of this, it's still nowhere > near as fast as 'none' for computation. For a MPI test though, > in which mostly you're waiting for the nic, all that V is really > doing is painting memory as it goes in/out, as per your ioctls > wrappers, and for this particular case - changing memory permissions - > Nick's work does indeed give a big speedup. The performance of the client checks will have had an effect here although as you say the code doesn't do much computation. Ashley, |
|
From: Nicholas N. <nj...@cs...> - 2006-03-28 22:23:58
|
On Tue, 28 Mar 2006, Julian Seward wrote: >> r5774. memcheck is now in the same ballpark as --tool=none which it >> never has been before. > > Ashley - you mean when you do MPI bandwidth tests? 'cos although > memcheck is much improved as a result of this, it's still nowhere > near as fast as 'none' for computation. For a MPI test though, > in which mostly you're waiting for the nic, all that V is really > doing is painting memory as it goes in/out, as per your ioctls > wrappers, and for this particular case - changing memory permissions - > Nick's work does indeed give a big speedup. Oh yeah, the new version is much faster at changing memory permissions over large areas. perf/sarp.c is a synthetic benchmark that tests this, it runs more than twice as fast now compared to 3.1.1. The real program it was based on runs about 1.6 times faster with the new version. Nick |
|
From: Julian S. <js...@ac...> - 2006-03-28 16:47:30
|
> The post-COMPVBITS-merge trunk has a geometric mean time reduction of > 22.9%, which means the programs run on average 1.30x faster than 3.1.1. That's very excellent. Cool. An excellent outcome all round. J |