|
From: Nicholas N. <nj...@cs...> - 2006-03-27 11:46:28
|
Hi, I just merged in the COMPVBITS branch. Hopefully things will keep working, but let me know if you have any problems with Memcheck as a result. The following figures show the performance improvement for 20 of the 26 SPEC2000 benchmarks, running the "test" inputs on a 3GHz P4. I compare Valgrind 3.1.X vs. the pre-COMPVBITS-merge-trunk vs. the post-COMPVBITS-merge-trunk. You can see we'd already got good improvements in the trunk since 3.1.X, and this commit improves things even more. In summary, the pre-COMPVBITS-merge trunk has a geometric mean time reduction of 11.5%, which means the programs ran on average 1.13x faster than 3.1.1. The post-COMPVBITS-merge trunk has a geometric mean time reduction of 22.9%, which means the programs run on average 1.30x faster than 3.1.1. Nick -- ammp -- ammp vg-3.1.X : 9.8s nl:48.1s ( 4.9x, -----) mc:403.6s (41.2x, -----) ammp trunk1 : 9.8s nl:37.8s ( 3.9x, 21.4%) mc:356.7s (36.4x, 11.6%) ammp trunk2 : 9.8s nl:38.0s ( 3.9x, 21.0%) mc:313.4s (32.0x, 22.4%) -- applu -- applu vg-3.1.X : 0.5s nl: 5.5s (10.3x, -----) mc:16.8s (31.7x, -----) applu trunk1 : 0.5s nl: 4.3s ( 8.0x, 22.0%) mc:15.6s (29.4x, 7.4%) applu trunk2 : 0.5s nl: 4.3s ( 8.1x, 21.4%) mc:14.5s (27.3x, 14.0%) -- apsi -- apsi vg-3.1.X : 8.5s nl:67.5s ( 8.0x, -----) mc:232.3s (27.5x, -----) apsi trunk1 : 8.5s nl:54.2s ( 6.4x, 19.7%) mc:205.7s (24.3x, 11.4%) apsi trunk2 : 8.5s nl:53.4s ( 6.3x, 20.9%) mc:184.4s (21.8x, 20.6%) -- art -- art vg-3.1.X : 3.2s nl:30.3s ( 9.3x, -----) mc:125.2s (38.5x, -----) art trunk1 : 3.2s nl:21.2s ( 6.5x, 30.2%) mc:105.6s (32.5x, 15.7%) art trunk2 : 3.2s nl:21.2s ( 6.5x, 30.1%) mc:93.2s (28.7x, 25.6%) -- bzip2 -- bzip2 vg-3.1.X : 7.3s nl:36.7s ( 5.0x, -----) mc:187.9s (25.6x, -----) bzip2 trunk1 : 7.3s nl:28.4s ( 3.9x, 22.5%) mc:167.1s (22.8x, 11.1%) bzip2 trunk2 : 7.3s nl:28.6s ( 3.9x, 22.1%) mc:148.9s (20.3x, 20.8%) -- crafty -- crafty vg-3.1.X : 3.9s nl:36.6s ( 9.4x, -----) mc:186.4s (47.8x, -----) crafty trunk1 : 3.9s nl:28.2s ( 7.2x, 23.0%) mc:176.8s (45.3x, 5.2%) crafty trunk2 : 3.9s nl:28.1s ( 7.2x, 23.4%) mc:157.0s (40.3x, 15.8%) -- equake -- equake vg-3.1.X : 1.0s nl:10.2s (10.3x, -----) mc:32.7s (33.0x, -----) equake trunk1 : 1.0s nl: 7.9s ( 7.9x, 23.1%) mc:27.5s (27.8x, 15.7%) equake trunk2 : 1.0s nl: 7.8s ( 7.9x, 23.7%) mc:23.5s (23.7x, 28.0%) -- gap -- gap vg-3.1.X : 0.7s nl: 9.3s (14.0x, -----) mc:40.1s (59.8x, -----) gap trunk1 : 0.7s nl: 6.7s (10.0x, 28.7%) mc:36.6s (54.6x, 8.8%) gap trunk2 : 0.7s nl: 6.6s ( 9.9x, 29.2%) mc:31.1s (46.5x, 22.3%) -- gcc -- gcc vg-3.1.X : 1.3s nl:19.0s (14.6x, -----) mc:73.0s (56.2x, -----) gcc trunk1 : 1.3s nl:14.6s (11.2x, 23.2%) mc:64.2s (49.4x, 12.1%) gcc trunk2 : 1.3s nl:14.7s (11.3x, 22.6%) mc:58.0s (44.6x, 20.6%) -- gzip -- gzip vg-3.1.X : 1.6s nl:10.3s ( 6.5x, -----) mc:48.2s (30.3x, -----) gzip trunk1 : 1.6s nl: 7.6s ( 4.8x, 26.6%) mc:39.9s (25.1x, 17.3%) gzip trunk2 : 1.6s nl: 7.5s ( 4.7x, 26.9%) mc:29.0s (18.2x, 39.8%) -- mcf -- mcf vg-3.1.X : 0.2s nl: 1.2s ( 5.9x, -----) mc: 5.0s (24.0x, -----) mcf trunk1 : 0.2s nl: 0.9s ( 4.5x, 23.6%) mc: 3.6s (17.3x, 27.8%) mcf trunk2 : 0.2s nl: 1.0s ( 4.6x, 22.0%) mc: 3.0s (14.3x, 40.3%) -- mesa -- mesa vg-3.1.X : 2.1s nl:22.4s (10.8x, -----) mc:91.3s (44.1x, -----) mesa trunk1 : 2.1s nl:16.1s ( 7.8x, 28.2%) mc:73.2s (35.4x, 19.9%) mesa trunk2 : 2.1s nl:16.0s ( 7.7x, 28.5%) mc:60.4s (29.2x, 33.8%) -- mgrid -- mgrid vg-3.1.X :36.8s nl:294.2s ( 8.0x, -----) mc:964.4s (26.2x, -----) mgrid trunk1 :36.8s nl:211.5s ( 5.8x, 28.1%) mc:893.4s (24.3x, 7.4%) mgrid trunk2 :36.8s nl:216.5s ( 5.9x, 26.4%) mc:782.4s (21.3x, 18.9%) -- parser -- parser vg-3.1.X : 2.7s nl:18.6s ( 6.9x, -----) mc:106.5s (39.4x, -----) parser trunk1 : 2.7s nl:14.1s ( 5.2x, 24.5%) mc:85.3s (31.6x, 19.9%) parser trunk2 : 2.7s nl:14.1s ( 5.2x, 24.5%) mc:61.1s (22.6x, 42.6%) -- sixtrack -- sixtrack vg-3.1.X : 9.9s nl:85.5s ( 8.6x, -----) mc:262.2s (26.4x, -----) sixtrack trunk1 : 9.9s nl:65.9s ( 6.6x, 23.0%) mc:238.2s (24.0x, 9.2%) sixtrack trunk2 : 9.9s nl:67.5s ( 6.8x, 21.1%) mc:213.8s (21.5x, 18.4%) -- swim -- swim vg-3.1.X : 0.5s nl: 4.1s ( 7.9x, -----) mc:11.3s (21.7x, -----) swim trunk1 : 0.5s nl: 3.1s ( 6.0x, 23.6%) mc: 9.9s (19.0x, 12.7%) swim trunk2 : 0.5s nl: 3.1s ( 6.0x, 24.3%) mc: 8.9s (17.1x, 21.4%) -- twolf -- twolf vg-3.1.X : 0.3s nl: 2.5s ( 9.1x, -----) mc: 9.7s (34.5x, -----) twolf trunk1 : 0.3s nl: 2.1s ( 7.6x, 16.9%) mc: 9.3s (33.1x, 3.9%) twolf trunk2 : 0.3s nl: 2.1s ( 7.6x, 16.9%) mc: 8.3s (29.8x, 13.7%) -- vortex -- vortex vg-3.1.X : 4.1s nl:71.6s (17.3x, -----) mc:386.6s (93.6x, -----) vortex trunk1 : 4.1s nl:47.0s (11.4x, 34.4%) mc:338.8s (82.0x, 12.4%) vortex trunk2 : 4.1s nl:47.0s (11.4x, 34.4%) mc:277.3s (67.1x, 28.3%) -- vpr -- vpr vg-3.1.X : 1.5s nl:14.2s ( 9.5x, -----) mc:62.9s (41.9x, -----) vpr trunk1 : 1.5s nl:10.7s ( 7.1x, 24.4%) mc:56.0s (37.4x, 10.9%) vpr trunk2 : 1.5s nl:10.7s ( 7.2x, 24.3%) mc:53.8s (35.9x, 14.4%) -- wupwise -- wupwise vg-3.1.X : 7.8s nl:113.2s (14.5x, -----) mc:349.9s (44.7x, -----) wupwise trunk1 : 7.8s nl:86.9s (11.1x, 23.2%) mc:303.6s (38.8x, 13.2%) wupwise trunk2 : 7.8s nl:85.3s (10.9x, 24.7%) mc:268.8s (34.4x, 23.2%) == 20 programs, 120 timings ================= |