[Lse-tech] Re: [Linstab] [BENCHMARK] 2.6.0-test4-mm3 regression/history results

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Andrew Morton wrote:

>Mark Peloquin <pel...@au...> wrote:
>  
>
>> Here is a link to the latest history graphs.
>>
>> http://ltcperf.ncsa.uiuc.edu/data/history-graphs/
>>
>> Nightly Regression Summary for 2.6.0-test4-mm2 vs 2.6.0-test4-mm3
>>    
>>
>
>Thanks.
>
>It would be nice if you could add a few words of interpretation to your
>email announcements.  You are more skilled in understanding these tests and
>have the benefit of having been running them for some time.  In other
>words: make it easy for us ;)
>

I apologize for my haste. Steve Pratt usually post comments on the 
results, however he was not available today, and I hope the new 
thumbnails would make it easier for others to find problems and initiate 
discussions about the results. I will ensure the necessary commentary 
will be included in future postings.

>
>The results which we're most interested in at this time are specjbb and
>volanomark.
>
>But
>
>	http://ltcperf.ncsa.uiuc.edu/data/2.6.0-test4-mm3/2.6.0-test4-vs-2.6.0-test4-mm3/specjbb.html
>
>has errors where the numbers should be and
>
>	http://ltcperf.ncsa.uiuc.edu/data/2.6.0-test4-mm3/2.6.0-test4-vs-2.6.0-test4-mm3/volanomark.html
>
>has only a single result, which suspiciously claims that -mm3 is 330%
>faster.
>

The problem lies in the 2.6.0-test4 folder and the presence of some 
unwanted data from a stray benchmark run. Our scripting, not expecting 
stray data, ended up using the stray data as 2.6.0-test4 results, thus 
corrupting the any comparisons to 2.6.0-test4-vs-2.6.0-test4-mm3 as well 
as the history graphs. For the immediate time being, disregard 
2.6.0-test4 results in the 2.6.0-test4-vs-2.6.0-test4-mm3. The 
2.6.0-test4 comparisons vs any other kernel (mm2 for example) will 
contain the correct 2.6.0-test4 results. I've fixed the problem and am 
in the process of  regenerating the corrected comparisons and history 
graphs.

It takes a few hours to regenerate the history graphs, I'll post the 
corrected data asap.

>
>
>The comparative graphs are really nice (bit too much rawiobench stuff though).
>
>I think 
>
>	http://ltcperf.ncsa.uiuc.edu/data/history-graphs/specjbb.results.avg.plot.16.png
>
>is telling me that -mm2 got a lot faster, and -mm3 faster still.
>

mm1 results were worse for both the 16 & 19 warehouse data points. mm2 
seems to get the results back and make it slightly faster than 
2.6.0-test4. mm3 made further improvements.

>But that doesn't gel with the tables of numbers which we saw with
>mm2.  
>
>
>And
>
>	http://ltcperf.ncsa.uiuc.edu/data/history-graphs/specjbb.utilization.idle.avg.plot.16.png
>	http://ltcperf.ncsa.uiuc.edu/data/history-graphs/specjbb.utilization.idle.avg.plot.19.png
>
>are showing good reductions in idle time.
>

mm2/mm3 have notably improved the user and idle time figures, with an 
increase in system time (not sure if this is a good thing or not), and a 
decrease in the number of context switches (a good thing).

specjbb comparison of 2.6.0-test4 vs 2.6.0-test4-mm3

Results:Throughput (Graph)

                                tolerance = 0.00 + 3.00% of 2.6.0-test4
            2.6.0-test4 2.6.0-test4-mm3
  # of WHs      OPs/sec      OPs/sec    %diff         diff    tolerance
---------- ------------ ------------ -------- ------------ ------------
         1      9783.46     10063.75     2.86       280.29       293.50
         4     33783.93     35417.80     4.84      1633.87      1013.52  *
         7     54401.52     53841.78    -1.03      -559.74      1632.05
        10     56861.59     57359.70     0.88       498.11      1705.85
        13     56024.86     55679.72    -0.62      -345.14      1680.75
        16     43874.77     51468.65    17.31      7593.88      1316.24  *
        19     32658.83     35740.45     9.44      3081.62       979.76  *

Results:User CPU Utilization (Graph)

                                tolerance = 0.00 + 3.00% of 2.6.0-test4
            2.6.0-test4 2.6.0-test4-mm3
  # of WHs         %CPU         %CPU    %diff         diff    tolerance
---------- ------------ ------------ -------- ------------ ------------
         1        11.90        11.81    -0.76        -0.09         0.36
         4        49.40        49.47     0.14         0.07         1.48
         7        86.51        86.26    -0.29        -0.25         2.60
        10        97.91        97.83    -0.08        -0.08         2.94
        13        97.55        97.19    -0.37        -0.36         2.93
        16        82.99        95.09    14.58        12.10         2.49  *
        19        67.40        91.93    36.39        24.53         2.02  *

Results:Idle CPU Utilization (Graph)

                                tolerance = 1.00 + 3.00% of 2.6.0-test4
            2.6.0-test4 2.6.0-test4-mm3
  # of WHs         %CPU         %CPU    %diff         diff    tolerance
---------- ------------ ------------ -------- ------------ ------------
         1        87.30        87.30     0.00         0.00         3.62
         4        49.53        49.51    -0.04        -0.02         2.49
         7        12.40        12.34    -0.48        -0.06         1.37
        10         0.36         0.35    -2.78        -0.01         1.01
        13         1.20         0.77   -35.83        -0.43         1.04
        16        15.17         2.28   -84.97       -12.89         1.46  *
        19        30.66         4.28   -86.04       -26.38         1.92  *

Results:System CPU Utilization (Graph)

                                tolerance = 1.00 + 3.00% of 2.6.0-test4
            2.6.0-test4 2.6.0-test4-mm3
  # of WHs         %CPU         %CPU    %diff         diff    tolerance
---------- ------------ ------------ -------- ------------ ------------
         1         0.72         0.81    12.50         0.09         1.02
         4         0.99         0.94    -5.05        -0.05         1.03
         7         1.07         1.35    26.17         0.28         1.03
        10         1.74         1.82     4.60         0.08         1.05
        13         1.25         2.04    63.20         0.79         1.04
        16         1.84         2.62    42.39         0.78         1.06
        19         1.91         3.79    98.43         1.88         1.06  *

Results:Context Switches (Graph)

                               tolerance = 0.00 + 10.00% of 2.6.0-test4
            2.6.0-test4 2.6.0-test4-mm3
  # of WHs    cswch/sec    cswch/sec    %diff         diff    tolerance
---------- ------------ ------------ -------- ------------ ------------
         1       179.49       179.42    -0.04        -0.07        17.95
         4       183.36       183.52     0.09         0.16        18.34
         7       221.62       217.25    -1.97        -4.37        22.16
        10       573.51       554.60    -3.30       -18.91        57.35
        13      2586.15      1765.67   -31.73      -820.48       258.61  *
        16     18150.40      5299.69   -70.80    -12850.71      1815.04  *
        19     25743.95     10634.13   -58.69    -15109.82      2574.40  *

>
>I think
>
>	http://ltcperf.ncsa.uiuc.edu/data/history-graphs/volanomark.throughput.plot.1.png
>
>is telling me that -mm still hasn't fixed the volanomark problems, but given
>the problems with the tabulated results I'm not very confident in that.
>

I've looked at the corrected comparisons and volanomark results are 
still down by about 11%.

volanomark comparison of 2.6.0-test4 vs 2.6.0-test4-mm3

Results:Throughput (Graph)

                                tolerance = 0.00 + 3.00% of 2.6.0-test4
            2.6.0-test4 2.6.0-test4-mm3
               Msgs/sec     Msgs/sec    %diff         diff    tolerance
---------- ------------ ------------ -------- ------------ ------------
         1        40757        36197   -11.19     -4560.00      1222.71  *