Re: [Sbcl-devel] MORE NUMBERS

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Andreas Fuchs <as...@bo...> writes:

> Hi all,
>
> this is a test of the emergency autobuilder & benchmark runner. In the
> next few days, you can expect to see more of these in this place. If
> all is well, there will be a mail sent to the list once per day (if
> there were commits), starting this Thursday.

Excellent!

> I believe it should be possible and interesting to make a build farm
> with this set of scripts (especially to see performance
> improvements/degradation in non-x86 backends as well). All you would
> need is rsync, tla, ploticus (perhaps, I'm thinking about a version
> without ploticus) and a reasonably fast machine. If you're interested
> in running this benchmark, just ask (-:

I think in particular a fast PowerPC benchmarker (again with
historical data inasmuch as this is possible) would be very useful.
My suspcion is that there are things that are sufficiently dissimilar
between the PowerPC and x86 platforms that will make this interesting.

> The numbers below were generated by running the emarsden benchmarks
> three times. The format for the reference column is as follows:
>       [mean(samples_ref) | standard error(samples_ref)]
> The others are relative to the reference:
>       (mean(samples_version) / mean(samples_ref)) | standard error (samples_version)

Just for the record, what I would like to see quoted for the relative
times are 
  mean(samples_version) / mean(samples_ref)
and
  standard error(samples_version) / mean(samples_ref)

Since I'm aware that not everyone has done as much statistical theory
as they should^W^WI have (and even if they have they may have a
slightly different convention for presenting results), let me go into
this a little more.  People not interested in the mathematical detail
can safely elide some of this.

Imagine taking k samples from a distribution X, with a view to
measuring the mean, \mu, of X.  By taking the mean of the sample, we
get an estimate for the population mean.

Label the samples x_1 ... x_k, and compute a statistic
  \bar{x} = \frac{1}{k} \sum_i x_i.
This statistic is an unbiased estimator for the mean of the
population:
  E_X(\bar{x}) = E(\frac{1}{k} \sum_i x_i) 
               = \frac{1}{k} \sum_i E(x_i)
               = \frac{1}{k} \sum_i \mu
               = \mu.

However, not only do we want to know an estimate for the population
mean (in this specific case, "how much time does it take to run this
benchmark?"), but we also want to know how wide of the mark our
estimate could be.  For this, we want to compute the variance of our
statistic:
  Var_X(\bar{x}) = Var(\frac{1}{k} \sum_i x_i)
                 = E([\frac{1}{k} \sum_i x_i]^2) - E(\frac{1}{k} \sum_i x_i)^2
                 = \frac{1}{k^2}E([\sum_i x_i]^2) - \mu^2
                 = \frac{1}{k^2}(kE(x^2) + k(k-1)\mu^2) - \mu^2
                 = \frac{1}{k}(E(x^2) - \mu^2)
                 = \frac{1}{k}\sigma^2
where \sigma^2 is the variance of the population X.

So to get an estimate of the error on a given estimate of the mean, we
estimate the standard deviation of the population, and divide by the
square root of the number of samples.[*] This is what I mean, at
least, when I talk about the "standard error" or "standard error on
the mean", and it's what physicists quote when the say "foo was
measured to be 20 +/- 3".

So, with all that said (and even taking account of the fact that the
calculation of the errors presented in Andreas' e-mail is bogus for
mean times not equalling 1) there are some oddities in the benchmark
results.  I've snipped to leave in just the odd ones, but I'll reserve
further comment until more data arrive.

Cheers,

Christophe

[*] In case you're wondering, no, this bit isn't rigorous.  It's good
enough for Physics, though :-)

> Benchmark                        Reference        0.8.8.3        0.8.8.1
> -------------------------------------------------------------------------------------
> LOAD-FASL                [      1.60|0.00]      0.95|0.00      1.01|0.00
> SUM-PERMUTATIONS         [      8.02|0.01]      1.05|0.01      0.99|0.00
> FPRINT/UGLY              [      3.27|0.00]      1.10|0.00      0.99|0.00
> CLOS/simple-instantiate  [      0.76|0.00]      1.04|0.00      1.04|0.00
> CLOS/complex-methods     [      2.21|0.01]      1.06|0.01      1.00|0.01

Incidentally, this presentation isn't terribly easy to read... the
graphs are a lot better, fortunately.  For this, I think it would help
if (a) the Reference were something altogether different (say, CMUCL,
or a long-distant version of sbcl), and (b) if the remaining columns
were arranged chronologically rightwards rather than leftwards (or
than in order of performing the benchmarks, even worse... :-)

Other than that, it's looking good.  Thanks very much for your work on
this!

Cheers,

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)

Re: [Sbcl-devel] MORE NUMBERS

Common Lisp compiler and runtime

Re: [Sbcl-devel] MORE NUMBERS