From: Andreas Fuchs <asf@bo...>  20040302 15:05:44

Hi all, this is a test of the emergency autobuilder & benchmark runner. In the next few days, you can expect to see more of these in this place. If all is well, there will be a mail sent to the list once per day (if there were commits), starting this Thursday. As a nicelooking supplement to this mail, you can also visit http://sbcl.boinkor.net/benchmark/ to see a ploticused version of the numbers (with history!) below. I believe it should be possible and interesting to make a build farm with this set of scripts (especially to see performance improvements/degradation in nonx86 backends as well). All you would need is rsync, tla, ploticus (perhaps, I'm thinking about a version without ploticus) and a reasonably fast machine. If you're interested in running this benchmark, just ask (: The numbers below were generated by running the emarsden benchmarks three times. The format for the reference column is as follows: [mean(samples_ref)  standard error(samples_ref)] The others are relative to the reference: (mean(samples_version) / mean(samples_ref))  standard error (samples_version) Ok. So here are the numbers: Benchmark Reference 0.8.8.3 0.8.8.1  COMPILER [ 11.060.01] 1.000.01 0.990.02 LOADFASL [ 1.600.00] 0.950.00 1.010.00 SUMPERMUTATIONS [ 8.020.01] 1.050.01 0.990.00 WALKLIST/SEQ [ 0.260.00] 1.000.00 1.000.00 WALKLIST/MESS [ 0.420.01] 0.020.00 0.500.00 BOYER [ 11.460.06] 1.000.01 1.000.01 BROWSE [ 1.430.00] 1.010.00 1.030.00 DDERIV [ 1.250.00] 1.000.00 1.000.00 DERIV [ 1.500.00] 1.000.00 1.000.00 DESTRUCTIVE [ 1.160.00] 1.000.01 0.990.00 DIV2TEST1 [ 2.270.00] 1.000.00 1.000.00 DIV2TEST2 [ 2.650.01] 1.000.00 1.010.00 FFT [ 0.140.00] 1.000.00 1.010.00 FRPOLY/FIXNUM [ 1.110.00] 1.000.00 0.980.00 FRPOLY/BIGNUM [ 1.460.00] 0.960.00 0.960.00 FRPOLY/FLOAT [ 1.820.00] 0.990.00 0.990.00 PUZZLE [ 0.610.00] 0.980.00 0.990.00 TAK [ 0.800.00] 0.960.00 0.960.00 CTAK [ 0.740.00] 1.030.00 1.030.00 TRTAK [ 0.800.00] 0.960.00 1.000.00 TAKL [ 0.930.00] 1.070.00 1.000.00 STAK [ 0.990.00] 1.020.00 1.010.00 FPRINT/UGLY [ 3.270.00] 1.100.00 0.990.00 FPRINT/PRETTY [ 18.330.02] 1.030.00 0.990.01 TRAVERSE [ 6.740.00] 1.000.05 1.000.00 TRIANGLE [ 1.470.00] 1.000.00 1.000.00 RICHARDS [ 1.140.00] 1.030.02 1.010.00 FACTORIAL [ 1.120.00] 1.000.00 1.010.00 FIB [ 0.950.00] 1.000.00 1.000.00 FIBRATIO [ 0.810.00] 1.000.00 0.980.00 ACKERMANN [ 13.630.08] 1.020.09 1.000.09 MANDELBROT/COMPLEX [ 25.510.00] 0.980.07 0.980.00 MANDELBROT/DFLOAT [ 14.220.01] 1.000.03 1.000.01 MRG32K3A [ 1.470.00] 1.000.00 0.990.00 CRC40 [ 54.550.02] 1.010.07 1.000.11 BIGNUM/ELEM1001000 [ 1.130.00] 1.000.00 1.000.00 BIGNUM/ELEM1000100 [ 5.560.00] 1.000.00 1.000.00 BIGNUM/ELEM100001 [ 5.760.00] 1.000.00 1.000.00 BIGNUM/PARI10010 [ 1.320.00] 1.000.00 1.000.00 BIGNUM/PARI2005 [ 15.300.00] 1.000.00 1.000.00 PIDECIMAL/SMALL [ 52.050.00] 1.000.00 1.000.00 PIDECIMAL/BIG [ 101.500.00] 1.000.00 1.000.00 PIATAN [ 3.310.00] 1.000.00 1.000.00 PIRATIOS [ 10.440.00] 0.990.00 1.000.00 SLURPLINES [ 17.100.00] 1.000.02 0.970.02 HASHSTRINGS [ 1.280.00] 0.990.00 1.000.00 HASHINTEGERS [ 3.240.00] 1.020.00 1.040.01 BOEHMGC [ 5.250.00] 1.000.02 1.000.01 DEFLATEFILE [ 1.750.00] 1.000.00 1.000.00 1DARRAYS [ 0.320.00] 1.000.00 1.000.00 2DARRAYS [ 1.380.01] 0.980.01 0.990.02 3DARRAYS [ 4.070.02] 1.000.02 0.990.01 BITVECTORS [ 7.640.01] 1.000.06 1.030.03 BENCHSTRINGS [ 1.470.00] 1.000.00 1.000.00 fillstrings/adjustable [ 84.320.01] 1.000.02 1.000.00 STRINGCONCAT [ 86.860.02] 1.020.15 0.990.01 SEARCHSEQUENCE [ 0.650.00] 1.010.01 1.000.00 CLOS/defclass [ 7.200.00] 1.000.00 1.000.00 CLOS/defmethod [ 14.220.01] 1.010.01 0.990.01 CLOS/instantiate [ 20.750.02] 1.010.06 0.980.06 CLOS/simpleinstantiate [ 0.760.00] 1.040.00 1.040.00 CLOS/methodcalls [ 5.520.00] 1.020.01 1.000.02 CLOS/method+after [ 11.460.01] 1.020.02 1.000.01 CLOS/complexmethods [ 2.210.01] 1.060.01 1.000.01 EQLSPECIALIZEDFIB [ 0.810.00] 1.000.00 1.000.00 Reference time in first column is in seconds; other columns are relative Reference implementation: SBCL 0.8.8 Impl 0.8.8.3: SBCL 0.8.8.3 Impl 0.8.8.1: SBCL 0.8.8.1 === Test machine === Machineinstance: walrus.boinkor.net Machinetype: X86 Machineversion: NIL FreeBSD walrus.boinkor.net 4.9STABLE FreeBSD 4.9STABLE #4: Fri Dec 5 00:59:29 CET 2003 root@...:/usr/obj/usr/src/sys/WALRUS i386 Have fun,  Andreas Fuchs, <asf@...>, asf@..., antifuchs 
From: Eric Marsden <emarsden@la...>  20040302 15:20:49

>>>>> "af" == Andreas Fuchs <asf@...> writes: af> As a nicelooking supplement to this mail, you can also visit af> af> http://sbcl.boinkor.net/benchmark/ af> af> to see a ploticused version of the numbers (with history!) below. looks good! I think it would be nice to implement the test grouping feature in clbench (:group keyword in tests.lisp), which would allow you to separate the tests into a number of pages arrays gabriel CLOS bignum misc etc [BTW the bogus results for WALKLIST/MESS are due to a bug in the released version of clbench (nondeterministic test); I've sent Andreas an updated version] I think I'll make a commonlisp.net project for clbench, so that people can track revisions more easily.  Eric Marsden <URL:http://www.laas.fr/~emarsden/>; 
From: Christophe Rhodes <csr21@ca...>  20040302 16:06:06

Andreas Fuchs <asf@...> writes: > Hi all, > > this is a test of the emergency autobuilder & benchmark runner. In the > next few days, you can expect to see more of these in this place. If > all is well, there will be a mail sent to the list once per day (if > there were commits), starting this Thursday. Excellent! > I believe it should be possible and interesting to make a build farm > with this set of scripts (especially to see performance > improvements/degradation in nonx86 backends as well). All you would > need is rsync, tla, ploticus (perhaps, I'm thinking about a version > without ploticus) and a reasonably fast machine. If you're interested > in running this benchmark, just ask (: I think in particular a fast PowerPC benchmarker (again with historical data inasmuch as this is possible) would be very useful. My suspcion is that there are things that are sufficiently dissimilar between the PowerPC and x86 platforms that will make this interesting. > The numbers below were generated by running the emarsden benchmarks > three times. The format for the reference column is as follows: > [mean(samples_ref)  standard error(samples_ref)] > The others are relative to the reference: > (mean(samples_version) / mean(samples_ref))  standard error (samples_version) Just for the record, what I would like to see quoted for the relative times are mean(samples_version) / mean(samples_ref) and standard error(samples_version) / mean(samples_ref) Since I'm aware that not everyone has done as much statistical theory as they should^W^WI have (and even if they have they may have a slightly different convention for presenting results), let me go into this a little more. People not interested in the mathematical detail can safely elide some of this. Imagine taking k samples from a distribution X, with a view to measuring the mean, \mu, of X. By taking the mean of the sample, we get an estimate for the population mean. Label the samples x_1 ... x_k, and compute a statistic \bar{x} = \frac{1}{k} \sum_i x_i. This statistic is an unbiased estimator for the mean of the population: E_X(\bar{x}) = E(\frac{1}{k} \sum_i x_i) = \frac{1}{k} \sum_i E(x_i) = \frac{1}{k} \sum_i \mu = \mu. However, not only do we want to know an estimate for the population mean (in this specific case, "how much time does it take to run this benchmark?"), but we also want to know how wide of the mark our estimate could be. For this, we want to compute the variance of our statistic: Var_X(\bar{x}) = Var(\frac{1}{k} \sum_i x_i) = E([\frac{1}{k} \sum_i x_i]^2)  E(\frac{1}{k} \sum_i x_i)^2 = \frac{1}{k^2}E([\sum_i x_i]^2)  \mu^2 = \frac{1}{k^2}(kE(x^2) + k(k1)\mu^2)  \mu^2 = \frac{1}{k}(E(x^2)  \mu^2) = \frac{1}{k}\sigma^2 where \sigma^2 is the variance of the population X. So to get an estimate of the error on a given estimate of the mean, we estimate the standard deviation of the population, and divide by the square root of the number of samples.[*] This is what I mean, at least, when I talk about the "standard error" or "standard error on the mean", and it's what physicists quote when the say "foo was measured to be 20 +/ 3". So, with all that said (and even taking account of the fact that the calculation of the errors presented in Andreas' email is bogus for mean times not equalling 1) there are some oddities in the benchmark results. I've snipped to leave in just the odd ones, but I'll reserve further comment until more data arrive. Cheers, Christophe [*] In case you're wondering, no, this bit isn't rigorous. It's good enough for Physics, though :) > Benchmark Reference 0.8.8.3 0.8.8.1 >  > LOADFASL [ 1.600.00] 0.950.00 1.010.00 > SUMPERMUTATIONS [ 8.020.01] 1.050.01 0.990.00 > FPRINT/UGLY [ 3.270.00] 1.100.00 0.990.00 > CLOS/simpleinstantiate [ 0.760.00] 1.040.00 1.040.00 > CLOS/complexmethods [ 2.210.01] 1.060.01 1.000.01 Incidentally, this presentation isn't terribly easy to read... the graphs are a lot better, fortunately. For this, I think it would help if (a) the Reference were something altogether different (say, CMUCL, or a longdistant version of sbcl), and (b) if the remaining columns were arranged chronologically rightwards rather than leftwards (or than in order of performing the benchmarks, even worse... :) Other than that, it's looking good. Thanks very much for your work on this! Cheers, Christophe  http://wwwjcsu.jesus.cam.ac.uk/~csr21/ +44 1223 510 299/+44 7729 383 757 (setpprintdispatch 'number (lambda (s o) (declare (special b)) (format s b))) (defvar b "~&Just another Lisp hacker~%") (pprint #36rJesusCollegeCambridge) 
From: Andreas Fuchs <asf@bo...>  20040302 22:49:42

Today, Christophe Rhodes <csr21@...> wrote: > Andreas Fuchs <asf@...> writes: > Just for the record, what I would like to see quoted for the > relative times are mean(samples_version) / mean(samples_ref) and > standard error(samples_version) / mean(samples_ref) [snip long and helpful explanation] > > So, with all that said (and even taking account of the fact that the > calculation of the errors presented in Andreas' email is bogus for > mean times not equalling 1) (fixed in the latest version (:) > there are some oddities in the benchmark > results. I've snipped to leave in just the odd ones, but I'll > reserve further comment until more data arrive. (reordered slightly:) >> Benchmark Reference 0.8.8.3 0.8.8.1 >>  >> LOADFASL [ 1.600.00] 0.950.00 1.010.00 >> FPRINT/UGLY [ 3.270.00] 1.100.00 0.990.00 I suspect these two vary so much between runs because there is file I/O happening, and these benchmarks are both rather file I/O intensive. >> SUMPERMUTATIONS [ 8.020.01] 1.050.01 0.990.00 >> CLOS/simpleinstantiate [ 0.760.00] 1.040.00 1.040.00 >> CLOS/complexmethods [ 2.210.01] 1.060.01 1.000.01 No idea about these two. > Incidentally, this presentation isn't terribly easy to read... the > graphs are a lot better, fortunately. For this, I think it would > help if (a) the Reference were something altogether different (say, > CMUCL, or a longdistant version of sbcl), and (b) if the remaining > columns were arranged chronologically rightwards rather than > leftwards (or than in order of performing the benchmarks, even > worse... :) You're right, the presentation isn't too useful in that form. I reordered the implementations now, and gave 2 more decimals to the nonreference implementations' standard error fields. A benchmarked cmucl should also appear any day now (: In other news, I plan to put thumbnails of the plots (with the curve still intact) on the main page and link the bigger pictures from there. That should take away a bit of the tedious scrolling. and btw: 0.8.8.8 is being benchmarked right now. Expect the anxiously awaited 0.8.8.10 results to be completed at ~1:00 tomorrow (CET). (: Good night,  Andreas Fuchs, <asf@...>, asf@..., antifuchs 
From: Nikodemus Siivola <tsiivola@cc...>  20040302 23:38:45

On Tue, 2 Mar 2004, Andreas Fuchs wrote: > >> LOADFASL [ 1.600.00] 0.950.00 1.010.00 > >> FPRINT/UGLY [ 3.270.00] 1.100.00 0.990.00 > > I suspect these two vary so much between runs because there is file > I/O happening, and these benchmarks are both rather file I/O > intensive. Recalling the latest round of fasl profiling, I'd say that SBCL could use another fasl benchmark then. ;) Fasl loading for large real systems (CLX, McCLIM) is typically not IO bound. Or if it is, I'm happy to be proven wrong since I still have that fasl mmapping patch around somewhere... OTOH, it may be that on a monster box like yours that is no longer true  I know my HD is much more up to date then my processor. Cheers,  Nikodemus 