From: Robin G. <rob...@an...> - 2006-12-17 10:59:35
|
---------------------------- Original Message ---------------------------= - Subject: Re: [dacapobench-researchers] Question about dacapo options From= : "Robin Garner" <rob...@an...> Date: Thu, December 14, 2006 5:56 pm To: "Eric Bodden" <eri...@ma...> -------------------------------------------------------------------------= - Eric Bodden wrote: > Hi, all. > > I am now finally getting into the phase where we are trying to get some final runtime numbers using DaCapo and at the moment I am struggling a bit with the problem that the variance between multiple runs with the very same configuration sometimes seems to be relatively high (between 2% and 5%). > > I saw that you have this "-n" option in order to perform multiple runs and I also saw that you >support convergence checks somehow. Is this feature properly documented anywhere? >In particular, is there a way to do something like the following: "run multiple times, >accumulating an average runtime and a confidence value until this value is below a given confidence interval"? By doing taking a running average, you basically force the values to converge as you gather more and more data. This is what we used to do during my time at the IBM performance labs and it proved very useful. That's exactly what the -converge option does. It's spelt out in the=20 paper, but unfortunately very briefly. As the built-in help says Measurement methodology options -converge Allow benchmark times to converge before timing -max_iterations <n> Run a max of n iterations (default 20) -variance <pct> Target coefficient of variation (default 3.0) -window <n> Measure variance over n runs (default 3) So when '-converge' is present, the '-window' most recent iteration time= s are kept. At the end of the iteration, the mean , $\mu$ and the percent coefficient of variation, $\sigma / \mu \times 100$ are calculated. If this is below the value of -variance, then the next=20 iteration of the benchmark is timed. If -max_variations is exceeded, th= e benchmark harness decides that the run will never succeed sufficiently and declare the run failed. I think this is what you describe, just in slightly different terminology. The scheme we use was arrived at in consultation with Perr= y Cheng of IBM Watson. I have never had the time to investigate in detail the convergence=20 characteristics of the benchmarks, but it is definitely clear that some=20 are more deterministic than others. Many benchmarks exhibit the kind of=20 'exponential decay' towards a stable iteration period that you would=20 naively expect. The more multi-threaded of the benchmarks (hsqldb,=20 xalan, lusearch) could be expected to have a higher variance. Others=20 like jython seemed to be 'bi-stable', eventually alternating between two=20 separate but highly predictable times. The default value of 3.0 for -variance was chosen because the majority o= f benchmarks converged to within that in under 20 runs. Perhaps we need a thread scheduler that will record and replay a given=20 pattern of thread switches, giving us the kind of reproducible results=20 that 'replay compilation' does for JikesRVM's compiler ... I'm not sure. Hope this was the answer you wanted, cheers Robin > Cheers, > Eric > > -- > Eric Bodden > Sable Research Group > McGill University, Montr=E9al, Canada > > -----------------------------------------------------------------------= -- Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV _______________________________________________ > dacapobench-researchers mailing list > dac...@li... > https://lists.sourceforge.net/lists/listinfo/dacapobench-researchers --=20 Robin Garner Dept. of Computer Science Australian National University http://cs.anu.edu.au/people/Robin.Garner/ |