From: Eliot M. <mo...@cs...> - 2012-02-05 23:01:40
|
On 2/5/2012 6:30 AM, Andreas Sewe wrote: > Even if the benchmark in question does not have a hard-coded timeout > hidden somewhere (like trade* and unfortunately also our own actors > benchmark), the overhead (time and/or space) caused by trace capture can > sometimes be so massive to make it infeasible for some of the benchmarks. > > This is a particular problem with some of the Scala benchmarks, which > exhibit extremely frequent method calls or allocations, which, if > traced, lead to trace files in the terabyte range (uncompressed). Right now I have some traces that compressed are in the 150-200Gb range. The compression factor that gzip gets on memory access traces from valgrind's lackey tool is impressive, too -- the typical record is 14 bytes long and gzip compresses it to around 6 bits, so these are traces with over a trillion references. So I certainly know what you mean. We're placing on order another 6 to 10 Tb of storage :-) ... > BTW, these problems are often not readily apparent from the > uninstrumented execution time; said benchmarks complete, in > wallclock-terms, just as fast as the others. Sure; I think these address traces are most likely reflective of wall clock time, modulo their cache locality, but Merlin-like traces from Matthew Hertz's Elephant Tracks tool are also sizeable while including only call/return, allocation/death, and heap-update records. > Currently, you have almost no choice but to use the "default" input > sizes, as "small" for many benchmarks doesn't do much real work, so any > results you report based on a trace of a "small" input (because using > "default" proved infeasible) looks a-priori suspicious. Yes, I prefer small mostly for testing that things work, but ... > However, whether the suspicion is warranted for depends very much on the > benchmark. For "fop", e.g., the "small" input is not only meaningful, > but also exercises quite different functionality. For other benchmarks, > "small" is just a scaled-down version of "default" and for others it > does little beyond benchmark setup. > > I thus think we need a more principled way of naming our input sizes; in > particular, it should be clear whether one input is just a scaled > version of another (different number of essentially the same > iteration/transactions). I agree. > Any suggestions? > > (And no, the Scala benchmarks don't use a such a naming scheme either; > they just use up more ad-hoc names like "tiny" and "gargantuan". ;-) Well, this is off the top of my head, but we could use size terms for "normally behaving" inputs, and "test" or something like that for minimal inputs intended mostly to see whether a benchmark starts up, etc., or will tend to fail. We can also have numbers or names within a size group, such as small/1, small/2, etc., or small/xyz. Here is a wondering about standardizing sizes, too. At present, they are relative, but within those provided for a benchmark. For some purposes I would find it more helpful if they related to the absolute running time. Of course this varies with platform in peculiar ways, and I don't have a great answer for that, except either to pick one JVMs running time to use for this size "binning", or perhaps some average of the time achieved by some set of k leading JVMs or of the top k times achieved over some set. Naturally this would have to be on some "standard" machine, etc. Tricky -- but then it doesn't have to be precise -- it's only to give a sense of what you're getting into if you're tracing or something. For those purposes we might want to thing in terms of long/short time words rather than "size" words. Just a thought. Anything like this quickly get tangly with dealing with benchmarks and measurement, eh? Regards, and happy terabytes to you -- Eliot |