|
From: Sebastian K. <Seb...@so...> - 2003-08-14 10:49:59
|
> I don't plan to invest too much money, > and I see the choice between Athlon who may have better cache, branch > prediction and memory architecture, versus higher speed clock but relatively > dumb Celeron. Well. General rule of thumb is that Athlon below model 2500+ compares more or less to P4 with the clock as high as Athlon's model number, and Athlons 2800+ and above are comparable to P4 with MHz == Athlon's model number - 200. And as Celeron has typically performance of P4 200-600 MHz slower (depending on task run) Athlon should be faster for most but a few workloads (those few are some inner loops of raytracing or media encoding/decoding -- which are hand optimised with vector ISA extensions (SSE & SSE2)). > My guts feeling is that for valgrind execution speed is mostly linear > with clock speed, though the branch prediction and cache may have a significant > effect too. But don't forget that on average Athlon has significantly greater instruction throughput. Also Valgrind severely increases cache pressure -- both instruction and data cache. And instrumentation code is allways integer code, so it diminishes the advantages Celeron (& P4) have in it's higher clocked vector unit. > So basically is there something else I forgot, and is there > existing Valgrind execution benchmarks for a given "test program" ? > I'm mostly interested in performances of the simple run of valgrind, > cachegrind kind of use is less common. Hmm, The best way is to benchmark it, by me suspiction is that smaller caches of Celeron with it's sensitivity to self modifing code would make it slower option rgds Sebastian |