|
From: Daniel V. <vei...@re...> - 2003-08-14 01:12:20
|
Hi, It doesn't seems to be an FAQ but I would think it's a common problem. What are the "best" processors to run Valgrind ? "best" in how do I compare "usual" valgrind speed w.r.t. CPU architecture and clock speed. I'm usually fine with a relatively slow processor for development but Valgrind is changing this. I don't plan to invest too much money, and I see the choice between Athlon who may have better cache, branch prediction and memory architecture, versus higher speed clock but relatively dumb Celeron. My guts feeling is that for valgrind execution speed is mostly linear with clock speed, though the branch prediction and cache may have a significant effect too. So basically is there something else I forgot, and is there existing Valgrind execution benchmarks for a given "test program" ? I'm mostly interested in performances of the simple run of valgrind, cachegrind kind of use is less common. Just wondering, I'm probably not the first one who had that question, Daniel -- Daniel Veillard | Red Hat Network https://rhn.redhat.com/ vei...@re... | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ |
|
From: Nicholas N. <nj...@ca...> - 2003-08-14 07:59:14
|
On Wed, 13 Aug 2003, Daniel Veillard wrote: > It doesn't seems to be an FAQ but I would think it's a common problem. > What are the "best" processors to run Valgrind ? "best" in how do I compare > "usual" valgrind speed w.r.t. CPU architecture and clock speed. > I'm usually fine with a relatively slow processor for development but > Valgrind is changing this. I don't plan to invest too much money, > and I see the choice between Athlon who may have better cache, branch > prediction and memory architecture, versus higher speed clock but relatively > dumb Celeron. > My guts feeling is that for valgrind execution speed is mostly linear > with clock speed, though the branch prediction and cache may have a significant > effect too. So basically is there something else I forgot, and is there > existing Valgrind execution benchmarks for a given "test program" ? > I'm mostly interested in performances of the simple run of valgrind, > cachegrind kind of use is less common. See www.cl.cam.ac.uk/~njn25/pubs/valgrind2003.ps.gz, p16 for results on the SPEC benchmarks on an Athlon. That's the only "official" measurements we have, AFAIK. In general, who can say? Processors are so complex these days, I wouldn't have a clue how Valgrind may differ between eg. P3 vs P4. But you may be right, I could believe that Valgrind doesn't interact well with cache and branch prediction. The only way to get a definitive answer is to get actual measurements. I have a script that can be used for benchmarking the SPEC2000 suite, for those of you who have access to it. If anyone is interested, it's at www.cl.cam.ac.uk/~njn25/valgrind/myrun. Instructions for using it are at the top. > Just wondering, I'm probably not the first one who had that question, AFAIK, no-one else has asked this before. N |
|
From: Sebastian K. <Seb...@so...> - 2003-08-14 10:49:59
|
> I don't plan to invest too much money, > and I see the choice between Athlon who may have better cache, branch > prediction and memory architecture, versus higher speed clock but relatively > dumb Celeron. Well. General rule of thumb is that Athlon below model 2500+ compares more or less to P4 with the clock as high as Athlon's model number, and Athlons 2800+ and above are comparable to P4 with MHz == Athlon's model number - 200. And as Celeron has typically performance of P4 200-600 MHz slower (depending on task run) Athlon should be faster for most but a few workloads (those few are some inner loops of raytracing or media encoding/decoding -- which are hand optimised with vector ISA extensions (SSE & SSE2)). > My guts feeling is that for valgrind execution speed is mostly linear > with clock speed, though the branch prediction and cache may have a significant > effect too. But don't forget that on average Athlon has significantly greater instruction throughput. Also Valgrind severely increases cache pressure -- both instruction and data cache. And instrumentation code is allways integer code, so it diminishes the advantages Celeron (& P4) have in it's higher clocked vector unit. > So basically is there something else I forgot, and is there > existing Valgrind execution benchmarks for a given "test program" ? > I'm mostly interested in performances of the simple run of valgrind, > cachegrind kind of use is less common. Hmm, The best way is to benchmark it, by me suspiction is that smaller caches of Celeron with it's sensitivity to self modifing code would make it slower option rgds Sebastian |