|
From: Josef W. <Jos...@gm...> - 2011-11-23 16:18:45
|
On 23.11.2011 08:39, Julian Seward wrote: > On Tuesday, November 22, 2011, Philippe Waroquiers wrote: >>> Together (with the macro removal patch), this gives me >> I obtain the below improvements on ppc64. [...] > Looks good to me. My goal here actually was to make the common case for instruction fetches (hit the MRU tag in I1) as fast as possible. One remaining obstacle is incrementing the access counter. If we can avoid that, we directly could instrument the MRU hit check for Ir. Is there a possibility to pass more than 3 parameters to a C call? Perhaps via shadow registers? Background: I really would like to be able to pass the memory block number for the Ir access not crossing cache line boundaries directly as parameter. We can calculate that at instrumentation time, so no need to do it in the simulator again and again from cache parameters, which actually are constant. Hmm... Valgrind has this nice code generator, but we "only" use it for instrumentation. It would be really cool to use VEX to generate the inner most cache simulation routine for given cache parameters (esp. unroll that loop for the fixed associativity), and call that from the C callback. Do you see a way to accomplish that? > FWIW, yes, I have also had much fun and games :-) > trying to get repeatable performance numbers on top end CPUs. I found > two things; firstly that even small numbers of other tasks running > (daemons, etc) generate a surprisingly large amount of measurement > noise, so a dedicated test machine is really worth having [and, also, > measurement in a VM is hopeless]. Secondly I found that the most > consistent numbers come from the least microarchitecturally complex > CPUs. So .. I get the most reliable numbers from my ARM Cortex-A8 > beagleboard. Interesting! Josef |