|
From: Josef W. <Jos...@gm...> - 2011-11-24 11:29:07
|
On 24.11.2011 10:12, Julian Seward wrote: > On Wednesday, November 23, 2011, Josef Weidendorfer wrote: > >> My goal here actually was to make the common case for instruction >> fetches (hit the MRU tag in I1) as fast as possible. One remaining obstacle >> is incrementing the access counter. If we can avoid that, we directly >> could instrument the MRU hit check for Ir. > Sounds good. /me is not claiming to understand all the details. One thing; > you know you can do conditional dirty helper calls, yes? Yes. I am just not sure yet how to mix that with event merging. It could be that instrumenting the MRU hit check is not worth it. > >> Is there a possibility to pass more than 3 parameters to a C call? > Mmh, yes. Why do you think it is limited to 3 params? Ah, good. Probably I had this impression because I never saw a dirty helper call with more parameters. > FWIW I think > all the backends can handle at least 4 word-sized parameters; maybe > more in some some cases (of course, that does not help you since you're > limited here to what the least capable backend can do.) > >> Hmm... Valgrind has this nice code generator, but we "only" use it for >> instrumentation. It would be really cool to use VEX to generate the inner >> most cache simulation routine for given cache parameters (esp. unroll >> that loop for the fixed associativity), and call that from the C callback. >> Do you see a way to accomplish that? > I'm sure it's doable, but it's not a half-a-day kind of hack. It > would require some messing with infrastructure. I'd need to think > about it. > > Can you get anywhere by using the C preprocessor to generate multiple > partially specialised copies of the cache simulation and adding calls > just to the relevant versions (specialised by associativity, whatever, > etc?) Could work. Hmm.. as I don't want a switch statement for these special cases in every simulator call, but already want this worked out at instrumentation time, this results in a lot of different dirty helpers. I need to play with that. The benefit of the generated code is that I always can call the generated partial simulation, as only one cache parameter set is needed at a time, without duplicating the helper. Anyway, it should be easy to just make a special case for my Core i5 laptop, and see if there is any benefit at all. Thanks! Josef |