|
From: Josef W. <Jos...@gm...> - 2012-11-07 14:15:47
|
Am 24.09.2012 21:11, schrieb Carl E. Love: > Good to know that the power of 2 issue is not just a POWER problem. Not sure if you are working on that: Before you come up with changing associativity for specific cases, I think it makes sense to relax from this "power of 2 issue" just for the LL simulation. In an old experiment, I replaced the bit masking with a modulo operation to find the cache set which needs to be checked for a hit. If you do that for all levels simulated (L1I, L1D, LL), cachegrind can easily slow down by a factor of 2. But I recently checked the behavior if you do that only for LL. I could not see much slowdown. The modulo operation needs to be done only if the access misses the L1, and that seems to be enough work to do, such that the modulo operation for LL simulation is not really that relevant any more. I'll try to come up with patch & measurements. Josef |