|
From: Nicholas N. <nj...@cs...> - 2008-06-06 00:32:15
|
On Thu, 5 Jun 2008, Josef Weidendorfer wrote: > recently I noted that for the cache simulation of Cachegrind (and thus, also Callgrind) > there are situations where a line found in L1 will be evicted from L2 only, thus violating > the inclusion property. If you can support this (see example below), we either should change > the implementation (probably involving a slowdown), or fix the manual which states that this > is a inclusive cache simulation. > > For simplicity, suppose we have a 2-way associativity both in L1 and L2, and we have 3 > memory blocks of cache line size with addresses a1, a2, a3. Suppose that these addresses > map into the same set in L1 and L2. > > (1) Access to a1, then to a2. > Afterwards, the LRU access history list of the sets is (a2, a1). > (2) Access to a1. > This time, we get a L1 hit, and return directly from simulation. > The LRU list of L1 now is (a1, a2), and the one of L2 is still (a2, a1). > (3) Access to a3. > This is a L1 miss, evicting cache line for a2 from L1, and also a L2 miss, > evicting the cache line for a1 in L2. > > Now, the cache line for a1 is evicted from L2, but still in L1. > > For correct simulation, we should forward the L1 hit in (2) also to L2, such that > the LRU list of L2 can be updated and matches the one of the L1 afterwards. Thus, > the LRU list after (3) would have been (a3, a1) both for L1/L2. Ie. a2 would have > been evicted both from L1 and L2. > > This also changes the event numbers in contrast to an inclusive cache: a further > access to a2 will give a L2 hit in the current simulation, but would have been > a L2 miss in the correct inclusive simulation. > > Probably the above situation is very rare, but I am sure that one could construct a > similar example even with higher associativity in L2 than L1, leading to different > event numbers between Cachegrinds simulation and a real inclusive cache. > > Do I miss something here? In short: when an L1 hit occurs, the L1 MRU list is updated, but the L2 MRU list is not, right? Hmm, I think your analysis is correct. I hope anyone that is using Cachegrind for serious analysis is plugging their own cache simulator into it. Making the simulation correct so that it is properly inclusive seems like a good thing to do. Would you be able to work up a patch that does this, and see what the performance effect is for Cachegrind? Nick |