|
From: Gaster, B. <ben...@su...> - 2002-07-19 15:01:54
|
Hello! =20 This week I decided to start looking into the implementation of caching on SHmedia Linux and in particular look at the problem of enabling the operand cache in write back mode. Since then I have started a complete rewrite of the caching implementation to firstly enable the operand cache, in write back mode, and then to optimise range and page flush/purging for both the I-cache and D-cache.=20 =20 To try to avoid problems with stability the implementation is planned in two stages the first, now complete, is to enable the D-cache, in write back mode, and to correctly implement flushing/purging of the entire cache. When ranges are flushed/purged then the resulting operation is done on the whole cache which is semantically correct but does not provide best performance. Stage two of the work is to optimise each of the flush/purge functions for both the I-cache and D-cache for the SH-5 platform. I have pushed the changes for stage 1 back into Bitkeeper and plan to start work on optimising cache flushing/purging either this weekend or early next week. =20 There are two ways to flush the operand cache on SH-5: =20 1. OCBP. Find any cache set/way that matches construct a virtual address in the line and issue an OCBP instruction for that address. Main problem with this approach is that the address might not be in the TLB, thus causing a page miss. 2. ALLOCO. Find any cache set where at least one way matches the flush range, and issue 4 alloco instructions on different addresses that hit that set. The main disadvantage of this approach is the eviction of blocks outside the flush range that happen to be resident in the same cache set, i.e., costs of pointless writebacks and later refills. A further disadvantage to this approach is that it not possible to optimise for the case when the cache line is dirty and so requires write back but should be retained in the cache without requiring refill from memory---caused by the fact that alloco writes zeros to the particular way. =20 The current implementation uses approach 2 as we want to avoid the case when a page miss is raised to bypass the issue with making sure the cache is coherent. This requires that a 32k region of memory, defined below, must be allocated with non-paged kernel space that is not used for anything else. This region must be at least 32 byte aligned to allow index calculations to be preformed using modulo 256 integer arithmetic. =20 The configuration options for the SHmedia kernel still allows the I-cache and D-cache to be disabled and this work does not break that option. =20 Please try out the kernel with both the I-cache and D-caches enabled and let me know how things go. Do not yet expect massive speed ups, as described above there are still many optimisations that must be implemented before the full benefits of the SH-5 caches can be utilized! =20 Ben=20 |