[linuxsh-shmedia-dev] Operand caches

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello!
=20
This week I decided to start looking into the implementation of caching
on SHmedia Linux and in particular look at the problem of enabling the
operand cache in write back mode. Since then I have started a complete
rewrite of the caching implementation to firstly enable the operand
cache, in write back mode, and then to optimise range and page
flush/purging for both the I-cache and D-cache.=20
=20
To try to avoid problems with stability the implementation is planned in
two stages the first, now complete, is to enable the D-cache, in write
back mode, and to correctly implement flushing/purging of the entire
cache. When ranges are flushed/purged then the resulting operation is
done on the whole cache which is semantically correct but does not
provide best performance. Stage two of the work is to optimise each of
the flush/purge functions for both the I-cache and D-cache for the SH-5
platform. I have pushed the changes for stage 1 back into Bitkeeper and
plan to start work on optimising cache flushing/purging either this
weekend or early next week.
=20
There are two ways to flush the operand cache on SH-5:
=20
1.	OCBP. Find any cache set/way that matches construct a virtual
address in the line and issue an OCBP instruction for that address. Main
problem with this approach is that the address might not be in the TLB,
thus causing a page miss.
2.	 ALLOCO. Find any cache set where at least one way matches the
flush range, and issue 4 alloco instructions on different addresses that
hit that set. The main disadvantage of this approach is the eviction of
blocks outside the flush range that happen to be resident in the same
cache set, i.e., costs of pointless writebacks and later refills. A
further disadvantage to this approach is that it not possible to
optimise for the case when the cache line is dirty and so requires write
back but should be retained in the cache without requiring refill from
memory---caused by the fact that alloco writes zeros to the particular
way.
=20
The current implementation uses approach 2 as we want to avoid the case
when a page miss is raised to bypass the issue with making sure the
cache is coherent. This requires that a 32k region of memory, defined
below, must be allocated with non-paged kernel space that is not used
for anything else. This region must be at least 32 byte aligned to allow
index calculations to be preformed using modulo 256 integer arithmetic.
=20
The configuration options for the SHmedia kernel still allows the
I-cache and D-cache to be disabled and this work does not break that
option.
=20
Please try out the kernel with both the I-cache and D-caches enabled and
let me know how things go. Do not yet expect massive speed ups, as
described above there are still many optimisations that must be
implemented before the full benefits of the SH-5 caches can be utilized!

=20
Ben=20