|
From: Josef W. <Jos...@gm...> - 2013-01-09 18:05:31
|
Am 09.01.2013 00:05, schrieb John Reiser:
>> With <block> being the address shifted right by log2(line_size),
>> instead of
>>
>> set = block % number_of_sets
>>
>> the table lookup looks like
>>
>> set = (block & lower_X_bits) | (table[(block >> X) & lower_K_bits]<<X)
>
> Whatever the mapping is: compute it (or an approximation) once during
> initialization, store the result into a linear table of length 2**K,
> and thereafter perform table lookup using the bottom K bits of 'set'.
Do you mean 'block' here, with the definition from above ('set' is the
result) ? Can you be more precise about the expression you would use?
> In general there will be an approximation error. Choose how much
> error against the access cost (including caching) to the table.
>
> The cache itself does not perform modulo. Instead the cache adjusts
> the associativity.
Usually, yes. With the result that the number of sets is a power of 2.
Cachegrind/Callgrind can handle this quite fine, e.g. Intel Atom
has 24kB L1D, with associativity 6. No problem.
But if the processor tells me that L3 has an associativity which makes
the number of sets being not a power of two?
Would you say that if CPUID tells me an associativity of 32, but
the CPU at hand has e.g. only 3 cores and thus L3 distributed on the
3 tiles of the chip, the associativity of 32 must be a lie?
If the cache has 96K lines, then (except for fully-
> associative) that is almost certainly 3x 32K lines (or 6x 16K lines,
> or 64K+32K lines, etc.),
> and the associativity is something like 3 times the associativity
> of each 32K-line piece. "Something like" because the hierarchy
> might not be fully distributive;
This would mean that different sets can have a different number of ways
each, wouldn't it?
As we do not know the hardware in the case anyway, I think 'modulo' is
the simplest thing to do, to approximate any possible real hardware.
Josef
and if so then only an
> exact simulation of the hardware will be error-free,
> and even the hardware designer may have a hard time telling you
> exactly what the cache does: only the VHDL/Verilog/etc. knows for sure.
|