|
From: Nicholas N. <nj...@cs...> - 2006-11-21 11:28:20
|
On Tue, 21 Nov 2006, Ulrich Drepper wrote: > Nicholas Nethercote wrote: >> I think a clearer thing to do in the two-set case is to compute two tag >> values, one for 'a' and one for 'a+size-1', just as is done for the sets. >> The first tag would be used when checking the first set, the second tag for >> the second set. I think the end effect is the same. > > Yes, but it's slower. And this in a hot path. My patch is as fast as you > can get it. This is not such a hot path, the set1 != set2 case is much less common than the set1 == set2 case. With my suggested approach you can compute the second tag only when needed, and as there's no extra test. >>> Which brings on the next step: now the cache_t2 structure consists of 8 >>> words and the char array. If you rearrange the struct to move the tags >>> pointer before the desc_line element all commonly used elements are in the >>> first 32 or 64 bytes (for 32 or 64 byte platforms respectively). If now >>> cache_t2 is aligned for this value there is only one cache line needed for >>> L2, I1, D1. >> >> Have you measured the effect of these changes? > > I don't have any numbers anymore. But it was a bit faster. It should be > obvious that this is the case. If you only need one cache line instead of > two especially L1d is used much more efficient. Cache optimisations are sufficiently subtle that I would consider very few of them obvious. Nick |