|
From: Ulrich D. <dr...@re...> - 2006-11-21 08:48:17
|
Nicholas Nethercote wrote: > I think a clearer thing to do in the two-set case is to compute two tag= =20 > values, one for 'a' and one for 'a+size-1', just as is done for the=20 > sets. The first tag would be used when checking the first set, the=20 > second tag for the second set. I think the end effect is the same. Yes, but it's slower. And this in a hot path. My patch is as fast as=20 you can get it. >> Which brings on the next step: now the cache_t2 structure consists of=20 >> 8 words and the char array. If you rearrange the struct to move the=20 >> tags pointer before the desc_line element all commonly used elements=20 >> are in the first 32 or 64 bytes (for 32 or 64 byte platforms=20 >> respectively). If now cache_t2 is aligned for this value there is=20 >> only one cache line needed for L2, I1, D1. >=20 > Have you measured the effect of these changes? I don't have any numbers anymore. But it was a bit faster. It should=20 be obvious that this is the case. If you only need one cache line=20 instead of two especially L1d is used much more efficient. --=20 =E2=9E=A7 Ulrich Drepper =E2=9E=A7 Red Hat, Inc. =E2=9E=A7 444 Castro St = =E2=9E=A7 Mountain View, CA =E2=9D=96 |