|
From: Lok K. Y. <lo...@sy...> - 2012-10-20 00:35:36
|
> LY> Can you please help me understand the problem a bit more by
> LY> answering the following questions?
>
> LY> 1. Where is addition being used in CTZ - and especially
> LY> expensiveCTZ. From the code, I only see subtraction being used.
>
I> think we may be looking at different code. The addition I was
> referring to is in the commented-out verbose_Clz32 at the bottom of
> guest_ppc_toIR.c. It uses addition to implement an efficient parallel
> add for counting the number of 1 bits in a word, as in the following C
> code in the comment:
Thank you for the clarification. It makes more sense now.
I don't think the additions should affect the precision of the resulting VBits. The VBits calculation do not have to be calculated exactly as bitcount is calculated. In the case of expensiveCTZ addition is no longer used. Or in the case of the BSF(va) <= BSF(a), only the output of bitcount is used and not the output VBits of the operations used to emulate bitcount.
>I believe you're right that ANDing the PCast with 31 would be sound
>and improve precision; my guess would be that the current
>implementation of a plain PCast was chosen simply for ease of
>implementation under the assumption that Ctz would not be commonly
>used on partially-defined values.
>I think an approach like that could work pretty well too. It captures
>the same defined-result common case that I agree is important, and it
>uses fewer additional operations (though still two Ctzs) compared to
>the sort of thing I was thinking of.
Thanks for the clarification again. I guess I forgot about the need for good performance.
Enjoy,
Lok
________________________________________
From: Stephen McCamant [sm...@cs...] on behalf of Stephen McCamant [sm...@al...]
Sent: Friday, October 19, 2012 4:32 PM
To: Lok Kwong Yan
Cc: val...@li...
Subject: Re: [Valgrind-developers] bsfl and pmovmskb validity bit propagation
>>>>> "LY" == Lok Kwong Yan <lo...@sy...> writes:
LY> Can you please help me understand the problem a bit more by
LY> answering the following questions?
LY> 1. Where is addition being used in CTZ - and especially
LY> expensiveCTZ. From the code, I only see subtraction being used.
I think we may be looking at different code. The addition I was
referring to is in the commented-out verbose_Clz32 at the bottom of
guest_ppc_toIR.c. It uses addition to implement an efficient parallel
add for counting the number of 1 bits in a word, as in the following C
code in the comment:
// /* unsigned int bitcount ( unsigned int n )
// {
// n = n - ((n >> 1) & 0x55555555);
// n = (n & 0x33333333) + ((n >> 2) & 0x33333333);
// n = (n + (n >> 4)) & 0x0F0F0F0F;
// n = n + (n >> 8);
// n = (n + (n >> 16)) & 0x3F;
// return n;
// }
// */
For instance there's a "binop(Iop_Add32" on line 17863 of the current
SVN revision r2537.
LY> 2. Why is PCast used for CTZ (and subsequently for BSF) if all of
LY> the upper bits are 0 as long as the operand is non-zero? That
LY> is, why not and the PCast with 0x1E since 31 is the maximum
LY> output value.
I believe you're right that ANDing the PCast with 31 would be sound
and improve precision; my guess would be that the current
implementation of a plain PCast was chosen simply for ease of
implementation under the assumption that Ctz would not be commonly
used on partially-defined values.
LY> 3. In the case of BSF, what about PCast( BSF(va) <= BSF(a) )?
I think an approach like that could work pretty well too. It captures
the same defined-result common case that I agree is important, and it
uses fewer additional operations (though still two Ctzs) compared to
the sort of thing I was thinking of.
-- Stephen
|