|
From: John R. <jr...@bi...> - 2012-08-13 15:19:06
|
On 08/13/2012 01:07 AM, Julian Seward wrote:
>
>> Of course, it is free software, so I guess I should fix it myself. But
>> that is not a trivial undertaking and I really do not have the time. So as
>> this problem hits people over and over and over, I encourage those
>> encountering it to add their voices to mine to see it fixed once,
>> permanently.
>
> I lose track of what is actually required. Is it to implement, for
> vector loads, the same thing that you did for scalar loads? That is,
> don't complain about naturally aligned word size loads that partially
> overlap the end of a block, and instead simply mark the part of the
> register corresponding to the area beyond the end of the block, as
> undefined?
What is a "vector load"? Perhaps this means "any load to %xmmK or %ymmK",
but it might be better to say, "a fetch of 8 or more bytes to %xmmK or %ymmK".
It is not enough to mark bytes that are "over-fetched" (beyond the end of
an allocated block) as Uninit, while being silent about the portion of the
access which is beyond the end of the block. Those uninit bits must be
propagated carefully during subsequent instructions, including "not equal".
In the case of st*cpy, a typical sequence is:
-----glibc-2.15/sysdeps/i386/i686/multiarch/strcpy-sse2.S
movdqa (%esi, %ecx), %xmm1
movaps 16(%esi, %ecx), %xmm2
movdqu %xmm1, (%edi, %ecx)
pcmpeqb %xmm2, %xmm0
pmovmskb %xmm0, %edx
add $16, %ecx
sub $48, %ebx
jbe L(CopyFrom1To16BytesCase2OrCase3)
test %edx, %edx
jnz L(CopyFrom1To16BytesUnalignedXmm2)
-----
Notice that any over-fetching into %xmm2 affects the byte-parallel pcmpeqb
and the sign-bit selector pmovmskb (128-to-16 bit reduction), which feeds
the "test; jnz". If the pcmpeqb-pmovmskb detects differences in bytes that
are Defined, then the test-jnz must succeed even though some high bits
of %edx are Uninit. memcheck must *not* say, "some bits are Uninit,
therefore the result is Uninit". Instead, memcheck must recognize that
"some *Defined* bits are different, therefore the result is NotZero."
--
|