|
From: Julian S. <js...@ac...> - 2012-02-23 08:18:27
|
> Another option is to make --partial-loads-ok a tri-state: "no", "yes",
> and "conservative". "no" and "yes" can mean the same as they do now,
> while "conservative" can mean the behavior I want (i.e., allow
> partially-accessible aligned loads but mark the inaccessible bytes as
> undefined). The advantage of the tri-state is that it would be 100%
> backward-compatible. The disadvantage is that it would add a
> maintenance burden, both now and going forward with 128-bit loads and
> beyond. Personally, I would just make "yes" behave the way I want,
I tend to agree, providing it does not cause serious problems when
tested. Another disadvantage of adding a third option is that we would
have to explain the semantics to users, and they are in general already
confused enough by the plethora of options available.
> Ultimately, I am looking for three things, in this order:
>
> 1) Eliminate false negatives for --partial-loads-ok=yes
> 2) Implement --partial-loads-ok=yes for SSE loads
> 3) Correctly propagate validity bits for PMOVMSKB
Sounds sane.
> It is turning out to be quite hard to eliminate Memcheck's false
> positives on my current code base (using the Intel compiler with full
> SSE4.2 optimization).
What specific problems are you having?
In my uncommitted work on this, I added a new IROp
Iop_GetMSBs8x8, /* I64 -> I8 */ with behaviour
+UInt h_generic_calc_GetMSBs8x8 ( ULong xx )
+{
+ UInt r = 0;
+ if (xx & (1ULL << (64-1))) r |= (1<<7);
+ if (xx & (1ULL << (56-1))) r |= (1<<6);
+ if (xx & (1ULL << (48-1))) r |= (1<<5);
+ if (xx & (1ULL << (40-1))) r |= (1<<4);
+ if (xx & (1ULL << (32-1))) r |= (1<<3);
+ if (xx & (1ULL << (24-1))) r |= (1<<2);
+ if (xx & (1ULL << (16-1))) r |= (1<<1);
+ if (xx & (1ULL << ( 8-1))) r |= (1<<0);
+ return r;
+}
and used that to implement pmovmskb, rather than using a helper
function in the front end. This allows Memcheck to instrument it
exactly, since the same operation exactly describes the V bit
propagation. Back end then generates calls to this function
and that handles both the real computation and Memcheck's
instrumentation of it. LMK if you want the diff.
J
|