|
From: Julian S. <js...@ac...> - 2012-10-12 00:40:33
|
We need to be able to represent conditional loads and stores in IR. The initial motivating case is AVX's VMASKMOVP[SD], which is not implemented due to lack of them, and apparently AVX2 has others that will need them. They would also be very useful for ARM, where conditional loads and stores produce IR which is both slow and confuses Memcheck into false positives, from time to time. Being able to express them directly in IR would give much shorter and more straightforward translations. Also s390 might benefit from them, I think. This seems quite simple to me. Currently we have stores as IRStmts and loads as IRExprs. For conditional stores, we could either add a guard expression field to IRStmt_Store. Or, add a new variant, IRStmt_StoreC, identical to IRStmt_Store but with an added guard field. This would have the advantage of not having to have the vast majority of (normal) stores carry an "always-true" guard field. For conditional loads, basically what we need is a normal load with two extra fields: a guard field, and a "backup" value which is the returned value if the guard is false. At first I thought of having conditional loads as just another IRExpr member. But that gives potential complexity with register allocation if we allow arbitrary expressions for the "backup" value, since if the guard is false then we won't want any of the expression to be evaluated. But that requires doing proper control flow within superblocks in the back ends, which they, and particularly the register allocator, are not set up to do. So an alternative, more conservative proposal, is to have conditional loads be a statement kind: result_temp = Conditional_Load( guard_expr, address_expr, backup_temp ) and the trick is that the backup value is an IRTemp, not an IRExpr, which unambiguously forces us to evaluate it regardless of what 'guard_expr' evaluates to. Hence it sidesteps the regalloc problems, because a Conditional_Load can be compiled into <compute address_expr into rAddr> <move backup_temp to rDest> <compute guard_expr_into condition codes> CLOAD rDest, (rAddr) where CMOV is either a conditional load instruction (on arm) or a pseudo-insn comprising a conditional jump over an unconditional load (on all other targets). But in either case posing no problem for register allocation. Hence in summary, the proposal is to add two new IRStmt kinds: StoreC ( guard_expr, address_expr, data_expr ) result_temp = LoadC ( guard_expr, address_expr, backup_temp ) I think this should get us almost all the benefits of conditional loads/stores with only minor modification of the IR optimiser, instruction selectors, and the register allocator. J |
|
From: Eliot M. <mo...@cs...> - 2012-10-12 00:45:35
|
On 10/11/2012 8:34 PM, Julian Seward wrote: > We need to be able to represent conditional loads and stores in IR. I would think this is useful on the x86 / AMD as well. Best -- Eliot Moss |
|
From: Florian K. <br...@ac...> - 2012-10-12 02:40:54
|
On 10/11/2012 08:34 PM, Julian Seward wrote: > > We need to be able to represent conditional loads and stores in IR. > [...] > Also s390 might benefit from them, I think. Yes. It has both conditional load and store insns as well as some other more obscure insns that could potentially benfit from this by having a simplified IR generation. > For conditional stores, we could either add a guard expression field > to IRStmt_Store. Or, add a new variant, IRStmt_StoreC, identical to > IRStmt_Store but with an added guard field. This would have the > advantage of not having to have the vast majority of (normal) stores > carry an "always-true" guard field. +1 for IRStmt_StoreC. Having a separate statement kind makes sure it gets implemented by the tools, whereas an additional guard expression is too easily overlooked and then isn't handled properly. Remember the guard expression on the dirty helper? > At first I thought of having conditional loads as just another IRExpr > member. But that gives potential complexity with register allocation > if we allow arbitrary expressions for the "backup" value, since if the > guard is false then we won't want any of the expression to be > evaluated. Huh? I think you want to evaluate the backup expression if the guard is false. No? > But that requires doing proper control flow within > superblocks in the back ends, which they, and particularly the > register allocator, are not set up to do. > > So an alternative, more conservative proposal, is to have conditional > loads be a statement kind: > > result_temp = Conditional_Load( guard_expr, address_expr, backup_temp ) > > and the trick is that the backup value is an IRTemp, not an IRExpr, > which unambiguously forces us to evaluate it regardless of what > 'guard_expr' evaluates to. Just to be clear: in a Conditional_Load stmt both the address_expr and the backup_temp get evaluated independent of the value of the guard expression. Right? You probably need to stick in the endianess into the Conditional_Load. While we're at it... Can I have a Ist_PutC -- a conditional Put ? PutC ( guard_expr, offset, data_temp) If guard_expr is true, then data_temp will be stored at offset in guest state. That would help s390. Florian |
|
From: Julian S. <js...@ac...> - 2012-10-12 08:42:12
|
On Friday, October 12, 2012, Florian Krohm wrote: > Remember the guard expression on the dirty helper? Yes (sigh). > > At first I thought of having conditional loads as just another IRExpr > > member. But that gives potential complexity with register allocation > > if we allow arbitrary expressions for the "backup" value, since if the > > guard is false then we won't want any of the expression to be > > evaluated. > > Huh? I think you want to evaluate the backup expression if the guard is > false. No? My mistake (I wrote that at 2.30am). I meant "if the guard is true". > Just to be clear: in a Conditional_Load stmt both the address_expr and > the backup_temp get evaluated independent of the value of the guard > expression. Right? Yes, that's true. Since we don't have proper if-then-else control flow in the back ends, everything has to be always evaluated. If we did have proper if-then-else then we wouldn't need a separate conditional load stmt since we could make Mux0X lazy (like C's ?-: triop) and use that. But doing that would require major rework of the regalloc and code generation framework, and I'm not up for that much hassle right now. > You probably need to stick in the endianess into the Conditional_Load. Yes. > While we're at it... Can I have a Ist_PutC -- a conditional Put ? > > PutC ( guard_expr, offset, data_temp) > > If guard_expr is true, then data_temp will be stored at offset in guest > state. That would help s390. Hmm, I'd prefer not, unless you absolutely need it. What do you need it for? To represent conditional loads into a register I had planned to do tAddr = ... tGuard = ... tBackup = GET(offs) tRes = LoadC( tGuard, tAddr, tBackup ) PUT(offs) = tRes which probably isn't as bad as it seems if you assume that in-context (in the middle of a sequence of insns, one of which loads the target register before this, and one of which hopefully uses and then overwrites it later), the IR optimiser will remove both the GET (PUT-GET forwarding) and the PUT (redundant PUT removal). To some extent it's a question of "hack it up and then see if IRopt turns it into something reasonable, in common cases". J |
|
From: John R. <jr...@bi...> - 2012-10-12 03:03:29
|
Julian Seward wrote: > > We need to be able to represent conditional loads and stores in IR. Just to remove any possible confusion, some architectures such as MIPS implement LoadLocked (LL) and StoreConditional (SC) as instructions for synthesizing mutual exclusion; and the proposed "conditional store in IR" has nothing to do with LL/SC. -- |
|
From: Julian S. <js...@ac...> - 2012-10-12 08:48:05
|
On Friday, October 12, 2012, John Reiser wrote: > Just to remove any possible confusion, some architectures such as MIPS > implement LoadLocked (LL) and StoreConditional (SC) as instructions > for synthesizing mutual exclusion; and the proposed "conditional store > in IR" has nothing to do with LL/SC. Yes, IR already represents LL/SC directly, else multithread anything wouldn't work correctly on ppc32/64, arm or mips32. I did wonder about the similarity in names when writing the proposal. Maybe it would be better to use the qualified "guarded" rather than "conditional" for the current proposal. J |
|
From: Julian S. <js...@ac...> - 2012-10-12 10:36:11
|
On Friday, October 12, 2012, Jakub Jelinek wrote: > I'd vote for IRStmt_StoreC. Yeah, me too. > Conditional loads, being IRExprs, can be implemented already using the > current primitives (see the #305728 bug, two of the patches implements it > there for AVX/AVX2), by doing a Iex_Mux0X of a load and the value that is > set if condition is false, and doing another Iex_Mux0X inside of the > load's address (the hack I used was to load from stack pointer if the > condition was false). That is an ingenious hack. Never occurred to me to do that. > But surely, having a new expression type for that is cleaner Yes. It should give much better code too. J |
|
From: Florian K. <br...@ac...> - 2012-10-12 13:11:29
|
On 10/12/2012 04:36 AM, Julian Seward wrote: > On Friday, October 12, 2012, Florian Krohm wrote: > >> While we're at it... Can I have a Ist_PutC -- a conditional Put ? >> >> PutC ( guard_expr, offset, data_temp) >> >> If guard_expr is true, then data_temp will be stored at offset in guest >> state. That would help s390. > > Hmm, I'd prefer not, unless you absolutely need it. What do you need > it for? I should have been more clear.. I could need it to implement a conditional assignment to a register with these semantics: r4 = guard_expr ? new_value : r4; Sure, I can (and do) implement that with today's machinery but with a PutC I can have a more efficient implementation. Florian |
|
From: Julian S. <js...@ac...> - 2012-11-05 15:33:49
|
I've been working on an implementation of this on and off (mostly off)
for a a while, using the ARM front end as a driver since it is a convenient
source of conditional loads/stores. It works well enough now to run
Firefox on ARM (--tool=none). Mostly this involved iterating the
representation to something usable and then fixing up the IR pipeline
(including ir_opt) to handle it. The resulting definitions are below.
Next step is to fix up all the tools to handle the changes. Might not
be too easy, since many tools look at loads and stores but assume they
are unconditional.
There are two new IRStmt_ kinds now, IRStmt_StoreG and IRStmt_LoadG.
("store guarded", "load guarded"). Since both require quite a few fields
these are just pointers to auxiliary structures as shown below, in the
style of the existing IRDirty, etc.
J
/* --------------- Guarded loads and stores --------------- */
/* Conditional stores are straightforward. They are the same as
normal stores, with an extra 'guard' field :: Ity_I1 that
determines whether or not the store actually happens. If not,
memory is unmodified.
The semantics of this is that 'addr' and 'data' are fully evaluated
even in the case where 'guard' evaluates to zero (false).
*/
typedef
struct {
IREndness end; /* Endianness of the store */
IRExpr* addr; /* store address */
IRExpr* data; /* value to write */
IRExpr* guard; /* Guarding value */
}
IRStoreG;
/* Conditional loads are a little more complex. 'addr' is the
address, 'guard' is the guarding condition. If the load takes
place, the loaded value is placed in 'dst'. If it does not take
place, 'alt' is copied to 'dst'. However, the loaded value is not
placed directly in 'dst' -- it is first subjected to the conversion
specified by 'cvt'.
For example, imagine doing a conditional 8-bit load, in which the
loaded value is zero extended to 32 bits. Hence:
* 'dst' and 'alt' must have type I32
* 'cvt' must be a unary op which converts I8 to I32. In this
example, it would be ILGop_8Uto32.
There is no explicit indication of the type at which the load is
done, since that is inferrable from the arg type of 'cvt'. Note
that the types of 'alt' and 'dst' and the result type of 'cvt' must
all be the same.
Semantically, 'addr' is evaluated even in the case where 'guard'
evaluates to zero (false), and 'alt' is evaluated even when 'guard'
evaluates to one (true). That is, 'addr' and 'alt' are always
evaluated.
*/
typedef
enum {
ILGop_INVALID=0x1D00,
ILGop_Ident32, /* 32 bit, no conversion */
ILGop_16Uto32, /* 16 bit load, Z-widen to 32 */
ILGop_16Sto32, /* 16 bit load, S-widen to 32 */
ILGop_8Uto32, /* 8 bit load, Z-widen to 32 */
ILGop_8Sto32 /* 8 bit load, S-widen to 32 */
}
IRLoadGOp;
typedef
struct {
IREndness end; /* Endianness of the load */
IRLoadGOp cvt; /* Conversion to apply to the loaded value */
IRTemp dst; /* Destination (LHS) of assignment */
IRExpr* addr; /* Address being loaded from */
IRExpr* alt; /* Value if load is not done. */
IRExpr* guard; /* Guarding value */
}
IRLoadG;
extern void ppIRStoreG ( IRStoreG* sg );
extern void ppIRLoadGOp ( IRLoadGOp cvt );
extern void ppIRLoadG ( IRLoadG* lg );
extern IRStoreG* mkIRStoreG ( IREndness end,
IRExpr* addr, IRExpr* data,
IRExpr* guard );
extern IRLoadG* mkIRLoadG ( IREndness end, IRLoadGOp cvt,
IRTemp dst, IRExpr* addr, IRExpr* alt,
IRExpr* guard );
On Friday, October 12, 2012, Florian Krohm wrote:
> On 10/12/2012 04:36 AM, Julian Seward wrote:
> > On Friday, October 12, 2012, Florian Krohm wrote:
> >> While we're at it... Can I have a Ist_PutC -- a conditional Put ?
> >>
> >> PutC ( guard_expr, offset, data_temp)
> >>
> >> If guard_expr is true, then data_temp will be stored at offset in guest
> >> state. That would help s390.
> >
> > Hmm, I'd prefer not, unless you absolutely need it. What do you need
> > it for?
>
> I should have been more clear.. I could need it to implement a
> conditional assignment to a register with these semantics:
>
> r4 = guard_expr ? new_value : r4;
>
> Sure, I can (and do) implement that with today's machinery but with a
> PutC I can have a more efficient implementation.
>
> Florian
>
> ---------------------------------------------------------------------------
> --- Don't let slow site performance ruin your business. Deploy New Relic
> APM Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Valgrind-developers mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-developers
|
|
From: Florian K. <br...@ac...> - 2012-11-07 03:16:48
|
On 11/05/2012 10:31 AM, Julian Seward wrote:
>
> Next step is to fix up all the tools to handle the changes. Might not
> be too easy, since many tools look at loads and stores but assume they
> are unconditional.
Right. That reminds me that the guard on the dirty helper is also only
recognised in memcheck...
>
> /* Conditional loads are a little more complex. 'addr' is the
> address, 'guard' is the guarding condition. If the load takes
> place, the loaded value is placed in 'dst'. If it does not take
> place, 'alt' is copied to 'dst'. However, the loaded value is not
> placed directly in 'dst' -- it is first subjected to the conversion
> specified by 'cvt'.
>
> For example, imagine doing a conditional 8-bit load, in which the
> loaded value is zero extended to 32 bits.
You could do the zero extension after the load, couldn't you? In which
case you wouldn't need the IRLoadGOp.
But representing it the way you propose gives you an advantage in insn
selection. Is that right? Or is there another reason? Just curious.
Florian
> typedef
> enum {
> ILGop_INVALID=0x1D00,
> ILGop_Ident32, /* 32 bit, no conversion */
> ILGop_16Uto32, /* 16 bit load, Z-widen to 32 */
> ILGop_16Sto32, /* 16 bit load, S-widen to 32 */
> ILGop_8Uto32, /* 8 bit load, Z-widen to 32 */
> ILGop_8Sto32 /* 8 bit load, S-widen to 32 */
> }
> IRLoadGOp;
>
> typedef
> struct {
> IREndness end; /* Endianness of the load */
> IRLoadGOp cvt; /* Conversion to apply to the loaded value */
> IRTemp dst; /* Destination (LHS) of assignment */
> IRExpr* addr; /* Address being loaded from */
> IRExpr* alt; /* Value if load is not done. */
> IRExpr* guard; /* Guarding value */
> }
> IRLoadG;
>
|
|
From: Julian S. <js...@ac...> - 2012-11-07 12:17:04
|
On Wednesday, November 07, 2012, Florian Krohm wrote: > You could do the zero extension after the load, couldn't you? In which > case you wouldn't need the IRLoadGOp. > But representing it the way you propose gives you an advantage in insn > selection. Is that right? Or is there another reason? Just curious. The IRLoadGOp was a late addition to the plan, when I came to try it out for real on ARM. What ARM has is (eg) a conditional 8 bit load into a 32 bit register, with zero extension. One possibility would be to do a conditional 8 bit load with an 8-bit fallback value (if the load doesn't happen), but then there would be complicated IR logic to either zero extend the loaded value to 32 bits or merge the top 24 bits of the fallback value back on to the lower 8 bits, so the IR-level register remains unchanged. In the end it seemed simplest to simply combine the load and conversion. I could have used an IROp to indicate the conversion instead of using a new enum, but that seems unclean -- obviously almost all of the IROps one could put it are nonsensical -- and it would also require extending IROp with a 32-bit identity operation. So I did a new enum. Overall the new scheme is quite nice and allows isel (as you observe) to simply re-emit the conditional load directly (on ARM). > > On 11/05/2012 10:31 AM, Julian Seward wrote: > > Next step is to fix up all the tools to handle the changes. Might not > > be too easy, since many tools look at loads and stores but assume they > > are unconditional. > > Right. That reminds me that the guard on the dirty helper is also only > recognised in memcheck... One other thing I have been thinking about for a while, although haven't done anything about, is to relax the type rules for condition expressions (guards) in IRLoadG, IRStoreG, IRMux0X, IRDirty, IRExit. Currently they are require, respectively, I1, I1, I8, I1, I1. This is inconsistent, and worse, it leads to chains of pointless IR conversions, like 32to1(1Sto32(32to1(1Uto32(CmpNE32(x, 0)))) which are surprisingly common, especially in arm-derived IR. What I am contemplating is to change all of them to require a value of an integral type (I1, I8, I16, I32 or I64), and test zero/nonzero, a la C style conditionals. Then all the chains of conversions would disappear. It would require a bit more work in the insn selectors since they'd need to check the type of the expression before generating code for it, but that is no big deal. In an ideal world we could also swap the order of '0' and 'X' clauses in Mux0X, so it reads more like C's ?-: operator. But that's probably too big a change given that the reason is only one of aesthetics. J |
|
From: Florian K. <br...@ac...> - 2012-11-08 04:03:11
|
On 11/07/2012 07:14 AM, Julian Seward wrote: > > On Wednesday, November 07, 2012, Florian Krohm wrote: >> You could do the zero extension after the load, couldn't you? In which >> case you wouldn't need the IRLoadGOp. >> But representing it the way you propose gives you an advantage in insn >> selection. Is that right? Or is there another reason? Just curious. > ...snip ... > > In the end it seemed simplest to simply combine the load and conversion. > I could have used an IROp to indicate the conversion instead of using a > new enum, but that seems unclean -- obviously almost all of the IROps > one could put it are nonsensical -- and it would also require extending > IROp with a 32-bit identity operation. > > So I did a new enum. Oh yes, that's definitely better. > One other thing I have been thinking about for a while, although haven't done > anything about, is to relax the type rules for condition expressions (guards) > in IRLoadG, IRStoreG, IRMux0X, IRDirty, IRExit. Currently they are > require, respectively, I1, I1, I8, I1, I1. This is inconsistent, Yes, I had wondered about the I8 for the Mux0X condition also.. > and worse, > it leads to chains of pointless IR conversions, like > > 32to1(1Sto32(32to1(1Uto32(CmpNE32(x, 0)))) > > which are surprisingly common, especially in arm-derived IR. > > What I am contemplating is to change all of them to require a value of an > integral type (I1, I8, I16, I32 or I64), and test zero/nonzero, a la C > style conditionals. Then all the chains of conversions would disappear. Seconded. In particular as ir_opt probably does not eliminate all of those chains (not sure, just guessing). So if we can get away with it by construction then that is a good thing. > In an ideal world we could also swap the order of '0' and 'X' clauses in > Mux0X, so it reads more like C's ?-: operator. But that's probably too > big a change given that the reason is only one of aesthetics. It's not just aesthetics. I could not deal with that operator being somehow backwards when I did the IR generation for s390. Always had to think about it. So I wrote a wrapper function mkite (make if-then-else) for my sanity. Changing it may not be that much work. Florian |