|
From: Maynard J. <may...@us...> - 2011-04-27 22:40:43
|
Hi, I'm implementing an instruction for a new IBM Power processor where one of the register arguments is used as both input and output. Having no straightforward way of implementing this instruction with existing emulation for the ppc64 architecture, I attempted to add a new Iop and associated cruf. I've added new Iops a few times before, so I *thought* I knew what I was doing. However, the end result of my attempt to support this "input/output type" of operation fails. My "dst" register has the same value it did as when it was used as input. Is there a trick I'm missing or is this concept not supported in VEX? Thanks. -Maynard |
|
From: Julian S. <js...@ac...> - 2011-04-27 23:04:06
|
> this "input/output type" of operation fails. My "dst" register has the > same value it did as when it was used as input. Is there a trick I'm > missing or is this concept not supported in VEX? Assuming you're referring to front end stuff, viz, how to translate the instruction into IR. Euh, the vast majority of x86 and x86_64 instructions have one of the register arguments as both input and output. So it's a supported concept. Imagine you have some 2-argument add instruction add r2, r5 meaning "r2 += r5" Then you'd generate IR like this (for ppc32) PUT(8) = Add32( GET(8), GET(20) ) No new Iops required. Iops seem completely unrelated to this discussion: they are just forms of math. What you appear to be talking about is simulation of guest register accesses, and that is done by IR Gets and Puts. J |
|
From: Florian K. <br...@ac...> - 2011-04-27 23:21:11
|
On 04/27/2011 06:40 PM, Maynard Johnson wrote: > My "dst" register has the same value it did as when it was used as input. This sounds familiar. I think something like this happened to me once, and the problem was in insn selection where my isel_int_expr_wrk was returning a register that was modified before. And that's not allowed. Perhaps that is what's happening. But I'm just guessing. Florian |
|
From: Julian S. <js...@ac...> - 2011-04-27 23:29:32
|
On Thursday, April 28, 2011, Florian Krohm wrote: > On 04/27/2011 06:40 PM, Maynard Johnson wrote: > > My "dst" register has the same value it did as when it was used as input. > > This sounds familiar. I think something like this happened to me once, > and the problem was in insn selection where my isel_int_expr_wrk was > returning a register that was modified before. And that's not allowed. > Perhaps that is what's happening. But I'm just guessing. Maynard needs to say whether he's talking about front end stuff (_toIR.c) or back end stuff (_isel.c). I can't guess from the original posting. J |
|
From: Maynard J. <may...@us...> - 2011-04-28 00:19:50
|
Julian Seward wrote:
> On Thursday, April 28, 2011, Florian Krohm wrote:
>> On 04/27/2011 06:40 PM, Maynard Johnson wrote:
>>> My "dst" register has the same value it did as when it was used as input.
>>
>> This sounds familiar. I think something like this happened to me once,
>> and the problem was in insn selection where my isel_int_expr_wrk was
>> returning a register that was modified before. And that's not allowed.
>> Perhaps that is what's happening. But I'm just guessing.
>
> Maynard needs to say whether he's talking about front end stuff
> (_toIR.c) or back end stuff (_isel.c). I can't guess from the
> original posting.
>
Here are some details to clarify the question . . .
In the frontend (guest_ppc_toIR.c):
---------------------
case 0x184: case 0x1A4: // xvmaddadp
{
DIP("xvmaddadp v%d,v%d,v%d\n", (UInt)XT, (UInt)XA, (UInt)XB);
putVSReg( XT,
qop( Iop_MAdd64Fx2,
rm,
getVSReg( XT ),
getVSReg( XA ),
getVSReg( XB ) ) );
break;
}
---------------------
The new operation is a "Vector multiply add double precision". All three registers are vector regs, each holding two double precision floating point input values. The arithmetic that's done is (XA * XB) + XT. Special rules apply to both the intermediate and final results such that I can't simply break the vector registers in two and operate on the constituent FPs with Iop_MulF64 and Iop_AddF64. So I defined Iop_MAdd64Fx2 and implemented the backend for this new Iop. So the backend looks like this:
------- host_ppc_isel.c ----------
if (e->tag == Iex_Qop) {
if (e->Iex.Qop.op == Iop_MAdd64Fx2) {
HReg dst = iselVecExpr(env, e->Iex.Qop.arg2);
HReg xA = iselVecExpr(env, e->Iex.Qop.arg3);
HReg xB = iselVecExpr(env, e->Iex.Qop.arg4);
set_FPU_rounding_mode( env, e->Iex.Qop.arg1 );
addInstr(env, PPCInstr_VxQop(Pavfp_MADD, True, dst, xA, xB));
return dst;
}
}
---------------------------------
------- host_ppc_defs.c ----------
case Pin_VxQopFP: {
UInt v_dst = vregNo(i->Pin.VxQopFP.xT);
UInt v_srcL = vregNo(i->Pin.VxQopFP.xA);
UInt v_srcR = vregNo(i->Pin.VxQopFP.xB);
if (i->Pin.VxQopFP.op == Pavfp_MADD) {
if (i->Pin.VxQopFP.d_prec)
p = mkFormVX3( p, 60, v_dst, v_srcL, v_srcR, 97 );
}
goto done;
}
---------------------------------
NOTE: Not shown above, but I also updated the "mapRegs" and getRegUsage" functions of host_ppc_defs.c.
When my testcase executes the xvmaddadp insn and prints out the XT register, it has the same value as its input value versus the expected output value. I know the xvmaddadp insn is being executed because I ran the testcase under valgrind with ' --trace-notbelow' and '--trace-flags=10000001' to show assembly code, and I see the xvmaddadp.
So I'm obviously missing something, but I just can't see what. Hopefully this is enough detail that someone can point me in the right direction. Thanks in advance for any help!
-Maynard
> J
|
|
From: Florian K. <br...@ac...> - 2011-04-28 02:07:18
|
On 04/27/2011 08:19 PM, Maynard Johnson wrote:
> ------- host_ppc_isel.c ----------
> if (e->tag == Iex_Qop) {
> if (e->Iex.Qop.op == Iop_MAdd64Fx2) {
> HReg dst = iselVecExpr(env, e->Iex.Qop.arg2);
> HReg xA = iselVecExpr(env, e->Iex.Qop.arg3);
> HReg xB = iselVecExpr(env, e->Iex.Qop.arg4);
> set_FPU_rounding_mode( env, e->Iex.Qop.arg1 );
> addInstr(env, PPCInstr_VxQop(Pavfp_MADD, True, dst, xA, xB));
> return dst;
> }
> }
> ---------------------------------
I believe this is the culprit. Rewrite this like so:
HReg xT = iselVecExpr(env, e->Iex.Qop.arg2); /* I presume this is xT */
HReg xA = iselVecExpr(env, e->Iex.Qop.arg3);
HReg xB = iselVecExpr(env, e->Iex.Qop.arg4);
set_FPU_rounding_mode( env, e->Iex.Qop.arg1 );
dst = newVreg.... /* allocate a new Vreg here */
addInstr(env, /* copy contents of xT to dst */);
addInstr(env, PPCInstr_VxQop(Pavfp_MADD, True, dst, xA, xB));
return dst;
That should do the trick.
Florian
|
|
From: Julian S. <js...@ac...> - 2011-04-28 07:31:52
|
> I believe this is the culprit. Rewrite this like so: > > HReg xT = iselVecExpr(env, e->Iex.Qop.arg2); /* I presume this is xT */ > HReg xA = iselVecExpr(env, e->Iex.Qop.arg3); > HReg xB = iselVecExpr(env, e->Iex.Qop.arg4); > set_FPU_rounding_mode( env, e->Iex.Qop.arg1 ); > dst = newVreg.... /* allocate a new Vreg here */ > addInstr(env, /* copy contents of xT to dst */); > addInstr(env, PPCInstr_VxQop(Pavfp_MADD, True, dst, xA, xB)); > return dst; > > That should do the trick. I agree. As you say, Maynard fell foul of the rule that says that you cannot (emit code to) modify a register returned by any of the isel*expr functions. As documented on comment above iselWordExpr_R. Maynard, look at how mk_iMOVds_RR is used in that file. Then make a vector equivalent of it and use that to do the copying. Also, have a look at isMove in host_ppc_defs.c: that is what the regalloc uses to identify these move instructions later, when it is trying to get rid of them. The reason for the rule is very simple. The instruction selectors carry around a mapping from IRTemp to virtual register, which holds the identity of the virtual register that holds the value for that IRTemp. This mapping lives in the ISelEnv that is passed everywhere. When we come to do instruction selection for an Iex_RdTmp (read of an IR temporary), no instructions are generated; instead we simply look up the IR temp in the mapping (via lookupIRTemp) and return the identity of the associated virtual register. That's simple, but it does mean that any later modification of the virtual register changes the value of the IR temporary, which can't happen (it's SSA) and so later uses of the IR temporary would "see" a different value, which is wrong. Hence the insistence on copying, plus machinery for removing copies when the copy marks the end of the live range of the source vreg and the beginning of the live range of the destination vreg. Plus .. this makes it way simpler to generate code for 2 address machines (x86, etc) since we can just indiscriminately copy values and assume the copies will disappear later. This is much easier than having to consider, everywhere in isel, whether an emitted instruction is going to overwrite a value in a register that will later be needed. J |
|
From: Maynard J. <may...@us...> - 2011-05-11 16:44:16
Attachments:
vg-P7-xvmaddadp.patch
|
Julian Seward wrote: > >> I believe this is the culprit. Rewrite this like so: >> >> HReg xT = iselVecExpr(env, e->Iex.Qop.arg2); /* I presume this is xT */ >> HReg xA = iselVecExpr(env, e->Iex.Qop.arg3); >> HReg xB = iselVecExpr(env, e->Iex.Qop.arg4); >> set_FPU_rounding_mode( env, e->Iex.Qop.arg1 ); >> dst = newVreg.... /* allocate a new Vreg here */ >> addInstr(env, /* copy contents of xT to dst */); >> addInstr(env, PPCInstr_VxQop(Pavfp_MADD, True, dst, xA, xB)); >> return dst; >> >> That should do the trick. > > I agree. As you say, Maynard fell foul of the rule that says that > you cannot (emit code to) modify a register returned by any of the > isel*expr functions. As documented on comment above iselWordExpr_R. > > Maynard, look at how mk_iMOVds_RR is used in that file. Then make > a vector equivalent of it and use that to do the copying. Also, have > a look at isMove in host_ppc_defs.c: that is what the regalloc uses > to identify these move instructions later, when it is trying to get > rid of them. > > The reason for the rule is very simple. The instruction selectors > carry around a mapping from IRTemp to virtual register, which holds > the identity of the virtual register that holds the value for that > IRTemp. This mapping lives in the ISelEnv that is passed everywhere. > > When we come to do instruction selection for an Iex_RdTmp (read of an > IR temporary), no instructions are generated; instead we simply look > up the IR temp in the mapping (via lookupIRTemp) and return the identity > of the associated virtual register. That's simple, but it does mean > that any later modification of the virtual register changes the value > of the IR temporary, which can't happen (it's SSA) and so later uses > of the IR temporary would "see" a different value, which is wrong. > > Hence the insistence on copying, plus machinery for removing copies > when the copy marks the end of the live range of the source vreg and > the beginning of the live range of the destination vreg. > > Plus .. this makes it way simpler to generate code for 2 address > machines (x86, etc) since we can just indiscriminately copy values and > assume the copies will disappear later. This is much easier than > having to consider, everywhere in isel, whether an emitted instruction > is going to overwrite a value in a register that will later be needed. OK, finally getting back to this issue. I *think* I did all the things you suggested above, Julian, but the returned dst register still holds an exact copy of what it was originally set up with. Attached below is the code I added for this xvmaddadp insn. Can you see anything obviously (or not so obviously) wrong? Thanks in advance for the help! -Maynard > > J |
|
From: Julian S. <js...@ac...> - 2011-05-11 23:00:03
|
> OK, finally getting back to this issue. I *think* I did all the things you > suggested above, Julian, but the returned dst register still holds an > exact copy of what it was originally set up with. Attached below is the > code I added for this xvmaddadp insn. Can you see anything obviously (or > not so obviously) wrong? Thanks in advance for the help! It looks ok, although one comment: it seems to me that Pav_COPY is unnecessary. I would have thought Pav_MOV is what you need. Anyway, since it doesn't do what you want, have you looked at the generated code, using --tool=none --trace-flags=10001110 --trace-notbelow=0 ? J |
|
From: Maynard J. <may...@us...> - 2011-05-13 23:09:24
|
On 05/11/2011 5:21 PM, Julian Seward wrote:
>
>> OK, finally getting back to this issue. I *think* I did all the things you
>> suggested above, Julian, but the returned dst register still holds an
>> exact copy of what it was originally set up with. Attached below is the
>> code I added for this xvmaddadp insn. Can you see anything obviously (or
>> not so obviously) wrong? Thanks in advance for the help!
>
> It looks ok, although one comment: it seems to me that Pav_COPY is
> unnecessary. I would have thought Pav_MOV is what you need.
I thought I should define a new value since Pav_MOV is used in the Ist_WrTmp in
iselStmt. Nevertheless, I get the same results using Pav_MOV.
>
> Anyway, since it doesn't do what you want, have you looked at the
> generated code, using --tool=none --trace-flags=10001110 --trace-notbelow=0 ?
Julian,
I collected the trace as you suggested and have been poring over it for several
hours today. I think I've spotted a problem, but don't know how to solve it.
I'd really appreciate some help. First, here's the snippet of application code
(an objdump) containing the xvmaddadp insn:
00000000100006cc <.test_xvmadd>:
test_xvmadd():
/home/mpj/ISA2.06/NEW/valg_svn_05.10.2011-P7_stage2/none/tests/ppc64/test_isa_2_06_part2.c:605
100006cc: 7c 00 42 a6 mfvrsave r0
100006d0: 90 01 ff fc stw r0,-4(r1)
100006d4: 64 00 c0 04 oris r0,r0,49156
100006d8: 7c 00 43 a6 mtvrsave r0
/home/mpj/ISA2.06/NEW/valg_svn_05.10.2011-P7_stage2/none/tests/ppc64/test_isa_2_06_part2.c:606
100006dc: 60 00 00 00 nop
100006e0: 39 22 80 f0 addi r9,r2,-32528
100006e4: 7c 20 4e 19 lxvw4x vs33,r0,r9
100006e8: 38 00 00 10 li r0,16
100006ec: 7d a9 06 19 lxvw4x vs45,r9,r0
100006f0: 38 00 00 20 li r0,32
100006f4: 7c 09 06 19 lxvw4x vs32,r9,r0
100006f8: f0 01 6b 0f xvmaddadp vs32,vs33,vs45
100006fc: 7c 09 07 19 stxvw4x vs32,r9,r0
/home/mpj/ISA2.06/NEW/valg_svn_05.10.2011-P7_stage2/none/tests/ppc64/test_isa_2_06_part2.c:607
10000700: 81 81 ff fc lwz r12,-4(r1)
10000704: 7d 80 43 a6 mtvrsave r12
10000708: 4e 80 00 20 blr
------------------------------------------------
And below is the relevant trace output. Search for "MPJ" and you'll find my
comments. The last comment is at the place where I *think* there's an error.
To keep the length of the pasted trace output relatively short, I truncated it
after the "Instruction selection" section. But the output from the
"Register-allocated code" and "Assembly" sections show more or less the same
thing I see in the "Instruction selection" section. Thanks in advance for any
help you can offer.
==== SB 1602 [tid 1] test_xvmadd(0x100006cc) SBs exec'd 51299 ====
------------------------ Front end ------------------------
0x100006CC: mfvrsave r0
------ IMark(0x100006CC, 4) ------
t0 = GET:I64(0)
PUT(0) = 32Uto64(GET:I32(1328))
0x100006D0: stw r0,-4(r1)
------ IMark(0x100006D0, 4) ------
PUT(1280) = 0x100006D0:I64
t2 = GET:I64(248)
t1 = GET:I64(0)
t3 = Add64(GET:I64(8),0xFFFFFFFFFFFFFFFC:I64)
STbe(t3) = 64to32(t1)
0x100006D4: oris r0,r0,0xC004
------ IMark(0x100006D4, 4) ------
PUT(1280) = 0x100006D4:I64
t4 = GET:I64(0)
t6 = GET:I64(192)
t5 = Or64(t4,0xC0040000:I64)
PUT(0) = t5
0x100006D8: mtvrsave r0
------ IMark(0x100006D8, 4) ------
PUT(1280) = 0x100006D8:I64
t7 = GET:I64(0)
PUT(1328) = 64to32(t7)
0x100006DC: ori r0,r0,0x0
------ IMark(0x100006DC, 4) ------
PUT(1280) = 0x100006DC:I64
t8 = GET:I64(0)
t10 = GET:I64(0)
t9 = Or64(t8,0x0:I64)
PUT(0) = t9
0x100006E0: addi r9,r2,-32528
------ IMark(0x100006E0, 4) ------
PUT(1280) = 0x100006E0:I64
t11 = GET:I64(16)
t12 = GET:I64(128)
t13 = Add64(t11,0xFFFFFFFFFFFF80F0:I64)
PUT(72) = t13
0x100006E4: lxvw4x 33,r0,r9
------ IMark(0x100006E4, 4) ------
PUT(1280) = 0x100006E4:I64
t14 = GET:I64(72)
PUT(784) =
64HLtoV128(32HLto64(LDbe:I32(t14),LDbe:I32(Add64(t14,0x4:I64))),32HLto64(LDbe:I32(Add64(t14,0x8:I64)),LDbe:I32(Add64(t14,0xC:I64))))
0x100006E8: li r0,16
------ IMark(0x100006E8, 4) ------
PUT(1280) = 0x100006E8:I64
t15 = GET:I64(0)
t16 = GET:I64(0)
t17 = 0x10:I64
PUT(0) = t17
0x100006EC: lxvw4x 45,r9,r0
------ IMark(0x100006EC, 4) ------
PUT(1280) = 0x100006EC:I64
t18 = Add64(GET:I64(72),GET:I64(0))
PUT(976) =
64HLtoV128(32HLto64(LDbe:I32(t18),LDbe:I32(Add64(t18,0x4:I64))),32HLto64(LDbe:I32(Add64(t18,0x8:I64)),LDbe:I32(Add64(t18,0xC:I64))))
0x100006F0: li r0,32
------ IMark(0x100006F0, 4) ------
PUT(1280) = 0x100006F0:I64
t19 = GET:I64(0)
t20 = GET:I64(0)
t21 = 0x20:I64
PUT(0) = t21
0x100006F4: lxvw4x 32,r9,r0
------ IMark(0x100006F4, 4) ------
PUT(1280) = 0x100006F4:I64
t22 = Add64(GET:I64(72),GET:I64(0))
PUT(768) =
64HLtoV128(32HLto64(LDbe:I32(t22),LDbe:I32(Add64(t22,0x4:I64))),32HLto64(LDbe:I32(Add64(t22,0x8:I64)),LDbe:I32(Add64(t22,0xC:I64))))
0x100006F8: xvmaddadp v32,v33,v45
------ IMark(0x100006F8, 4) ------
PUT(1280) = 0x100006F8:I64
t24 = GET:I32(1324)
t23 = And32(t24,0x3:I32)
PUT(768) =
MAdd64Fx2(Xor32(t23,And32(Shl32(t23,0x1:I8),0x2:I32)),GET:V128(768),GET:V128(784),GET:V128(976))
0x100006FC: stxvw4x 32,r9,r0
------ IMark(0x100006FC, 4) ------
PUT(1280) = 0x100006FC:I64
t28 = Add64(GET:I64(72),GET:I64(0))
t27 = GET:V128(768)
t29 = V128HIto64(t27)
t30 = V128to64(t27)
STbe(t28) = 64HIto32(t29)
STbe(Add64(t28,0x4:I64)) = 64to32(t29)
STbe(Add64(t28,0x8:I64)) = 64HIto32(t30)
STbe(Add64(t28,0xC:I64)) = 64to32(t30)
0x10000700: lwz r12,-4(r1)
------ IMark(0x10000700, 4) ------
PUT(1280) = 0x10000700:I64
t31 = Add64(GET:I64(8),0xFFFFFFFFFFFFFFFC:I64)
PUT(96) = 32Uto64(LDbe:I32(t31))
0x10000704: mtvrsave r12
------ IMark(0x10000704, 4) ------
PUT(1280) = 0x10000704:I64
t32 = GET:I64(96)
PUT(1328) = 64to32(t32)
0x10000708: blr
------ IMark(0x10000708, 4) ------
PUT(1280) = 0x10000708:I64
t37 = 0xFFFFFFFF:I32
t34 = t37
t38 = 0x1:I32
t35 = t38
t33 = And32(t35,t34)
t36 = And64(GET:I64(1288),0xFFFFFFFFFFFFFFFC:I64)
if (CmpEQ32(t33,0x0:I32)) goto {Boring} 0x1000070C:I64
====== AbiHint(Sub64(GET:I64(8),0x120:I64), 288, t36) ======
goto {Return} t36
GuestBytes 100006CC 64 7C 00 42 A6 90 01 FF FC 64 00 C0 04 7C 00 43 A6 60 00 00
00 39 22 80 F0 7C 20 4E 19 38 00 00 10 7D A9 06 19 38 00 00 20 7C 09 06 19 F0 01
6B 0F 7C 09 07 19 81 81 FF FC 7D 80 43 A6 4E 80 00 20 4A7722D0
------------------------ After tree-building ------------------------
IRSB {
t0:I64 t1:I64 t2:I64 t3:I64 t4:I64 t5:I64 t6:I64 t7:I64
t8:I64 t9:I64 t10:I64 t11:I64 t12:I64 t13:I64 t14:I64 t15:I64
t16:I64 t17:I64 t18:I64 t19:I64 t20:I64 t21:I64 t22:I64 t23:I32
t24:I32 t25:F64 t26:F64 t27:V128 t28:I64 t29:I64 t30:I64 t31:I64
t32:I64 t33:I32 t34:I32 t35:I32 t36:I64 t37:I32 t38:I32 t39:I32
t40:I64 t41:I32 t42:I64 t43:I64 t44:I32 t45:I32 t46:V128 t47:I64
t48:I32 t49:I32 t50:I64 t51:I64 t52:I32 t53:I64 t54:I32 t55:I64
t56:I64 t57:I64 t58:I64 t59:V128 t60:I64 t61:I32 t62:I32 t63:I64
t64:I64 t65:I32 t66:I64 t67:I32 t68:I64 t69:I64 t70:I64 t71:I64
t72:V128 t73:I64 t74:I32 t75:I32 t76:I64 t77:I64 t78:I32 t79:I64
t80:I32 t81:I64 t82:V128 t83:I32 t84:I32 t85:I32 t86:V128 t87:V128
t88:V128 t89:I64 t90:I64 t91:I64 t92:I64 t93:I64 t94:I32 t95:I64
t96:I32 t97:I64 t98:I32 t99:I64 t100:I32 t101:I64 t102:I64
t103:I64
t104:I32 t105:I32 t106:I64 t107:I64 t108:I1 t109:I64 t110:I64
------ IMark(0x100006CC, 4) ------
------ IMark(0x100006D0, 4) ------
t40 = 32Uto64(GET:I32(1328))
PUT(1280) = 0x100006D0:I64
t43 = GET:I64(8)
t42 = Add64(t43,0xFFFFFFFFFFFFFFFC:I64)
STbe(t42) = 64to32(t40)
------ IMark(0x100006D4, 4) ------
------ IMark(0x100006D8, 4) ------
------ IMark(0x100006DC, 4) ------
------ IMark(0x100006E0, 4) ------
t13 = Add64(GET:I64(16),0xFFFFFFFFFFFF80F0:I64)
PUT(72) = t13
------ IMark(0x100006E4, 4) ------
>>> MPJ: First lxvw4x
PUT(1280) = 0x100006E4:I64
t46 =
64HLtoV128(32HLto64(LDbe:I32(t13),LDbe:I32(Add64(t13,0x4:I64))),32HLto64(LDbe:I32(Add64(t13,0x8:I64)),LDbe:I32(Add64(t13,0xC:I64))))
PUT(784) = t46
------ IMark(0x100006E8, 4) ------
------ IMark(0x100006EC, 4) ------
>>> MPJ: Second lxvw4x
PUT(1280) = 0x100006EC:I64
t56 = Add64(t13,0x10:I64)
>>> MPJ: Third lxvw4x embedded
t59 =
64HLtoV128(32HLto64(LDbe:I32(t56),LDbe:I32(Add64(t56,0x4:I64))),32HLto64(LDbe:I32(Add64(t56,0x8:I64)),LDbe:I32(Add64(t56,0xC:I64))))
PUT(976) = t59
------ IMark(0x100006F0, 4) ------
PUT(0) = 0x20:I64
------ IMark(0x100006F4, 4) ------
PUT(1280) = 0x100006F4:I64
t69 = Add64(t13,0x20:I64)
t76 = Add64(t69,0x4:I64)
t79 = Add64(t69,0x8:I64)
t81 = Add64(t69,0xC:I64)
------ IMark(0x100006F8, 4) ------
t23 = And32(GET:I32(1324),0x3:I32)
>>> MPJ: Here's the xvmaddadp . . .
t82 =
MAdd64Fx2(Xor32(t23,And32(Shl32(t23,0x1:I8),0x2:I32)),64HLtoV128(32HLto64(LDbe:I32(t69),LDbe:I32(t76)),32HLto64(LDbe:I32(t79),LDbe:I32(t81))),t46,t59)
PUT(768) = t82
------ IMark(0x100006FC, 4) ------
PUT(1280) = 0x100006FC:I64
t92 = V128HIto64(t82)
t93 = V128to64(t82)
STbe(t69) = 64HIto32(t92)
STbe(t76) = 64to32(t92)
STbe(t79) = 64HIto32(t93)
STbe(t81) = 64to32(t93)
------ IMark(0x10000700, 4) ------
PUT(1280) = 0x10000700:I64
t103 = 32Uto64(LDbe:I32(t42))
PUT(96) = t103
------ IMark(0x10000704, 4) ------
PUT(1328) = 64to32(t103)
------ IMark(0x10000708, 4) ------
PUT(1280) = 0x10000708:I64
t106 = And64(GET:I64(1288),0xFFFFFFFFFFFFFFFC:I64)
====== AbiHint(Sub64(t43,0x120:I64), 288, t106) ======
goto {Return} t106
}
------------------------ Instruction selection ------------------------
mflr %vR111
-- ------ IMark(0x100006CC, 4) ------
-- ------ IMark(0x100006D0, 4) ------
-- t40 = 32Uto64(GET:I32(1328))
lwz %vR113,1328(%r31)
sldi %vR112,%vR113,32
srdi %vR112,%vR112,32
mr %vR40,%vR112
-- PUT(1280) = 0x100006D0:I64
li_word %vR114,0x00000000100006D0
std %vR114,1280(%r31)
-- t43 = GET:I64(8)
ld %vR115,8(%r31)
mr %vR43,%vR115
-- t42 = Add64(t43,0xFFFFFFFFFFFFFFFC:I64)
addi %vR116,%vR43,-4
mr %vR42,%vR116
-- STbe(t42) = 64to32(t40)
stw %vR40,0(%vR42)
-- ------ IMark(0x100006D4, 4) ------
-- ------ IMark(0x100006D8, 4) ------
-- ------ IMark(0x100006DC, 4) ------
-- ------ IMark(0x100006E0, 4) ------
-- t13 = Add64(GET:I64(16),0xFFFFFFFFFFFF80F0:I64)
ld %vR118,16(%r31)
addi %vR117,%vR118,-32528
mr %vR13,%vR117
-- PUT(72) = t13
std %vR13,72(%r31)
-- ------ IMark(0x100006E4, 4) ------
-- PUT(1280) = 0x100006E4:I64
li_word %vR119,0x00000000100006E4
std %vR119,1280(%r31)
-- t46 =
64HLtoV128(32HLto64(LDbe:I32(t13),LDbe:I32(Add64(t13,0x4:I64))),32HLto64(LDbe:I32(Add64(t13,0x8:I64)),LDbe:I32(Add64(t13,0xC:I64))))
lwz %vR120,0(%vR13)
lwz %vR121,4(%vR13)
sldi %vR122,%vR120,32
li_word %vR123,0x00000000FFFFFFFF
and %vR121,%vR121,%vR123
or %vR122,%vR122,%vR121
lwz %vR124,8(%vR13)
lwz %vR125,12(%vR13)
sldi %vR126,%vR124,32
li_word %vR127,0x00000000FFFFFFFF
and %vR125,%vR125,%vR127
or %vR126,%vR126,%vR125
subi %r1,%r1,32
mr %vR129,%r1
addi %vR129,%vR129,16
li_word %vR130,0xFFFFFFFFFFFFFFF0
and %vR129,%vR129,%vR130
std %vR122,0(%vR129)
std %vR126,8(%vR129)
li_word %r30,0x0000000000000000 ; lvx %vV128,%r30,%vR129
addi %r1,%r1,32
vmr %vV46,%vV128
-- PUT(784) = t46
li_word %r30,0x0000000000000310 ; stvx %vV46,%r30,%r31
-- ------ IMark(0x100006E8, 4) ------
-- ------ IMark(0x100006EC, 4) ------
-- PUT(1280) = 0x100006EC:I64
li_word %vR131,0x00000000100006EC
std %vR131,1280(%r31)
-- t56 = Add64(t13,0x10:I64)
addi %vR132,%vR13,16
mr %vR56,%vR132
-- t59 =
64HLtoV128(32HLto64(LDbe:I32(t56),LDbe:I32(Add64(t56,0x4:I64))),32HLto64(LDbe:I32(Add64(t56,0x8:I64)),LDbe:I32(Add64(t56,0xC:I64))))
lwz %vR133,0(%vR56)
lwz %vR134,4(%vR56)
sldi %vR135,%vR133,32
li_word %vR136,0x00000000FFFFFFFF
and %vR134,%vR134,%vR136
or %vR135,%vR135,%vR134
lwz %vR137,8(%vR56)
lwz %vR138,12(%vR56)
sldi %vR139,%vR137,32
li_word %vR140,0x00000000FFFFFFFF
and %vR138,%vR138,%vR140
or %vR139,%vR139,%vR138
subi %r1,%r1,32
mr %vR142,%r1
addi %vR142,%vR142,16
li_word %vR143,0xFFFFFFFFFFFFFFF0
and %vR142,%vR142,%vR143
std %vR135,0(%vR142)
std %vR139,8(%vR142)
li_word %r30,0x0000000000000000 ; lvx %vV141,%r30,%vR142
addi %r1,%r1,32
vmr %vV59,%vV141
-- PUT(976) = t59
li_word %r30,0x00000000000003D0 ; stvx %vV59,%r30,%r31
-- ------ IMark(0x100006F0, 4) ------
-- PUT(0) = 0x20:I64
li_word %vR144,0x0000000000000020
std %vR144,0(%r31)
-- ------ IMark(0x100006F4, 4) ------
-- PUT(1280) = 0x100006F4:I64
li_word %vR145,0x00000000100006F4
std %vR145,1280(%r31)
-- t69 = Add64(t13,0x20:I64)
addi %vR146,%vR13,32
mr %vR69,%vR146
-- t76 = Add64(t69,0x4:I64)
addi %vR147,%vR69,4
mr %vR76,%vR147
-- t79 = Add64(t69,0x8:I64)
addi %vR148,%vR69,8
mr %vR79,%vR148
-- t81 = Add64(t69,0xC:I64)
addi %vR149,%vR69,12
mr %vR81,%vR149
-- ------ IMark(0x100006F8, 4) ------
-- t23 = And32(GET:I32(1324),0x3:I32)
lwz %vR151,1324(%r31)
andi. %vR150,%vR151,3
mr %vR23,%vR150
-- t82 =
MAdd64Fx2(Xor32(t23,And32(Shl32(t23,0x1:I8),0x2:I32)),64HLtoV128(32HLto64(LDbe:I32(t69),LDbe:I32(t76)),32HLto64(LDbe:I32(t79),LDbe:I32(t81))),t46,t59)
lwz %vR152,0(%vR69)
lwz %vR153,0(%vR76)
sldi %vR154,%vR152,32
li_word %vR155,0x00000000FFFFFFFF
and %vR153,%vR153,%vR155
or %vR154,%vR154,%vR153
lwz %vR156,0(%vR79)
lwz %vR157,0(%vR81)
sldi %vR158,%vR156,32
li_word %vR159,0x00000000FFFFFFFF
and %vR157,%vR157,%vR159
or %vR158,%vR158,%vR157
subi %r1,%r1,32
mr %vR161,%r1
addi %vR161,%vR161,16
li_word %vR162,0xFFFFFFFFFFFFFFF0
and %vR161,%vR161,%vR162
std %vR154,0(%vR161)
std %vR158,8(%vR161)
li_word %r30,0x0000000000000000 ; lvx %vV160,%r30,%vR161
addi %r1,%r1,32
vmr %vV163,%vV160
slwi %vR167,%vR23,1
andi. %vR166,%vR167,2
xor %vR165,%vR23,%vR166
slwi %vR169,%vR165,1
xor %vR169,%vR165,%vR169
andi. %vR168,%vR169,3
subi %r1,%r1,16
std %vR168,0(%r1)
lfd %vD170,0(%r1)
addi %r1,%r1,16
mtfsf 0xFF,%vD170
xvmaddadp %vV163,%vV46,%vV59
>>> MPJ: I think the problem is right here. In the line below, we copy virtual register V163 to
>>> virtual register V82; but V163 would still hold the original input data, not the result
>>> value from the xvmaddadp, right?
vmr %vV82,%vV163
-- PUT(768) = t82
li_word %r30,0x0000000000000300 ; stvx %vV82,%r30,%r31
-- ------ IMark(0x100006FC, 4) ------
-- PUT(1280) = 0x100006FC:I64
li_word %vR171,0x00000000100006FC
std %vR171,1280(%r31)
-- t92 = V128HIto64(t82)
subi %r1,%r1,32
mr %vR173,%r1
addi %vR173,%vR173,16
li_word %vR174,0xFFFFFFFFFFFFFFF0
and %vR173,%vR173,%vR174
li_word %r30,0x0000000000000000 ; stvx %vV82,%r30,%vR173
ld %vR172,0(%vR173)
addi %r1,%r1,32
mr %vR92,%vR172
-- t93 = V128to64(t82)
subi %r1,%r1,32
mr %vR176,%r1
addi %vR176,%vR176,16
li_word %vR177,0xFFFFFFFFFFFFFFF0
and %vR176,%vR176,%vR177
li_word %r30,0x0000000000000000 ; stvx %vV82,%r30,%vR176
ld %vR175,8(%vR176)
addi %r1,%r1,32
mr %vR93,%vR175
-- STbe(t69) = 64HIto32(t92)
srdi %vR178,%vR92,32
stw %vR178,0(%vR69)
-- STbe(t76) = 64to32(t92)
stw %vR92,0(%vR76)
-- STbe(t79) = 64HIto32(t93)
srdi %vR179,%vR93,32
stw %vR179,0(%vR79)
-- STbe(t81) = 64to32(t93)
stw %vR93,0(%vR81)
-- ------ IMark(0x10000700, 4) ------
-- PUT(1280) = 0x10000700:I64
li_word %vR180,0x0000000010000700
std %vR180,1280(%r31)
-- t103 = 32Uto64(LDbe:I32(t42))
lwz %vR182,0(%vR42)
sldi %vR181,%vR182,32
srdi %vR181,%vR181,32
mr %vR103,%vR181
-- PUT(96) = t103
std %vR103,96(%r31)
-- ------ IMark(0x10000704, 4) ------
-- PUT(1328) = 64to32(t103)
stw %vR103,1328(%r31)
-- ------ IMark(0x10000708, 4) ------
-- PUT(1280) = 0x10000708:I64
li_word %vR183,0x0000000010000708
std %vR183,1280(%r31)
-- t106 = And64(GET:I64(1288),0xFFFFFFFFFFFFFFFC:I64)
ld %vR185,1288(%r31)
li_word %vR186,0xFFFFFFFFFFFFFFFC
and %vR184,%vR185,%vR186
mr %vR106,%vR184
-- ====== AbiHint(Sub64(t43,0x120:I64), 288, t106) ======
-- goto {Return} t106
---------------------------------------------------------------
-Maynard
>
> J
|
|
From: Julian S. <js...@ac...> - 2011-05-16 10:26:41
|
> -- t82 =
> MAdd64Fx2(
> arg1: Xor32(t23,And32(Shl32(t23,0x1:I8),0x2:I32)),
> arg2: 64HLtoV128(32HLto64(LDbe:I32(t69),LDbe:I32(t76)),
> 32HLto64(LDbe:I32(t79),LDbe:I32(t81))),
> arg3: t46,
> arg4: t59)
> lwz %vR152,0(%vR69)
> lwz %vR153,0(%vR76)
> sldi %vR154,%vR152,32
> li_word %vR155,0x00000000FFFFFFFF
> and %vR153,%vR153,%vR155
> or %vR154,%vR154,%vR153
> lwz %vR156,0(%vR79)
> lwz %vR157,0(%vR81)
> sldi %vR158,%vR156,32
> li_word %vR159,0x00000000FFFFFFFF
> and %vR157,%vR157,%vR159
> or %vR158,%vR158,%vR157
> subi %r1,%r1,32
> mr %vR161,%r1
> addi %vR161,%vR161,16
> li_word %vR162,0xFFFFFFFFFFFFFFF0
> and %vR161,%vR161,%vR162
> std %vR154,0(%vR161)
> std %vR158,8(%vR161)
> li_word %r30,0x0000000000000000 ; lvx %vV160,%r30,%vR161
> addi %r1,%r1,32
> vmr %vV163,%vV160
> slwi %vR167,%vR23,1
> andi. %vR166,%vR167,2
> xor %vR165,%vR23,%vR166
> slwi %vR169,%vR165,1
> xor %vR169,%vR165,%vR169
> andi. %vR168,%vR169,3
> subi %r1,%r1,16
> std %vR168,0(%r1)
> lfd %vD170,0(%r1)
> addi %r1,%r1,16
> mtfsf 0xFF,%vD170
> xvmaddadp %vV163,%vV46,%vV59
>
> >>> MPJ: I think the problem is right here. In the line below, we copy
> >>> virtual register V163 to virtual register V82; but V163 would still
> >>> hold the original input data, not the result value from the xvmaddadp,
> >>> right?
>
> vmr %vV82,%vV163
This all looks OK to me; i have a different theory why it failed (see below)
This last vmr is not the move under discussion; that occurs earlier. Here's
the generated code in pieces, each preceded by the isel line that generates
it:
+ if (e->tag == Iex_Qop) {
+ if (e->Iex.Qop.op == Iop_MAdd64Fx2) {
+ HReg xT = iselVecExpr(env, e->Iex.Qop.arg2);
> lwz %vR152,0(%vR69)
> lwz %vR153,0(%vR76)
> sldi %vR154,%vR152,32
> li_word %vR155,0x00000000FFFFFFFF
> and %vR153,%vR153,%vR155
> or %vR154,%vR154,%vR153
> lwz %vR156,0(%vR79)
> lwz %vR157,0(%vR81)
> sldi %vR158,%vR156,32
> li_word %vR159,0x00000000FFFFFFFF
> and %vR157,%vR157,%vR159
> or %vR158,%vR158,%vR157
> subi %r1,%r1,32
> mr %vR161,%r1
> addi %vR161,%vR161,16
> li_word %vR162,0xFFFFFFFFFFFFFFF0
> and %vR161,%vR161,%vR162
> std %vR154,0(%vR161)
> std %vR158,8(%vR161)
> li_word %r30,0x0000000000000000 ; lvx %vV160,%r30,%vR161
> addi %r1,%r1,32
# arg2 (big horrible expression) is now in %vV160
+ HReg xA = iselVecExpr(env, e->Iex.Qop.arg3);
# doesn't generate any code; arg3 is already in %vV46
+ HReg xB = iselVecExpr(env, e->Iex.Qop.arg4);
# doesn't generate any code; arg4 is already in %vV59
+ HReg dst = newVRegV(env);
+ addInstr(env, mk_vMOV_RR( dst, xT ));
> vmr %vV163,%vV160
# allocate dst = %vV163 and copy %vV160 into it. So it's now
# OK to modify %vV163
+ set_FPU_rounding_mode( env, e->Iex.Qop.arg1 );
# guff to set the rounding mode
> slwi %vR167,%vR23,1
> andi. %vR166,%vR167,2
> xor %vR165,%vR23,%vR166
> slwi %vR169,%vR165,1
> xor %vR169,%vR165,%vR169
> andi. %vR168,%vR169,3
> subi %r1,%r1,16
> std %vR168,0(%r1)
> lfd %vD170,0(%r1)
> addi %r1,%r1,16
> mtfsf 0xFF,%vD170
# and do "%vV163 += %vV46 * %vV59" (vector)
> xvmaddadp %vV163,%vV46,%vV59
# result is now in %vV163
> vmr %vV82,%vV163
# this is some other copy caused by the call to iselVecExpr that called
# this one. It's unrelated to this problem. Where it appears in the
generated code implies it can't have been generated by
+ addInstr(env, mk_vMOV_RR( dst, xT ));
since it appears after the xvmaddadp %vV163,%vV46,%vV59, not before.
If I had to guess .. I'd say your getRegUsage entry for xvmaddadp is wrong.
You need to convince the register allocator to put %vV163,%vV46,%vV59 in
different real registers, and if your getRegUsage claims for the insn
are wrong, the register allocator will duly screw the code up. What does
the regalloc'd code around the insn look like? What does the getRegUsage
entry for xvmaddadp look like?
J
|
|
From: Maynard J. <may...@us...> - 2011-05-16 21:16:58
|
Julian Seward wrote:
>
[snip]
>
> If I had to guess .. I'd say your getRegUsage entry for xvmaddadp is wrong.
> You need to convince the register allocator to put %vV163,%vV46,%vV59 in
> different real registers, and if your getRegUsage claims for the insn
> are wrong, the register allocator will duly screw the code up. What does
> the regalloc'd code around the insn look like? What does the getRegUsage
> entry for xvmaddadp look like?
Julian,
The register allocation code dump is as follows:
------------------------ Register-allocated code ------------------------
0 mflr %r4
1 ld %r5,16(%r31)
2 addi %r6,%r5,-32528
3 std %r6,72(%r31)
4 li_word %r5,0x00000000100006F4
5 std %r5,1280(%r31)
6 addi %r5,%r6,16
7 lwz %r7,0(%r5)
8 lwz %r8,4(%r5)
9 sldi %r9,%r7,32
10 li_word %r7,0x00000000FFFFFFFF
11 and %r8,%r8,%r7
12 or %r9,%r9,%r8
13 lwz %r7,8(%r5)
14 lwz %r8,12(%r5)
15 sldi %r5,%r7,32
16 li_word %r7,0x00000000FFFFFFFF
17 and %r8,%r8,%r7
18 or %r5,%r5,%r8
19 subi %r1,%r1,32
20 mr %r7,%r1
21 addi %r7,%r7,16
22 li_word %r8,0xFFFFFFFFFFFFFFF0
23 and %r7,%r7,%r8
24 std %r9,0(%r7)
25 std %r5,8(%r7)
26 li_word %r30,0x0000000000000000 ; lvx %v20,%r30,%r7
27 addi %r1,%r1,32
28 vmr %v21,%v20
29 li_word %r30,0x0000000000000310 ; stvx %v21,%r30,%r31
30 li_word %r5,0x00000000100006FC
31 std %r5,1280(%r31)
32 addi %r5,%r6,32
33 lwz %r7,0(%r5)
34 lwz %r8,4(%r5)
35 sldi %r9,%r7,32
36 li_word %r7,0x00000000FFFFFFFF
37 and %r8,%r8,%r7
38 or %r9,%r9,%r8
39 lwz %r7,8(%r5)
40 lwz %r8,12(%r5)
41 sldi %r5,%r7,32
42 li_word %r7,0x00000000FFFFFFFF
43 and %r8,%r8,%r7
44 or %r5,%r5,%r8
45 subi %r1,%r1,32
46 mr %r7,%r1
47 addi %r7,%r7,16
48 li_word %r8,0xFFFFFFFFFFFFFFF0
49 and %r7,%r7,%r8
50 std %r9,0(%r7)
51 std %r5,8(%r7)
52 li_word %r30,0x0000000000000000 ; lvx %v20,%r30,%r7
53 addi %r1,%r1,32
54 vmr %v22,%v20
55 li_word %r30,0x00000000000003D0 ; stvx %v22,%r30,%r31
56 li_word %r5,0x0000000000000030
57 std %r5,0(%r31)
58 li_word %r5,0x0000000010000704
59 std %r5,1280(%r31)
60 addi %r5,%r6,48
61 addi %r6,%r5,4
62 addi %r7,%r5,8
63 addi %r8,%r5,12
64 lwz %r9,1324(%r31)
65 andi. %r10,%r9,3
66 lwz %r9,0(%r5)
67 lwz %r14,0(%r6)
68 sldi %r15,%r9,32
69 li_word %r9,0x00000000FFFFFFFF
70 and %r14,%r14,%r9
71 or %r15,%r15,%r14
72 lwz %r9,0(%r7)
73 lwz %r14,0(%r8)
74 sldi %r16,%r9,32
75 li_word %r9,0x00000000FFFFFFFF
76 and %r14,%r14,%r9
77 or %r16,%r16,%r14
78 subi %r1,%r1,32
79 mr %r9,%r1
80 addi %r9,%r9,16
81 li_word %r14,0xFFFFFFFFFFFFFFF0
82 and %r9,%r9,%r14
83 std %r15,0(%r9)
84 std %r16,8(%r9)
85 li_word %r30,0x0000000000000000 ; lvx %v20,%r30,%r9
86 addi %r1,%r1,32
87 slwi %r9,%r10,1
88 andi. %r14,%r9,2
89 xor %r9,%r10,%r14
90 slwi %r10,%r9,1
91 xor %r10,%r9,%r10
92 andi. %r9,%r10,3
93 subi %r1,%r1,16
94 std %r9,0(%r1)
95 lfd %fr14,0(%r1)
96 addi %r1,%r1,16
97 mtfsf 0xFF,%fr14
98 xvmaddadp %v20,%v21,%v22
99 vmr %v21,%v20
100 li_word %r30,0x0000000000000300 ; stvx %v21,%r30,%r31
101 li_word %r9,0x000000001000070C
102 std %r9,1280(%r31)
103 subi %r1,%r1,32
104 mr %r9,%r1
105 addi %r9,%r9,16
106 li_word %r10,0xFFFFFFFFFFFFFFF0
107 and %r9,%r9,%r10
108 li_word %r30,0x0000000000000000 ; stvx %v21,%r30,%r9
109 ld %r10,0(%r9)
110 addi %r1,%r1,32
111 subi %r1,%r1,32
112 mr %r9,%r1
113 addi %r9,%r9,16
114 li_word %r14,0xFFFFFFFFFFFFFFF0
115 and %r9,%r9,%r14
116 li_word %r30,0x0000000000000000 ; stvx %v21,%r30,%r9
117 ld %r14,8(%r9)
118 addi %r1,%r1,32
119 srdi %r9,%r10,32
120 stw %r9,0(%r5)
121 stw %r10,0(%r6)
122 srdi %r5,%r14,32
123 stw %r5,0(%r7)
124 stw %r14,0(%r8)
125 li_word %r5,0x000000001000073C
126 std %r5,1280(%r31)
127 ld %r5,8(%r31)
128 lwz %r6,-4(%r5)
129 sldi %r5,%r6,32
130 srdi %r5,%r5,32
131 std %r5,96(%r31)
132 stw %r5,1328(%r31)
133 li_word %r5,0x0000000010000744
134 std %r5,1280(%r31)
135 ld %r5,1288(%r31)
136 li_word %r6,0xFFFFFFFFFFFFFFFC
137 and %r7,%r5,%r6
138 mtlr %r4
139 goto: { mr %r3,%r7 ; blr }
-----------------------------------------------
As for my getRegUsage entry for xvmaddadp, it's shown below with the *entire* patch. I realize now that I had not included all of the new xvmadd-related code in the patch I attached in my earlier note. The xvmadd stuff is part of a larger patch I'm working on, so I had tried pulling out the unrelated stuff -- and pulled out too much. The stuff below is the *complete* set of changes I added for xvmadd support.
Thanks much for the help!
-Maynard
--------------
diff -paurX /home/mpj/diff_fileExclusionFilter ../../vg-tmp/VEX/priv/guest_ppc_toIR.c valgrind-P7-update/VEX/priv/guest_ppc_toIR.c
--- ../../vg-tmp/VEX/priv/guest_ppc_toIR.c 2011-05-11 10:33:11.000000000 -0500
+++ valgrind-P7-update/VEX/priv/guest_ppc_toIR.c 2011-05-11 10:39:11.000000000 -0500
@@ -8385,6 +8385,18 @@ dis_vx_arith ( UInt theInstr, UInt opc2
break;
}
+ case 0x184: // xvmaddadp
+ {
+ DIP("xvmaddadp v%d,v%d,v%d\n", (UInt)XT, (UInt)XA, (UInt)XB);
+ putVSReg( XT,
+ qop( Iop_MAdd64Fx2,
+ rm,
+ getVSReg( XT ),
+ getVSReg( XA ),
+ getVSReg( XB ) ) );
+ break;
+ }
+
default:
vex_printf( "dis_vx_arith(ppc)(opc2)\n" );
return False;
@@ -11589,6 +11601,7 @@ DisResult disInstr_PPC_WRK (
case 0x096: case 0x0F4: // xssqrtdp, xstdivdp
case 0x180: case 0x100: // xvadddp, xvaddsp
case 0x1E0: case 0x160: // xvdivdp, xvdivsp
+ case 0x184: // xvmaddadp
if (dis_vx_arith(theInstr, vsxOpc2)) goto decode_success;
goto decode_failure;
case 0x2B0: case 0x2F0: case 0x2D0: // xscvdpsxds, xscvsxddp, xscvuxddp
diff -paurX /home/mpj/diff_fileExclusionFilter ../../vg-tmp/VEX/priv/host_ppc_defs.c valgrind-P7-update/VEX/priv/host_ppc_defs.c
--- ../../vg-tmp/VEX/priv/host_ppc_defs.c 2011-05-16 15:42:08.000000000 -0500
+++ valgrind-P7-update/VEX/priv/host_ppc_defs.c 2011-05-16 15:44:19.000000000 -0500
@@ -639,7 +639,8 @@ HChar* showPPCAvOp ( PPCAvOp op ) {
/* Unary */
case Pav_MOV: return "vmr"; /* Mov */
-
+ case Pav_COPY: return "copy?!"; /* Internal use reg-reg copy */
+
case Pav_AND: return "vand"; /* Bitwise */
case Pav_OR: return "vor";
case Pav_XOR: return "vxor";
@@ -1204,6 +1205,19 @@ PPCInstr* PPCInstr_AvLdVSCR ( HReg src )
return i;
}
+/* VSX */
+PPCInstr* PPCInstr_VxQop( PPCVxFpOp op, Bool double_prec, HReg XT, HReg XA, HReg XB )
+{
+ PPCInstr* i = LibVEX_Alloc(sizeof(PPCInstr));
+ i->tag = Pin_VxQopFP;
+ i->Pin.VxQopFP.d_prec = double_prec;
+ i->Pin.VxQopFP.op = op;
+ i->Pin.VxQopFP.xT = XT;
+ i->Pin.VxQopFP.xA = XA;
+ i->Pin.VxQopFP.xB = XB;
+ return i;
+}
+
/* Pretty Print instructions */
static void ppLoadImm ( HReg dst, ULong imm, Bool mode64 ) {
@@ -1722,6 +1736,17 @@ void ppPPCInstr ( PPCInstr* i, Bool mode
ppHRegPPC(i->Pin.AvLdVSCR.src);
return;
+ case Pin_VxQopFP:
+ vex_printf( "%s ",
+ showPPCVxFpOp( i->Pin.VxQopFP.op,
+ i->Pin.VxQopFP.d_prec ) );
+ ppHRegPPC(i->Pin.VxQopFP.xT);
+ vex_printf(",");
+ ppHRegPPC(i->Pin.VxQopFP.xA);
+ vex_printf(",");
+ ppHRegPPC(i->Pin.VxQopFP.xB);
+ return;
+
default:
vex_printf("\nppPPCInstr: No such tag(%d)\n", (Int)i->tag);
vpanic("ppPPCInstr");
@@ -1989,6 +2014,11 @@ void getRegUsage_PPCInstr ( HRegUsage* u
case Pin_AvLdVSCR:
addHRegUse(u, HRmRead, i->Pin.AvLdVSCR.src);
return;
+ case Pin_VxQopFP:
+ addHRegUse(u, HRmWrite, i->Pin.VxQopFP.xT);
+ addHRegUse(u, HRmRead, i->Pin.VxQopFP.xA);
+ addHRegUse(u, HRmRead, i->Pin.VxQopFP.xB);
+ return;
default:
ppPPCInstr(i, mode64);
@@ -2185,6 +2215,11 @@ void mapRegs_PPCInstr ( HRegRemap* m, PP
case Pin_AvLdVSCR:
mapReg(m, &i->Pin.AvLdVSCR.src);
return;
+ case Pin_VxQopFP:
+ mapReg(m, &i->Pin.VxQopFP.xT);
+ mapReg(m, &i->Pin.VxQopFP.xA);
+ mapReg(m, &i->Pin.VxQopFP.xB);
+ return;
default:
ppPPCInstr(i, mode64);
@@ -2220,6 +2255,13 @@ Bool isMove_PPCInstr ( PPCInstr* i, HReg
return True;
}
+ if (i->tag == Pin_AvUnary) {
+ if (i->Pin.AvUnary.op != Pav_COPY)
+ return False;
+ *src = i->Pin.AvUnary.src;
+ *dst = i->Pin.AvUnary.dst;
+ return True;
+ }
return False;
}
@@ -2639,6 +2681,21 @@ static UChar* mkFormVX ( UChar* p, UInt
return emit32(p, theInstr);
}
+static UChar* mkFormVX3 ( UChar* p, UInt opc1, UInt r1, UInt r2,
+ UInt r3, UInt opc2 )
+{
+ UInt theInstr;
+ vassert(opc1 == 0x3c);
+ vassert(r1 < 0x20);
+ vassert(r2 < 0x20);
+ vassert(r3 < 0x20);
+ vassert(opc2 < 0x3f3);
+
+ theInstr = ((opc1<<26) | (r1<<21) | (r2<<16) | (r3<<11) | (opc2)<<3);
+ return emit32(p, theInstr);
+}
+
+
static UChar* mkFormVXR ( UChar* p, UInt opc1, UInt r1, UInt r2,
UInt r3, UInt Rc, UInt opc2 )
{
@@ -3603,7 +3660,7 @@ Int emit_PPCInstr ( UChar* buf, Int nbuf
UInt v_src = vregNo(i->Pin.AvUnary.src);
UInt opc2;
switch (i->Pin.AvUnary.op) {
- case Pav_MOV: opc2 = 1156; break; // vor vD,vS,vS
+ case Pav_MOV: case Pav_COPY: opc2 = 1156; break; // vor vD,vS,vS
case Pav_NOT: opc2 = 1284; break; // vnor vD,vS,vS
case Pav_UNPCKH8S: opc2 = 526; break; // vupkhsb
case Pav_UNPCKH16S: opc2 = 590; break; // vupkhsh
@@ -3616,6 +3673,7 @@ Int emit_PPCInstr ( UChar* buf, Int nbuf
}
switch (i->Pin.AvUnary.op) {
case Pav_MOV:
+ case Pav_COPY:
case Pav_NOT:
p = mkFormVX( p, 4, v_dst, v_src, v_src, opc2 );
break;
@@ -3956,6 +4014,17 @@ Int emit_PPCInstr ( UChar* buf, Int nbuf
goto done;
}
+ case Pin_VxQopFP: {
+ UInt v_dst = vregNo(i->Pin.VxQopFP.xT);
+ UInt v_srcL = vregNo(i->Pin.VxQopFP.xA);
+ UInt v_srcR = vregNo(i->Pin.VxQopFP.xB);
+ if (i->Pin.VxQopFP.op == Pavfp_MADD) {
+ if (i->Pin.VxQopFP.d_prec)
+ p = mkFormVX3( p, 60, v_dst, v_srcL, v_srcR, 97 );
+ }
+ goto done;
+ }
+
default:
goto bad;
}
diff -paurX /home/mpj/diff_fileExclusionFilter ../../vg-tmp/VEX/priv/host_ppc_defs.h valgrind-P7-update/VEX/priv/host_ppc_defs.h
--- ../../vg-tmp/VEX/priv/host_ppc_defs.h 2011-05-16 15:51:02.000000000 -0500
+++ valgrind-P7-update/VEX/priv/host_ppc_defs.h 2011-05-11 10:04:50.000000000 -0500
@@ -408,6 +408,9 @@ typedef
/* Merge */
Pav_MRGHI, Pav_MRGLO,
+
+ /* For internal use */
+ Pav_COPY
}
PPCAvOp;
@@ -431,7 +434,12 @@ typedef
}
PPCAvFpOp;
+typedef enum {
+ Pavfp_MADD,
+} PPCVxFpOp;;
+
extern HChar* showPPCAvFpOp ( PPCAvFpOp );
+extern HChar* showPPCVxFpOp ( PPCVxFpOp op, Bool double_prec);
/* --------- */
@@ -485,7 +493,9 @@ typedef
Pin_AvShlDbl, /* AV shift-left double by imm */
Pin_AvSplat, /* One elem repeated throughout dst */
Pin_AvLdVSCR, /* mtvscr */
- Pin_AvCMov /* AV conditional move */
+ Pin_AvCMov, /* AV conditional move */
+
+ Pin_VxQopFP /* VSX vector qop floating point */
}
PPCInstrTag;
@@ -782,6 +792,14 @@ typedef
struct {
HReg src;
} AvLdVSCR;
+ /* */
+ struct {
+ PPCVxFpOp op;
+ Bool d_prec;
+ HReg xT;
+ HReg xA;
+ HReg xB;
+ } VxQopFP;
} Pin;
}
PPCInstr;
@@ -840,6 +858,10 @@ extern PPCInstr* PPCInstr_AvSplat ( U
extern PPCInstr* PPCInstr_AvCMov ( PPCCondCode, HReg dst, HReg src );
extern PPCInstr* PPCInstr_AvLdVSCR ( HReg src );
+extern PPCInstr* PPCInstr_VxQop(PPCVxFpOp op, Bool double_prec,
+ HReg XT, HReg XA, HReg XB);
+
+
extern void ppPPCInstr ( PPCInstr*, Bool mode64 );
/* Some functions that insulate the register allocator from details
diff -paurX /home/mpj/diff_fileExclusionFilter ../../vg-tmp/VEX/priv/host_ppc_isel.c valgrind-P7-update/VEX/priv/host_ppc_isel.c
--- ../../vg-tmp/VEX/priv/host_ppc_isel.c 2011-05-11 10:17:09.000000000 -0500
+++ valgrind-P7-update/VEX/priv/host_ppc_isel.c 2011-05-11 10:04:50.000000000 -0500
@@ -430,6 +430,15 @@ static PPCInstr* mk_iMOVds_RR ( HReg r_d
return PPCInstr_Alu(Palu_OR, r_dst, r_src, PPCRH_Reg(r_src));
}
+/* Make an vector reg-reg move. */
+
+static PPCInstr* mk_vMOV_RR ( HReg r_dst, HReg r_src )
+{
+ vassert(hregClass(r_dst) == hregClass(r_src));
+ vassert(hregClass(r_src) == HRcVec128);
+ return PPCInstr_AvUnary(Pav_COPY, r_dst, r_src);
+}
+
/* Advance/retreat %r1 by n. */
static void add_to_sp ( ISelEnv* env, UInt n )
@@ -3833,6 +3842,19 @@ static HReg iselVecExpr_wrk ( ISelEnv* e
} /* switch (e->Iex.Binop.op) */
} /* if (e->tag == Iex_Binop) */
+ if (e->tag == Iex_Qop) {
+ if (e->Iex.Qop.op == Iop_MAdd64Fx2) {
+ HReg xT = iselVecExpr(env, e->Iex.Qop.arg2);
+ HReg xA = iselVecExpr(env, e->Iex.Qop.arg3);
+ HReg xB = iselVecExpr(env, e->Iex.Qop.arg4);
+ HReg dst = newVRegV(env);
+ addInstr(env, mk_vMOV_RR( dst, xT ));
+ set_FPU_rounding_mode( env, e->Iex.Qop.arg1 );
+ addInstr(env, PPCInstr_VxQop(Pavfp_MADD, True, dst, xA, xB));
+ return dst;
+ }
+ }
+
if (e->tag == Iex_Const ) {
vassert(e->Iex.Const.con->tag == Ico_V128);
if (e->Iex.Const.con->Ico.V128 == 0x0000) {
>
> J
|
|
From: Julian S. <js...@ac...> - 2011-05-16 21:55:37
|
> + case Pin_VxQopFP: > + addHRegUse(u, HRmWrite, i->Pin.VxQopFP.xT); > + addHRegUse(u, HRmRead, i->Pin.VxQopFP.xA); > + addHRegUse(u, HRmRead, i->Pin.VxQopFP.xB); > + return; This isn't right. You also need to say that the insn reads xT. If you don't do that, bad things will happen -- the register allocator will re-use registers too early. If you add this line addHRegUse(u, HRmRead, i->Pin.VxQopFP.xT); does it help? J |
|
From: Maynard J. <may...@us...> - 2011-05-18 12:44:36
|
Julian Seward wrote: > >> + case Pin_VxQopFP: >> + addHRegUse(u, HRmWrite, i->Pin.VxQopFP.xT); >> + addHRegUse(u, HRmRead, i->Pin.VxQopFP.xA); >> + addHRegUse(u, HRmRead, i->Pin.VxQopFP.xB); >> + return; > > This isn't right. You also need to say that the insn reads xT. > If you don't do that, bad things will happen -- the register > allocator will re-use registers too early. If you add this > line > addHRegUse(u, HRmRead, i->Pin.VxQopFP.xT); > does it help? Unfortunately no. I hope to find time to take a deep dive today. Did you spot anything suspicious in the register allocation code? -Maynard > > J |
|
From: Julian S. <js...@ac...> - 2011-05-18 13:18:41
|
> > addHRegUse(u, HRmRead, i->Pin.VxQopFP.xT); > > > > does it help? > > Unfortunately no. I hope to find time to take a deep dive today. Did you > spot anything suspicious in the register allocation code? No, that looked ok. Another thing to triple check is the case in the insn emitter (emit_PPCInstr, or whatever it's called). I've found (at least for x86 and amd64 emission) that it's very easy to get these encodings wrong. J |