|
From: Carl E. L. <ce...@li...> - 2013-07-02 23:59:08
|
I am working on implementing Julian's first proposal for the
Transactional Memory instructions on PPC. Here is my test program:
htm_begin (int r3, int r4)
{
int ret;
if (__builtin_tbegin (0))
{
ret = r3;
__builtin_tend (0);
}
else
{
ret = r4;
}
return ret;
}
int main (void)
{
int ret = htm_begin (10, 20);
printf ("ret = %d, expected = 10\n", ret);
return 0;
}
Note for test purposes the value of ret is 10 if you do the tbegin/tend
path and ret is 20 if you don't.
Here is the power assembly for the htm_begin functiion:
0000000010001370 <.htm_begin>:
10001370: 7c 00 05 1d tbegin.
10001374: 40 82 00 0c bne 10001380 <.htm_begin+0x10>
10001378: 7c 83 23 78 mr r3,r4
1000137c: 4e 80 00 20 blr
10001380: 7c 00 05 5d tend.
10001384: 4e 80 00 20 blr
On input to the function, r3 is equal to 20. The function return value
is returned in r3 as well. The issue is that when I run the test case
on Valgrind it is executing the tbegin/tend code path copying 10 into r3
thus returning 10 from the function. It should jump to the second blr
at address 10001384 thus not changing r3 from the input.
In my Valgrind code, I convert the tbegin to an unconditional jump. The
target address for the unconditional jump is calculated as tgt =
0x10001380. I then have the following Valgrind code for executing the
unconditional jump:
/* The tbegin is being treated as an unconditonal jump to the
* failure handler which presumably will be a new basic block.
*/
dres->whatNext = Dis_ResteerU;
dres->jk_StopHere = Ijk_Boring;
dres->continueAt = tgt;
putGST( PPC_GST_CIA, mkSzImm(ty, tgt) ); /* update the PPC current instruction address register*/
return True;
In my Valgrind code, I set whatNext to Dis_ResteerU since we are taking
an unconditional branch. I set continueAt to the target address
0x10001380. I have tried setting jk_StopHere to Ijk_Call and to
Ijk_Boring. But I don't seem to be able to get Valgrind to do the
unconditional jump to the second blr and thus not execute the
tbegin/tend code. The move is executed setting r3 to 10 as reported by
the print in the test program.
I also get the following messages from Valgrind:
=70522== Conditional jump or move depends on uninitialised value(s)
==70522== at 0x10024F88: write (syscall-template.S:81)
==70522== by 0x1000DDA3: _IO_file_write (fileops.c:1254)
==70522== by 0x1001021B: _IO_file_overflow (fileops.c:860)
==70522== by 0x1000E77B: _IO_file_xsputn (fileops.c:1325)
==70522== by 0x1003D43B: vfprintf (vfprintf.c:1660)
==70522== by 0x10009D47: printf (printf.c:33)
==70522== by 0x10001137: main (in /home/carll/HTM/carll-htm.out)
==70522==
I assume I have something wrong with the dres structure values to get
Valgrind to start fetching and executing a new basic block. Please let
me know if you see what I am missing here to get Valgrind to do the
unconditional branch. Thanks.
Carl Love
|
|
From: Peter B. <be...@vn...> - 2013-07-03 03:06:39
|
On Tue, 2013-07-02 at 16:58 -0700, Carl E. Love wrote: > I am working on implementing Julian's first proposal for the > Transactional Memory instructions on PPC. Here is my test program: [snip] > Note for test purposes the value of ret is 10 if you do the tbegin/tend > path and ret is 20 if you don't. Correct, so if the tbegin is to fail, as it should for Julian's (1) proposal, we should see 20 being printed. > Here is the power assembly for the htm_begin functiion: > 0000000010001370 <.htm_begin>: > 10001370: 7c 00 05 1d tbegin. > 10001374: 40 82 00 0c bne 10001380 <.htm_begin+0x10> > 10001378: 7c 83 23 78 mr r3,r4 > 1000137c: 4e 80 00 20 blr > 10001380: 7c 00 05 5d tend. > 10001384: 4e 80 00 20 blr This code looks correct. Note that the failure handler is the "mr r3,r4" code. > On input to the function, r3 is equal to 20. This is not correct. On entry to htm_begin(), r3 (the first incoming arg register) should contain 10 and r4 should contain 20. You can see that by the call to htm_begin: int ret = htm_begin (10, 20); > The function return value > is returned in r3 as well. The issue is that when I run the test case > on Valgrind it is executing the tbegin/tend code path copying 10 into r3 > thus returning 10 from the function. It should jump to the second blr > at address 10001384 thus not changing r3 from the input. That is not correct. The tend. and the second blr are the "success" path for the tbegin. The "fail" path is the "mr r3,r4" and the first blr. Nothing in the code above branches to 0x10001384. The bne branches to 0x10001380 (the tend.), which is the success path. What is happening, is that r3 contains 10 on entering the function. Your tbegin. is then incorrectly branching to the success path and returning the unmodified r3 value (ie, 10). > In my Valgrind code, I convert the tbegin to an unconditional jump. The > target address for the unconditional jump is calculated as tgt = > 0x10001380. That is not correct. You are branching to the success path, not the failure handler. What should happen for a failing tbegin. is that you set cr0 to 0b0010 (ie, 0x2) and continue on to the following instruction at CIA+4 (ie, the bne). It is up to the assembler code that follows the tbegin. to branch to the failure code, not the tbegin. instruction itself. All you have to do is initialize cr0 to show that the tbegin. failed. > In my Valgrind code, I set whatNext to Dis_ResteerU since we are taking > an unconditional branch. I set continueAt to the target address > 0x10001380. I have tried setting jk_StopHere to Ijk_Call and to > Ijk_Boring. But I don't seem to be able to get Valgrind to do the > unconditional jump to the second blr and thus not execute the > tbegin/tend code. The move is executed setting r3 to 10 as reported by > the print in the test program. The problem isn't that it isn't branching. The problem is that it is branching. You don't want it to. As I mentioned in my earlier email and again above, all you need to do in the function that implements tbegin., is to set cr0 to 0b0010 (ie, 0x2) and then continue executing at the next instruction (CIA+4). The following branch will take care of getting you to the failure handler code. You will also eventually need to initialize the texasr register with some type of made up failure code, but that can wait until you have this simple test case working. I'll note that I am heading out on vacation tomorrow (the 3rd) and won't be back to work until the 15th, so if you need more help from me, it will have to wait until then. In the mean time, I will try and create a more complicated test case that checks the texasr for a persistent failure and either retries the transaction or bails out and send that to you before I go. Peter |
|
From: Peter B. <be...@vn...> - 2013-07-03 03:43:48
|
On Tue, 2013-07-02 at 22:06 -0500, Peter Bergner wrote:
> I'll note that I am heading out on vacation tomorrow (the 3rd) and won't
> be back to work until the 15th, so if you need more help from me, it
> will have to wait until then. In the mean time, I will try and create
> a more complicated test case that checks the texasr for a persistent
> failure and either retries the transaction or bails out and send that
> to you before I go.
Here's a more complicated test case, that tests the texasr for
a persistent failure and has a retry limit of 10 if it is not
persistent. You can get it and compile it on our igoo system.
On success on real power8 hardware, it should return 11 like
below. On failure, it should return 9.
Note that this test case requires you to implement the texasr
SPR and initialize it in the tbegin. valgrind function.
Peter
[bergner@igoo HTM]$ pwd
/home/bergner/HTM
[bergner@igoo HTM]$ cat carll-htm-2.c
extern int printf (const char *, ...);
#include <htmintrin.h>
int
__attribute__ ((noinline))
htm_begin (int r3)
{
int num_retries = 10;
while (1)
{
if (__builtin_tbegin (0))
{
r3++;
__builtin_tend (0);
break;
}
else
{
if (num_retries-- <= 0
|| _TEXASR_FAILURE_PERSISTENT (__builtin_get_texasr ()))
{
r3--;
break;
}
}
}
return r3;
}
int
main (void)
{
int ret = htm_begin (10);
printf ("ret = %d\n", ret);
return 0;
}
[bergner@igoo HTM]$ /home/bergner/gcc/install/gcc-fsf-mainline-htm/bin/gcc -O2 -mhtm -m64 -static -o carll-htm-2.out carll-htm-2.c
[bergner@igoo HTM]$ scp carll-htm-2.out power8:
carll-htm-2.out 100% 3783KB 3.7MB/s 00:01
[bergner@igoo HTM]$ ssh power8 ./carll-htm-2.out
ret = 11
|
|
From: Carl E. L. <ce...@li...> - 2013-07-03 15:30:01
|
On Tue, 2013-07-02 at 22:06 -0500, Peter Bergner wrote: > On Tue, 2013-07-02 at 16:58 -0700, Carl E. Love wrote: > > I am working on implementing Julian's first proposal for the > > Transactional Memory instructions on PPC. Here is my test program: > [snip] > > Note for test purposes the value of ret is 10 if you do the tbegin/tend > > path and ret is 20 if you don't. > > Correct, so if the tbegin is to fail, as it should for Julian's (1) > proposal, we should see 20 being printed. > > > > Here is the power assembly for the htm_begin functiion: > > 0000000010001370 <.htm_begin>: > > 10001370: 7c 00 05 1d tbegin. > > 10001374: 40 82 00 0c bne 10001380 <.htm_begin+0x10> > > 10001378: 7c 83 23 78 mr r3,r4 > > 1000137c: 4e 80 00 20 blr > > 10001380: 7c 00 05 5d tend. > > 10001384: 4e 80 00 20 blr > > This code looks correct. Note that the failure handler is the > "mr r3,r4" code. > > > > On input to the function, r3 is equal to 20. > > This is not correct. On entry to htm_begin(), r3 (the first incoming > arg register) should contain 10 and r4 should contain 20. You can > see that by the call to htm_begin: > > int ret = htm_begin (10, 20); > > OK, I have it backwards. In the proposal Julian was talking about changing the tbegin to a jump instruction and keep thinking along those lines. So, I thought I knew what the code was doing and didn't go back to rethink my understanding of the assembly code which was my real error. Argh! > > > The function return value > > is returned in r3 as well. The issue is that when I run the test case > > on Valgrind it is executing the tbegin/tend code path copying 10 into r3 > > thus returning 10 from the function. It should jump to the second blr > > at address 10001384 thus not changing r3 from the input. > > That is not correct. The tend. and the second blr are the "success" > path for the tbegin. The "fail" path is the "mr r3,r4" and the first > blr. Nothing in the code above branches to 0x10001384. The bne branches > to 0x10001380 (the tend.), which is the success path. What is happening, > is that r3 contains 10 on entering the function. Your tbegin. is then > incorrectly branching to the success path and returning the unmodified > r3 value (ie, 10). > > > > In my Valgrind code, I convert the tbegin to an unconditional jump. The > > target address for the unconditional jump is calculated as tgt = > > 0x10001380. > > That is not correct. You are branching to the success path, not the > failure handler. What should happen for a failing tbegin. is that > you set cr0 to 0b0010 (ie, 0x2) and continue on to the following > instruction at CIA+4 (ie, the bne). It is up to the assembler code > that follows the tbegin. to branch to the failure code, not the > tbegin. instruction itself. All you have to do is initialize cr0 > to show that the tbegin. failed. OK, the compiler is generating a branch if not equal on the cr0 value. So as you say, just setup the condition bit and let the code take care of its self. > > > > > In my Valgrind code, I set whatNext to Dis_ResteerU since we are taking > > an unconditional branch. I set continueAt to the target address > > 0x10001380. I have tried setting jk_StopHere to Ijk_Call and to > > Ijk_Boring. But I don't seem to be able to get Valgrind to do the > > unconditional jump to the second blr and thus not execute the > > tbegin/tend code. The move is executed setting r3 to 10 as reported by > > the print in the test program. > > The problem isn't that it isn't branching. The problem is that it is > branching. You don't want it to. As I mentioned in my earlier email > and again above, all you need to do in the function that implements > tbegin., is to set cr0 to 0b0010 (ie, 0x2) and then continue executing > at the next instruction (CIA+4). The following branch will take care > of getting you to the failure handler code. > > You will also eventually need to initialize the texasr register with > some type of made up failure code, but that can wait until you have > this simple test case working. I have created the three TM registers (TEXASR, TFIAR, TFIAR) and added support to the mtspr and mfspr instructions to access these registers. The next step after getting the tbegin and tend working is to start implementing the needed updates to these registers. > > I'll note that I am heading out on vacation tomorrow (the 3rd) and won't > be back to work until the 15th, so if you need more help from me, it > will have to wait until then. In the mean time, I will try and create > a more complicated test case that checks the texasr for a persistent > failure and either retries the transaction or bails out and send that > to you before I go. > > > Peter Thanks for setting me straight on the assembly code. |
|
From: Julian S. <js...@ac...> - 2013-07-03 16:29:01
|
> OK, I have it backwards. In the proposal Julian was talking about > changing the tbegin to a jump instruction and keep thinking along those > lines. Yeah. So (judging by Peter's comments) the confusion arises because on Power, the failure code follows the tbegin immediately. Whereas I was talking about the Intel scheme, where the address of the failure code is in a register that is an operand to the XBEGIN instruction, hence a jump really is necessary. Anyway .. sounds like you have a first implementation working, yes? J |
|
From: Carl E. L. <ce...@li...> - 2013-07-03 17:34:54
|
On Wed, 2013-07-03 at 18:29 +0200, Julian Seward wrote:
> > OK, I have it backwards. In the proposal Julian was talking about
> > changing the tbegin to a jump instruction and keep thinking along those
> > lines.
>
> Yeah. So (judging by Peter's comments) the confusion arises because on
> Power, the failure code follows the tbegin immediately. Whereas I was
> talking about the Intel scheme, where the address of the failure code
> is in a register that is an operand to the XBEGIN instruction, hence a
> jump really is necessary.
>
> Anyway .. sounds like you have a first implementation working, yes?
>
> J
>
I changed my implementation to change the condition code and then allow
the following branch to jump to the failure path. I got that to work and
verified it took the failure path. I have the underlying support to
update the specific TM reporting registers. Now I need to actually add
the code to do the specific register updates to the tbegin instruction
decoding. That should be fairly easy. I have only done the tbegin and
tend instruction decoding. I will need to add the decoding of other
instructions (suspend, restart, abort) but I think they should all be no
ops as we will be executing the failure path. Should be fast and easy as
well.
But yes I have a very simplistic first implementation of proposal 1) to
just execute the failure path working on the simple example program I
posted. Peter and I will need to talk a bit more about how the compiler
will be generating the code to make sure we are in sync. But I will do
that when he gets back from vacation. Note, I will be off Thursday and
Friday of this week for the Fourth of July national holiday. Peter said
we will still need to implement your second solution as the first one is
only a partial solution. Not sure why, I need to review his message
again and maybe get clarification as to the reasons.
Carl Love
|
|
From: Julian S. <js...@ac...> - 2013-07-03 18:41:39
|
> Peter said > we will still need to implement your second solution as the first one is > only a partial solution. Not sure why, I need to review his message > again and maybe get clarification as to the reasons. The reason is (I think) that forcing the program onto the failure path only works provided a failure path actually exists. In some circumstances with the power hardware (maybe) and with the s390 h/w (definitely), the hardware can guarantee that the transaction will never fail, so no fallback (failure) path needs to be provided. J |
|
From: Carl E. L. <ce...@li...> - 2013-07-11 14:33:52
|
Julian:
I have the following patch for the Power PC to implement that implements you first suggested
approach for handling the Transactional Memory instructions. I just wanted to throw it out there for
people to look at and comment on. I am working on implementing you second suggestion. I have tested
this patch with a very simple TM example as given below. The patch causes the execution flow to take
the TM failure path as expected. The compiler is generating a branch if not equal to decide if it
should take the TM path or the failure path. For now, the Valgrind patch just assumes the compiler
will always generate the branch if not equal instruction to take one of the two paths. Furthermore,
it is assumed there will always be a failure path. I need to talk with Peter when he gets back about
the code generated by the compiler to determine if the compiler might generate different code
sequences.
Again, this patch is just for discussion as to how the first suggested TM approach might be implemented
on Power.
Carl Love
-------------------------------------------------------------------------------------------------
Test case:
#include <stdio.h>
int
__attribute__ ((noinline))
htm_begin (int r3, int r4)
{
int ret;
if (__builtin_tbegin (0))
{
ret = r3;
__builtin_tend (0);
}
else
{
ret = r4;
}
return ret;
}
int main (void)
{
int ret;
ret = htm_begin (10, 20);
printf ("ret = %d, expected = 10\n", ret);
return 0;
}
-------------------------------------------------------------------------------------------------
Power PC, add Transactional Memory instruction support
The following Transactional Memory instructions are added:
tbegin., tend., tsr., tcheck., tabortwc.,
tabortdc., tabortwci., tabortdci., tabort.
The patch implements the first proposal by Julian on how to handle the
TM instructions. The assumption is that there is always an error handler
for the tbegin instruction. The tbegin support modifies the condition code
register to set the TM failure thus causing the conditional branch instruction
that follows the tbegin to take the failure path. The other TM instructions
are all treated as no ops as we shouldn't be executing the transactiona
code path.
Signed-off-by: Carl Love <ce...@us...>
---
VEX/priv/guest_ppc_helpers.c | 10 +-
VEX/priv/guest_ppc_toIR.c | 272 ++++++++++++++++++++++++++++++++++++++++++-
VEX/pub/libvex_guest_ppc32.h | 8 +-
VEX/pub/libvex_guest_ppc64.h | 10 +-
4 files changed, 292 insertions(+), 8 deletions(-)
diff --git a/VEX/priv/guest_ppc_helpers.c b/VEX/priv/guest_ppc_helpers.c
index f320149..c326453 100644
--- a/VEX/priv/guest_ppc_helpers.c
+++ b/VEX/priv/guest_ppc_helpers.c
@@ -511,7 +511,9 @@ void LibVEX_GuestPPC32_initialise ( /*OUT*/VexGuestPPC32State* vex_state )
vex_state->guest_IP_AT_SYSCALL = 0;
vex_state->guest_SPRG3_RO = 0;
- vex_state->padding = 0;
+ vex_state->padding1 = 0;
+ vex_state->padding2 = 0;
+ vex_state->padding3 = 0;
}
@@ -676,10 +678,14 @@ void LibVEX_GuestPPC64_initialise ( /*OUT*/VexGuestPPC64State* vex_state )
vex_state->guest_IP_AT_SYSCALL = 0;
vex_state->guest_SPRG3_RO = 0;
+ vex_state->guest_TFHAR = 0xa1b2c3d4;
+ vex_state->guest_TFIAR = 0xf7e6d5c4;
+ vex_state->guest_TEXASR = 0x1f2e3d4c5b6a7988;
- vex_state->padding2 = 0;
+ /* vex_state->padding2 = 0;
vex_state->padding3 = 0;
vex_state->padding4 = 0;
+ */
}
diff --git a/VEX/priv/guest_ppc_toIR.c b/VEX/priv/guest_ppc_toIR.c
index d46512a..7b98535 100644
--- a/VEX/priv/guest_ppc_toIR.c
+++ b/VEX/priv/guest_ppc_toIR.c
@@ -232,7 +232,9 @@ static void* fnptr_to_fnentry( VexAbiInfo* vbi, void* f )
#define OFFB_TILEN offsetofPPCGuestState(guest_TILEN)
#define OFFB_NRADDR offsetofPPCGuestState(guest_NRADDR)
#define OFFB_NRADDR_GPR2 offsetofPPCGuestState(guest_NRADDR_GPR2)
-
+#define OFFB_TFHAR offsetofPPCGuestState(guest_TFHAR)
+#define OFFB_TEXASR offsetofPPCGuestState(guest_TEXASR)
+#define OFFB_TFIAR offsetofPPCGuestState(guest_TFIAR)
/*------------------------------------------------------------*/
/*--- Extract instruction fields --- */
@@ -378,6 +380,9 @@ typedef enum {
PPC_GST_TILEN, // For icbi: length of area to invalidate
PPC_GST_IP_AT_SYSCALL, // the CIA of the most recently executed SC insn
PPC_GST_SPRG3_RO, // SPRG3
+ PPC_GST_TFHAR, // Transactional Failure Handler Address Register
+ PPC_GST_TFIAR, // Transactional Failure Instruction Address Register
+ PPC_GST_TEXASR, // Transactional EXception And Summary Register
PPC_GST_MAX
} PPC_GST;
@@ -1308,6 +1313,12 @@ static IRExpr* getVReg ( UInt archreg )
static void putVReg ( UInt archreg, IRExpr* e )
{
vassert(archreg < 32);
+ if (typeOfIRExpr(irsb->tyenv, e) != Ity_V128) {
+ vex_printf("putVReg type of expr is ");
+ ppIRType(typeOfIRExpr(irsb->tyenv, e) );
+ vex_printf(" not Ity_V128 as expected.\n");
+ }
+
vassert(typeOfIRExpr(irsb->tyenv, e) == Ity_V128);
stmt( IRStmt_Put(vectorGuestRegOffset(archreg), e) );
}
@@ -2530,6 +2541,15 @@ static IRExpr* /* :: Ity_I32/64 */ getGST ( PPC_GST reg )
binop( Iop_Shl32, getXER_CA32(), mkU8(29)),
getXER_BC32()));
+ case PPC_GST_TFHAR:
+ return IRExpr_Get( OFFB_TFHAR, ty );
+
+ case PPC_GST_TEXASR:
+ return IRExpr_Get( OFFB_TEXASR, ty );
+
+ case PPC_GST_TFIAR:
+ return IRExpr_Get( OFFB_TFIAR, ty );
+
default:
vex_printf("getGST(ppc): reg = %u", reg);
vpanic("getGST(ppc)");
@@ -2691,6 +2711,18 @@ static void putGST ( PPC_GST reg, IRExpr* src )
stmt( IRStmt_Put( OFFB_TILEN, src) );
break;
+ case PPC_GST_TEXASR:
+ vassert( ty_src == ty );
+ stmt( IRStmt_Put( OFFB_TEXASR, src ) );
+ break;
+ case PPC_GST_TFIAR:
+ vassert( ty_src == ty );
+ stmt( IRStmt_Put( OFFB_TFIAR, src ) );
+ break;
+ case PPC_GST_TFHAR:
+ vassert( ty_src == ty );
+ stmt( IRStmt_Put( OFFB_TFHAR, src ) );
+ break;
default:
vex_printf("putGST(ppc): reg = %u", reg);
vpanic("putGST(ppc)");
@@ -3007,6 +3039,50 @@ static IRTemp getNegatedResult_32(IRTemp intermediateResult)
}
/*------------------------------------------------------------*/
+/* Transactional memory helpers
+ *
+ *------------------------------------------------------------*/
+
+static unsigned long long generate_TMreason( UInt failure_code,
+ UInt persistant,
+ UInt nest_overflow,
+ UInt tm_exact )
+{
+ unsigned long long tm_err_code =
+ ( (ULong) 0) << (63-6) /* Failure code */
+ | ( (ULong) persistant) << (63-7) /* Failure persistant */
+ | ( (ULong) 0) << (63-8) /* Disallowed */
+ | ( (ULong) nest_overflow) << (63-9) /* Nesting Overflow */
+ | ( (ULong) 0) << (63-10) /* Footprint Overflow */
+ | ( (ULong) 0) << (63-11) /* Self-Induced Conflict */
+ | ( (ULong) 0) << (63-12) /* Non-Transactional Conflict */
+ | ( (ULong) 0) << (63-13) /* Transactional Conflict */
+ | ( (ULong) 0) << (63-14) /* Translation Invalidation Conflict */
+ | ( (ULong) 0) << (63-15) /* Implementation-specific */
+ | ( (ULong) 0) << (63-16) /* Instruction Fetch Conflict */
+ | ( (ULong) 0) << (63-30) /* Reserved */
+ | ( (ULong) 0) << (63-31) /* Abort */
+ | ( (ULong) 0) << (63-32) /* Suspend */
+ | ( (ULong) 0) << (63-33) /* Reserved */
+ | ( (ULong) 0) << (63-35) /* Privilege */
+ | ( (ULong) 0) << (63-36) /* Failure Summary */
+ | ( (ULong) tm_exact) << (63-37) /* TFIAR Exact */
+ | ( (ULong) 0) << (63-38) /* ROT */
+ | ( (ULong) 0) << (63-51) /* Reserved */
+ | ( (ULong) 0) << (63-63); /* Transaction Level */
+
+ return tm_err_code;
+}
+
+static void storeTMfailure( Addr64 err_address, ULong tm_reason,
+ Addr64 handler_address )
+{
+ putGST( PPC_GST_TFIAR, mkU64( err_address ) );
+ putGST( PPC_GST_TEXASR, mkU64( tm_reason ) );
+ putGST( PPC_GST_TFHAR, mkU64( handler_address ) );
+}
+
+/*------------------------------------------------------------*/
/*--- Integer Instruction Translation --- */
/*------------------------------------------------------------*/
@@ -6652,6 +6728,18 @@ static Bool dis_proc_ctl ( VexAbiInfo* vbi, UInt theInstr )
DIP("mfctr r%u\n", rD_addr);
putIReg( rD_addr, getGST( PPC_GST_CTR ) );
break;
+ case 0x80: // 128
+ DIP("mfspr r%u (TFHAR)\n", rD_addr);
+ putIReg( rD_addr, getGST( PPC_GST_TFHAR) );
+ break;
+ case 0x81: // 129
+ DIP("mfspr r%u (TFIAR)\n", rD_addr);
+ putIReg( rD_addr, getGST( PPC_GST_TFIAR) );
+ break;
+ case 0x82: // 130
+ DIP("mfspr r%u (TEXASR)\n", rD_addr);
+ putIReg( rD_addr, getGST( PPC_GST_TEXASR) );
+ break;
case 0x100:
DIP("mfvrsave r%u\n", rD_addr);
putIReg( rD_addr, mkWidenFrom32(ty, getGST( PPC_GST_VRSAVE ),
@@ -6796,7 +6884,18 @@ static Bool dis_proc_ctl ( VexAbiInfo* vbi, UInt theInstr )
DIP("mtvrsave r%u\n", rS_addr);
putGST( PPC_GST_VRSAVE, mkNarrowTo32(ty, mkexpr(rS)) );
break;
-
+ case 0x80: // 128
+ DIP("mtspr r%u (TFHAR)\n", rS_addr);
+ putGST( PPC_GST_TFHAR, mkexpr(rS) );
+ break;
+ case 0x81: // 129
+ DIP("mtspr r%u (TFIAR)\n", rS_addr);
+ putGST( PPC_GST_TFIAR, mkexpr(rS) );
+ break;
+ case 0x82: // 130
+ DIP("mtspr r%u (TEXASR)\n", rS_addr);
+ putGST( PPC_GST_TEXASR, mkexpr(rS) );
+ break;
default:
vex_printf("dis_proc_ctl(ppc)(mtspr,SPR)(%u)\n", SPR);
return False;
@@ -17314,6 +17413,164 @@ static Bool dis_av_fp_convert ( UInt theInstr )
return True;
}
+static Bool dis_transactional_memory ( UInt theInstr, UInt nextInstr,
+ VexAbiInfo* vbi,
+ /*OUT*/DisResult* dres,
+ Bool (*resteerOkFn)(void*,Addr64),
+ void* callback_opaque )
+{
+ UInt opc2 = IFIELD( theInstr, 1, 10 );
+ UInt opc1_next = ifieldOPC(nextInstr);
+
+ switch (opc2) {
+ case 0x28E: { //tbegin.
+ /* The current implementation is to just fail the tbegin and execute
+ * the failure path. The failure path is assumed to be functionaly
+ * equivalent to the transactional path with the needed data locking
+ * to ensure correctness. The tend is just a noop and shouldn't
+ * actually get executed. The instruction following the tbegin is
+ * expected to be a branch to the failure path.
+ */
+ UInt R = IFIELD( theInstr, 21, 1 );
+
+ DIP("tbegin. %d\n", R);
+ if (opc1_next == 0x10) { // conditional branch
+ ULong tm_reason;
+ UInt failure_code = 0; /* Forcing failure, will not be due to tabort
+ * or treclaim.
+ */
+ UInt persistant = 1; /* set persistant since we are always failing
+ * the tbegin.
+ */
+ UInt nest_overflow = 1; /* Alowed nesting depth overflow, we use this
+ as the reason for failing the trasaction */
+ UInt tm_exact = 1; /* have exact address for failure */
+ UChar flag_AA = ifieldBIT1(nextInstr);
+ UInt BD_u16 = ifieldUIMM16(nextInstr) & 0xFFFFFFFC; /* mask off */
+ Addr64 failure_tgt = 0;
+ IRType ty = mode64 ? Ity_I64 : Ity_I32;
+
+ /* Get the address of the failure handler from the conditional
+ * branch in the next instruction location.
+ */
+ if ( flag_AA )
+ failure_tgt = mkSzAddr( ty, extend_s_16to64( BD_u16 ) );
+ else
+ failure_tgt = mkSzAddr( ty, guest_CIA_curr_instr +
+ (Long)extend_s_16to64( BD_u16 ) );
+
+ /* Set the CR0 field to indicate the tbegin failed. Then let
+ * the code do the branch to the failure path.
+ *
+ * 000 || 0 Transaction initiation successful,
+ * unnested (Transaction state of
+ * Non-transactional prior to tbegin.)
+ * 010 || 0 Transaction initiation successful, nested
+ * (Transaction state of Transactional
+ * prior to tbegin.)
+ * 001 || 0 Transaction initiation unsuccessful,
+ * (Transaction state of Suspended prior
+ * to tbegin.)
+ */
+ if (mode64)
+ /* 0x0010 takes transactional path */
+ /* 0x0000 takes the failure path */
+ set_CR0(mkU64(0x0000));
+ else
+ set_CR0(mkU32(0x0000));
+
+ tm_reason = generate_TMreason( failure_code, persistant,
+ nest_overflow, tm_exact );
+ storeTMfailure( guest_CIA_curr_instr, tm_reason, failure_tgt );
+ return True;
+
+ } else {
+ vex_printf("dis_transactional_memory(ppc): tbegin not followed by a conditional branch instruction, instruction 0x%x\n", nextInstr);
+ return False;
+ }
+ break;
+ }
+
+ case 0x2AE: { //tend.
+ /* The tend. is just a noop. Do nothing */
+ UInt A = IFIELD( theInstr, 25, 1 );
+
+ DIP("tend. %d\n", A);
+ break;
+ }
+
+ case 0x2EE: { //tsr.
+ /* The tsr. is just a noop. Do nothing */
+ UInt L = IFIELD( theInstr, 21, 1 );
+
+ DIP("tsr. %d\n", L);
+ break;
+ }
+
+ case 0x2CE: { //tcheck.
+ /* The tcheck. is just a noop. Do nothing */
+ UInt BF = IFIELD( theInstr, 25, 1 );
+
+ DIP("tcheck. %d\n", BF);
+ break;
+ }
+
+ case 0x30E: { //tbortwc.
+ /* The tabortwc. is just a noop. Do nothing */
+ UInt TO = IFIELD( theInstr, 25, 1 );
+ UInt RA = IFIELD( theInstr, 16, 5 );
+ UInt RB = IFIELD( theInstr, 11, 5 );
+
+ DIP("tabortwc. %d,%d,%d\n", TO, RA, RB);
+ break;
+ }
+
+ case 0x32E: { //tbortdc.
+ /* The tabortdc. is just a noop. Do nothing */
+ UInt TO = IFIELD( theInstr, 25, 1 );
+ UInt RA = IFIELD( theInstr, 16, 5 );
+ UInt RB = IFIELD( theInstr, 11, 5 );
+
+ DIP("tabortdc. %d,%d,%d\n", TO, RA, RB);
+ break;
+ }
+
+ case 0x34E: { //tbortwci.
+ /* The tabortwci. is just a noop. Do nothing */
+ UInt TO = IFIELD( theInstr, 25, 1 );
+ UInt RA = IFIELD( theInstr, 16, 5 );
+ UInt SI = IFIELD( theInstr, 11, 5 );
+
+ DIP("tabortwci. %d,%d,%d\n", TO, RA, SI);
+ break;
+ }
+
+ case 0x36E: { //tbortdci.
+ /* The tabortdci. is just a noop. Do nothing */
+ UInt TO = IFIELD( theInstr, 25, 1 );
+ UInt RA = IFIELD( theInstr, 16, 5 );
+ UInt SI = IFIELD( theInstr, 11, 5 );
+
+ DIP("tabortdci. %d,%d,%d\n", TO, RA, SI);
+ break;
+ }
+
+ case 0x38E: { //tbort.
+ /* The tabort. is just a noop. Do nothing */
+ UInt RA = IFIELD( theInstr, 16, 5 );
+
+ DIP("tabort. %d\n", RA);
+ break;
+ }
+
+ default:
+ vex_printf("dis_transactional_memory(ppc): unrecognized instruction\n");
+ return False;
+ }
+
+ return True;
+}
+
/* The 0x3C primary opcode (VSX category) uses several different forms of
* extended opcodes:
@@ -18403,6 +18660,17 @@ DisResult disInstr_PPC_WRK (
if (dis_int_logic( theInstr )) goto decode_success;
goto decode_failure;
+ case 0x28E: case 0x2AE: // tbegin., tend.
+ case 0x2EE: case 0x2CE: case 0x30E: // tsr., tcheck., tabortwc.
+ case 0x32E: case 0x34E: case 0x36E: // tabortdc., tabortwci., tabortdci.
+ case 0x38E: // tabort.
+ if (dis_transactional_memory( theInstr,
+ getUIntBigendianly( (UChar*)(&guest_code[delta + 4])),
+ abiinfo, &dres,
+ resteerOkFn, callback_opaque))
+ goto decode_success;
+ goto decode_failure;
+
/* 64bit Integer Logical Instructions */
case 0x3DA: case 0x03A: // extsw, cntlzd
if (!mode64) goto decode_failure;
diff --git a/VEX/pub/libvex_guest_ppc32.h b/VEX/pub/libvex_guest_ppc32.h
index d90b7d3..e07e80d 100644
--- a/VEX/pub/libvex_guest_ppc32.h
+++ b/VEX/pub/libvex_guest_ppc32.h
@@ -238,8 +238,14 @@ typedef
threading on AIX. */
/* 1352 */ UInt guest_SPRG3_RO;
+ /* 1360 */ ULong guest_TFHAR; // Transaction Failure Handler Address Register
+ /* 1368 */ ULong guest_TEXASR; // Transaction EXception And Summary Register
+ /* 1376 */ ULong guest_TFIAR; // Transaction Failure Instruction Address Register
+
/* Padding to make it have an 8-aligned size */
- /* 1356 */ UInt padding;
+ /* 1380 */ UInt padding1;
+ /* 1384 */ UInt padding2;
+ /* 1388 */ UInt padding3;
}
VexGuestPPC32State;
diff --git a/VEX/pub/libvex_guest_ppc64.h b/VEX/pub/libvex_guest_ppc64.h
index 1c9502c..531e92d 100644
--- a/VEX/pub/libvex_guest_ppc64.h
+++ b/VEX/pub/libvex_guest_ppc64.h
@@ -279,11 +279,15 @@ typedef
threading on AIX. */
/* 1648 */ ULong guest_SPRG3_RO;
+ /* 1656 */ ULong guest_TFHAR; // Transaction Failure Handler Address Register
+ /* 1684 */ ULong guest_TEXASR; // Transaction EXception And Summary Register
+ /* 1692 */ ULong guest_TFIAR; // Transaction Failure Instruction Address Register
+
/* offsets in comments are wrong ..*/
/* Padding to make it have an 16-aligned size */
- /* 1656 */ ULong padding2;
- /* 16XX */ ULong padding3;
- /* 16XX */ ULong padding4;
+ /* 1656 ULong padding2; */
+ /* 16XX ULong padding3; */
+ /* 16XX ULong padding4; */
}
VexGuestPPC64State;
--
1.7.12.rc1.22.gbfbf4d4
|
|
From: Roland M. <rol...@nr...> - 2013-07-11 15:08:20
|
On Thu, Jul 11, 2013 at 4:33 PM, Carl E. Love <ce...@li...> wrote: > I have the following patch for the Power PC to implement that implements you first suggested > approach for handling the Transactional Memory instructions. I just wanted to throw it out there for > people to look at and comment on. I am working on implementing you second suggestion. I have tested > this patch with a very simple TM example as given below. The patch causes the execution flow to take > the TM failure path as expected. The compiler is generating a branch if not equal to decide if it > should take the TM path or the failure path. For now, the Valgrind patch just assumes the compiler > will always generate the branch if not equal instruction to take one of the two paths. Furthermore, > it is assumed there will always be a failure path. I need to talk with Peter when he gets back about > the code generated by the compiler to determine if the compiler might generate different code > sequences. Erm... you don't only have to deal with compiler-generated code... I expect that some userland code (e.g. futex or similar userland mutex/lock/barrier code) will make use of this using hand-crafted assembler... ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) rol...@nr... \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) |
|
From: Peter B. <be...@vn...> - 2013-07-15 04:51:01
|
On Thu, 2013-07-11 at 07:33 -0700, Carl E. Love wrote:
> The patch causes the execution flow to take
> the TM failure path as expected. The compiler is generating a branch if not equal to decide if it
> should take the TM path or the failure path. For now, the Valgrind patch just assumes the compiler
> will always generate the branch if not equal instruction to take one of the two paths. Furthermore,
> it is assumed there will always be a failure path. I need to talk with Peter when he gets back about
> the code generated by the compiler to determine if the compiler might generate different code
> sequences.
The compiler is free to and actually does generate either a beq or a bne
depending on the circumstances. Specifically, the code it can generate
can look like:
tbegin.
...
beq <failure handler>
// fall-through success handler
or
tbegin.
....
bne <success handler>
// fall-through failure handler
Note that the "...." above denotes possible instructions that
might be placed between the tbegin. and the conditional branch,
so you cannot assume the conditional branch immediately follows
the tbegin. Luckily, you don't need to look at the branch at
all. You are only looking at it to compute the address of
the failure handler, but that is not correct. The address of
the failure handler is stored in the TFHAR register and the
tbegin. initializes it to CIA+4 (ie, the next instruction).
> + case 0x80: // 128
> + DIP("mfspr r%u (TFHAR)\n", rD_addr);
> + putIReg( rD_addr, getGST( PPC_GST_TFHAR) );
> + break;
> + case 0x81: // 129
> + DIP("mfspr r%u (TFIAR)\n", rD_addr);
> + putIReg( rD_addr, getGST( PPC_GST_TFIAR) );
> + break;
> + case 0x82: // 130
> + DIP("mfspr r%u (TEXASR)\n", rD_addr);
> + putIReg( rD_addr, getGST( PPC_GST_TEXASR) );
> + break;
Note that the texasr is a 64-bit register in both 32-bit
and 64-bit modes. In 32-bit mode, the "mfspr TEXASR,rX"
should just place the lower 32-bits of the texasr into
rX. Is that what this code ends up doing?
You are also missing the "mfspr TEXASRU,rX", which is
used by 32-bit code to get the upper 32-bits of the
texasr into rX.
> + case 0x80: // 128
> + DIP("mtspr r%u (TFHAR)\n", rS_addr);
> + putGST( PPC_GST_TFHAR, mkexpr(rS) );
> + break;
> + case 0x81: // 129
> + DIP("mtspr r%u (TFIAR)\n", rS_addr);
> + putGST( PPC_GST_TFIAR, mkexpr(rS) );
> + break;
> + case 0x82: // 130
> + DIP("mtspr r%u (TEXASR)\n", rS_addr);
> + putGST( PPC_GST_TEXASR, mkexpr(rS) );
> + break;
Ditto here.
> + DIP("tbegin. %d\n", R);
> + if (opc1_next == 0x10) { // conditional branch
[snip]
As mentional above, there is no need to look at the following
branch, so your "if (opc1_next == 0x10)" test can be removed.
Just unconditionally execute the then clause code and remove
the error code in the else clause.
> + /* Get the address of the failure handler from the conditional
> + * branch in the next instruction location.
> + */
> + if ( flag_AA )
> + failure_tgt = mkSzAddr( ty, extend_s_16to64( BD_u16 ) );
> + else
> + failure_tgt = mkSzAddr( ty, guest_CIA_curr_instr +
> + (Long)extend_s_16to64( BD_u16 ) );
To be pedantic, the address of the failure handler is equal to
the address that is in the TFHAR register, not the address from
the conditional branch. That is especially true given that the
compiler is free to generate either:
tbegin.
...
beq <failure handler>
// fall-through success handler
or
tbegin.
....
bne <success handler>
// fall-through failure handler
Remember that tbegin. initializes the TFHAR to CIA+4, so you
just have to set failure_tgt to the address of nextInstr.
> + /* Set the CR0 field to indicate the tbegin failed. Then let
> + * the code do the branch to the failure path.
> + *
> + * 000 || 0 Transaction initiation successful,
> + * unnested (Transaction state of
> + * Non-transactional prior to tbegin.)
> + * 010 || 0 Transaction initiation successful, nested
> + * (Transaction state of Transactional
> + * prior to tbegin.)
> + * 001 || 0 Transaction initiation unsuccessful,
> + * (Transaction state of Suspended prior
> + * to tbegin.)
> + */
> + if (mode64)
> + /* 0x0010 takes transactional path */
> + /* 0x0000 takes the failure path */
> + set_CR0(mkU64(0x0000));
> + else
> + set_CR0(mkU32(0x0000));
Your comment is correct, but you are incorrectly clearing cr0,
which signifies the tbegin. succeeded. You need to initialize
it to 0x2.
Stylistically, using "0x0000" looks like you're trying to stuff
4 4-bit nibbles into cr0, but a cr register can only hold 4-bits
(ie, 1 nibble). You could use 0x2 or 0b0010, which both look more
like just 1 nibble of data. Then again, I don't know the valgrind
code formatting rules and maybe that is how things are written?
If so, just ignore this comment of mine.
Peter
|
|
From: Carl E. L. <ce...@li...> - 2013-07-15 16:17:02
|
On Sun, 2013-07-14 at 23:50 -0500, Peter Bergner wrote:
> On Thu, 2013-07-11 at 07:33 -0700, Carl E. Love wrote:
> > The patch causes the execution flow to take
> > the TM failure path as expected. The compiler is generating a branch if not equal to decide if it
> > should take the TM path or the failure path. For now, the Valgrind patch just assumes the compiler
> > will always generate the branch if not equal instruction to take one of the two paths. Furthermore,
> > it is assumed there will always be a failure path. I need to talk with Peter when he gets back about
> > the code generated by the compiler to determine if the compiler might generate different code
> > sequences.
>
Remember, this patch was for the proposal to just replace the tbegin
with a branch to execute the failure path for the TM. The tbegin and
tend instructions would not actually get executed. Thus we would need
to go find the address for the failure handler and can not get it from
the TM registers. So, your comments below really show the issues with
trying to make this work, i.e. we can't rely on the branch being the
next instruction and we have to handle the compiler generating code
using either beq and bne instructions. The patch is just a proof of
concept patch for 64-bit mode only. I didn't worry about the additional
complexity of handling 32-bit mode as well. But this gives us an idea
of what the issues with the first proposal.
I will keep working on the second proposal from Julian where we do let
the CPU execute the TM instructions then pull the needed failure path
address from the TM register.
> The compiler is free to and actually does generate either a beq or a bne
> depending on the circumstances. Specifically, the code it can generate
> can look like:
>
> tbegin.
> ...
> beq <failure handler>
> // fall-through success handler
>
> or
>
> tbegin.
> ....
> bne <success handler>
> // fall-through failure handler
>
> Note that the "...." above denotes possible instructions that
> might be placed between the tbegin. and the conditional branch,
> so you cannot assume the conditional branch immediately follows
> the tbegin. Luckily, you don't need to look at the branch at
> all. You are only looking at it to compute the address of
> the failure handler, but that is not correct. The address of
> the failure handler is stored in the TFHAR register and the
> tbegin. initializes it to CIA+4 (ie, the next instruction).
>
>
> > + case 0x80: // 128
> > + DIP("mfspr r%u (TFHAR)\n", rD_addr);
> > + putIReg( rD_addr, getGST( PPC_GST_TFHAR) );
> > + break;
> > + case 0x81: // 129
> > + DIP("mfspr r%u (TFIAR)\n", rD_addr);
> > + putIReg( rD_addr, getGST( PPC_GST_TFIAR) );
> > + break;
> > + case 0x82: // 130
> > + DIP("mfspr r%u (TEXASR)\n", rD_addr);
> > + putIReg( rD_addr, getGST( PPC_GST_TEXASR) );
> > + break;
>
> Note that the texasr is a 64-bit register in both 32-bit
> and 64-bit modes. In 32-bit mode, the "mfspr TEXASR,rX"
> should just place the lower 32-bits of the texasr into
> rX. Is that what this code ends up doing?
>
> You are also missing the "mfspr TEXASRU,rX", which is
> used by 32-bit code to get the upper 32-bits of the
> texasr into rX.
>
>
>
> > + case 0x80: // 128
> > + DIP("mtspr r%u (TFHAR)\n", rS_addr);
> > + putGST( PPC_GST_TFHAR, mkexpr(rS) );
> > + break;
> > + case 0x81: // 129
> > + DIP("mtspr r%u (TFIAR)\n", rS_addr);
> > + putGST( PPC_GST_TFIAR, mkexpr(rS) );
> > + break;
> > + case 0x82: // 130
> > + DIP("mtspr r%u (TEXASR)\n", rS_addr);
> > + putGST( PPC_GST_TEXASR, mkexpr(rS) );
> > + break;
>
> Ditto here.
>
>
>
> > + DIP("tbegin. %d\n", R);
> > + if (opc1_next == 0x10) { // conditional branch
> [snip]
>
> As mentional above, there is no need to look at the following
> branch, so your "if (opc1_next == 0x10)" test can be removed.
> Just unconditionally execute the then clause code and remove
> the error code in the else clause.
>
>
> > + /* Get the address of the failure handler from the conditional
> > + * branch in the next instruction location.
> > + */
> > + if ( flag_AA )
> > + failure_tgt = mkSzAddr( ty, extend_s_16to64( BD_u16 ) );
> > + else
> > + failure_tgt = mkSzAddr( ty, guest_CIA_curr_instr +
> > + (Long)extend_s_16to64( BD_u16 ) );
>
> To be pedantic, the address of the failure handler is equal to
> the address that is in the TFHAR register, not the address from
> the conditional branch. That is especially true given that the
> compiler is free to generate either:
>
> tbegin.
> ...
> beq <failure handler>
> // fall-through success handler
>
> or
>
> tbegin.
> ....
> bne <success handler>
> // fall-through failure handler
>
> Remember that tbegin. initializes the TFHAR to CIA+4, so you
> just have to set failure_tgt to the address of nextInstr.
If we execute the tbegin, which is not done according to proposal 1.
>
>
>
> > + /* Set the CR0 field to indicate the tbegin failed. Then let
> > + * the code do the branch to the failure path.
> > + *
> > + * 000 || 0 Transaction initiation successful,
> > + * unnested (Transaction state of
> > + * Non-transactional prior to tbegin.)
> > + * 010 || 0 Transaction initiation successful, nested
> > + * (Transaction state of Transactional
> > + * prior to tbegin.)
> > + * 001 || 0 Transaction initiation unsuccessful,
> > + * (Transaction state of Suspended prior
> > + * to tbegin.)
> > + */
> > + if (mode64)
> > + /* 0x0010 takes transactional path */
> > + /* 0x0000 takes the failure path */
> > + set_CR0(mkU64(0x0000));
> > + else
> > + set_CR0(mkU32(0x0000));
>
> Your comment is correct, but you are incorrectly clearing cr0,
> which signifies the tbegin. succeeded. You need to initialize
> it to 0x2.
Well we are trying to make it execute the failure path, that was the
first proposal, so yes we do need to clear it.
>
> Stylistically, using "0x0000" looks like you're trying to stuff
> 4 4-bit nibbles into cr0, but a cr register can only hold 4-bits
> (ie, 1 nibble). You could use 0x2 or 0b0010, which both look more
> like just 1 nibble of data. Then again, I don't know the valgrind
> code formatting rules and maybe that is how things are written?
> If so, just ignore this comment of mine.
OK, will take the stylistic comments. :-)
>
>
> Peter
>
>
|
|
From: Peter B. <be...@vn...> - 2013-07-15 17:03:49
|
On Mon, 2013-07-15 at 09:16 -0700, Carl E. Love wrote: > Remember, this patch was for the proposal to just replace the tbegin > with a branch to execute the failure path for the TM. The tbegin and > tend instructions would not actually get executed. Thus we would need > to go find the address for the failure handler and can not get it from > the TM registers. For Julian's proposal (1), I do not think we should replace the tbegin. with a branch. Instead, we should implement the tbegin. instruction, but in a way that it always returns failure. Since the hw tbegin. initializes TFHAR to CIA+4, you don't need to branch anywhere, just set cr0 to 0x2, and initialize the HTM SPRs like you are currently doing and then continue on to the next instruction. Nothing more is needed. > > Remember that tbegin. initializes the TFHAR to CIA+4, so you > > just have to set failure_tgt to the address of nextInstr. > > If we execute the tbegin, which is not done according to proposal 1. Julian's proposal (1) at a high level is just to make the transaction begin instruction fail so that we always execute the failure path. That doesn't mean we have to replace it with a branch. As I said above and I thought was clear from some of my earlier posts, we should implement a simple tbegin. instruction and execute it. > > Your comment is correct, but you are incorrectly clearing cr0, > > which signifies the tbegin. succeeded. You need to initialize > > it to 0x2. > > Well we are trying to make it execute the failure path, that was the > first proposal, so yes we do need to clear it. No. To execute the failure path, you need to set cr0 to 0x2 to signify a transaction begin failure. If clearing cr0 makes you execute the failure path, then you have a bug somewhere else you need to track down. Probably it is due to you (incorrectly) grabbing the "failure address" from the branch and the compiler has changed the "beq <failure_path>" with a "bne <success_path>". So what you think is the address of the failure handler is really the address of the success handler. In that case, clearing cr0 would make you execute the failure path, but that is just two bugs causing you to accidentally doing the right thing. I will say this more more time so we're all on the same page. For Julian's (1) proposal, we (Power) should implement and execute a tbegin. instruction. It should do: 1) set cr0 to 0x2 2) Initialize TFHAR to CIA+4 3) Initialize TEXASR 4) Initialize TFIAR (probably to CIA, ie, the address of tbegin.) 5) Continue executing at the next instruction. There really isn't anything more it needs to do. Peter |