|
From: Carl E. L. <ce...@li...> - 2013-07-19 22:03:29
|
On Thu, 2013-06-20 at 11:27 +0200, Julian Seward wrote: <snip> > > We can't hope to emulate the conflict checking done by the real CPU > in any sane (performance or complexity) way. So I see two remaining > options: > > (1) translate "XBEGIN fail-addr" as "goto fail-addr"; that is: push > simulated execution directly onto the failure path. This is simple > but will have poor performance, if (as is likely) the failure path > uses normal locking and is not tuned for speed. > > (2) translate "XBEGIN fail-addr" by handing it through to the real CPU, > in the same kind of way we handle CPUID, RDTSC, etc. Of course we will > have to put our own failure-handler address so we don't lose control > of execution if the transaction is aborted, but that's not difficult. > > --- > > (1) is simple but likely to work. (2) is probably preferable but I don't > know if there will be hidden problems in it. > > This also assumes that the s390 and POWER instructions can be mapped to > this same structure: TRANSACTION-START(fail-addr) and TRANSACTION-END. > That's probably the first thing that we should investigate. I have implemented the second proposal for Power. I looked at the CPUID implementation some. On Power, the tbegin instruction sets the three TM registers (TEXASR, TFIAR, TFHAR) registers and the condition code field CR0. The condition code value is used by a branch instruction following the tbegin to determine if the transaction path or the failure path is to be taken. The TFHAR register contains the address of the transaction failure handler code. The TFIAR is the address at which a TM fails. The TEXASR is a 64-bit register contains the reasons a TM failed. In my implementation, the Power Valgrind support for the tbegin instruction sets the TFIAR and TFHAR registers to CIA and CIA+4 respectively where CIA is the Current Instruction Address. The Power tbegin instruction does not have any registers associated with it. However, for my Valgrind support I implemented it as an instruction that an 128-bit destination register. This destination register contains the TEXASR register in the upper 64-bits and the condition code register in the lower 32-bits. These two values are extracted from the destination register and copied into the Power guest state registers. Valgrind issues the tbegin instruction, then issues the needed instructions to get the contents of the TEXASR and Condition code registers to be returned. This implementation results in the CPU executing the tbegin instruction from the guest program, followed by some number of instructions for the Valgrind code with the occasional guest program instruction mixed in. >From my very preliminary tests on a very simple test case, given below, is that the tbegin fails and the failure handler is executed. The value from the TEXASR register is 0x120000018000001 which according to my decoding of the bits indicate: persistent failure, footprint overflow, privilege state, failure summary is complete, transaction level 1. The key thing here is the "footprint overflow" which is described in the ISA as "an attempt to perform a storage access in Transactional state which exceeds the capacity for tracking transactional accesses". Basically, we can't track all of the guest program instructions between the tbegin and tend with all of the valgrind instructions mixed in. I was concerned that this might happen as I suspect the number of loads/stores that can be tracked in the HW must be somewhat limited. The bottom line is this implementation for Power does not seem to be viable. On Power, you can suspend and resume a transaction. I am not sure on the details of this but the thought did occur to resume the transaction only when executing guest program instructions ant then suspend again. It means having to wrap each guest program instruction issue with the resume and suspend instructions which would only be executed when executing guest instruction on a TM code path. It seems like a rather messy thing to do, with possibly a fair bit of overhead for the rare case of doing a TM code path. I don't think the approach is viable on all architectures even if it did work on Power, which I am not sure about. I don't think this would be a good approach, just a thought. The bottom line is that the second approach doesn't appear to be viable on Power unless we can change the implementation that I did somehow. It seems like Valgrind really needs to have the TM support in software. Specifically, valgrind will have to track the loads/stores between a tbegin and tend and then try to emulate the correct behavior. This approach has performance and accuracy questions of its own. Below is the test program and my preliminary patch to implement the second approach. The patch for Power can be found in https://bugs.kde.org/show_bug.cgi?id=322593 Let me know if you have thoughts or suggestions on how better to implement the support on Power or in general. Thanks. Carl Love --------------------------------------------------------------- #include <stdio.h> int __attribute__ ((noinline)) htm_begin (int r3, int r4) { int ret; unsigned long long texasr = 0x5678ULL; if (__builtin_tbegin (0)) { ret = r3; __builtin_tend (0); } else { texasr = __builtin_get_texasr(); printf("failure path TEXASR = 0x%lx\n", texasr); printf("failure path\n"); ret = r4; } return ret; } |
|
From: Josef W. <Jos...@gm...> - 2013-07-19 23:19:32
|
Am 20.07.2013 00:03, schrieb Carl E. Love: > I have implemented the second proposal for Power. > ... > decoding of the bits indicate: persistent failure, footprint overflow, > privilege state, failure summary is complete, transaction level 1. The > key thing here is the "footprint overflow" which is described in the ISA > as "an attempt to perform a storage access in Transactional state which > exceeds the capacity for tracking transactional accesses". Basically, > we can't track all of the guest program instructions between the tbegin > and tend with all of the valgrind instructions mixed in. Hmm. The transaction in your code is quite small. Is there a new superblock started within the transaction? If this never was translated before (very probably), Valgrind will try to do a translation, which of course overflows the transaction storage. And with the rollback, the partly started translation will never succeed. > The bottom line is that the second approach doesn't appear to be viable > on Power unless we can change the implementation that I did somehow. It > seems like Valgrind really needs to have the TM support in software. > Specifically, valgrind will have to track the loads/stores between a > tbegin and tend and then try to emulate the correct behavior. After collecting read/write sets, the emulation could be done using TM itself, which would care about conflicts, ensuring the correct behavior. But as already mentioned elsewhere, it is probably way simpler to just force the transaction to run in single-thread mode (stop other threads before), simply ignoring all TM transactions. Josef |
|
From: Julian S. <js...@ac...> - 2013-08-16 14:25:08
|
> I have implemented the second proposal for Power. [...]
I'm not sure I understand the details of how TM is presented in the Power
instruction set and architectural state. It seems broadly similar to the
Intel scheme, though, in which there are two basic primitives:
T_BEGIN, which takes the failure handler address as a parameter
T_END
T_BEGIN starts a new transaction. T_END ends it and releases any resources
associated with it. If the transaction fails for any reason, the processor
jumps to the handler address specified by T_BEGIN. Typically, if the
transaction fails, some registers will be set, indicating the reason, before
jumping to the failure handler; although that is secondary to this
discussion.
My vague implementation sketch for this second proposal was:
* when the guest CPU guest arrives at the T_BEGIN(guest-fail-handler)
instruction, call a dirty helper function which:
- adds guest-fail-handler to a stack of handler addresses
for the current thread
- (on the the host) does T_BEGIN(host-fail-handler). Note that
host-fail-handler is part of the valgrind C code and is
is definitely != guest-fail-handler
* the dirty helper returns (to JITted code), and continues. This
is (of course) part of the transaction on the host. The guest CPU
therefore continues with the instructions that are part of the
(guest) transaction.
* (if the transaction does not fail)
the guest CPU arrives at T_END. It calls another dirty helper
function which first does T_END on the host, then pops
guest-fail-handler off the stack of handler addresses for
the thread. The transaction is over.
* (if the transaction fails)
the host CPU will jump to host-fail-handler.
This behaves similarly to how synchronous signals are currently
handled:
basically host-fail-handler must longjmp out of the JITted code, over
m_dispatch/dispatch-*.S, back into the scheduler, indicating somehow
that a transaction has failed. The scheduler can then fix up the guest
state, by popping guest-fail-handler off this thread's handler stack,
setting the guest state program counter to that value, and letting the
guest CPU resume.
What is crucial (and I was unable to determine from your description) is
that we cannot pass the guest's failure-handling address through to the
host, since otherwise we will permanently lose control of execution when
the transaction fails.
Whether or not the transactions on the host get nuked due to resource
constraints is orthogonal to the above proposal. In principle, if the
host has enough tracking resources, it could succeed.
Note that none of this involves changing the IR, so none of the tools
have to be aware that transactions are supported.
Does any of the above sync with how you did your Power implementation?
J
|
|
From: Maran P. <ma...@li...> - 2013-08-19 09:11:17
|
On 08/16/2013 07:54 PM, Julian Seward wrote: >> I have implemented the second proposal for Power. [...] > I'm not sure I understand the details of how TM is presented in the Power > instruction set and architectural state. It seems broadly similar to the > Intel scheme, though, in which there are two basic primitives: > > T_BEGIN, which takes the failure handler address as a parameter > T_END In s390, there are 2 types of transactions 1) normal (nonconstrained) transactions and 2) constrained transactions Though s390 architecture does not mandate a failure-handler address to be specified in both kind of transactions, it might be safe to assume a failure-handler(fall back path) will be provided in case of normal transactions similar to the way Power does failure handling. So, the above said scheme could be adopted for s390 in nonconstrained transactions. However, constrained transactions, by definition, provides guaranteed completion of the transaction and hence a failure handler is not available. Also, contrained transactions are too restrictive in terms of the number of instructions, type of instructions, accessibilty of the storage operands and instructions etc. So, it is quite possible that an instrumented transaction block (always) fails which otherwise could succeed when executed natively. Since a failure handler is not available, I am not sure if constrained transactions could fit in this scheme. 1) Either we could explore to bring the transaction block into a basic block and remove the transaction primitives (TBEGINC (for constrained transactions) and TEND) - which could work as long as valgrind is single threaded. Or, 2) Try to simulate constrained transaction as a nonconstrained transaction where the host-fail-handler has to make sure that the transaction succeeds in normal cases. I have not made a feasibility study of any of the above schemes, however. -- Maran |
|
From: Carl E. L. <ce...@li...> - 2013-08-16 22:06:14
|
On Fri, 2013-08-16 at 16:24 +0200, Julian Seward wrote:
> > I have implemented the second proposal for Power. [...]
>
> I'm not sure I understand the details of how TM is presented in the Power
> instruction set and architectural state. It seems broadly similar to the
> Intel scheme, though, in which there are two basic primitives:
>
> T_BEGIN, which takes the failure handler address as a parameter
> T_END
>
> T_BEGIN starts a new transaction. T_END ends it and releases any resources
> associated with it. If the transaction fails for any reason, the processor
> jumps to the handler address specified by T_BEGIN. Typically, if the
> transaction fails, some registers will be set, indicating the reason, before
> jumping to the failure handler; although that is secondary to this
> discussion.
On Power, the compiler generates the T_BEGIN instruction followed by an
conditional branch instruction. The result of executing the T_BEGIN
instruction is to set the condition code register. If the T_BEGIN
succeeds, then the subsequent branch instruction will cause the control
flow to follow the successful code path. Otherwise, the branch causes
execution to follow the failure path. The code sequence is essentially
as follows:
tbegin.
beq <failure path>
// success path
Note, the are no restrictions preventing the compiler from putting
additional instructions between the tbegin and the beq in the above code
sequence. We do not tell the CPU where to go on failure as is done in
the Intel T_BEGIN.
Power has three registers for use by the TM instructions. Here is
Peter's description of the registers.
Transaction Failure Handler Address Register (TFHAR):
This register holds the address the hardware will start
executing from upon a transaction failure/abort. It is
initialized by the tbegin. instruction to CIA+4 (in IBM
parlance), which means it contains the address of the
instruction immediately following the tbegin. instruction.
It can be modified by a "mtspr TFHAR,<reg>", but that
should be a fairly rare occurrence. Similar to x86's
common usage, where the xbegin's %reg is set to the
address following the xbegin.
Transaction EXception And Summary Register (TEXASR):
This register is normally used by failure handlers for
determining why a transaction failed, but it also holds
information about the depth of nested transactions we
currently have.
Transaction Failure Instruction Address Register (TFIAR):
This register holds the address of the instruction
that caused the transaction failure (when possible).
>
> My vague implementation sketch for this second proposal was:
>
> * when the guest CPU guest arrives at the T_BEGIN(guest-fail-handler)
> instruction, call a dirty helper function which:
>
> - adds guest-fail-handler to a stack of handler addresses
> for the current thread
Power places the address of the error branch code into the TFHA. The
nesting level for the TM is updated in the TEXASR register. For now,
lets not consider nested TMs.
>
> - (on the the host) does T_BEGIN(host-fail-handler). Note that
> host-fail-handler is part of the valgrind C code and is
> is definitely != guest-fail-handler
In my implementation, the host executes the tbegin. I didn't do
anything to set or change the host TFIAR register. I capture the value
from the condition code register and write that into the guest machine's
condition code register. I also capture the value of the register
(TEXASR) containing the reason for the failure and put it in the guest
TEXASR. Thus the guest code control flow will take either the success or
failure path based on the updated guest condition code register which
contains the result of executing the T_BEGIN on the host.
>
> * the dirty helper returns (to JITted code), and continues. This
> is (of course) part of the transaction on the host. The guest CPU
> therefore continues with the instructions that are part of the
> (guest) transaction.
The code that I have as part of the tbegin implementation copies the
condition code register and TEXASR value from the host to the guest
machine registers I guess is what you would refer to as the dirty
helper. I my case, the code is not in an explicit function but could be
put into an explicit dirty helper function. The code I am talking about
is:
+ /* The TEXASR is returned from the TBEGIN instruction in the
upper
+ * 64-bits, the CC register is returned in the lowest 32-bits.
+ */
+ assign( rDst, unop( Iop_TBEGIN, mkU32( R_field ) ) );
+
+ assign( texasr, unop( Iop_128HIto64, mkexpr( rDst ) ) );
+ assign( lDst, unop( Iop_128to64, mkexpr( rDst ) ) );
+ assign( CondCode, unop( Iop_64to32, mkexpr( lDst) ) );
+
+ /* Set the CR0 field to indicate the tbegin failed. Then let
+ * the code do the branch to the success/failure path.
+ *
+ * 000 || 0 Transaction initiation successful,
+ * unnested (Transaction state of
+ * Non-transactional prior to tbegin.)
+ * 010 || 0 Transaction initiation successful, nested
+ * (Transaction state of Transactional
+ * prior to tbegin.)
+ * 001 || 0 Transaction initiation unsuccessful,
+ * (Transaction state of Suspended prior
+ * to tbegin.)
+ */
+
+ /* 0x2 takes transactional path */
+ /* 0x0 takes the failure path */
+
+ putGST( PPC_GST_TFIAR, mkU64( guest_CIA_curr_instr) );
+ putGST( PPC_GST_TEXASR, mkexpr( texasr ) );
+ putGST( PPC_GST_TFHAR, mkU64( guest_CIA_curr_instr+4 ) );
+
+ return True;
+
+ break;
>
> * (if the transaction does not fail)
> the guest CPU arrives at T_END. It calls another dirty helper
> function which first does T_END on the host, then pops
> guest-fail-handler off the stack of handler addresses for
> the thread. The transaction is over.
In my implementation the T_END instruction is actually a noop right now.
Looking at it again as I write this response I see this is an error in
my current implementation. I will fix it.
What happens, is the host is executing the guest instructions issued to
the host as well as the instructions from Valgrind. The Host HW detects
the HW can't track all of the instructions being executed. Since I
didn't issue the T_END the TM is bound to fail eventually. The
host HW rolls the state of the registers back to the state at the
tbegin, updates the condition code register to failure, sets the TEXASR
register. Then the return from the host executing the T_BEGIN
effectively happens again but this time the condition code register and
TEXASR are set for failure and we go down the failure path in the guest
code. So, in my implementation, I see that the HW resources were
exceeded and I only see the results of the failure path.
>
> * (if the transaction fails)
> the host CPU will jump to host-fail-handler.
> This behaves similarly to how synchronous signals are currently
> handled:
> basically host-fail-handler must longjmp out of the JITted code, over
> m_dispatch/dispatch-*.S, back into the scheduler, indicating somehow
> that a transaction has failed. The scheduler can then fix up the guest
> state, by popping guest-fail-handler off this thread's handler stack,
> setting the guest state program counter to that value, and letting the
> guest CPU resume.
I believe what the hardware does once the failure occurs is that all of
the register and memory changes that occurred between the T_BEGIN and
the failure are erased and we go back to the T_BEGIN instruction
(Program counter again points to the T_BEGIN), update the condition code
register, the TEXASR and the TFIAR and continue executing the code
sequence with the condition code set to failure thus following the TM
failure path as if we had never been down the success path. This is
what I understand of how this works at the hardware level.
>
> What is crucial (and I was unable to determine from your description) is
> that we cannot pass the guest's failure-handling address through to the
> host, since otherwise we will permanently lose control of execution when
> the transaction fails.
>From what I understand of how Power implements this, we have reset the
state of the HW back to the state when the T_BEGIN started executing,
i.e. the program counter is set back to the T_BEGIN. So, we are not
going to lose control of the execution as we never tell the CPU
explicitly where to go on failure.
>
> Whether or not the transactions on the host get nuked due to resource
> constraints is orthogonal to the above proposal. In principle, if the
> host has enough tracking resources, it could succeed.
Yes, it could, assuming I actually do the T_END. I will fix that and
try again.
An alternative implementation suggestion. I believe Valgrind is single
threaded, correct? When we see a T_BEGIN couldn't we just have the
Valgrind scheduler just continue to execute instructions in the same
thread/CPU until the T_END is seen. We would effectively make the
transactional memory thread sequential so there wouldn't be any
conflicts with other threads. The host would not execute any of the TM
instructions. We would then make the Power suspend and abort
instructions noops. The transaction abort and end instructions would
then allow the Valgrind scheduler to go back to scheduling threads/CPUs
normally. Not sure if this is a viable solution for Valgrind or not. I
just don't know enough of the internals.
>
> Note that none of this involves changing the IR, so none of the tools
> have to be aware that transactions are supported.
>From the POWER perspective, all the register/memory updates are erased
so the tools would have no way of knowing.
>
> Does any of the above sync with how you did your Power implementation?
>
> J
>
|
|
From: Peter B. <be...@vn...> - 2013-08-17 00:09:48
|
On Fri, 2013-08-16 at 15:06 -0700, Carl E. Love wrote:
> The code sequence is essentially as follows:
>
> tbegin.
> beq <failure path>
> // success path
>
> Note, the are no restrictions preventing the compiler from putting
> additional instructions between the tbegin and the beq in the above code
> sequence.
True, although if the compiler does move something there, it will be
a redundant computation that would have been computed on both paths
anyway. A more common occurrence is the compiler can reverse the
sense of the branch, so instead of the code above, you might instead
see:
tbegin.
bne <success path>
// failure path
Logically it makes no difference and shouldn't affect the implementation
in valgrind, since the target address branched to on failure is always
the instruction after tbegin. (modulo being changed by writing to the
THFAR register) and not the address of the failure handler. It's the
job of the cr0 value and the branch to get you to the failure handler.
> We do not tell the CPU where to go on failure as is done in
> the Intel T_BEGIN.
Well not explicitly we don't, but we do implicitly and it is
just CIA+4. From a GCC code generation point of view, when a
xbegin is emitted on Intel, the failure "label" passed to the
xbegin instruction is always at CIA+4, so from a practical
standpoint, Intel and POWER are identical here.
> In my implementation, the host executes the tbegin. I didn't do
> anything to set or change the host TFIAR register. I capture the value
> from the condition code register and write that into the guest machine's
> condition code register.
How do you capture the cr0 value? I ask, because as part of the code
generation for the __builtin_tbegin builtin, I destroy it's value.
The only way to get that value is through use of the __builtin_ttest()
builtin, which returns the 4-bit value that was written to cr0 on
completion of the tbegin...as long as you haven't executed any more HTM
instructions in the mean time.
> > * (if the transaction does not fail)
> > the guest CPU arrives at T_END. It calls another dirty helper
> > function which first does T_END on the host, then pops
> > guest-fail-handler off the stack of handler addresses for
> > the thread. The transaction is over.
>
> In my implementation the T_END instruction is actually a noop right now.
> Looking at it again as I write this response I see this is an error in
> my current implementation. I will fix it.
If the host doesn't execute a tend., your host transaction will abort
100% of the time. No wonder things didn't work for you.
>> * (if the transaction fails)
>> the host CPU will jump to host-fail-handler.
>> This behaves similarly to how synchronous signals are currently
>> handled:
>> basically host-fail-handler must longjmp out of the JITted code, over
>> m_dispatch/dispatch-*.S, back into the scheduler, indicating somehow
>> that a transaction has failed. The scheduler can then fix up the guest
>> state, by popping guest-fail-handler off this thread's handler stack,
>> setting the guest state program counter to that value, and letting the
>> guest CPU resume.
One question I have, is what should we tell the guest program if the
host transaction fails? I'm guessing we want to mirror the transaction
success/failure up into the guest by making it look like the guest
transaction succeeded/failed too. Otherwise, all guest transactions
will seem to succeed regardless of the underlying host transactions.
That's not too realistic, just like the first implementation where
is always fails.
Peter
|
|
From: Carl E. L. <ce...@li...> - 2013-08-19 16:22:41
|
On Fri, 2013-08-16 at 18:39 -0500, Peter Bergner wrote:
>
> > In my implementation, the host executes the tbegin. I didn't do
> > anything to set or change the host TFIAR register. I capture the value
> > from the condition code register and write that into the guest machine's
> > condition code register.
>
> How do you capture the cr0 value? I ask, because as part of the code
> generation for the __builtin_tbegin builtin, I destroy it's value.
> The only way to get that value is through use of the __builtin_ttest()
> builtin, which returns the 4-bit value that was written to cr0 on
> completion of the tbegin...as long as you haven't executed any more HTM
> instructions in the mean time.
>
Valgrind generates instructions that are executed on the host. Below is
the code that generates the instructions for the tbegin instruction. The
first mkFormX() call adds the tbegin instruction to the sequence of instructions
to be executed. The mkFormXFX() add the instruction to copy the contents
of the TEXASR register to the variable r_dst. The second mkFormX() adds
the instruction to copy the condition register to the r_cond variable.
Once the sequence of instructions is executed, the value of r_dst (TEXASR)
and the condition code value (r_cond) is returned from this routine. The
values are then written into the register state for the guest. The guest is
the memory, register values for the specified program being run under Valgrind.
+ case Pin_TM: {
+ UInt r_dst = iregNo(i->Pin.TM.dst, mode64);
+ UInt r_cond = iregNo(i->Pin.TM.cc_dst, mode64);
+ UInt R_field = i->Pin.TM.R_field->Pri.Imm;
+
+ switch (i->Pin.TM.op) {
+ case Pfp_TBEGIN:
+ vex_printf (" >>>> CALLED Pin_TM, case Pfp_TBEGIN \n");
+ p = mkFormX(p, 31, 0, R_field, 0, 654, 1);
+
+ /* move TEXASR to return */
+ p = mkFormXFX(p, r_dst, 130, 339);
+
+ /* move CR to info to return it */
+ p = mkFormX(p, 31, r_cond, 0, 0, 19, 0);
+ break;
>
> > > * (if the transaction does not fail)
> > > the guest CPU arrives at T_END. It calls another dirty helper
> > > function which first does T_END on the host, then pops
> > > guest-fail-handler off the stack of handler addresses for
> > > the thread. The transaction is over.
> >
> > In my implementation the T_END instruction is actually a noop right now.
> > Looking at it again as I write this response I see this is an error in
> > my current implementation. I will fix it.
>
> If the host doesn't execute a tend., your host transaction will abort
> 100% of the time. No wonder things didn't work for you.
After I sent the message, I went back and checked out my code. When I
was writing the message I was quickly referring to the code to remind me
how this implementation was done. The TEND code in the patch is as
follows:
+ case 0x2AE: { //tend.
+ /* The tend. is just a noop. Do nothing */
+ UInt A = IFIELD( theInstr, 25, 1 );
+
+ DIP("tend. %d\n", A);
+ IRTemp rDst = newTemp(Ity_I128);
+
+ /* Treat all of the TM instructions as unops, arg is unused here */
+ assign( rDst, unop( Iop_TEND, mkU32( 0 ) ) );
+ break;
+ }
The comment says the tend is a noop. That is how it is implemented in
Julian's first proposal. When I read the comment, I was thinking I had forgotten
to issue the tend instruction as I said in my previous message. But when
I really looked at the code, I am generating the Iop for the TEND instruction.
I checked the code generation for handling the TEND and the tend instruction is
being generated/executed on the host CPU. So, it really is the comment that
is in error not the code. I fixed the comment and reran the test. The test
is taking the failure path. Unfortunately, the failure wasn't due to forgetting
to do the tend instruction on the underlying CPU.
Carl Love
|
|
From: Peter B. <be...@vn...> - 2013-08-20 18:10:57
|
On Mon, 2013-08-19 at 09:22 -0700, Carl E. Love wrote: > The comment says the tend is a noop. That is how it is implemented in > Julian's first proposal. When I read the comment, I was thinking I had forgotten > to issue the tend instruction as I said in my previous message. But when > I really looked at the code, I am generating the Iop for the TEND instruction. > I checked the code generation for handling the TEND and the tend instruction is > being generated/executed on the host CPU. So, it really is the comment that > is in error not the code. I fixed the comment and reran the test. The test > is taking the failure path. Unfortunately, the failure wasn't due to forgetting > to do the tend instruction on the underlying CPU. Then what does the TEXASR value say? That should tell you why your host transaction aborted. And depending on the failure, the TFIAR register will tell (exactly or close to it) what the failure address is. Peter |
|
From: Carl E. L. <ce...@li...> - 2013-08-20 20:39:36
|
I have created bugzilla 323803 for adding transactional memory support
for Power. This bugzilla is specifically for implementing the first
approach for supporting the TM instructions in Valgrind. Please review
the patch and let me know if you have any comments. Thanks.
Carl Love
|
|
From: Carl E. L. <ce...@li...> - 2013-08-20 22:11:32
|
On Tue, 2013-08-20 at 13:10 -0500, Peter Bergner wrote: > On Mon, 2013-08-19 at 09:22 -0700, Carl E. Love wrote: > > The comment says the tend is a noop. That is how it is implemented in > > Julian's first proposal. When I read the comment, I was thinking I had forgotten > > to issue the tend instruction as I said in my previous message. But when > > I really looked at the code, I am generating the Iop for the TEND instruction. > > I checked the code generation for handling the TEND and the tend instruction is > > being generated/executed on the host CPU. So, it really is the comment that > > is in error not the code. I fixed the comment and reran the test. The test > > is taking the failure path. Unfortunately, the failure wasn't due to forgetting > > to do the tend instruction on the underlying CPU. > > Then what does the TEXASR value say? That should tell you why your > host transaction aborted. And depending on the failure, the TFIAR > register will tell (exactly or close to it) what the failure address > is. The value from the TEXASR register is 0x120000018000001 which according to my decoding of the bits indicate: persistent failure, footprint overflow, privilege state, failure summary is complete, transaction level 1. The key thing here is the "footprint overflow" which is described in the ISA as "an attempt to perform a storage access in Transactional state which exceeds the capacity for tracking transactional accesses". Basically, we can't track all of the guest program instructions between the tbegin and tend with all of the valgrind instructions mixed in. The value of the TFIAR register was 0x382de231 which is somewhere in the valgrind tool code space. |