|
From: Carl E. L. <ce...@li...> - 2013-07-19 22:03:29
|
On Thu, 2013-06-20 at 11:27 +0200, Julian Seward wrote: <snip> > > We can't hope to emulate the conflict checking done by the real CPU > in any sane (performance or complexity) way. So I see two remaining > options: > > (1) translate "XBEGIN fail-addr" as "goto fail-addr"; that is: push > simulated execution directly onto the failure path. This is simple > but will have poor performance, if (as is likely) the failure path > uses normal locking and is not tuned for speed. > > (2) translate "XBEGIN fail-addr" by handing it through to the real CPU, > in the same kind of way we handle CPUID, RDTSC, etc. Of course we will > have to put our own failure-handler address so we don't lose control > of execution if the transaction is aborted, but that's not difficult. > > --- > > (1) is simple but likely to work. (2) is probably preferable but I don't > know if there will be hidden problems in it. > > This also assumes that the s390 and POWER instructions can be mapped to > this same structure: TRANSACTION-START(fail-addr) and TRANSACTION-END. > That's probably the first thing that we should investigate. I have implemented the second proposal for Power. I looked at the CPUID implementation some. On Power, the tbegin instruction sets the three TM registers (TEXASR, TFIAR, TFHAR) registers and the condition code field CR0. The condition code value is used by a branch instruction following the tbegin to determine if the transaction path or the failure path is to be taken. The TFHAR register contains the address of the transaction failure handler code. The TFIAR is the address at which a TM fails. The TEXASR is a 64-bit register contains the reasons a TM failed. In my implementation, the Power Valgrind support for the tbegin instruction sets the TFIAR and TFHAR registers to CIA and CIA+4 respectively where CIA is the Current Instruction Address. The Power tbegin instruction does not have any registers associated with it. However, for my Valgrind support I implemented it as an instruction that an 128-bit destination register. This destination register contains the TEXASR register in the upper 64-bits and the condition code register in the lower 32-bits. These two values are extracted from the destination register and copied into the Power guest state registers. Valgrind issues the tbegin instruction, then issues the needed instructions to get the contents of the TEXASR and Condition code registers to be returned. This implementation results in the CPU executing the tbegin instruction from the guest program, followed by some number of instructions for the Valgrind code with the occasional guest program instruction mixed in. >From my very preliminary tests on a very simple test case, given below, is that the tbegin fails and the failure handler is executed. The value from the TEXASR register is 0x120000018000001 which according to my decoding of the bits indicate: persistent failure, footprint overflow, privilege state, failure summary is complete, transaction level 1. The key thing here is the "footprint overflow" which is described in the ISA as "an attempt to perform a storage access in Transactional state which exceeds the capacity for tracking transactional accesses". Basically, we can't track all of the guest program instructions between the tbegin and tend with all of the valgrind instructions mixed in. I was concerned that this might happen as I suspect the number of loads/stores that can be tracked in the HW must be somewhat limited. The bottom line is this implementation for Power does not seem to be viable. On Power, you can suspend and resume a transaction. I am not sure on the details of this but the thought did occur to resume the transaction only when executing guest program instructions ant then suspend again. It means having to wrap each guest program instruction issue with the resume and suspend instructions which would only be executed when executing guest instruction on a TM code path. It seems like a rather messy thing to do, with possibly a fair bit of overhead for the rare case of doing a TM code path. I don't think the approach is viable on all architectures even if it did work on Power, which I am not sure about. I don't think this would be a good approach, just a thought. The bottom line is that the second approach doesn't appear to be viable on Power unless we can change the implementation that I did somehow. It seems like Valgrind really needs to have the TM support in software. Specifically, valgrind will have to track the loads/stores between a tbegin and tend and then try to emulate the correct behavior. This approach has performance and accuracy questions of its own. Below is the test program and my preliminary patch to implement the second approach. The patch for Power can be found in https://bugs.kde.org/show_bug.cgi?id=322593 Let me know if you have thoughts or suggestions on how better to implement the support on Power or in general. Thanks. Carl Love --------------------------------------------------------------- #include <stdio.h> int __attribute__ ((noinline)) htm_begin (int r3, int r4) { int ret; unsigned long long texasr = 0x5678ULL; if (__builtin_tbegin (0)) { ret = r3; __builtin_tend (0); } else { texasr = __builtin_get_texasr(); printf("failure path TEXASR = 0x%lx\n", texasr); printf("failure path\n"); ret = r4; } return ret; } |