|
From: Carl E. L. <ce...@li...> - 2013-06-28 16:14:08
|
Julian:
I am starting to look at how to implement the new Power ISA 2.07
Transactional memory instructions.
The discussion thread talked about trying to do your suggestion 1) of
pushing the simulated execution directly onto the failure path as a
simple first attempt. I am wondering how I would go about doing that so
I can try implementing support for the transaction begin and end
instructions. Not sure how this will work for the PPC suspend and
resume instructions. Not sure that we will be able to goto a failure
path, stop this path and then come back to it later. But I figure I
need to get started somewhere to really start understanding how this
would work or not work in Valgrind for the power instructions. Anyway,
any input on the specifics of how I push the simulated execution
directly to the failure path would be helpful. Thanks.
Carl Love
Date: Thu, 20 Jun 2013 15:39:42 +0200
From: Julian Seward <js...@ac...>
Subject: Re: [Valgrind-developers] Transactional Memory support
To: Josef Weidendorfer <Jos...@gm...>
Cc: Valgrind Dev Maillist <val...@li...>
Message-ID: <51C...@ac...>
Content-Type: text/plain; charset=ISO-8859-1
On 06/20/2013 02:15 PM, Josef Weidendorfer wrote:
> But the code added by VG +
> tool (just assume cache simulation) will raise the probability for
> transaction failure significantly.
Yes .. I agree with that.
> And doing a rollback + failure path
> is slower than doing the failure path from the beginning. So (1) may
> not be a bad option at all.
OK, worth a try. At least it gives us an ultra-simple baseline
implementation. I expect to have a Haswell box to try with, soon.
That's Intel-specific, though. The question of whether we can fit
Power/s390 into the same framework is also important.
> What about
> (3) if XBEGIN/XEND is found in the same SB, remove them. As VG is
> serializing threads, there is no way for a conflict within a SB
anyway.
OK for now, but if we continue Philippe Waroquiers' work on
multithreaded
Valgrind, then that serialization might go away one day.
> PS: Using TM ourself should be a very nice solution to make memcheck
> fast when we remove serializing of threads at one point. The move
> forward here would be to let the tool decide whether VG core should
> do serialization or not. If TM is not available, memcheck would
> go with thread serialization as now.
Hmm, interesting.
Some time earlier this year I worked out most of the details for making
memcheck multithreaded without using TM, and posted the details to the
list
(I think). Philippe points out though that we will in any case need to
retain the ability to serialize so as not to force tool authors to
completely rewrite tools for multithreaded operation.
J
|
|
From: Julian S. <js...@ac...> - 2013-06-28 20:16:28
|
Hi Carl,
> would work or not work in Valgrind for the power instructions. Anyway,
> any input on the specifics of how I push the simulated execution
> directly to the failure path would be helpful. Thanks.
I don't know the specifics of the Power instructions, so I can't answer
that directly. But I can tell you what the idea was for the Intel
instructions -- maybe that would help.
(IIRC) the Intel instructions are
XBEGIN %reg -- begin a transaction.
-- %reg holds the failure-path address
XEND -- finish the most recently XBEGIN'd transaction
So the idea is very simple: translate XBEGIN %reg as if it was
simply a jump to (the code address in) %reg. Does that help?
J
|
|
From: Christian B. <bor...@de...> - 2013-06-29 12:37:19
|
On 28/06/13 22:16, Julian Seward wrote: > > Hi Carl, > >> would work or not work in Valgrind for the power instructions. Anyway, >> any input on the specifics of how I push the simulated execution >> directly to the failure path would be helpful. Thanks. > > I don't know the specifics of the Power instructions, so I can't answer > that directly. But I can tell you what the idea was for the Intel > instructions -- maybe that would help. > > (IIRC) the Intel instructions are > > XBEGIN %reg -- begin a transaction. > -- %reg holds the failure-path address > > XEND -- finish the most recently XBEGIN'd transaction > > So the idea is very simple: translate XBEGIN %reg as if it was > simply a jump to (the code address in) %reg. Does that help? s390 has the concept of normal and constrained transactions. We could do the same logic (jump directly into error path) for normal transactions. Constrained transactions can do less, but are supposed to succeed eventually. Therefore we might have no error path in that case. We could mask out transactions in the facility bits, though. Christian |
|
From: Carl E. L. <ce...@li...> - 2013-06-28 22:38:05
|
On Fri, 2013-06-28 at 22:16 +0200, Julian Seward wrote: > Hi Carl, > > > would work or not work in Valgrind for the power instructions. Anyway, > > any input on the specifics of how I push the simulated execution > > directly to the failure path would be helpful. Thanks. > > I don't know the specifics of the Power instructions, so I can't answer > that directly. But I can tell you what the idea was for the Intel > instructions -- maybe that would help. > > (IIRC) the Intel instructions are > > XBEGIN %reg -- begin a transaction. > -- %reg holds the failure-path address > > XEND -- finish the most recently XBEGIN'd transaction > > So the idea is very simple: translate XBEGIN %reg as if it was > simply a jump to (the code address in) %reg. Does that help? So basically you will unconditionally take the failure path which presumably is the code to handle the transaction in a non-transactional way, i.e. to obtain the necessary lock do the operations and then release the lock. >From what I understand of the power implementation (I have some questions for the IBM compiler people) is that the instruction following the tbegin will be a branch instruction with the address of the failure path. So I guess we could do something similar. However I have some concerns. One concern is, what guarantee do we have that the outcome of the failure path would be functionally equivalent to the transactional path? Specifically, did the programmer do the same operations under the appropriate data lock to guarantee the code sequence is executed atomically and thus generated the same result?. Seems like we are at the mercy of whatever the programmer chooses to do in the case of a failure. I need to chat with our compiler people about that question. I guess in theory the programmer could chose throw up his hands and just call exit(1) if the transaction failed. In that case, Valgrind would never be able to reproduce a successful run of the program on real hardware where there was no TM issues during the run. Maybe I am still missing a lot here. Like I said, I am just starting to dive into this and understand all the issues and ramifications. > > J > |
|
From: Eliot M. <mo...@cs...> - 2013-06-28 23:07:15
|
Two quick comments: 1. *Some* PPC txns are expected to complete if tried repeatedly, and thus will not have a failure path. 2. At least on x86 and maybe on PPC some status bits indicating failure and its cause need to be set appropriately when going to the failure path. Regards - Eliot Moss Sent via BlackBerry -----Original Message----- From: "Carl E. Love" <ce...@li...> Date: Fri, 28 Jun 2013 15:37:52 To: Julian Seward<js...@ac...> Cc: <val...@li...> Subject: Re: [Valgrind-developers] Transactional memory implementation input On Fri, 2013-06-28 at 22:16 +0200, Julian Seward wrote: > Hi Carl, > > > would work or not work in Valgrind for the power instructions. Anyway, > > any input on the specifics of how I push the simulated execution > > directly to the failure path would be helpful. Thanks. > > I don't know the specifics of the Power instructions, so I can't answer > that directly. But I can tell you what the idea was for the Intel > instructions -- maybe that would help. > > (IIRC) the Intel instructions are > > XBEGIN %reg -- begin a transaction. > -- %reg holds the failure-path address > > XEND -- finish the most recently XBEGIN'd transaction > > So the idea is very simple: translate XBEGIN %reg as if it was > simply a jump to (the code address in) %reg. Does that help? So basically you will unconditionally take the failure path which presumably is the code to handle the transaction in a non-transactional way, i.e. to obtain the necessary lock do the operations and then release the lock. >From what I understand of the power implementation (I have some questions for the IBM compiler people) is that the instruction following the tbegin will be a branch instruction with the address of the failure path. So I guess we could do something similar. However I have some concerns. One concern is, what guarantee do we have that the outcome of the failure path would be functionally equivalent to the transactional path? Specifically, did the programmer do the same operations under the appropriate data lock to guarantee the code sequence is executed atomically and thus generated the same result?. Seems like we are at the mercy of whatever the programmer chooses to do in the case of a failure. I need to chat with our compiler people about that question. I guess in theory the programmer could chose throw up his hands and just call exit(1) if the transaction failed. In that case, Valgrind would never be able to reproduce a successful run of the program on real hardware where there was no TM issues during the run. Maybe I am still missing a lot here. Like I said, I am just starting to dive into this and understand all the issues and ramifications. > > J > ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Valgrind-developers mailing list Val...@li... https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Carl E. L. <ce...@li...> - 2013-07-01 16:28:45
|
Repost, I sent the following message but I didn't see it on the email list. Not sure if it got lost. My apologies if this is a repeat. On Fri, 2013-06-28 at 22:16 +0200, Julian Seward wrote: > Hi Carl, > > > would work or not work in Valgrind for the power instructions. Anyway, > > any input on the specifics of how I push the simulated execution > > directly to the failure path would be helpful. Thanks. > > I don't know the specifics of the Power instructions, so I can't answer > that directly. But I can tell you what the idea was for the Intel > instructions -- maybe that would help. > > (IIRC) the Intel instructions are > > XBEGIN %reg -- begin a transaction. > -- %reg holds the failure-path address > > XEND -- finish the most recently XBEGIN'd transaction > > So the idea is very simple: translate XBEGIN %reg as if it was > simply a jump to (the code address in) %reg. Does that help? So basically you will unconditionally take the failure path which presumably is the code to handle the transaction in a non-transactional way, i.e. to obtain the necessary lock do the operations and then release the lock. >From what I understand of the power implementation (I have some questions for the IBM compiler people) is that the instruction following the tbegin will be a branch instruction with the address of the failure path. So I guess we could do something similar. However I have some concerns. One concern is, what guarantee do we have that the outcome of the failure path would be functionally equivalent to the transactional path? Specifically, did the programmer do the same operations under the appropriate data lock to guarantee the code sequence is executed atomically and thus generated the same result?. Seems like we are at the mercy of whatever the programmer chooses to do in the case of a failure. I need to chat with our compiler people about that question. I guess in theory the programmer could chose throw up his hands and just call exit(1) if the transaction failed. In that case, Valgrind would never be able to reproduce a successful run of the program on real hardware where there was no TM issues during the run. Maybe I am still missing a lot here. Like I said, I am just starting to dive into this and understand all the issues and ramifications. > > J > |
|
From: Peter B. <be...@vn...> - 2013-07-01 20:06:41
|
FYI, I'm the IBM guy adding the GCC HTM support for Power.
Carl has asked me questions wrt valgrind support for HTM, so
I thought I would subscribe to this mailing list to participate
in the conversation.
First off, I'd like to respond to Julian's initial post, but since
that occurred before I joined the mailing list, I'll have to reply
to his post here.
Julian, I agree your (1) proposal is a good and easy starting point,
since valgrind will have to be able to handle the error code path
anyway. However, I think valgrind *will* have to implement something
along the lines of (2) sometime. Implementing (1) now will give us
time to come up with a design for (2) while allowing people to start
executing HTM programs now.
On Mon, 2013-07-01 at 09:28 -0700, Carl E. Love wrote:
> On Fri, 2013-06-28 at 22:16 +0200, Julian Seward wrote:
> > Hi Carl,
> >
> > > would work or not work in Valgrind for the power instructions. Anyway,
> > > any input on the specifics of how I push the simulated execution
> > > directly to the failure path would be helpful. Thanks.
> >
> > I don't know the specifics of the Power instructions, so I can't answer
> > that directly. But I can tell you what the idea was for the Intel
> > instructions -- maybe that would help.
> >
> > (IIRC) the Intel instructions are
> >
> > XBEGIN %reg -- begin a transaction.
> > -- %reg holds the failure-path address
> >
> > XEND -- finish the most recently XBEGIN'd transaction
> >
> > So the idea is very simple: translate XBEGIN %reg as if it was
> > simply a jump to (the code address in) %reg. Does that help?
On Power, a tbegin. instruction sets cr0 to show success or failure
at entering transactional state and updates two new SPR registers:
Transaction EXception And Summary Register (TEXASR):
This register is normally used by failure handlers for
determining why a transaction failed, but it also holds
information about the depth of nested transactions we
currently have.
Transaction Failure Handler Address Register (TFHAR):
This register holds the address the hardware will start
executing from upon a transaction failure/abort. It is
initialized by the tbegin. instruction to CIA+4 (in IBM
parlance), which means it contains the address of the
instruction immediately following the tbegin. instruction.
It can be modified by a "mtspr TFHAR,<reg>", but that
should be a fairly rare occurrence. Similar to x86's
common usage, where the xbegin's %reg is set to the
address following the xbegin.
There is one more HTM SPR register:
Transaction Failure Instruction Address Register (TFIAR):
This register holds the address of the instruction
that caused the transaction failure (when possible).
On Power, to implement (1), we just need to have the tbegin. instruction
return failure by setting cr0 to 0b0010 (ie, 0x2), set the TFHAR to
CIA+4 and then begin executing at the address in the TFHAR.
We'll need to choose a reason why the transaction failed. so we can
initialize the TEXASR and TFIAR for use in the failure handler code.
I highly suggest setting the PERSISTENT flag in the TEXASR, since that
is a hint to the failure handler that this failure is not likely to
go away and retrying the transaction will likely fail. Normally
failure handlers will not retry a hardware transaction if the failure
is marked persistent. A possible failure we could use is that we
hit a resource limit on the number of allowable nested transactions
(heh, one is one too many :).
For the other Power htm instructions, they basically just act as nops
when we're in non-transactional state, so implementing them for (1)
should be straightforward, since we're always in non-transactional
state.
> From what I understand of the power implementation (I have some
> questions for the IBM compiler people) is that the instruction following
> the tbegin will be a branch instruction with the address of the failure
> path. So I guess we could do something similar.
Normally, that is the case, but it doesn't have to immediately follow
the tbegin. ... and it may not even exist, depending on the code we
compiled, so you cannot rely on it being there. That also doesn't
cover the case where the user updates the TFHAR with an address,
so on failure, we don't even branch to the instruction following
the tbegin., but rather to the new address the user stuffed into
the TFHAR.
> However I have some
> concerns. One concern is, what guarantee do we have that the outcome of
> the failure path would be functionally equivalent to the transactional
> path?
There is no guarantee they are functionally equivalent. That can be
due to stupid or untested buggy code or for valid reasons, like a
real hardware transaction shouldn't fail for some specific transactions
and so the programmer omitted the failure handler. Either way, for
(1), I don't think we should sweat it too much, since the former is
totally a user error and they get what they get, while for the latter,
there's nothing we can do until we have (2) working.
> Seems like we are at
> the mercy of whatever the programmer chooses to do in the case of a
> failure. I need to chat with our compiler people about that question.
> I guess in theory the programmer could chose throw up his hands and just
> call exit(1) if the transaction failed. In that case, Valgrind would
> never be able to reproduce a successful run of the program on real
> hardware where there was no TM issues during the run.
Correct, we are at the mercy of whatever the programmer decides to
do when a transaction fails, but if they decide to throw their hands
up in the event of errors, well... that's stupid, but that's their
business. Note that there's nothing special here wrt valgrind that
doesn't also apply to real htm hardware failing for some reason.
The only caveat are the special transactions that normally would never
fail on real hardware, but that will be covered if/when Julian's (2)
proposal is implemented.
As for Julian's (2) proposal, I haven't had time enough to think about
possible solutions, and whether we could (or even should?) possibly rely
on the underlying HTM hardware to help us. I am eager to hear other
people ideas though, if they have them.
Peter
|
|
From: Josef W. <Jos...@gm...> - 2013-07-01 17:25:39
|
Am 29.06.2013 01:06, schrieb Eliot Moss: > Two quick comments: > > 1. *Some* PPC txns are expected to complete if tried repeatedly, and thus will not have a failure path. Hmm. If such transactions are ensured to fall into one SB translation, we could add a flag "execute this translation only in thread-serializing mode", and we can remove the transaction instructions, as it will always succeed. If larger transactions are expected to always succeed after some number of retries, I do not see an easy solution, as any kind of locking (and above serialization will use locks) may result in deadlocks. > 2. At least on x86 and maybe on PPC some status bits indicating failure and its cause need to be set appropriately when going to the failure path. I would assume that there always is a flag for something like "resource overflow"? In the hope that the failure path does not retry the same transaction again. Josef > > Regards - Eliot Moss > Sent via BlackBerry > > -----Original Message----- > From: "Carl E. Love" <ce...@li...> > Date: Fri, 28 Jun 2013 15:37:52 > To: Julian Seward<js...@ac...> > Cc: <val...@li...> > Subject: Re: [Valgrind-developers] Transactional memory implementation input > > On Fri, 2013-06-28 at 22:16 +0200, Julian Seward wrote: >> Hi Carl, >> >>> would work or not work in Valgrind for the power instructions. Anyway, >>> any input on the specifics of how I push the simulated execution >>> directly to the failure path would be helpful. Thanks. >> >> I don't know the specifics of the Power instructions, so I can't answer >> that directly. But I can tell you what the idea was for the Intel >> instructions -- maybe that would help. >> >> (IIRC) the Intel instructions are >> >> XBEGIN %reg -- begin a transaction. >> -- %reg holds the failure-path address >> >> XEND -- finish the most recently XBEGIN'd transaction >> >> So the idea is very simple: translate XBEGIN %reg as if it was >> simply a jump to (the code address in) %reg. Does that help? > > So basically you will unconditionally take the failure path which > presumably is the code to handle the transaction in a non-transactional > way, i.e. to obtain the necessary lock do the operations and then > release the lock. > >>From what I understand of the power implementation (I have some > questions for the IBM compiler people) is that the instruction following > the tbegin will be a branch instruction with the address of the failure > path. So I guess we could do something similar. However I have some > concerns. One concern is, what guarantee do we have that the outcome of > the failure path would be functionally equivalent to the transactional > path? Specifically, did the programmer do the same operations under the > appropriate data lock to guarantee the code sequence is executed > atomically and thus generated the same result?. Seems like we are at > the mercy of whatever the programmer chooses to do in the case of a > failure. I need to chat with our compiler people about that question. > I guess in theory the programmer could chose throw up his hands and just > call exit(1) if the transaction failed. In that case, Valgrind would > never be able to reproduce a successful run of the program on real > hardware where there was no TM issues during the run. Maybe I am still > missing a lot here. Like I said, I am just starting to dive into this > and understand all the issues and ramifications. > >> >> J >> > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |
|
From: Peter B. <be...@vn...> - 2013-07-01 20:58:21
|
> Am 29.06.2013 01:06, schrieb Eliot Moss: > > 2. At least on x86 and maybe on PPC some status bits indicating failure > and its cause need to be set appropriately when going to the failure path. That is true for PPC too. On Mon, 2013-07-01 at 19:25 +0200, Josef Weidendorfer wrote: > I would assume that there always is a flag for something like "resource > overflow"? In the hope that the failure path does not retry the same > transaction again. On PPC, we have a bit in the TEXASR register that says whether the transaction failure is persistent. If it is persistent, the failure code should not retry the hardware transaction. Peter |
|
From: Maran P. <ma...@li...> - 2013-07-02 11:34:14
|
In case of s390x, both of the following facilities are provided by the architecture. But the completion guarantee provided by s390x in the form of constrained transactions seem to be not provided by other architectures. Since constrained transactions do not come with fall-back path, Julian's proposal 1 - jump to the fall back path on a transaction begin - will most probably not help to support constrained transactions on s390x. On 07/02/2013 01:57 AM, Peter Bergner wrote: >> Am 29.06.2013 01:06, schrieb Eliot Moss: >>> 2. At least on x86 and maybe on PPC some status bits indicating failure >> and its cause need to be set appropriately when going to the failure path. > That is true for PPC too. > > > On Mon, 2013-07-01 at 19:25 +0200, Josef Weidendorfer wrote: >> I would assume that there always is a flag for something like "resource >> overflow"? In the hope that the failure path does not retry the same >> transaction again. > On PPC, we have a bit in the TEXASR register that says whether the > transaction failure is persistent. If it is persistent, the failure > code should not retry the hardware transaction. > > Peter > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > > |
|
From: Josef W. <Jos...@gm...> - 2013-07-02 12:28:29
|
Am 02.07.2013 13:33, schrieb Maran Pakkirisamy: > In case of s390x, both of the following facilities are provided by the > architecture. > But the completion guarantee provided by s390x in the form of > constrained transactions seem to be not provided by other architectures. > Since constrained transactions do not come with fall-back path, Julian's > proposal 1 - jump to the fall back path on a transaction begin - will > most probably not help to support constrained transactions on s390x. I just was curious on that and looked it up in the referenced docu. A "constrained" transaction can have a max. of 32 instructions, the code must be in a 256-byte continous area, and only conditional relative forward branches are allowed. Because of the allowed forward branches, a transaction cannot really be restricted to one VEX translation (or one needs to add support for diamond shapes)... Josef |