|
From: Carl E. L. <ce...@li...> - 2012-01-11 18:54:00
|
Julian, Florian and Christian: I would like to thank you for your responses and time to review the proposal. It has been valuable to help us meet the two primary goals of the proposal. The first goal is to identify the minimal set of Iops that are needed by both POWER, s390 and hopefully additional future architectures. The second goal is to identify areas where architecture specific additions may be needed in the future. I would like to focus at this point on the common issues between POWER and the s390 architectures and just note where we have s390 specific additions that may be needed in the future. I would also like to defer the s390 specific Iop additions to later when the s390 team is ready to add the DFP support to Valgrind. The following is a combined response to the various responses I received from the three of you. On Tue, 2011-12-27 at 23:17 +0100, Julian Seward wrote: > Thanks for making a plausible looking proposal. Looks like it's > heading in the right direction. There are some details to iron > out, though. Please look at all of them. > > Comments in order of reading the doc: > > General: how much has this been checked out by the s390 folks > (Florian, Christian, Divya) ? They have reviewed it at a high level. They have not gotten into the details as they are not currently working on the Valgrind DFP support. They seem to be happy with the minimal set of Iops proposed. However, there are a couple of s390 instructions that may need additional Iops to be added. I put TBD in the iop name for these s390 instructions. There was some discussion of this in the replies by Christian Borntraeger and Florian Krohm. There are some specific s390 issues that I would like to defer to later as mentioned above. My comments below should make it clear what I think we can defer until later. > > General: please give a reference, including URL, to a publically available > standard that defines the basic arithmetic (IEEE 754-2008 ?) I will add the public links to the two ISA documents. I have them. > > > > The IBM Power and s390 systems support 32-bit, 64-bit and 128-bit > > decimal floating point (DFP) numbers. The DFP instructions use the > > existing floating point registers. The DFP 128-bit operands are stored in > > registers i and i+1 where i must be even. For example the DFP 128-bit add > > is done as follows: > > > > lfd 12,48(31) // load the upper 64 bits of the first 128 value > > in reg 12 lfd 13,56(31) // load the lower 64 bits of the first 128 > > value in reg 13 addi 0,31,64 > > mr 9,0 > > lfd 0,0(9) // load the upper 64 bits of the 128 second value > > in reg 0 lfd 1,8(9) // load the lower 64 bits of the 128 second > > value in reg 1 daddq 0,12,0 // do the add of the 128 DFP numbers, > > result is stored in // registers 0 and 1. Note, registers 1 and 13 are > > not // explicitly listed but are implied > > It feels like there's possible some confusion between types in the IR > (that is, IRType) and how values are represented in ppc registers. These > concepts are distinct, and are related only in the sense that it is necessary > to choose types that don't cause the ppc->IR and IR->ppc translations to > be inefficient. > > AFAICS (also, from reading the rest of the doc) you want three new types, > Ity_D32, Ity_I64 and Ity_D128. (yes? that sounds right to me) The addition of new types is not required. In the proposal the type D32, D64 and D128 was used to make it clear that the Iops operate on DFP values stored in a floating point registers. From an implementation standpoint, the Iops can be implemented using the existing F32, F64 and F128 types. I did a proof of concept implementation for a couple of Iops by explicitly adding the D64 and D128 types. I have implemented most of the 49 POWER instructions just using the existing floating point types. Adding the DFP types requires adding more code to handle unop, binop, triops of DFP values. If the Iops are implemented using the F32, F64 and F128 types, then the existing code to handle the various unop, binop and triop cases can be leveraged minimizing the amount of additional code. For example, we don't need new binop code to handle two DFP operands, we just leverage the existing binop code that handles two floating point operands. Not adding DFP types will minimize the amount of additional code that is functionally similar to existing code. Adding the DFP type may provide additional capability for type checking. I am not sure how much value there is in this additional type checking capability. Input on this point from the community would be good. If there is value in adding the DFP type, I am willing to change my preliminary POWER7 implementation to add the DFP types rather then using the floating point types as I currently have. Having the DFP type may make the overall code clearer then having Iops with the D32, D64 and D128 in the name but actually operate on F32, F64 and F128 values. Thoughts on adding DFP type? > > Note that many of the back ends already convert F32-typed expression > trees into 64-bit floating point code (eg, the host_ppc_isel.c) so > doing so for D32 would be considered "normal". > > > > > Notation Description > > > > -------------------------------------------------------------------------- > > - DFP: Decimal Floating Point number possibly 32-bit, 64-bit, or > > 128-bit format > > > > D32: 32-bit decimal floating point format These values > > use F64 registers. The D32 term is used to distinguish the value from the > > standard 32-bit floating point value. > > > > D64: 64-bit decimal floating point format. These values > > use F64 registers. The D64 term is used to distinguish the value from the > > standard 64-bit floating point value. > > > > D128: 128-bit decimal floating point format. These > > values use a pair of two 64 bit floating point registers (F64). The > > instruction only references the first register of the register pair. The > > second register is implied. > > As per comments above, the comments re the PPC register layouts isn't > directly relevant to what you need in the IR. (I don't care; I just want > to be sure we have our concepts straight here) Correct, the actual layout of the DFP value in a register is not relevant, POWER and s390 follow the IEEE 754-2008 specification. I wanted to make it clear from the Iop name that the register value being operated on by the Iop was assumed to be in the DFP format. Hence the use of DFP in the Iop name. As mentioned above, the actual types expected by the Iop could be either the traditional floating point type or a DFP type if it is decided that we should really add the DFP types to Valgrind. > > > > IRRoundingMode(I32): Indicates the I32 argument is used > > to hold the bits that specify the rounding mode to be used by the > > instruction. Comment from Florian Krohm: > IEEE Std 754-2008 says (4.3.3) that > > A decimal format implementation of this standard shall provide > roundTiesToAway as a user-selectable rounding-direction attribute. > > with: > > roundTiesToAway, the floating-point number nearest to the infinitely > precise result shall be delivered; if the two nearest floating-point > numbers bracketing an unrepresentable infinitely precise result are > equally near, the one with larger magnitude shall be delivered. > > We probably should extend IRRoundingMode accordingly. > >s930 DFP actually has 9 rounding modes > > According to FPC setting > Round toward 0 > Round away from 0 > Round toward +inf > Round toward -inf > Round to nearest with ties away from 0 > Round to nearest with ties to even > Round to nearest with ties toward 0 > Round to prepare for shorter precision > > We can handle the unsupported ones as we do for binary floating > point and map them to Irrm_NEAREST until problems arise. The POWER7 and s390 machines support the additional four rounding modes for DFP (total of 8 not 9 modes) in the IEEE specification. After some thought, I think it would be better to explicitly change the rounding mode specifier from IRRoundingMode(I32) to IRRoundingModeDFP(I32). The concern is if we extend the existing rounding mode specifier then there will be rounding modes for which the existing binary floating point instructions are not specified to support. Yes, it is some replication of code to create a second super-set of rounding modes for DFP but from an overall consistency, clarity and accuracy standpoint I think it might be best. >From the specification, we will then need support at least the roundTiesToAway mode. The additional modes could be added as needed in the future. What are your thoughts on creating a separate DFP rounding mode specifier? The decision of which additional rounding modes will be supported by s390 can be deferred to when the architecture specific support is added to Valgrind. > > Fine; as per existing code. > > > > IRRoundingExceptionModes(I32): Indicates the I32 argument is used > > to hold the bits that specify the rounding mode and the bits that specify > > the exception mode to be used by the instruction. The s390 instructions > > specify both modes whereas POWER does not specify either mode. > > Hmm. Have you read (in detail) the limitations re floating point > described at > http://valgrind.org/docs/manual/manual-core.html#manual-core.limits > > The point is that IR and Valgrind generally doesn't provide support > for non-default exception modes, and silently assumes that exceptions > are to be fixed up using the default IEEE fixup actions. So there's > no point at the moment in adding exception action information into > the IR. None of the other front ends (xxx_to_IR.c) do it. OK, I had not specifically read the document above but was aware that Valgrind only had limited rounding and exception support. I included the full rounding and exception support for completeness. I thought it better to include it at this point then to gloss over it. I added a comment to the document ahead of the field definitions to that effect. So, if there is no chance that the exception support will be added then we can just drop the exception specification part. Thoughts? > > > > IRRoundingModeAndEponent(I32): Indicates the I32 argument will > > contain bits to specify the rounding mode. POWER also has bits to specify > > the desired exponent. The s390 instructions only specify the rounding > > mode. > > Euh, can you elaborate on the encoding/meaning of "desired exponent" ? > Sounds a bit like wiring a POWER-ism into the IR spec, which isn't good. The POWER instruction includes an immediate value specifing the exponent value. For example if you are doing U.S. dollars you can set the exponent to 2 so the number will be in the standard form of dollars and cents, i.e. $234.99. This is useful if you are calculating interest or taxes in U.S dollars. The process of adjusting the exponent to a specific value, either immediate or as the quantum of a value in a specified register is referred to as quantizing the number. The immediate exponent value in the instruction is specific to the POWER processor. I have spent some time reconsidering this. We really don't need it so I will remove the IRRoundingModeAndEponent(I32). This is related to the discussion below about the quantize instructions. I had specified a quantize immediate Iop. This is a bit redundant. > > > > ARITHMETIC INSTRUCTIONS > > ----------------------- > > IRRoundingMode(I32) X D64 X D64 -> D64 > > Iop_AddD64, Iop_SubD64, Iop_MulD64, Iop_DivD64 > > > > IRRoundingMode(I32) X D128 X D128 -> D128 > > Iop_AddD128, Iop_SubD128, Iop_MulD128, Iop_DivD128 > > fine > > > > FORMAT CONVERSION INSTRUCTIONS > > ------------------------------ > > InvOperationModes(I32) x D32 -> D64 > > Iop_D32toD64 > > what's InvOperationModes? It's not specified anywhere in your doc. InvOperationModes(I32): Specifies the handling of the SNaN and infinity. The field specifies the IEEE invalid operation exception control. These bits are only specified on the IBM s390 architecture. Comment from Florian: > You did not describe InvOperationModes. But looking at insn LDETR > (which is D32 -> D64 conversion) I gather that InvOperationModes > controls whether or not the IEEE-invalid-operation-exception is > delivered. It's essentially a Boolean value. I propose to ignore it > and deliver the exception unconditionally (which is what we do for > binary floating point). Julian do you agree we should remove the InvOperationModes(I32) from the Iop specification? I am OK with it. > > > InvOperationModes(I32) x D64 -> D128 > > Iop_D64toD128 > > ditto > > > IRRoundingExceptionModes(I32) x D64 -> D32 > > Iop_RoundD64toD32 > > > > IRRoundingExceptionModes(I32) x D128 -> D64 > > Iop_RoundD128toD64 > > > > IRRoundingExceptionModes(I32) x I64 -> D64 > > Iop_I64StoD64 > > ok Comment from Florian: > These two should be renamed to Iop_D64toD32 and Iop_D128toD64 for > symmetry in naming with binary floating point ops. I disagree. The drsp instruction is analogous to the binary FP frsp instruction, which (for ppc64) uses the Iop Iop_RoundF64ToF32. The distinction is that an Iop that does not have "Round" in the name is a conversion from one general type (i.e., floating point) to a different general type (i.e., integer); whereas an Iop that *does* have "Round" in the name is a narrowing of the same general type. Comment from Florian: > For s390 we also need: > IROp description s390 insn > Iop_I64StoD128 IRRoundingMode(I32) x signed I64 -> D128 CXGTR > Iop_I32StoD64 signed I32 -> D64 CDFTR > Iop_I32StoD128 signed I32 -> D128 CXFTR > Iop_I64UtoD64 IRRoundingMode(I32) x unsigned I64 -> D64 CDLGTR > Iop_I64UtoD128 IRRoundingMode(I32) x unsigned I64 -> D128 CXLGTR > Iop_I32UtoD64 unsigned I32 -> D64 CDLFTR > Iop_I32UtoD128 unsigned I32 -> D128 CXLFTR > > We need both: conversion to signed and unsigned int > > IROp description s390 insn > Iop_D64toI64S IRRoundingMode(I32) x D64 -> signed I64 CGDTR(A) > Iop_D128toI64S IRRoundingMode(I32) x D128 -> signed I64 CGXTR(A) > Iop_D64toI32S IRRoundingMode(I32) x D64 -> signed I32 CFDTR > Iop_D128toI32S IRRoundingMode(I32) x D128 -> signed I32 CFXTR > Iop_D64toI64U IRRoundingMode(I32) x D64 -> unsigned I64 CLGDTR > Iop_D128toI64U IRRoundingMode(I32) x D128 -> unsigned I64 CLGXTR > Iop_D64toI32U IRRoundingMode(I32) x D64 -> unsigned I32 CLFDTR > Iop_D128toI32U IRRoundingMode(I32) x D128 -> unsigned I32 CLFXTR > > Note, the new IRops for conversion to 32-bit wide results and from D128. Comment from Christian Borntraege: > > The PFPO insn is used to convert between binary floating point and > > decimal floating point. Since we have 3 formats each, that makes 9 > > conversion ops for each direction: > > > > Iop_D32toF32 IRRoundingMode(I32) x D32 -> F32 > > Iop_D32toF64 IRRoundingMode(I32) x D32 -> F64 > > Iop_D32toF128 IRRoundingMode(I32) x D32 -> F128 > > Iop_D64toF32 IRRoundingMode(I32) x D64 -> F32 > > Iop_D64toF64 IRRoundingMode(I32) x D64 -> F64 > > Iop_D64toF128 IRRoundingMode(I32) x D64 -> F128 > > Iop_D128toF32 IRRoundingMode(I32) x D128 -> F32 > > Iop_D128toF64 IRRoundingMode(I32) x D128 -> F64 > > Iop_D128toF128 IRRoundingMode(I32) x D128 -> F128 > > > > Iop_F32toD32 IRRoundingMode(I32) x F32 -> D32 > > Iop_F32toD64 IRRoundingMode(I32) x F32 -> D64 > > Iop_F32toD128 IRRoundingMode(I32) x F32 -> D128 > > Iop_F64toD32 IRRoundingMode(I32) x F64 -> D32 > > Iop_F64toD64 IRRoundingMode(I32) x F64 -> D64 > > Iop_F64toD128 IRRoundingMode(I32) x F64 -> D128 > > Iop_F128toD32 IRRoundingMode(I32) x F128 -> D32 > > Iop_F128toD64 IRRoundingMode(I32) x F128 -> D64 > > Iop_F128toD128 IRRoundingMode(I32) x F128 -> D128 > If you look at pfpo, then the instruction has the same tricky > behaviour as EXecute. Since a self checking prefix and 18 Iops is > pretty expensive I think that pfpo qualifies for a helper. These conversion modes are specifically support by s390 but not POWER. POWER supports a subset of the conversions supported by s390. This is one of the areas where I feel we should defer the decision on adding more Iops, using a helper function or emulating these conversions with the subset of conversions that have been proposed until the s390 team is ready to add the s390 support to Valgrind. This is beyond the scope of what is needed for me to add the POWER support. But I think it is important that the issue has been raised to understand the full scope of what is needed by both architectures. > > IRRoundingExceptionModes(I32) x D64 -> I64 > > Iop_D64toI64 > > this is underspecified .. you need to decide whether that's a > conversion to signed or unsigned I64 (or maybe you need both) > and call them Iop_D64toI64S or Iop_D64toI64U respectively. > (I think you comment about this further down in the doc.) > I mention this partly because sorting out such ambiguity in the > past for the Fxx->Ixx conversions required a lot of hoop > jumping, so we might as well get it straightened out up front. OK, should be convert to signed integer. Changed the name to Iop_D64toI64S. > > > > ROUNDING INSTRUCTIONS > > ----------------------- > > IRRoundingMode(I32) x D64 -> D64 > > Iop_RoundD64 > > > > IRRoundingMode(I32) x D128 -> D128 > > Iop_RoundD128 > > ok > Comment from Florian: > These should be named Iop_RoundD64toInt and Iop_RoundD128toInt for > symmetry in naming with binary floating point ops. I am OK with the name change. Comments? > > > > COMPARE INSTRUCTIONS > > ----------------------- > > D64 x D64 -> IRCmpF64Result(I32) > > Iop_CmpD64 > > > > D128 x D128 -> IRCmpF64Result(I32) > > Iop_CmpD128 > > ok Comment from Florian: > OK. I would use IRCmpD64Result and IRCmpD128Result. That allows > us to use a different encoding, which may be desirable. I am OK with the name change. Comments? Thoughts? > > > > D64 x D64 -> 1 if the condition is TRUE, 0 otherwise > > Iop_CmpEQD64, Iop_CmpLTD64, Iop_CmpGTD64 > > > > D128 x D128 -> 1 if the condition is TRUE; 0 otherwise; > > Iop_CmpEQD128, Iop_CmpLTD128, Iop_CmpGTD128 > > why are these 6 necessary? Isn't their functionality a subset of > Iop_CmpD64 and Iop_CmpD128 ? Sorry, they were in there but I removed them as they were not needed. I took them out of the detailed description below but missed removing all of them from the summary list. > > > > QUANTIZE AND ROUND INSTRUCTIONS > > ------------------------------- > > IRRoundingMode(I32) x D64-> D64 > > Iop_QuantizeID64, > > > > IRRoundingMode(I32) x D128-> D128 > > Iop_QuantizeID128 > > > > IRRoundingMode(I32) x D64 x D64 -> D64 > > Iop_QuantizeD64 > > > > IRRoundingMode(I32) x D128 x D128 -> D128 > > Iop_QuantizeD128 > > I'm not clear what the ID vs D signifies in these names. Can > they instead be called Iop_Quantize{Un,Bin}{D64,D128} to denote > unary vs binary ness (ignoring the rounding mode arg which is > present in all 4 cases). > The first two with the I are for immediate value of the exponent for the quantization operation. Remember above, I had the IRRoundingModeAndEponent(I32) specification. It was also there for specifying the immediate value. I didn't realize that I had a redundant way of specifying the immediate exponent value. So, as said above, we drop IRRoundingModeAndEponent(I32) specification. > What is quantization, anyway (in the context of DFP I mean)? > Does it have any analogue in traditional IEEE754 FP ? Quantization is the process of rounding a number to a specified exponent as mentioned earlier. For example, the U.S. specifies a dollar amount with at most two fractional digits, for example $12.98. When doing a computation involving money, the quantization instruction would be used to round the result to two fractional digits. The immediate value to specify the desired exponent is specific to POWER. Both s390 and POWER have instructions for changing the exponent of (quantizing) a value in register A to match the exponent of a value in register B. I think we should remove the Iops Iop_QuantizeID64 and Iop_QuantizeID128 for specify an immediate exponent. For the POWER instruction that specifies the immediate exponent, I can just generate a DFP number with the desired exponent and pass that as the value in register B as the target exponent. This will help minimize the number of new Iops. > > > IRRoundingMode(I32) x D64 x D64 -> D64 > > Iop_SignificanceRoundD64 > > > > IRRoundingMode(I32) x D128 x D128 -> D128 > > Iop_SignificanceRoundD128 > > > > > > EXTRACT AND INSERT INSTRUCTIONS > > ------------------------------- > > D64 -> I64 > > Iop_ExtractExpD64 > > > > D128 -> I64 > > Iop_ExtractExpD128 > > The exponent really needs 64 bits? Can it be 32 bits? That > might allow for more efficient code generation for 32 bit targets. The biased exponent could be stored as a 32 bit biased integer. The functionality of the specific instruction that this Iop was intended for specifies the exponent will be stored in a floating point register in a signed integer format. I agree we can make the target be I32 and then just handle the I32 to I64 conversion as needed for the specific instruction. The exponent for any DFP number will easily fit in less then 31 bits. I will change the above two Iops to return an I32 value. Comment from Florian: > Do we need to support these at all? In other words, does GCC issue these > or do they show up in hand crafted assembler shipped with GCC/GLIBC? > I don't know but will find out (for s390). Yes, the libdfp and binutils use them. > > > I64 x I64 -> D64 > > Iop_InsertExpD64 > > > > I64 x I128 -> D128 > > Iop_InsertExpD128 > > ditto comment ditto above that we do need these Iops for instructions that are used. > > > SHIFT SIGNIFICAND INSTRUCTIONS > > ------------------------------- > > U16 x D64 -> D64 > > Iop_ShlD64, Iop_ShrD64 > > > > U16 x D128 -> D128 > > Iop_ShlD128, Iop_ShrD128 > > two things: (1) does the shift amount need to be 16 bits? > For all the other shifting style ops we have, the shift amount > is encoded in 8 bits (Ity_I8) and I would prefer to stick with > that for consistency, if possible. (2) pls put the shift amount > as the second argument, not the first, as that too is consistent > with all other shift ops we have (eg, Iop_Shr64) OK, changed shift to U8 and made it D64 x U8 -> D64; D128 x U8 -> D128 > > --------------- > > > This section give the detailed mapping of Power and s390 > > instructions to the proposed DFP Iops or how they would be implemented > > with the existing Iops and the proposed DFP Iops. > > I'll comment on this second half of the proposal tomorrow. > > J > I have not seen any additional comments. It is probably best to focus on resolving the above issues first as the second part is just a more detailed discussion of the above summary of the Iops. Once we have agreement on the above discussion points, I will update and post version 2 of the proposal. Carl Love |
|
From: John R. <jr...@bi...> - 2012-01-12 16:49:41
|
On 01/11/2012 10:52 AM, Carl E. Love wrote:
[snip]
>> AFAICS (also, from reading the rest of the doc) you want three new types,
>> Ity_D32, Ity_I64 and Ity_D128. (yes? that sounds right to me)
>
> The addition of new types is not required. In the proposal the type D32, D64
> and D128 was used to make it clear that the Iops operate on DFP values stored
> in a floating point registers. From an implementation standpoint, the Iops
> can be implemented using the existing F32, F64 and F128 types. I did a proof
> of concept implementation for a couple of Iops by explicitly adding the D64
> and D128 types. I have implemented most of the 49 POWER instructions just using
> the existing floating point types. Adding the DFP types requires adding more
> code to handle unop, binop, triops of DFP values. If the Iops are implemented
> using the F32, F64 and F128 types, then the existing code to handle the various
> unop, binop and triop cases can be leveraged minimizing the amount of additional
> code. For example, we don't need new binop code to handle two DFP operands, we
> just leverage the existing binop code that handles two floating point operands.
> Not adding DFP types will minimize the amount of additional code that is
> functionally similar to existing code. Adding the DFP type may provide
> additional capability for type checking. I am not sure how much value there
> is in this additional type checking capability. Input on this point from the
> community would be good.
For binary floating point, memcheck currently tracks undefined bits very coarsely.
If at least one bit of input is Undefined, then every bit of output is Undefined.
This coarseness is independent of the operation [except perhaps for a Copy operation],
independent of the field (sign, exponent, significand), and independent of the
positional value of the Undefined bit(s) within the field.
I am disappointed by the current coarseness of tracking Undefined bits for
binary floating point, and I am saddened that tracking for decimal floating point
may inherit similar coarseness. This will produce visible practical limitations
in the Usability of memcheck for decimal floating point. For instance, consider
US retail transactions. There is a big difference between an Undefined which
affects "only" the cents ($0.01) versus an Undefined which affects the thousands
of dollars ($1,000.00). Furthermore, the cents column can be _more_ important.
The decline in effective value of a one-cent coin ("penny": $0.01) has produced
serious proposals for "nickel rounding" (round the total of several items
to a multiple of $0.05), and this has high impact for grocery stores and other
retailers with a large number of small transactions.
So, please introduce a type corresponding to each Decimal Floating Point
data format, the better to encourage more accurate tracking of Undefined
bits in DFP operations.
[By the way, memcheck understands only partially the propagation of
Undefined bits through two's complement binary integer ADD and SUBtract!
Current code believes that an Undefined input bit always "pollutes" any
result bit that has equal or greater significance ("to the left"),
which ignores the fact that Carry does not propagate when there are
enough Defined zero bits at matching positional values. Most of the
ugliness for memcheck suppressions of false positives in string operations
(strlen, etc.) is a direct result of memcheck not understanding the
details of propagation of Undefined in a data-dependent Carry chain.]
--
|
|
From: Christian B. <bor...@de...> - 2012-01-12 20:02:37
|
> For binary floating point, memcheck currently tracks undefined bits very coarsely. > If at least one bit of input is Undefined, then every bit of output is Undefined. Being not an expert on floating point, isnt that exactly what we want? I would consider any undefined bit in floating point input a bug, no? |
|
From: John R. <jr...@bi...> - 2012-01-12 21:03:30
|
On 01/12/2012 12:02 PM, Christian Borntraeger wrote:
>> For binary floating point, memcheck currently tracks undefined bits very coarsely.
>> If at least one bit of input is Undefined, then every bit of output is Undefined.
>
> Being not an expert on floating point, isnt that exactly what we want? I would consider
> any undefined bit in floating point input a bug, no?
No, that is only the least possible "good" property. I want the best possible
good property: An output bit is marked Undefined if and only if that bit is
Undefined after bit-by-bit examination using the value tables for one-bit
operations involving 0,1, and U. The one-bit table for addition is:
A B Ci Sum Cout
-------------------
0 0 0 0 0
0 0 U U 0 *
0 0 1 1 0
0 U 0 U 0 *
0 U U U U
0 U 1 U U
0 1 0 1 0
0 1 U U U
0 1 1 0 1
U 0 0 U 0 *
U 0 U U U
U 0 1 U U
U U 0 U U
U U U U U
U U 1 U U
U 1 0 U U
U 1 U U U
U 1 1 U 1 *
1 0 0 1 0
1 0 U U U
1 0 1 0 1
1 U 0 U U
1 U U U U
1 U 1 U 1 *
1 1 0 0 1
1 1 U U 1 *
1 1 1 1 1
The rows marked with '*' are cases where CarryOut (Cout) is Defined
even though exactly one input from {A, B, Ci} is Undefined. Currently
memcheck pessimizes these cases by assuming that Cout also is Undefined.
For example, if the inputs to a Decimal Floating Point ADD are
12.34 with all bits Defined except the 00.20 bit,
and 65.65 with all bits Defined
+ ----- then the result of ADD is
77.99 with all bits Defined except the 00.60 bits
(the 00.20 bit, also the 00.40 bit because of Carry)
Thus in particular the sum has no possible Undefined in the
integer part (77.) nor in the "low 5 bits" (the 00.19).
If DFP were to inherit its Undefined behavior from current binary FP,
then the sum would be marked as "all bits undefined", i.e., worthless.
Yet we know that only 2 of its bits can be bad.
--
|
|
From: Julian S. <js...@ac...> - 2012-01-19 18:17:15
|
> If there is value in adding the DFP type, I am willing to change my
> preliminary POWER7 implementation to add the DFP types rather then using
> the floating point types as I currently have. Having the DFP type may
> make the overall code clearer then having Iops with the D32, D64 and D128
> in the name but actually operate on F32, F64 and F128 values.
>
> Thoughts on adding DFP type?
Yes, please do. It does cause the code to be a little longer, but it's
also clearer and easier to verify as being correct. In the long run the
latter points are much more important.
> The POWER7 and s390 machines support the additional four rounding modes for
> DFP (total of 8 not 9 modes) in the IEEE specification. After some
> thought, I think it would be better to explicitly change the rounding mode
> specifier from IRRoundingMode(I32) to IRRoundingModeDFP(I32). The concern
> is if we extend the existing rounding mode specifier then there will be
> rounding modes for which the existing binary floating point instructions
> are not specified to support. Yes, it is some replication of code to
> create a second super-set of rounding modes for DFP but from an overall
> consistency, clarity and accuracy standpoint I think it might be best.
I agree. +1 for IRRoundingModeDFP. It will have 8 possible values,
yes?
> OK, I had not specifically read the document above but was aware that
> Valgrind only had limited rounding and exception support. I included
> the full rounding and exception support for completeness. I thought it
> better to include it at this point then to gloss over it. I added a
> comment to the document ahead of the field definitions to that effect.
> So, if there is no chance that the exception support will be added then
> we can just drop the exception specification part.
>
> Thoughts?
Drop the exception specification part. If we ever need to support FP
exceptions then we will need to design a solution which works cleanly
for both regular FP and DFP.
> The immediate exponent value in the instruction is specific to the POWER
> processor. I have spent some time reconsidering this. We really don't
> need it so I will remove the IRRoundingModeAndEponent(I32).
Good.
> Comment from Florian:
> > You did not describe InvOperationModes. But looking at insn LDETR
> > (which is D32 -> D64 conversion) I gather that InvOperationModes
> > controls whether or not the IEEE-invalid-operation-exception is
> > delivered. It's essentially a Boolean value. I propose to ignore it
> > and deliver the exception unconditionally (which is what we do for
> > binary floating point).
>
> Julian do you agree we should remove the InvOperationModes(I32) from
> the Iop specification? I am OK with it.
Yes, please remove.
> > > InvOperationModes(I32) x D64 -> D128
> > > Iop_D64toD128
> >
> > ditto
> >
> > > IRRoundingExceptionModes(I32) x D64 -> D32
> > > Iop_RoundD64toD32
> > >
> > > IRRoundingExceptionModes(I32) x D128 -> D64
> > > Iop_RoundD128toD64
> > >
> > > IRRoundingExceptionModes(I32) x I64 -> D64
> > > Iop_I64StoD64
> >
> > ok
>
> Comment from Florian:
> > These two should be renamed to Iop_D64toD32 and Iop_D128toD64 for
> > symmetry in naming with binary floating point ops.
>
> I disagree. The drsp instruction is analogous to the binary FP frsp
> instruction, which (for ppc64) uses the Iop Iop_RoundF64ToF32. The
> distinction is that an Iop that does not have "Round" in the name is
> a conversion from one general type (i.e., floating point) to a different
> general type (i.e., integer); whereas an Iop that *does* have "Round"
> in the name is a narrowing of the same general type.
Florian is correct, assuming that the types for these operations
(D64 -> D32 etc) are really what you (Carl) intended. Problem is
that it's unclear what behaviour you want. Your options are
round and convert to a different format
in which case you want the (eg) Iop_D64toD32 name style
and the IR types for src and dst will then be different
(eg D64 and D32 in this case)
round (to a smaller range) but remain in the same format
in which case you want the Iop_RoundD64toD32 names
in this case the src and dst types are the same (D64)
and D32 merely indicates the range to which the value is
rounded
so which is it?
> Comment from Florian:
> > For s390 we also need:
> > IROp description s390 insn
> >
> > Iop_I64StoD128 IRRoundingMode(I32) x signed I64 -> D128 CXGTR
> > Iop_I32StoD64 signed I32 -> D64 CDFTR
> > Iop_I32StoD128 signed I32 -> D128 CXFTR
> > Iop_I64UtoD64 IRRoundingMode(I32) x unsigned I64 -> D64 CDLGTR
> > Iop_I64UtoD128 IRRoundingMode(I32) x unsigned I64 -> D128 CXLGTR
> > Iop_I32UtoD64 unsigned I32 -> D64 CDLFTR
> > Iop_I32UtoD128 unsigned I32 -> D128 CXLFTR
> >
> > We need both: conversion to signed and unsigned int
> >
> > IROp description s390 insn
> >
> > Iop_D64toI64S IRRoundingMode(I32) x D64 -> signed I64 CGDTR(A)
> > Iop_D128toI64S IRRoundingMode(I32) x D128 -> signed I64 CGXTR(A)
> > Iop_D64toI32S IRRoundingMode(I32) x D64 -> signed I32 CFDTR
> > Iop_D128toI32S IRRoundingMode(I32) x D128 -> signed I32 CFXTR
> > Iop_D64toI64U IRRoundingMode(I32) x D64 -> unsigned I64 CLGDTR
> > Iop_D128toI64U IRRoundingMode(I32) x D128 -> unsigned I64 CLGXTR
> > Iop_D64toI32U IRRoundingMode(I32) x D64 -> unsigned I32 CLFDTR
> > Iop_D128toI32U IRRoundingMode(I32) x D128 -> unsigned I32 CLFXTR
> >
> > Note, the new IRops for conversion to 32-bit wide results and from D128.
>
> Comment from Christian Borntraege:
> > > The PFPO insn is used to convert between binary floating point and
> > > decimal floating point. Since we have 3 formats each, that makes 9
> > > conversion ops for each direction:
> > >
> > > Iop_D32toF32 IRRoundingMode(I32) x D32 -> F32
> > > Iop_D32toF64 IRRoundingMode(I32) x D32 -> F64
> > > Iop_D32toF128 IRRoundingMode(I32) x D32 -> F128
> > > Iop_D64toF32 IRRoundingMode(I32) x D64 -> F32
> > > Iop_D64toF64 IRRoundingMode(I32) x D64 -> F64
> > > Iop_D64toF128 IRRoundingMode(I32) x D64 -> F128
> > > Iop_D128toF32 IRRoundingMode(I32) x D128 -> F32
> > > Iop_D128toF64 IRRoundingMode(I32) x D128 -> F64
> > > Iop_D128toF128 IRRoundingMode(I32) x D128 -> F128
> > >
> > > Iop_F32toD32 IRRoundingMode(I32) x F32 -> D32
> > > Iop_F32toD64 IRRoundingMode(I32) x F32 -> D64
> > > Iop_F32toD128 IRRoundingMode(I32) x F32 -> D128
> > > Iop_F64toD32 IRRoundingMode(I32) x F64 -> D32
> > > Iop_F64toD64 IRRoundingMode(I32) x F64 -> D64
> > > Iop_F64toD128 IRRoundingMode(I32) x F64 -> D128
> > > Iop_F128toD32 IRRoundingMode(I32) x F128 -> D32
> > > Iop_F128toD64 IRRoundingMode(I32) x F128 -> D64
> > > Iop_F128toD128 IRRoundingMode(I32) x F128 -> D128
> >
> > If you look at pfpo, then the instruction has the same tricky
> > behaviour as EXecute. Since a self checking prefix and 18 Iops is
> >
> > pretty expensive I think that pfpo qualifies for a helper.
>
> These conversion modes are specifically support by s390 but not POWER.
> POWER supports a subset of the conversions supported by s390. This is
> one of the areas where I feel we should defer the decision on adding more
> Iops, using a helper function or emulating these conversions with the
> subset of conversions that have been proposed until the s390 team is ready
> to add the s390 support to Valgrind. This is beyond the scope of what is
> needed for me to add the POWER support. But I think it is important that
> the issue has been raised to understand the full scope of what is needed
> by both architectures.
Ok. I agree .. just add the conversions needed for POWER, unless
Christian/Florian want to also add the S390 needed conversions now
(I am unclear if you do, or not)
>
> > > IRRoundingExceptionModes(I32) x D64 -> I64
> > > Iop_D64toI64
> >
> > this is underspecified .. you need to decide whether that's a
> > conversion to signed or unsigned I64 (or maybe you need both)
> > and call them Iop_D64toI64S or Iop_D64toI64U respectively.
> > (I think you comment about this further down in the doc.)
> > I mention this partly because sorting out such ambiguity in the
> > past for the Fxx->Ixx conversions required a lot of hoop
> > jumping, so we might as well get it straightened out up front.
>
> OK, should be convert to signed integer. Changed the name to
> Iop_D64toI64S.
Good.
> > > ROUNDING INSTRUCTIONS
> > > -----------------------
> > > IRRoundingMode(I32) x D64 -> D64
> > > Iop_RoundD64
> > >
> > > IRRoundingMode(I32) x D128 -> D128
> > > Iop_RoundD128
> >
> > ok
>
> Comment from Florian:
> > These should be named Iop_RoundD64toInt and Iop_RoundD128toInt for
> > symmetry in naming with binary floating point ops.
>
> I am OK with the name change. Comments?
/me agrees with Florian.
>
> > > COMPARE INSTRUCTIONS
> > > -----------------------
> > > D64 x D64 -> IRCmpF64Result(I32)
> > > Iop_CmpD64
> > >
> > > D128 x D128 -> IRCmpF64Result(I32)
> > > Iop_CmpD128
> >
> > ok
>
> Comment from Florian:
> > OK. I would use IRCmpD64Result and IRCmpD128Result. That allows
> > us to use a different encoding, which may be desirable.
>
> I am OK with the name change. Comments?
Fine by me.
> > > QUANTIZE AND ROUND INSTRUCTIONS
> > > -------------------------------
> > > IRRoundingMode(I32) x D64-> D64
> > > Iop_QuantizeID64,
> > >
> > > IRRoundingMode(I32) x D128-> D128
> > > Iop_QuantizeID128
> > >
> > > IRRoundingMode(I32) x D64 x D64 -> D64
> > > Iop_QuantizeD64
> > >
> > > IRRoundingMode(I32) x D128 x D128 -> D128
> > > Iop_QuantizeD128
> >
> > I'm not clear what the ID vs D signifies in these names. Can
> > they instead be called Iop_Quantize{Un,Bin}{D64,D128} to denote
> > unary vs binary ness (ignoring the rounding mode arg which is
> > present in all 4 cases).
>
> The first two with the I are for immediate value of the exponent for the
> quantization operation. Remember above, I had the
> IRRoundingModeAndEponent(I32) specification. It was also there for
> specifying the immediate value. I didn't realize that I had a redundant
> way of specifying the immediate exponent value. So, as said above, we drop
> IRRoundingModeAndEponent(I32) specification.
>
> > What is quantization, anyway (in the context of DFP I mean)?
> > Does it have any analogue in traditional IEEE754 FP ?
>
> Quantization is the process of rounding a number to a specified exponent as
> mentioned earlier. For example, the U.S. specifies a dollar amount with at
> most two fractional digits, for example $12.98. When doing a computation
> involving money, the quantization instruction would be used to round the
> result to two fractional digits.
>
> The immediate value to specify the desired exponent is specific to POWER.
> Both s390 and POWER have instructions for changing the exponent of
> (quantizing) a value in register A to match the exponent of a value in
> register B. I think we should remove the Iops Iop_QuantizeID64 and
> Iop_QuantizeID128 for specify an immediate exponent. For the POWER
> instruction that specifies the immediate exponent, I can just generate a
> DFP number with the desired exponent and pass that as the value in
> register B as the target exponent. This will help minimize the number of
> new Iops.
Ok by me.
> > > IRRoundingMode(I32) x D64 x D64 -> D64
> > > Iop_SignificanceRoundD64
> > >
> > > IRRoundingMode(I32) x D128 x D128 -> D128
> > > Iop_SignificanceRoundD128
> > >
> > >
> > > EXTRACT AND INSERT INSTRUCTIONS
> > > -------------------------------
> > > D64 -> I64
> > > Iop_ExtractExpD64
> > >
> > > D128 -> I64
> > > Iop_ExtractExpD128
> >
> > The exponent really needs 64 bits? Can it be 32 bits? That
> > might allow for more efficient code generation for 32 bit targets.
>
> The biased exponent could be stored as a 32 bit biased integer. The
> functionality of the specific instruction that this Iop was intended for
> specifies the exponent will be stored in a floating point register in a
> signed integer format. I agree we can make the target be I32 and then
> just handle the I32 to I64 conversion as needed for the specific
> instruction. The exponent for any DFP number will easily fit in less then
> 31 bits.
>
> I will change the above two Iops to return an I32 value.
Good.
>
> Comment from Florian:
> > Do we need to support these at all? In other words, does GCC issue these
> > or do they show up in hand crafted assembler shipped with GCC/GLIBC?
> > I don't know but will find out (for s390).
>
> Yes, the libdfp and binutils use them.
>
> > > I64 x I64 -> D64
> > > Iop_InsertExpD64
> > >
> > > I64 x I128 -> D128
> > > Iop_InsertExpD128
> >
> > ditto comment
>
> ditto above that we do need these Iops for instructions that are used.
So (unclear) as with ExtractExp, you'll change the exp value to be an I32
instead of an I64, yes?
> > > SHIFT SIGNIFICAND INSTRUCTIONS
> > > -------------------------------
> > > U16 x D64 -> D64
> > > Iop_ShlD64, Iop_ShrD64
> > >
> > > U16 x D128 -> D128
> > > Iop_ShlD128, Iop_ShrD128
> >
> > two things: (1) does the shift amount need to be 16 bits?
> > For all the other shifting style ops we have, the shift amount
> > is encoded in 8 bits (Ity_I8) and I would prefer to stick with
> > that for consistency, if possible. (2) pls put the shift amount
> > as the second argument, not the first, as that too is consistent
> > with all other shift ops we have (eg, Iop_Shr64)
>
> OK, changed shift to U8 and made it D64 x U8 -> D64; D128 x U8 -> D128
Good.
> > ---------------
> >
> > > This section give the detailed mapping of Power and s390
> > >
> > > instructions to the proposed DFP Iops or how they would be implemented
> > > with the existing Iops and the proposed DFP Iops.
> >
> > I'll comment on this second half of the proposal tomorrow.
> >
> > J
>
> I have not seen any additional comments. It is probably best to focus on
> resolving the above issues first as the second part is just a more detailed
> discussion of the above summary of the Iops. Once we have agreement on the
> above discussion points, I will update and post version 2 of the proposal.
Yes, good plan.
J
|
|
From: Carl E. L. <ce...@li...> - 2012-01-19 19:36:28
|
On Thu, 2012-01-19 at 19:12 +0100, Julian Seward wrote:
> > If there is value in adding the DFP type, I am willing to change my
> > preliminary POWER7 implementation to add the DFP types rather then using
> > the floating point types as I currently have. Having the DFP type may
> > make the overall code clearer then having Iops with the D32, D64 and D128
> > in the name but actually operate on F32, F64 and F128 values.
> >
> > Thoughts on adding DFP type?
>
> Yes, please do. It does cause the code to be a little longer, but it's
> also clearer and easier to verify as being correct. In the long run the
> latter points are much more important.
>
>
>
> > The POWER7 and s390 machines support the additional four rounding modes for
> > DFP (total of 8 not 9 modes) in the IEEE specification. After some
> > thought, I think it would be better to explicitly change the rounding mode
> > specifier from IRRoundingMode(I32) to IRRoundingModeDFP(I32). The concern
> > is if we extend the existing rounding mode specifier then there will be
> > rounding modes for which the existing binary floating point instructions
> > are not specified to support. Yes, it is some replication of code to
> > create a second super-set of rounding modes for DFP but from an overall
> > consistency, clarity and accuracy standpoint I think it might be best.
>
> I agree. +1 for IRRoundingModeDFP. It will have 8 possible values,
> yes?
>
>
>
> > OK, I had not specifically read the document above but was aware that
> > Valgrind only had limited rounding and exception support. I included
> > the full rounding and exception support for completeness. I thought it
> > better to include it at this point then to gloss over it. I added a
> > comment to the document ahead of the field definitions to that effect.
> > So, if there is no chance that the exception support will be added then
> > we can just drop the exception specification part.
> >
> > Thoughts?
>
> Drop the exception specification part. If we ever need to support FP
> exceptions then we will need to design a solution which works cleanly
> for both regular FP and DFP.
>
>
>
> > The immediate exponent value in the instruction is specific to the POWER
> > processor. I have spent some time reconsidering this. We really don't
> > need it so I will remove the IRRoundingModeAndEponent(I32).
>
> Good.
>
>
>
> > Comment from Florian:
> > > You did not describe InvOperationModes. But looking at insn LDETR
> > > (which is D32 -> D64 conversion) I gather that InvOperationModes
> > > controls whether or not the IEEE-invalid-operation-exception is
> > > delivered. It's essentially a Boolean value. I propose to ignore it
> > > and deliver the exception unconditionally (which is what we do for
> > > binary floating point).
> >
> > Julian do you agree we should remove the InvOperationModes(I32) from
> > the Iop specification? I am OK with it.
>
> Yes, please remove.
>
>
>
> > > > InvOperationModes(I32) x D64 -> D128
> > > > Iop_D64toD128
> > >
> > > ditto
> > >
> > > > IRRoundingExceptionModes(I32) x D64 -> D32
> > > > Iop_RoundD64toD32
> > > >
> > > > IRRoundingExceptionModes(I32) x D128 -> D64
> > > > Iop_RoundD128toD64
> > > >
> > > > IRRoundingExceptionModes(I32) x I64 -> D64
> > > > Iop_I64StoD64
> > >
> > > ok
> >
> > Comment from Florian:
> > > These two should be renamed to Iop_D64toD32 and Iop_D128toD64 for
> > > symmetry in naming with binary floating point ops.
> >
> > I disagree. The drsp instruction is analogous to the binary FP frsp
> > instruction, which (for ppc64) uses the Iop Iop_RoundF64ToF32. The
> > distinction is that an Iop that does not have "Round" in the name is
> > a conversion from one general type (i.e., floating point) to a different
> > general type (i.e., integer); whereas an Iop that *does* have "Round"
> > in the name is a narrowing of the same general type.
>
> Florian is correct, assuming that the types for these operations
> (D64 -> D32 etc) are really what you (Carl) intended. Problem is
> that it's unclear what behaviour you want. Your options are
>
> round and convert to a different format
> in which case you want the (eg) Iop_D64toD32 name style
> and the IR types for src and dst will then be different
> (eg D64 and D32 in this case)
>
> round (to a smaller range) but remain in the same format
> in which case you want the Iop_RoundD64toD32 names
> in this case the src and dst types are the same (D64)
> and D32 merely indicates the range to which the value is
> rounded
>
> so which is it?
>
>
The intention is the Iop_RoundD128toD64 Iop takes a value with the
physical D128 bit encoding and generates a result with the physical D64
bit encoding. Similarly for the Iop_RoundD64toD32 Iop. So based on
your definitions above, the first one (round and convert to a different
format) is correct and the Iops need to be renamed. The above
clarification was really helpful to understand the naming convention.
Thanks. I will make the Iop name changes.
>
>
> > Comment from Florian:
> > > For s390 we also need:
> > > IROp description s390 insn
> > >
> > > Iop_I64StoD128 IRRoundingMode(I32) x signed I64 -> D128 CXGTR
> > > Iop_I32StoD64 signed I32 -> D64 CDFTR
> > > Iop_I32StoD128 signed I32 -> D128 CXFTR
> > > Iop_I64UtoD64 IRRoundingMode(I32) x unsigned I64 -> D64 CDLGTR
> > > Iop_I64UtoD128 IRRoundingMode(I32) x unsigned I64 -> D128 CXLGTR
> > > Iop_I32UtoD64 unsigned I32 -> D64 CDLFTR
> > > Iop_I32UtoD128 unsigned I32 -> D128 CXLFTR
> > >
> > > We need both: conversion to signed and unsigned int
> > >
> > > IROp description s390 insn
> > >
> > > Iop_D64toI64S IRRoundingMode(I32) x D64 -> signed I64 CGDTR(A)
> > > Iop_D128toI64S IRRoundingMode(I32) x D128 -> signed I64 CGXTR(A)
> > > Iop_D64toI32S IRRoundingMode(I32) x D64 -> signed I32 CFDTR
> > > Iop_D128toI32S IRRoundingMode(I32) x D128 -> signed I32 CFXTR
> > > Iop_D64toI64U IRRoundingMode(I32) x D64 -> unsigned I64 CLGDTR
> > > Iop_D128toI64U IRRoundingMode(I32) x D128 -> unsigned I64 CLGXTR
> > > Iop_D64toI32U IRRoundingMode(I32) x D64 -> unsigned I32 CLFDTR
> > > Iop_D128toI32U IRRoundingMode(I32) x D128 -> unsigned I32 CLFXTR
> > >
> > > Note, the new IRops for conversion to 32-bit wide results and from D128.
> >
> > Comment from Christian Borntraege:
> > > > The PFPO insn is used to convert between binary floating point and
> > > > decimal floating point. Since we have 3 formats each, that makes 9
> > > > conversion ops for each direction:
> > > >
> > > > Iop_D32toF32 IRRoundingMode(I32) x D32 -> F32
> > > > Iop_D32toF64 IRRoundingMode(I32) x D32 -> F64
> > > > Iop_D32toF128 IRRoundingMode(I32) x D32 -> F128
> > > > Iop_D64toF32 IRRoundingMode(I32) x D64 -> F32
> > > > Iop_D64toF64 IRRoundingMode(I32) x D64 -> F64
> > > > Iop_D64toF128 IRRoundingMode(I32) x D64 -> F128
> > > > Iop_D128toF32 IRRoundingMode(I32) x D128 -> F32
> > > > Iop_D128toF64 IRRoundingMode(I32) x D128 -> F64
> > > > Iop_D128toF128 IRRoundingMode(I32) x D128 -> F128
> > > >
> > > > Iop_F32toD32 IRRoundingMode(I32) x F32 -> D32
> > > > Iop_F32toD64 IRRoundingMode(I32) x F32 -> D64
> > > > Iop_F32toD128 IRRoundingMode(I32) x F32 -> D128
> > > > Iop_F64toD32 IRRoundingMode(I32) x F64 -> D32
> > > > Iop_F64toD64 IRRoundingMode(I32) x F64 -> D64
> > > > Iop_F64toD128 IRRoundingMode(I32) x F64 -> D128
> > > > Iop_F128toD32 IRRoundingMode(I32) x F128 -> D32
> > > > Iop_F128toD64 IRRoundingMode(I32) x F128 -> D64
> > > > Iop_F128toD128 IRRoundingMode(I32) x F128 -> D128
> > >
> > > If you look at pfpo, then the instruction has the same tricky
> > > behaviour as EXecute. Since a self checking prefix and 18 Iops is
> > >
> > > pretty expensive I think that pfpo qualifies for a helper.
> >
> > These conversion modes are specifically support by s390 but not POWER.
> > POWER supports a subset of the conversions supported by s390. This is
> > one of the areas where I feel we should defer the decision on adding more
> > Iops, using a helper function or emulating these conversions with the
> > subset of conversions that have been proposed until the s390 team is ready
> > to add the s390 support to Valgrind. This is beyond the scope of what is
> > needed for me to add the POWER support. But I think it is important that
> > the issue has been raised to understand the full scope of what is needed
> > by both architectures.
>
> Ok. I agree .. just add the conversions needed for POWER, unless
> Christian/Florian want to also add the S390 needed conversions now
> (I am unclear if you do, or not)
I have no need to support all these on POWER. I will put the following
into version 2 of the proposal so it is clear we may need additional
Iops in the future for s390.
"Additional format conversion Iops for converting to/from
decimal floating point and binary floating point may need to be
added later for the s390 support."
I think it is important that we state some additional Iops maybe needed
for other architectures. But I would like to defer the final decision
on exactly how these conversions are supported via additional Iops or a
helper function to when the s390 support is done.
>
>
>
> >
> > > > IRRoundingExceptionModes(I32) x D64 -> I64
> > > > Iop_D64toI64
> > >
> > > this is underspecified .. you need to decide whether that's a
> > > conversion to signed or unsigned I64 (or maybe you need both)
> > > and call them Iop_D64toI64S or Iop_D64toI64U respectively.
> > > (I think you comment about this further down in the doc.)
> > > I mention this partly because sorting out such ambiguity in the
> > > past for the Fxx->Ixx conversions required a lot of hoop
> > > jumping, so we might as well get it straightened out up front.
> >
> > OK, should be convert to signed integer. Changed the name to
> > Iop_D64toI64S.
>
> Good.
>
>
>
> > > > ROUNDING INSTRUCTIONS
> > > > -----------------------
> > > > IRRoundingMode(I32) x D64 -> D64
> > > > Iop_RoundD64
> > > >
> > > > IRRoundingMode(I32) x D128 -> D128
> > > > Iop_RoundD128
> > >
> > > ok
> >
> > Comment from Florian:
> > > These should be named Iop_RoundD64toInt and Iop_RoundD128toInt for
> > > symmetry in naming with binary floating point ops.
> >
> > I am OK with the name change. Comments?
>
> /me agrees with Florian.
>
>
>
> >
> > > > COMPARE INSTRUCTIONS
> > > > -----------------------
> > > > D64 x D64 -> IRCmpF64Result(I32)
> > > > Iop_CmpD64
> > > >
> > > > D128 x D128 -> IRCmpF64Result(I32)
> > > > Iop_CmpD128
> > >
> > > ok
> >
> > Comment from Florian:
> > > OK. I would use IRCmpD64Result and IRCmpD128Result. That allows
> > > us to use a different encoding, which may be desirable.
> >
> > I am OK with the name change. Comments?
>
> Fine by me.
>
>
>
> > > > QUANTIZE AND ROUND INSTRUCTIONS
> > > > -------------------------------
> > > > IRRoundingMode(I32) x D64-> D64
> > > > Iop_QuantizeID64,
> > > >
> > > > IRRoundingMode(I32) x D128-> D128
> > > > Iop_QuantizeID128
> > > >
> > > > IRRoundingMode(I32) x D64 x D64 -> D64
> > > > Iop_QuantizeD64
> > > >
> > > > IRRoundingMode(I32) x D128 x D128 -> D128
> > > > Iop_QuantizeD128
> > >
> > > I'm not clear what the ID vs D signifies in these names. Can
> > > they instead be called Iop_Quantize{Un,Bin}{D64,D128} to denote
> > > unary vs binary ness (ignoring the rounding mode arg which is
> > > present in all 4 cases).
> >
> > The first two with the I are for immediate value of the exponent for the
> > quantization operation. Remember above, I had the
> > IRRoundingModeAndEponent(I32) specification. It was also there for
> > specifying the immediate value. I didn't realize that I had a redundant
> > way of specifying the immediate exponent value. So, as said above, we drop
> > IRRoundingModeAndEponent(I32) specification.
> >
> > > What is quantization, anyway (in the context of DFP I mean)?
> > > Does it have any analogue in traditional IEEE754 FP ?
> >
> > Quantization is the process of rounding a number to a specified exponent as
> > mentioned earlier. For example, the U.S. specifies a dollar amount with at
> > most two fractional digits, for example $12.98. When doing a computation
> > involving money, the quantization instruction would be used to round the
> > result to two fractional digits.
> >
> > The immediate value to specify the desired exponent is specific to POWER.
> > Both s390 and POWER have instructions for changing the exponent of
> > (quantizing) a value in register A to match the exponent of a value in
> > register B. I think we should remove the Iops Iop_QuantizeID64 and
> > Iop_QuantizeID128 for specify an immediate exponent. For the POWER
> > instruction that specifies the immediate exponent, I can just generate a
> > DFP number with the desired exponent and pass that as the value in
> > register B as the target exponent. This will help minimize the number of
> > new Iops.
>
> Ok by me.
>
>
>
> > > > IRRoundingMode(I32) x D64 x D64 -> D64
> > > > Iop_SignificanceRoundD64
> > > >
> > > > IRRoundingMode(I32) x D128 x D128 -> D128
> > > > Iop_SignificanceRoundD128
> > > >
> > > >
> > > > EXTRACT AND INSERT INSTRUCTIONS
> > > > -------------------------------
> > > > D64 -> I64
> > > > Iop_ExtractExpD64
> > > >
> > > > D128 -> I64
> > > > Iop_ExtractExpD128
> > >
> > > The exponent really needs 64 bits? Can it be 32 bits? That
> > > might allow for more efficient code generation for 32 bit targets.
> >
> > The biased exponent could be stored as a 32 bit biased integer. The
> > functionality of the specific instruction that this Iop was intended for
> > specifies the exponent will be stored in a floating point register in a
> > signed integer format. I agree we can make the target be I32 and then
> > just handle the I32 to I64 conversion as needed for the specific
> > instruction. The exponent for any DFP number will easily fit in less then
> > 31 bits.
> >
> > I will change the above two Iops to return an I32 value.
>
> Good.
>
>
>
>
> >
> > Comment from Florian:
> > > Do we need to support these at all? In other words, does GCC issue these
> > > or do they show up in hand crafted assembler shipped with GCC/GLIBC?
> > > I don't know but will find out (for s390).
> >
> > Yes, the libdfp and binutils use them.
> >
> > > > I64 x I64 -> D64
> > > > Iop_InsertExpD64
> > > >
> > > > I64 x I128 -> D128
> > > > Iop_InsertExpD128
> > >
> > > ditto comment
> >
> > ditto above that we do need these Iops for instructions that are used.
>
> So (unclear) as with ExtractExp, you'll change the exp value to be an I32
> instead of an I64, yes?
Sorry, missed that. Yes, we should change this to be consistent with
the ExtractExp. So, it will be:
I32 x I64 -> D64
Iop_InsertExpD64
I32 x I128 -> D128
Iop_InsertExpD128
where the I32 is the exponent and the I64/128 is the significand.
> > > > SHIFT SIGNIFICAND INSTRUCTIONS
> > > > -------------------------------
> > > > U16 x D64 -> D64
> > > > Iop_ShlD64, Iop_ShrD64
> > > >
> > > > U16 x D128 -> D128
> > > > Iop_ShlD128, Iop_ShrD128
> > >
> > > two things: (1) does the shift amount need to be 16 bits?
> > > For all the other shifting style ops we have, the shift amount
> > > is encoded in 8 bits (Ity_I8) and I would prefer to stick with
> > > that for consistency, if possible. (2) pls put the shift amount
> > > as the second argument, not the first, as that too is consistent
> > > with all other shift ops we have (eg, Iop_Shr64)
> >
> > OK, changed shift to U8 and made it D64 x U8 -> D64; D128 x U8 -> D128
>
> Good.
>
>
>
> > > ---------------
> > >
> > > > This section give the detailed mapping of Power and s390
> > > >
> > > > instructions to the proposed DFP Iops or how they would be implemented
> > > > with the existing Iops and the proposed DFP Iops.
> > >
> > > I'll comment on this second half of the proposal tomorrow.
> > >
> > > J
> >
> > I have not seen any additional comments. It is probably best to focus on
> > resolving the above issues first as the second part is just a more detailed
> > discussion of the above summary of the Iops. Once we have agreement on the
> > above discussion points, I will update and post version 2 of the proposal.
>
> Yes, good plan.
>
> J
|
|
From: Carl E. L. <ce...@li...> - 2012-01-19 23:37:35
|
Valgrind Community: The following is version 2 of the proposal to add Iops to Valgrind to support the Power and s390 Decimal Floating Point (DFP) instructions. The following proposal includes adding 28 new DFP Iops to support operations on 32-bit, 64-bit and 128-bit DFP numbers. The goal is to define a minimal set of Iops that are needed to support these two architectures and hopefully any future architectures that add support for DFP. Three new types will be added to Valgrind to support these Iops, D32, D64 and D128 for the 32-bit, 64-bit and 128-bit DFP values. The proposed DFP support will not extend the existing exception support in Valgrind. The exception support will be the same as the current binary floating point support. The Instruction Set Architecture (ISA) for s390(Z-series) can be found at: http://w3.ibm.com/jct03001pt/wps/myportal, click on the link to the document SA22-7832-08.pdf. The DFP instructions are described in chapter 20. Note, you will need to obtain a free IBM login to get to the document. Follow the instructions on the web page. If you have trouble getting to the document, use the "feedback" link to let them know. The POWER 7 ISA an be found at: http://www.power.org/home Then under community at work, click on the link "Power ISA v2.06 Now Available". The DFP instructions are found in book I, chapter 5. The s390 architecture has a few instructions that may require the introduction of additional Iops to support these instructions. The specifics of how to handle these s390 specific instructions by introducing additional Iops, helper functions, or by emulating them with existing Iops has not been decided. These instructions are not part of the base set of Iops required by both POWER and s390. The discussion of how to support these s390 specific instructions has been deferred until the s390 support is implemented. The names of these Iops are left as "To Be Determined" (TBD) at the time the s3990 support is added. They are called out here for completeness of all the Iops that may need to be added for s390 DFP support. The need for the additional Iops is mentioned in this proposal for completeness sake. The decision on the Iops is deferred until the s390 support is added. The list of proposed DFP Iops will be used as a basis to implement the POWER7 and s390 DFP instruction support. Note that some of the POWER7 and s390 instructions can be implemented using one or more of the new DFP Iops and the existing integer and binary floating point Iops. The instructions that can be implemented using the existing Iops and new DFP Iops have not been explicitly listed in this proposal. I have done a proof of concept implementation for the POWER DFP instruction support in Valgrind. The implementation was intended to validate that the proposed DFP Iops are sufficient for at least the POWER architecture. I will be working on revising the proof of concept implementation for POWER to make it consistent with this version of the proposal and any additional changes the community agrees on. I hope to post the code for inclusion into Valgrind in the near future. Notation Description --------------------------------------------------------------------------- DFP: Decimal Floating Point type (D32, D64 or D128). The term is used to distinguish the value from the traditional binary floating point type. D32: 32-bit decimal floating point type. These values use F64 registers. D64: 64-bit decimal floating point type. These values use F64 registers. D128: 128-bit decimal floating point type. These values use a pair of two 64 bit floating point registers (F64). The instruction only references the first register of the register pair. The second register is implied. Quantization: Rounding a number to a specified exponent is referred to as quantizing a number. For example, the U.S. specifies a dollar amount with at most two fractional digits, for example $12.98. When doing a computation involving money, the quantization instruction would be used to round the result to two fractional digits. Note, POWER 7 and s390 use two floating point registers to hold the 128-bit DFP value. The D32 and D64 values are also stored in the floating point registers. The IEEE 754-2008 specification adds four more rounding modes for DFP that are not supported by the binary floating point numbers. A new rounding mode designator will be introduced for DFP to support all of the specified DFP rounding modes. IRRoundingModeDFP(I32): Indicates the I32 argument is used to hold the bits that specify the rounding mode to be used by the instruction. The possible rounding modes are: - Round to nearest, ties to even - Round toward Zero - Round toward +infinity - Round toward -infinity - Round to Nearest, Ties away from 0 - Round to Nearest, Ties toward 0 - Round to away from Zero - Round to Prepare for Shorter Precision Summary of proposed Iops ---------------------------------------------------------------------------- ARITHMETIC INSTRUCTIONS ----------------------- IRRoundingModeDFP(I32) X D64 X D64 -> D64 Iop_AddD64, Iop_SubD64, Iop_MulD64, Iop_DivD64 IRRoundingModeDFP(I32) X D128 X D128 -> D128 Iop_AddD128, Iop_SubD128, Iop_MulD128, Iop_DivD128 FORMAT CONVERSION INSTRUCTIONS ------------------------------ D32 -> D64 Iop_D32toD64 D64 -> D128 Iop_D64toD128 IRRoundingModeDFP(I32) x D64 -> D32 Iop_D64toD32 IRRoundingModeDFP(I32) x D128 -> D64 Iop_D128toD64 IRRoundingModeDFP(I32) x I64 -> D64 Iop_I64StoD64 IRRoundingModeDFP(I32) x D64 -> I64 Iop_D64toI64S Additional format conversion Iops for converting to/from the DFP and binary floating point formats may need to be added later for s390 support. ROUNDING INSTRUCTIONS ----------------------- IRRoundingModeDFP(I32) x D64 -> D64 Iop_RoundD64toInt IRRoundingModeDFP(I32) x D128 -> D128 Iop_RoundD128toInt COMPARE INSTRUCTIONS ----------------------- D64 x D64 -> IRCmpD64Result(I32) Iop_CmpD64 D128 x D128 -> IRCmpD64Result(I32) Iop_CmpD128 QUANTIZE AND ROUND INSTRUCTIONS ------------------------------- IRRoundingModeDFP(I32) x D64 x D64 -> D64 Iop_QuantizeD64 IRRoundingModeDFP(I32) x D128 x D128 -> D128 Iop_QuantizeD128 IRRoundingModeDFP(I32) x D64 x D64 -> D64 Iop_SignificanceRoundD64 IRRoundingModeDFP(I32) x D128 x D128 -> D128 Iop_SignificanceRoundD128 EXTRACT AND INSERT INSTRUCTIONS ------------------------------- D64 -> I32 Iop_ExtractExpD64 D128 -> I32 Iop_ExtractExpD128 I32 x I64 -> D64 Iop_InsertExpD64 I32 x I128 -> D128 Iop_InsertExpD128 SHIFT SIGNIFICAND INSTRUCTIONS ------------------------------- D64 x U8 -> D64 Iop_ShlD64, Iop_ShrD64 D128 x U8 -> D128 Iop_ShlD128, Iop_ShrD128 This section gives the detailed mapping of the proposed DFP Iops to the corresponding POWER and s390 instruction. The POWER and s390 instructions are listed in their respective Instruction Set Architecture documents referenced at the beginning of this proposal. The POWER and s390 instructions that can be implemented using a sequence of Iops are not listed. ARITHMETIC INSTRUCTIONS Iop s390 Power Description of instruction, implementation opcode opcode details ------------------------------------------------------------------------------- Iop_AddD64 IRRoundingModeDFP(I32) X D64 X D64 -> D64 ADTR dadd ADTRA Add two 64-bit DFP numbers. If both operands are finite numbers, they are added algebraically, forming an intermediate sum. The intermediate sum, if nonzero, is rounded to the operand format and the rounded value is then placed at the result location. The ADTRA instruction has a field to specify the desired rounding mode but is otherwise identical to the ADTR. Iop_AddD128 IRRoundingModeDFP(I32) X D128 X D128 -> D128 AXTR daddq AXTRA Add two 128-bit DFP numbers If both operands are finite numbers, they are added algebraically, forming an intermediate sum. The intermediate sum, if nonzero, is rounded to the operand format and the rounded value is then placed at the result location. The AXTRA instruction has a field to specify the desired rounding mode. Iop_SubD64 IRRoundingModeDFP(I32) X D64 X D64 -> D64 SDTR dsub SDTRA The execution of SUBTRACT is identical to that of ADD, except that the second operand, if numeric, participates in the operation with its sign bit inverted. The SXTRA instruction has a field to specify the desired rounding mode but is otherwise identical to the SDTR. Iop_SubD128 IRRoundingModeDFP(I32) X D128 X D128 -> D128 SXTR dsubq SXTRA The execution of SUBTRACT is identical to that of ADD, except that the second operand, if numeric, participates in the operation with its sign bit inverted. The SXTRA instruction has a field to specify the desired rounding mode but is otherwise identical to the SXTRA. Iop_MulD64 IRRoundingModeDFP(I32) X D64 X D64 -> D64 MDTR dmul MDTRA If both source operands are finite numbers, they are multiplied to form an intermediate product. The intermediate product is rounded to the target format. The MDTRA instruction has a field to specify the desired rounding mode but is otherwise identical to the MDTR. Iop_MulD128 IRRoundingModeDFP(I32) X D128 X D128 -> D128 MXTR dmulq MXTRA If both source operands are finite numbers, they are multiplied to form an intermediate product. The intermediate product is rounded to the target format. The MXTRA instruction has a field to specify the desired rounding mode but is otherwise identical to the MXTR. Iop_DivD64 IRRoundingModeDFP(I32) X D64 X D64 -> D64 DDTR ddiv DDTRA If divisor is nonzero and both the dividend and divisor are finite numbers, the first operand is divided by the second operand to form an intermediate quotient. The intermediate quotient, if nonzero, is rounded to the target format. The MDTRA instruction has a field to specify the desired rounding mode but is otherwise identical to the MDTR. Iop_DivD128 IRRoundingModeDFP(I32) X D128 X D128 -> D128 DXTR ddivq DXTRA If divisor is nonzero and both the dividend and divisor are finite numbers, the first operand is divided by the second operand to form an intermediate quotient. The intermediate quotient, if nonzero, is rounded to the target format. The DXTRA instruction has a field to specify the desired rounding mode but is otherwise identical to the MXTR. FORMAT CONVERSION INSTRUCTIONS Iop s390 Power Description of instruction, implementation opcode opcode details ------------------------------------------------------------------------------- TBD PFPO The PFPO instruction operation is specified by the code in general purpose register 0 and the condition code is set to indicate the result. The operations that can be specified for this instruction include a large list of format conversions to/from various sizes of DFP operands and various sizes of Hexadecimal floating point and Binary floating point formats. Additional Iops to support these conversions on s390 may be needed. The decision on which Iops are needed will be deferred to when the s390 support is added. Iop_D32toD64 D32 -> D64 LDETR dctdp The 32-bit DFP source operand is converted into a 64-bit DFP result. Iop_D64toD128 D64 -> D128 LXETR dctqpq The 64-bit DFP source operand is converted into a 128-bit DFP result. Iop_D64toD32 IRRoundingModeDFP(I32) x D64 -> D64 LEDTR drsp The 64-bit DFP source operand is rounded to a DFP 32-bit value, according to the rounding mode. Iop_D128toD64 IRRoundingModeDFP(I32) x D128 -> D64 LDXTR drdpq The 128-bit DFP source operand is rounded, according to the rounding mode. Iop_I64StoD64 IRRoundingModeDFP(I32) x I64 -> D64 CDGTR dcffix CDGTRA The 64-bit signed binary-integer in the second source operand is converted into a 64-bit DFP result using the rounding mode specified in the first operand. Iop_D64toI64S IRRoundingModeDFP(I32) x D64 -> I64 CGDTR dctfix CGDTRA The 64-bit DFP source operand is rounded to a signed integer, according to the rounding mode specified by the first operand, and converted into a signed 64-bit integer result with the same sign as the source. May need new Iop IRRoundingModeDFP(I32) x D64 -> I64 TBD CLGDTR The 64-bit DFP source operand is rounded to a signed integer, according to the rounding mode specified by the first operand, and converted into a signed 64-bit binary integer result with the same sign as the source. Possible implementation: the 64-bit DFP source operand is rounded into a 64-bit signed integer using the existing instruction Iop_D64toI64. Defer decision on adding a new Iop until the s390 implementation is started. May need new Iop IRRoundingModeDFP(I32) x D64 -> I32 TBD CLFDTR The 64-bit DFP source operand is rounded to a signed integer, according to the rounding mode specified by the first operand, and converted into a signed 32-bit integer result with the same sign as the source. Possible implementation: the 64-bit DFP source operand is rounded into a 64-bit signed integer using the existing instruction Iop_D64toI64, then Iop_64to32 convert to a signed 32-bit integer. Defer decision on adding a new Iop until the s390 implementation is started. May need new Iop IRRoundingModeDFP(I32) x D128 -> I64 TBD CLGXTR The 128-bit DFP source operand is rounded to a signed integer, according to the rounding mode specified by the first operand, and converted into a signed 64-bit integer result with the same sign as the source. Possible implementation: the 128-bit DFP source operand is rounded into a 64-bit DFP with the Iop_RoundD128toD64 instruction, 64-bit DFP is rounded to a 64-bit signed integer using the existing instruction Iop_D64toI64. Defer decision on adding a new Iop until the s390 implementation is started. May need new Iop IRRoundingModeDFP(I32) x D128 -> I32 TBD CLFXTR The 128-bit DFP source operand is rounded to a signed integer, according to the rounding mode specified by the first operand, and converted into a signed 32-bit integer result with the same sign as the source. Possible implementation: the 128-bit DFP source operand is rounded into a 64-bit DFP with the Iop_RoundD128toD64 instruction,the 64-bit DFP is then converted to a 64-bit signed integer using the existing instruction Iop_D64toI64, then Iop_64to32 convert to a signed 32-bit integer. Defer decision on adding a new Iop until the s390 implementation is started. ROUNDING INSTRUCTIONS Iop s390 Power Description of instruction, implementation opcode opcode details ------------------------------------------------------------------------------- Iop_RoundD64toInt IRRoundingModeDFP(I32) x D64 -> D64 FIDTR drintx The D64 operand, if a finite number, is rounded to an integer value in the same DFP format. Iop_RoundD128toInt IRRoundingModeDFP(I32) x D128 -> D128 FIXTR drintxq The D128 operand, if a finite number, is rounded to a binary integer value, in the same DFP format. COMPARE INSTRUCTIONS Iop s390 Power Description of instruction, implementation opcode opcode details ------------------------------------------------------------------------------- Iop_CmpD64 D64 x D64 -> IRCmpF64Result(I32) dcmpo dcmpu Perform the comparison, as specified in the instruction. The instruction sets the platform condition codes. The condition codes for the virtual machine need to be updated based on the result in this platform specific condition code register. Iop_CmpD128 D128 x D128 -> IRCmpF64Result(I32) dcmpoq dcmpuq Perform the comparison, as specified in the instruction. The instruction sets the platform condition codes. The condition codes for the virtual machine need to be updated based on the result in this platform specific condition code register. QUANTIZE AND ROUND INSTRUCTIONS Iop s390 Power Description of instruction, implementation opcode opcode details ------------------------------------------------------------------------------- Iop_QuantizeD64 IRRoundingModeDFP(I32) x D64 x D64 -> D64 dqua The second D64 operand is converted and rounded to the form with the same exponent as that of the first DFP operand. The result is placed in the result operand. Iop_QuantizeD128 IRRoundingModeDFP(I32) x D128 x D128-> D128 dquaq The second D128 operand is converted and rounded to the form with the same exponent as that of the first DFP operand. The result is placed in the result operand. Iop_SignificanceRoundD64 IRRoundingModeDFP(I32) x D64 x D64 -> D64 RRDTR drrnd The second D64 operand is rounded to the significance specified by the first D64 operand as specified by the rounding mode. Iop_SignificanceRoundD128 IRRoundingModeDFP(I32) x D128 x D128 -> D128 RRXTR drrndq The second D128 operand is rounded to the significance specified by the first D128 operand as specified by the rounding mode. EXTRACT AND INSERT INSTRUCTIONS Iop s390 Power Description of instruction, implementation opcode opcode details ------------------------------------------------------------------------------- Iop_ExtractExpD64 D64 -> I32 EEDTR dxex The exponent of the D64 operand is extracted. The extracted exponent is converted and stored in the floating point register as a signed 32-bit binary integer format. Iop_ExtractExpD128 D128 -> I32 EEXTR dxexq The exponent of the D128 operand is extracted. The extracted exponent is converted and stored in the floating point register as a signed 32-bit binary integer. Iop_InsertExpD64 I32 x I64 -> D64 IEDTR diex The exponent is specified by the first I32 operand the signed significand is given by the second I64 value. The result is a D64 value consisting of the specified significand and exponent whose sign is that of the specified significand. Iop_InsertExpD128 I32 x I128 -> D128 IEXTR diexq The exponent is specified by the first I32 operand the signed significand is given by the second I128 value. The result is a D128 value consisting of the specified significand and exponent whose sign is that of the specified significand. SHIFT SIGNIFICAND INSTRUCTIONS Iop s390 Power Description of instruction, implementation opcode opcode details ------------------------------------------------------------------------------- Iop_ShlD64 D64 x U8 -> D64 SLDT dscli The D32 or D64 significand is shifted left by the number of digits specified by the U8 operand. Digits shifted out of the leftmost digit are lost. Zeros are supplied to the vacated positions on the right. The sign of the result is the same as the sign of the D64 operand. Iop_ShlD128 D128 x U8 -> D128 SLDT dscliq The D128 significand is shifted left by the number of digits specified by the U8 operand. Digits shifted out of the leftmost digit are lost. Zeros are supplied to the vacated positions on the right. The sign of the result is the same as the sign of the D128 operand. Iop_ShrD64 D64 x U8 -> D64 SLDT dscri The D32 or D64 significand is shifted right by the number of digits specified by the U8 operand. Digits shifted out of the right most digit are lost. Zeros are supplied to the vacated positions on the left. The sign of the result is the same as the sign of the D64 operand. Iop_ShrD128 D128 x U8 -> D128 SLDT dscriq The D128 significand is shifted right by the number of digits specified by the U8 operand. Digits shifted out of the right most digit are lost. Zeros are supplied to the vacated positions on the left. The sign of the result is the same as the sign of the D128 operand. Thank you for your time and effort to review this proposal. Carl Love |
|
From: Florian K. <br...@ac...> - 2012-01-21 15:56:56
|
On 01/19/2012 06:37 PM, Carl E. Love wrote: > > COMPARE INSTRUCTIONS > ----------------------- > D64 x D64 -> IRCmpD64Result(I32) > Iop_CmpD64 > > D128 x D128 -> IRCmpD64Result(I32) > Iop_CmpD128 > The letter should yield an IRCmpD128Result. Symmetry with binary floating point which also has a IRCmpF128Result. > Iop_D64toD32 IRRoundingModeDFP(I32) x D64 -> D64 > LEDTR drsp > The 64-bit DFP source operand is rounded to a > DFP 32-bit value, according to the rounding > mode. -> D32 > > Iop_D64toI64S IRRoundingModeDFP(I32) x D64 -> I64 I presume, this will become a comment in libvex_ir.h. Please include the signedness: IRRoundingModeDFP(I32) x D64 -> signed I64 This is what is being done for the binary floating point ops, too. > > Iop_CmpD128 D128 x D128 -> IRCmpF64Result(I32) -> IRCmpF128Result(I32) Comment from Julian > Ok. I agree .. just add the conversions needed for POWER, unless > Christian/Florian want to also add the S390 needed conversions now > (I am unclear if you do, or not) Let's postpone this. We need to do some research to determine which of the conversions need rounding modes so we get the semantics correct. Florian |
|
From: Julian S. <js...@ac...> - 2012-01-23 09:57:12
|
> The following is version 2 of the proposal to add Iops [...] I am in agreement with all of Florian's comments on this version 2 proposal (so please incorporate them), plus minor comments of my own below. Overall though it looks pretty good to me. J > IRRoundingModeDFP(I32): Indicates the I32 argument is used to hold > the bits that specify the rounding mode to be used > by the instruction. The possible rounding modes > are: > - Round to nearest, ties to even > - Round toward Zero > - Round toward +infinity > - Round toward -infinity > - Round to Nearest, Ties away from 0 > - Round to Nearest, Ties toward 0 > - Round to away from Zero > - Round to Prepare for Shorter Precision You need to specify the actual values of these enumeration values at some point. > COMPARE INSTRUCTIONS > ----------------------- > D64 x D64 -> IRCmpD64Result(I32) > Iop_CmpD64 > > D128 x D128 -> IRCmpD64Result(I32) > Iop_CmpD128 As per Florian's comment, make this D128 x D128 -> IRCmpD128Result(I32) and make IRCmpD128Result be a typedef for IRCmpD64Result, so as to be consistent with how this is done for binary FP. > QUANTIZE AND ROUND INSTRUCTIONS > ------------------------------- > IRRoundingModeDFP(I32) x D64 x D64 -> D64 > Iop_QuantizeD64 > > IRRoundingModeDFP(I32) x D128 x D128 -> D128 > Iop_QuantizeD128 > > IRRoundingModeDFP(I32) x D64 x D64 -> D64 > Iop_SignificanceRoundD64 > > IRRoundingModeDFP(I32) x D128 x D128 -> D128 > Iop_SignificanceRoundD128 Ok. When you add these to libvex_ir.h, a short comment on what they do would be nice. > EXTRACT AND INSERT INSTRUCTIONS > ------------------------------- > D64 -> I32 > Iop_ExtractExpD64 > > D128 -> I32 > Iop_ExtractExpD128 > > I32 x I64 -> D64 > Iop_InsertExpD64 > > I32 x I128 -> D128 > Iop_InsertExpD128 Ditto re comment. J |
|
From: Carl E. L. <ce...@li...> - 2012-01-24 17:02:07
|
On Mon, 2012-01-23 at 10:52 +0100, Julian Seward wrote:
> > The following is version 2 of the proposal to add Iops [...]
>
> I am in agreement with all of Florian's comments on this version 2
> proposal (so please incorporate them), plus minor comments of my
> own below. Overall though it looks pretty good to me.
Yes, some good catches where I missed updating all the pieces of the
description. I will fix those.
>
> J
>
>
> > IRRoundingModeDFP(I32): Indicates the I32 argument is used to hold
> > the bits that specify the rounding mode to be used
> > by the instruction. The possible rounding modes
> > are:
> > - Round to nearest, ties to even
> > - Round toward Zero
> > - Round toward +infinity
> > - Round toward -infinity
> > - Round to Nearest, Ties away from 0
> > - Round to Nearest, Ties toward 0
> > - Round to away from Zero
> > - Round to Prepare for Shorter Precision
>
> You need to specify the actual values of these enumeration values
> at some point.
OK, I started on updating my proof of concept implementation by adding
the DFP types and the rounding mode. I didn't get to adding the new
rounding mode yet. I will take a closer look at the existing rounding
mode support and come up with some appropriate values as I implement the
new rounding mode.
Along those lines, I sent out a reply to the proposal on a couple of
additional Iops that it looks like I will need since we are going with
the DFP types. Specifically, Iop_ReinterpD64asI64 and
Iop_ReinterpI64asD64. I was able to leverage the equivalent FP Iop in
the initial proposal. Looks like I may need the DFP equivalent of the
binary FP Iops: Iop_F64HLtoF128, Iop_F128HItoF64, and Iop_F128LOtoF64.
Let me work through a little of the rewrite to work out these details
and update the proposal. I want to try and avoid a series of little
updates to the proposal.
>
>
> > COMPARE INSTRUCTIONS
> > -----------------------
> > D64 x D64 -> IRCmpD64Result(I32)
> > Iop_CmpD64
> >
> > D128 x D128 -> IRCmpD64Result(I32)
> > Iop_CmpD128
>
> As per Florian's comment, make this D128 x D128 -> IRCmpD128Result(I32)
> and make IRCmpD128Result be a typedef for IRCmpD64Result, so as to be
> consistent with how this is done for binary FP.
>
>
>
> > QUANTIZE AND ROUND INSTRUCTIONS
> > -------------------------------
> > IRRoundingModeDFP(I32) x D64 x D64 -> D64
> > Iop_QuantizeD64
> >
> > IRRoundingModeDFP(I32) x D128 x D128 -> D128
> > Iop_QuantizeD128
> >
> > IRRoundingModeDFP(I32) x D64 x D64 -> D64
> > Iop_SignificanceRoundD64
> >
> > IRRoundingModeDFP(I32) x D128 x D128 -> D128
> > Iop_SignificanceRoundD128
>
> Ok. When you add these to libvex_ir.h, a short comment on what
> they do would be nice.
>
>
> > EXTRACT AND INSERT INSTRUCTIONS
> > -------------------------------
> > D64 -> I32
> > Iop_ExtractExpD64
> >
> > D128 -> I32
> > Iop_ExtractExpD128
> >
> > I32 x I64 -> D64
> > Iop_InsertExpD64
> >
> > I32 x I128 -> D128
> > Iop_InsertExpD128
>
> Ditto re comment.
OK, I will add the comments for both of these to the proposal as well so
we can agree on the wording to put in libvex_ir.h ahead of time.
Julian and Florian, thanks for the input. I will get an update out
shortly.
Carl Love
>
> J
>
|