|
From: Carl E. L. <ce...@li...> - 2011-12-22 16:17:20
|
This is a re-post as the original posting was rejected by the mailing list.
--------------------------------------------------------------------------
Valgrind community:
Please review and comment on the following is a proposal to add Iops to
Valgrind to support Decimal Floating Point (DFP) instructions. The
following proposal includes adding 31 new DFP Iops to support operations
on 32-bit, 64-bit and 128-bit DFP numbers. The intent was to come up with
a minimal and sufficient set for the POWER and s390 architectures to start
with. The hope is that the set of proposed Iops will also be sufficient
for other architectures but I can't say for sure since I do not have access
to the DFP instruction set for any other architectures.
There are seven additional Iops that may need to be added in the future when
DFP support is added for the s390 architecture. For these s390 instructions,
there is no Power equivalent. The names of these Iops is left as
"To Be Determined" (TBD) at the time the support is added. They are called
out here for completeness of all the Iops that may need to be added for s390
DFP support.
The list of proposed DFP Iops will be used to implement the 49 POWER7 DFP
instructions and the 78 s390 DFP instructions. Note that some of the POWER7
and s390 instructions can be emulated using existing Iops and the proposed
new Iops. The instructions that can be emulated have not been explicitly
listed in this proposal.
The IBM Power and s390 systems support 32-bit, 64-bit and 128-bit decimal
floating point (DFP) numbers. The DFP instructions use the existing floating
point registers. The DFP 128-bit operands are stored in registers i and i+1
where i must be even. For example the DFP 128-bit add is done as follows:
lfd 12,48(31) // load the upper 64 bits of the first 128 value in reg 12
lfd 13,56(31) // load the lower 64 bits of the first 128 value in reg 13
addi 0,31,64
mr 9,0
lfd 0,0(9) // load the upper 64 bits of the 128 second value in reg 0
lfd 1,8(9) // load the lower 64 bits of the 128 second value in reg 1
daddq 0,12,0 // do the add of the 128 DFP numbers, result is stored in
// registers 0 and 1. Note, registers 1 and 13 are not
// explicitly listed but are implied
Please note, I will only be adding the IBM POWER series DFP support at this
time. The s390 team will be responsible for doing the Valgrind support for
the s390 at some point in the future.
The following is a list definitions for the notations used in the description
of the proposed new Iops.
Notation Description
---------------------------------------------------------------------------
DFP: Decimal Floating Point number possibly 32-bit, 64-bit, or
128-bit format
D32: 32-bit decimal floating point format These values use F64
registers. The D32 term is used to distinguish the value
from the standard 32-bit floating point value.
D64: 64-bit decimal floating point format. These values use F64
registers. The D64 term is used to distinguish the value
from the standard 64-bit floating point value.
D128: 128-bit decimal floating point format. These values use a pair
of two 64 bit floating point registers (F64). The instruction
only references the first register of the register pair. The
second register is implied.
IRRoundingMode(I32): Indicates the I32 argument is used to hold the
bits that specify the rounding mode to be used
by the instruction.
IRRoundingExceptionModes(I32): Indicates the I32 argument is used to hold the
bits that specify the rounding mode and the bits
that specify the exception mode to be used by
the instruction. The s390 instructions specify
both modes whereas POWER does not specify either
mode.
IRRoundingModeAndEponent(I32): Indicates the I32 argument will contain bits to
specify the rounding mode. POWER also has bits
to specify the desired exponent. The s390
instructions only specify the rounding mode.
Summary of proposed Iops:
The references to D32, D64 and D128 indicate that the value in the floating
point register is a DFP value. The DFP instructions use the existing floating
point registers.
ARITHMETIC INSTRUCTIONS
-----------------------
IRRoundingMode(I32) X D64 X D64 -> D64
Iop_AddD64, Iop_SubD64, Iop_MulD64, Iop_DivD64
IRRoundingMode(I32) X D128 X D128 -> D128
Iop_AddD128, Iop_SubD128, Iop_MulD128, Iop_DivD128
FORMAT CONVERSION INSTRUCTIONS
------------------------------
InvOperationModes(I32) x D32 -> D64
Iop_D32toD64
InvOperationModes(I32) x D64 -> D128
Iop_D64toD128
IRRoundingExceptionModes(I32) x D64 -> D32
Iop_RoundD64toD32
IRRoundingExceptionModes(I32) x D128 -> D64
Iop_RoundD128toD64
IRRoundingExceptionModes(I32) x I64 -> D64
Iop_I64StoD64
IRRoundingExceptionModes(I32) x D64 -> I64
Iop_D64toI64
ROUNDING INSTRUCTIONS
-----------------------
IRRoundingMode(I32) x D64 -> D64
Iop_RoundD64
IRRoundingMode(I32) x D128 -> D128
Iop_RoundD128
COMPARE INSTRUCTIONS
-----------------------
D64 x D64 -> IRCmpF64Result(I32)
Iop_CmpD64
D128 x D128 -> IRCmpF64Result(I32)
Iop_CmpD128
D64 x D64 -> 1 if the condition is TRUE, 0 otherwise
Iop_CmpEQD64, Iop_CmpLTD64, Iop_CmpGTD64
D128 x D128 -> 1 if the condition is TRUE; 0 otherwise;
Iop_CmpEQD128, Iop_CmpLTD128, Iop_CmpGTD128
QUANTIZE AND ROUND INSTRUCTIONS
-------------------------------
IRRoundingMode(I32) x D64-> D64
Iop_QuantizeID64,
IRRoundingMode(I32) x D128-> D128
Iop_QuantizeID128
IRRoundingMode(I32) x D64 x D64 -> D64
Iop_QuantizeD64
IRRoundingMode(I32) x D128 x D128 -> D128
Iop_QuantizeD128
IRRoundingMode(I32) x D64 x D64 -> D64
Iop_SignificanceRoundD64
IRRoundingMode(I32) x D128 x D128 -> D128
Iop_SignificanceRoundD128
EXTRACT AND INSERT INSTRUCTIONS
-------------------------------
D64 -> I64
Iop_ExtractExpD64
D128 -> I64
Iop_ExtractExpD128
I64 x I64 -> D64
Iop_InsertExpD64
I64 x I128 -> D128
Iop_InsertExpD128
SHIFT SIGNIFICAND INSTRUCTIONS
-------------------------------
U16 x D64 -> D64
Iop_ShlD64, Iop_ShrD64
U16 x D128 -> D128
Iop_ShlD128, Iop_ShrD128
This section give the detailed mapping of Power and s390 instructions to the
proposed DFP Iops or how they would be implemented with the existing Iops and
the proposed DFP Iops.
ARITHMETIC INSTRUCTIONS
Iop s390 Power Description of instruction, implementation
opcode opcode details
-------------------------------------------------------------------------------
Iop_AddD64 IRRoundingMode(I32) X D64 X D64 -> D64
ADTR dadd
ADTRA Add two 64-bit DFP numbers. If both operands
are finite numbers, they are added
algebraically, forming an intermediate
sum. The intermediate sum, if nonzero, is
rounded to the operand format and the rounded
value is then placed at the result location.
The ADTRA instruction has a field to specify
the desired rounding mode but is otherwise
identical to the ADTR.
Iop_AddD128 IRRoundingMode(I32) X D128 X D128 -> D128
AXTR daddq
AXTRA Add two 128-bit DFP numbers If both operands
are finite numbers, they are added
algebraically, forming an intermediate sum. The
intermediate sum, if nonzero, is rounded to the
operand format and the rounded value is then
placed at the result location.
The AXTRA instruction has a field to specify
the desired rounding mode.
Iop_SubD64 IRRoundingMode(I32) X D64 X D64 -> D64
SDTR dsub
SDTRA
The execution of SUBTRACT is identical to that
of ADD, except that the second operand, if
numeric, participates in the operation with
its sign bit inverted.
The SXTRA instruction has a field to specify
the desired rounding mode but is otherwise
identical to the SDTR.
Iop_SubD128 IRRoundingMode(I32) X D128 X D128 -> D128
SXTR dsubq
SXTRA
The execution of SUBTRACT is identical to that
of ADD, except that the second operand, if
numeric, participates in the operation with
its sign bit inverted.
The SXTRA instruction has a field to specify
the desired rounding mode but is otherwise
identical to the SXTRA.
Iop_MulD64 IRRoundingMode(I32) X D64 X D64 -> D64
MDTR dmul
MDTRA
If both source operands are finite numbers,
they are multiplied to form an intermediate
product. The intermediate product is rounded
to the target format.
The MDTRA instruction has a field to specify
the desired rounding mode but is otherwise
identical to the MDTR.
Iop_MulD128 IRRoundingMode(I32) X D128 X D128 -> D128
MXTR dmulq
MXTRA
If both source operands are finite numbers,
they are multiplied to form an intermediate
product. The intermediate product is rounded
to the target format.
The MXTRA instruction has a field to specify
the desired rounding mode but is otherwise
identical to the MXTR.
Iop_DivD64 IRRoundingMode(I32) X D64 X D64 -> D64
DDTR ddiv
DDTRA
If divisor is nonzero and both the dividend and
divisor are finite numbers, the first operand
is divided by the second operand to form an
intermediate quotient. The intermediate
quotient, if nonzero, is rounded to the target
format.
The MDTRA instruction has a field to specify
the desired rounding mode but is otherwise
identical to the MDTR.
Iop_DivD128 IRRoundingMode(I32) X D128 X D128 -> D128
DXTR ddivq
DXTRA
If divisor is nonzero and both the dividend and
divisor are finite numbers, the first operand
is divided by the second operand to form an
intermediate quotient. The intermediate
quotient, if nonzero, is rounded to the target
format.
The DXTRA instruction has a field to specify
the desired rounding mode but is otherwise
identical to the MXTR.
FORMAT CONVERSION INSTRUCTIONS
Iop s390 Power Description of instruction, implementation
opcode opcode details
-------------------------------------------------------------------------------
TBD
PFPO
The PFPO instruction operation is specified by
the code in general purpose register 0 and the
condition code is set to indicate the result.
The operations that can be specified for this
instruction include a large list of format
conversions to/from various sizes of DFP
operands and various sizes of Hexadecimal
floating point and Binary floating point
formats.
Iop_D32toD64 InvOperationModes(I32) x D32 -> D64
LDETR dctdp
The 32-bit DFP source operand is converted into
a 64-bit DFP result. The I32 operand is for
specifying the IEEE invalid operation exception
control mode, if required.
Iop_D64toD128 InvOperationModes(I32) x x D64 -> D128
LXETR dctqpq
The 64-bit DFP source operand is
converted into a 128-bit DFP result. The I32
operand is for specifying the IEEE invalid
operation exception control mode, if required.
Iop_RoundD64toD32 IRRoundingExceptionModes(I32) x D64 -> D64
LEDTR drsp
The 64-bit DFP source operand is rounded to a
DFP 32-bit value, according to the rounding
mode.
Iop_RoundD128toD64 IRRoundingExceptionModes(I32) x D128 -> D64
LDXTR drdpq
The 128-bit DFP source operand is rounded,
according to the rounding mode.
Iop_I64StoD64 IRRoundingExceptionModes(I32) x I64 -> D64
CDGTR dcffix
CDGTRA
The 64-bit signed binary-integer in the second
source operand is converted into a 64-bit DFP
result using the rounding mode specified in
the first operand.
May need new Iop IRRoundingExceptionModes(I32) x D64 -> I64
TBD CLGDTR
The 64-bit DFP source operand is rounded to a
signed integer, according to the rounding mode
specified by the first operand, and converted
into a signed 64-bit binary integer result with
the same sign as the source.
Implementation: the 64-bit DFP source operand
is rounded into a 64-bit signed integer using
the existing instruction Iop_D64toI64.
CONCERN: HOW DO WE CONVERT FROM SIGNED INT TO
UNSIGNED INT?? IF WE DO THE CONVERSION
TO SIGNED THEN TO UNSIGNED WE LOSE
RANGE FOR AN UNSIGNED VALUE. IF USING
EXISTING IOPS IS NOT VIABLE, WILL
NEED A NEW IOP. TBD WHEN DFP SUPPORT
IS ADDED FOR S390.
May need new Iop IRRoundingExceptionModes(I32) x D64 -> I32
TBD CLFDTR
The 64-bit DFP source operand is rounded to a
signed integer, according to the rounding mode
specified by the first operand, and converted
into a signed 32-bit integer result with
the same sign as the source.
Implementation: the 64-bit DFP source operand
is rounded into a 64-bit signed integer using
the existing instruction Iop_D64toI64, then
Iop_64to32 convert to a signed 32-bit integer.
CONCERN: HOW DO WE CONVERT FROM SIGNED INT TO
UNSIGNED INT?? IF WE DO THE CONVERSION
TO SIGNED THEN TO UNSIGNED WE LOSE
RANGE FOR AN UNSIGNED VALUE. IF USING
EXISTING IOPS IS NOT VIABLE, WILL
NEED A NEW IOP. TBD WHEN DFP SUPPORT
IS ADDED FOR S390.
May need new Iop IRRoundingExceptionModes(I32) x D128 -> I64
TBD CLGXTR
The 128-bit DFP source operand is rounded to a
signed integer, according to the rounding mode
specified by the first operand, and converted
into a signed 64-bit integer result with
the same sign as the source.
Implementation: the 128-bit DFP source operand
is rounded into a 64-bit DFP with the
Iop_RoundD128toD64 instruction, 64-bit DFP is
rounded to a 64-bit signed integer using the
existing instruction Iop_D64toI64.
CONCERN: HOW DO WE CONVERT FROM SIGNED INT TO
UNSIGNED INT?? IF WE DO THE CONVERSION
TO SIGNED THEN TO UNSIGNED WE LOSE
RANGE FOR AN UNSIGNED VALUE. IF
USING EXISTING IOPS IS NOT VIABLE,
WILL NEED A NEW IOP. TBD WHEN DFP
SUPPORT IS ADDED FOR S390.
May need new Iop IRRoundingExceptionModes(I32) x D128 -> I32
TBD CLFXTR
The 128-bit DFP source operand is rounded to a
signed integer, according to the rounding mode
specified by the first operand, and converted
into a signed 32-bit integer result with
the same sign as the source.
Implementation: the 128-bit DFP source operand
is rounded into a 64-bit DFP with the
Iop_RoundD128toD64 instruction,the 64-bit DFP
is then converted to a 64-bit signed integer
using the existing instruction Iop_D64toI64,
then Iop_64to32 convert to a signed 32-bit
integer.
CONCERN: HOW DO WE CONVERT FROM SIGNED INT TO
UNSIGNED INT?? IF WE DO THE CONVERSION
TO SIGNED THEN TO UNSIGNED WE LOSE
RANGE FOR AN UNSIGNED VALUE. IF USING
EXISTING IOPS IS NOT VIABLE, WILL
NEED A NEW IOP. TBD WHEN DFP SUPPORT
IS ADDED FOR S390.
ROUNDING INSTRUCTIONS
Iop s390 Power Description of instruction, implementation
opcode opcode details
-------------------------------------------------------------------------------
Iop_RoundD64 IRRoundingMode(I32) x D64 -> D64
FIDTR drintx
The D64 operand, if a finite number, is rounded
to an integer value, with inexact.
The DFP operand is rounded and stored in the
floating point register as a binary integer
value based on the specified rounding mode and
stored as a DFP.
COMMNET: THE FIDTR INSTRUCTION HAS THE M3
FIELD FOR THE ROUNDING MODE. IT ALSO
HAS THE M4 FIELD WHICH APPEARS TO BE
UNUSED AT PRESENT. IN THE FUTURE,
THE I32 OPERAND COULD BE USED TO HOLD
BOTH M3 AND M4.
Iop_RoundD128 IRRoundingMode(I32) x D128 -> D128
FIXTR drintxq
The D128 operand, if a finite number, is
rounded to a binary integer value, with
inexact. The DFP operand is rounded and stored
in the floating point register as a binary
integer based on the specified rounding mode.
COMMNET: THE FIXTR INSTRUCTION HAS THE M3
FIELD FOR THE ROUNDING MODE. IT ALSO
HAS THE M4 FIELD WHICH APPEARS TO BE
UNUSED AT PRESENT. IN THE FUTURE,
THE I32 OPERAND COULD BE USED TO HOLD
BOTH M3 AND M4.
COMPARE INSTRUCTIONS
Iop s390 Power Description of instruction, implementation
opcode opcode details
-------------------------------------------------------------------------------
Iop_CmpD64 D64 x D64 -> IRCmpF64Result(I32)
dcmpo
dcmpu
Perform the comparison, as specified in the
instruction. The instruction sets the platform
condition codes. The condition codes for the
virtual machine need to be updated based on the
result in ths platform specific condition code
register.
Iop_CmpD128 D128 x D128 -> IRCmpF64Result(I32)
dcmpoq
dcmpuq
Perform the comparison, as specified in the
instruction. The instruction sets the platform
condition codes. The condition codes for the
virtual machine need to be updated based on the
result in ths platform specific condition code
register.
QUANTIZE AND ROUND INSTRUCTIONS
Iop s390 Power Description of instruction, implementation
opcode opcode details
-------------------------------------------------------------------------------
Iop_QuantizeID64 IRRoundingModeAndEponent(I32) x D64 -> D64
QADTR dquai
The D64 source operand is converted and rounded
to the form with the immediate exponent
specified by the rounding and exponent
parameter.
Iop_QuantizeID128 IRRoundingModeAndExponent(I32) x D128-> D128
QAXTR dquaiq
The D128 source operand is converted and
rounded to the form with the immediate exponent
specified by the rounding and exponent
parameter.
Iop_QuantizeD64 IRRoundingMode(I32) x D64 x D64 -> D64
dqua
The instruction second D64 operand is converted
and rounded to the form with the same exponent
as that of the first DFP operand. The result
is placed in the result opernad.
Note, s390 Does not have the immediate
quantize instruction. Hence can't emulate it
on s390 using the immediate instruction
Iop_QuantizeD128 IRRoundingMode(I32) x D128 x D128-> D128
dquaq
The instruction second D128 operand is
converted and rounded to the form with the same
exponent as that of the first DFP operand. The
result is placed in the result opernad.
Note, s390 Does not have the immediate
quantize instruction. Hence can't emulate it
on s390 using the immediate instruction
Iop_SignificanceRoundD64 IRRoundingMode(I32) x D64 x D64 -> D64
RRDTR drrnd
The D32 or D64 operand is rounded to the
requested significance given by the I8 operand
as specified by the rounding mode.
Iop_SignificanceRoundD128 IRRoundingMode(I32) x D128 x D128 -> D128
RRXTR drrndq
The D128 operand is rounded to the requested
significance given by the I8 operand as
specified by the rounding mode.
EXTRACT AND INSERT INSTRUCTIONS
Iop s390 Power Description of instruction, implementation
opcode opcode details
-------------------------------------------------------------------------------
TBD D64 -> I64
ESDTR
This Iop is only needed by s390. It will not
be used for Power and hence may need to be
added when s390 adds DFP support to Valgrind.
Final decision is left to s390 team.
The result is the number of DFP significant
digits of the D32 or D64 operand. The result
is a 64-bit signed value.
TBD D128 -> D644
ESXTR
This Iop is only needed by s390. It will not
be used for Power and hence may need to be
added when s390 adds DFP support to Valgrind.
Final decision is left to s390 team.
The result is the number of DFP significant
digits of the D32 operand. The result is
a 64-bit signed value.
Iop_ExtractExpD64 D64 -> I64
EEDTR dxex
The exponent of the D64 operand is extracted.
The extracted exponent is converted and stored
in the floating point register as a signed
64-bit binary integer format.
Iop_ExtractExpD128 D128 -> I64
EEXTR dxexq
The exponent of the D128 operand is extracted.
The extracted exponent is converted and stored
in the floating point register as a signed
64-bit binary integer.
Iop_InsertExpD64 I64 x I64 -> D64
IEDTR diex
The exponent is specified by the first I64
operand the signed significand is given by the
second I64 value. The result is a D64 value
consisting of the specified significand and
exponent whose sign is that of the specified
significand.
Iop_InsertExpD128 I64 x I128 -> D128
IEXTR diexq
The exponent is specified by the first I64
operand the signed significand is given by the
second I128 value. The result is a D128 value
consisting of the specified significand and
exponent whose sign is that of the specified
significand.
SHIFT SIGNIFICAND INSTRUCTIONS
Iop s390 Power Description of instruction, implementation
opcode opcode details
-------------------------------------------------------------------------------
Iop_ShlD64 U8 x D64 -> D64
SLDT dscli
The D32 or D64 significand is shifted left by
the number of digits specified by the U8
operand. Digits shifted out of the leftmost
digit are lost. Zeros are supplied to the
vacated positions on the right. The sign of
the result is the same as the sign of the D64
operand.
Iop_ShlD128 U8 x D128 -> D128
SLDT dscliq
The D128 significand is shifted left by the
number of digits specified by the U8 operand.
Digits shifted out of the leftmost digit are
lost. Zeros are supplied to the vacated
positions on the right. The sign of the result
is the same as the sign of the D128 operand.
Iop_ShrD64 U8 x D64 -> D64
SLDT dscri
The D32 or D64 significand is shifted right by
the number of digits specified by the U8
operand. Digits shifted out of the right most
digit are lost. Zeros are supplied to the
vacated positions on the left. The sign of the
result is the same as the sign of the D64
operand.
Iop_ShrD128 U8 x D128 -> D128
SLDT dscriq
The D128 significand is shifted right by the
number of digits specified by the U8 operand.
Digits shifted out of the right most digit are
lost. Zeros are supplied to the vacated
positions on the left. The sign of the result
is the same as the sign of the D128 operand.
Thank you for your time and consideration to review this proposal.
Carl Love
|
|
From: John R. <jr...@bi...> - 2011-12-23 05:36:35
|
On 12/22/2011 08:13 AM, Carl E. Love wrote:
> Valgrind community:
>
> Please review and comment on the following is a proposal to add Iops to
> Valgrind to support Decimal Floating Point (DFP) instructions. The
> following proposal includes adding 31 new DFP Iops to support operations
> on 32-bit, 64-bit and 128-bit DFP numbers. The intent was to come up with
> a minimal and sufficient set for the POWER and s390 architectures to start
> with. The hope is that the set of proposed Iops will also be sufficient
> for other architectures but I can't say for sure since I do not have access
> to the DFP instruction set for any other architectures.
[snip]
The proposal fails as a specification. It does not provide all the information
that is necessary to implement it, nor does it provide what is needed to
write the test cases in order to verify an implementation.
1. All data formats must be defined at the bit level, either explicitly
or by formal citation of some highly-usable document which is available
at low cost in each of {money, time, space, hassle}.
2. It is astounding that the proposal does not cite the IEEE-754-2008 standard,
either to claim compatibility [under some explicit representation schema]
or to expound in detail on the differences. If the POWER and s390
instructions do not conform to IEEE-754-2008 to a very high degree,
then that would be sufficient reason to reject the proposal. _Somebody_
must claim and explain the degree of conformance by POWER/s390, and the
specification of new Iops with reference to POWER/s390 must be traceable
to IEEE-754-2008. The best motivation for new Iops is by reference to
IEEE-754-2008, together with a separate mapping between POWER/s390
and the IEEE specification. The coverage of DFP on POWER/s390 by
new Iops would then be mostly a consequence, as would the coverage
by new Iops of DFP on [almost] any highly-conforming hardware.
3. The description of the handling of PFPO must give an explicit list of
the codes [values and symbolic names] in general purpose register 0,
and the corresponding operation in each case. This is EXACTLY
the purpose of a specification!
--
|
|
From: Julian S. <js...@ac...> - 2011-12-27 22:18:23
|
Thanks for making a plausible looking proposal. Looks like it's heading in the right direction. There are some details to iron out, though. Please look at all of them. Comments in order of reading the doc: General: how much has this been checked out by the s390 folks (Florian, Christian, Divya) ? General: please give a reference, including URL, to a publically available standard that defines the basic arithmetic (IEEE 754-2008 ?) > The IBM Power and s390 systems support 32-bit, 64-bit and 128-bit > decimal floating point (DFP) numbers. The DFP instructions use the > existing floating point registers. The DFP 128-bit operands are stored in > registers i and i+1 where i must be even. For example the DFP 128-bit add > is done as follows: > > lfd 12,48(31) // load the upper 64 bits of the first 128 value > in reg 12 lfd 13,56(31) // load the lower 64 bits of the first 128 > value in reg 13 addi 0,31,64 > mr 9,0 > lfd 0,0(9) // load the upper 64 bits of the 128 second value > in reg 0 lfd 1,8(9) // load the lower 64 bits of the 128 second > value in reg 1 daddq 0,12,0 // do the add of the 128 DFP numbers, > result is stored in // registers 0 and 1. Note, registers 1 and 13 are > not // explicitly listed but are implied It feels like there's possible some confusion between types in the IR (that is, IRType) and how values are represented in ppc registers. These concepts are distinct, and are related only in the sense that it is necessary to choose types that don't cause the ppc->IR and IR->ppc translations to be inefficient. AFAICS (also, from reading the rest of the doc) you want three new types, Ity_D32, Ity_I64 and Ity_D128. (yes? that sounds right to me) Note that many of the back ends already convert F32-typed expression trees into 64-bit floating point code (eg, the host_ppc_isel.c) so doing so for D32 would be considered "normal". > Notation Description > > -------------------------------------------------------------------------- > - DFP: Decimal Floating Point number possibly 32-bit, 64-bit, or > 128-bit format > > D32: 32-bit decimal floating point format These values > use F64 registers. The D32 term is used to distinguish the value from the > standard 32-bit floating point value. > > D64: 64-bit decimal floating point format. These values > use F64 registers. The D64 term is used to distinguish the value from the > standard 64-bit floating point value. > > D128: 128-bit decimal floating point format. These > values use a pair of two 64 bit floating point registers (F64). The > instruction only references the first register of the register pair. The > second register is implied. As per comments above, the comments re the PPC register layouts isn't directly relevant to what you need in the IR. (I don't care; I just want to be sure we have our concepts straight here) > IRRoundingMode(I32): Indicates the I32 argument is used > to hold the bits that specify the rounding mode to be used by the > instruction. Fine; as per existing code. > IRRoundingExceptionModes(I32): Indicates the I32 argument is used > to hold the bits that specify the rounding mode and the bits that specify > the exception mode to be used by the instruction. The s390 instructions > specify both modes whereas POWER does not specify either mode. Hmm. Have you read (in detail) the limitations re floating point described at http://valgrind.org/docs/manual/manual-core.html#manual-core.limits The point is that IR and Valgrind generally doesn't provide support for non-default exception modes, and silently assumes that exceptions are to be fixed up using the default IEEE fixup actions. So there's no point at the moment in adding exception action information into the IR. None of the other front ends (xxx_to_IR.c) do it. > IRRoundingModeAndEponent(I32): Indicates the I32 argument will > contain bits to specify the rounding mode. POWER also has bits to specify > the desired exponent. The s390 instructions only specify the rounding > mode. Euh, can you elaborate on the encoding/meaning of "desired exponent" ? Sounds a bit like wiring a POWER-ism into the IR spec, which isn't good. > ARITHMETIC INSTRUCTIONS > ----------------------- > IRRoundingMode(I32) X D64 X D64 -> D64 > Iop_AddD64, Iop_SubD64, Iop_MulD64, Iop_DivD64 > > IRRoundingMode(I32) X D128 X D128 -> D128 > Iop_AddD128, Iop_SubD128, Iop_MulD128, Iop_DivD128 fine > FORMAT CONVERSION INSTRUCTIONS > ------------------------------ > InvOperationModes(I32) x D32 -> D64 > Iop_D32toD64 what's InvOperationModes? It's not specified anywhere in your doc. > InvOperationModes(I32) x D64 -> D128 > Iop_D64toD128 ditto > IRRoundingExceptionModes(I32) x D64 -> D32 > Iop_RoundD64toD32 > > IRRoundingExceptionModes(I32) x D128 -> D64 > Iop_RoundD128toD64 > > IRRoundingExceptionModes(I32) x I64 -> D64 > Iop_I64StoD64 ok > IRRoundingExceptionModes(I32) x D64 -> I64 > Iop_D64toI64 this is underspecified .. you need to decide whether that's a conversion to signed or unsigned I64 (or maybe you need both) and call them Iop_D64toI64S or Iop_D64toI64U respectively. (I think you comment about this further down in the doc.) I mention this partly because sorting out such ambiguity in the past for the Fxx->Ixx conversions required a lot of hoop jumping, so we might as well get it straightened out up front. > ROUNDING INSTRUCTIONS > ----------------------- > IRRoundingMode(I32) x D64 -> D64 > Iop_RoundD64 > > IRRoundingMode(I32) x D128 -> D128 > Iop_RoundD128 ok > COMPARE INSTRUCTIONS > ----------------------- > D64 x D64 -> IRCmpF64Result(I32) > Iop_CmpD64 > > D128 x D128 -> IRCmpF64Result(I32) > Iop_CmpD128 ok > D64 x D64 -> 1 if the condition is TRUE, 0 otherwise > Iop_CmpEQD64, Iop_CmpLTD64, Iop_CmpGTD64 > > D128 x D128 -> 1 if the condition is TRUE; 0 otherwise; > Iop_CmpEQD128, Iop_CmpLTD128, Iop_CmpGTD128 why are these 6 necessary? Isn't their functionality a subset of Iop_CmpD64 and Iop_CmpD128 ? > QUANTIZE AND ROUND INSTRUCTIONS > ------------------------------- > IRRoundingMode(I32) x D64-> D64 > Iop_QuantizeID64, > > IRRoundingMode(I32) x D128-> D128 > Iop_QuantizeID128 > > IRRoundingMode(I32) x D64 x D64 -> D64 > Iop_QuantizeD64 > > IRRoundingMode(I32) x D128 x D128 -> D128 > Iop_QuantizeD128 I'm not clear what the ID vs D signifies in these names. Can they instead be called Iop_Quantize{Un,Bin}{D64,D128} to denote unary vs binary ness (ignoring the rounding mode arg which is present in all 4 cases). What is quantization, anyway (in the context of DFP I mean)? Does it have any analogue in traditional IEEE754 FP ? > IRRoundingMode(I32) x D64 x D64 -> D64 > Iop_SignificanceRoundD64 > > IRRoundingMode(I32) x D128 x D128 -> D128 > Iop_SignificanceRoundD128 > > > EXTRACT AND INSERT INSTRUCTIONS > ------------------------------- > D64 -> I64 > Iop_ExtractExpD64 > > D128 -> I64 > Iop_ExtractExpD128 The exponent really needs 64 bits? Can it be 32 bits? That might allow for more efficient code generation for 32 bit targets. > I64 x I64 -> D64 > Iop_InsertExpD64 > > I64 x I128 -> D128 > Iop_InsertExpD128 ditto comment > SHIFT SIGNIFICAND INSTRUCTIONS > ------------------------------- > U16 x D64 -> D64 > Iop_ShlD64, Iop_ShrD64 > > U16 x D128 -> D128 > Iop_ShlD128, Iop_ShrD128 two things: (1) does the shift amount need to be 16 bits? For all the other shifting style ops we have, the shift amount is encoded in 8 bits (Ity_I8) and I would prefer to stick with that for consistency, if possible. (2) pls put the shift amount as the second argument, not the first, as that too is consistent with all other shift ops we have (eg, Iop_Shr64) --------------- > This section give the detailed mapping of Power and s390 > instructions to the proposed DFP Iops or how they would be implemented > with the existing Iops and the proposed DFP Iops. I'll comment on this second half of the proposal tomorrow. J |
|
From: Florian K. <br...@ac...> - 2011-12-28 00:51:36
|
On 12/27/2011 05:17 PM, Julian Seward wrote: > > Thanks for making a plausible looking proposal. Looks like it's > heading in the right direction. There are some details to iron > out, though. Please look at all of them. > > Comments in order of reading the doc: > > General: how much has this been checked out by the s390 folks > (Florian, Christian, Divya) ? > I've been procrastinating this and haven't had a close look.. Will send my comments in the next two days. Florian |
|
From: Christian B. <bor...@de...> - 2011-12-28 12:13:21
|
Just some comments on the generic things. > General: how much has this been checked out by the s390 folks > (Florian, Christian, Divya) ? I have reviewed the proposal IBM-internally before Carl pushed that out (several things were added and clarified for s390). There might be some problems still hiding, since we have not yet started with decimal floating point for valgrind on s390, but most aspects should be covered or at least mentioned (like PFPO). > General: please give a reference, including URL, to a publically available > standard that defines the basic arithmetic (IEEE 754-2008 ?) Yes decimal floating point on power and s390 follows 754-2008 as far as I know. [...] >> D128: 128-bit decimal floating point format. These >> values use a pair of two 64 bit floating point registers (F64). The >> instruction only references the first register of the register pair. The >> second register is implied. > > As per comments above, the comments re the PPC register layouts isn't > directly relevant to what you need in the IR. (I don't care; I just want > to be sure we have our concepts straight here) I think your concepts are the same. Carl was just describing the whole stack. In the end the register pair thing might boil down to the same logic as for F128 on s390: two loads. (see get_fpr_pair in VEX/priv/guest_s390_toIR.c) |
|
From: Carl E. L. <ce...@li...> - 2011-12-28 17:46:54
|
On Tue, 2011-12-27 at 23:17 +0100, Julian Seward wrote:
> Thanks for making a plausible looking proposal. Looks like it's
> heading in the right direction. There are some details to iron
> out, though. Please look at all of them.
Julian:
Thanks for taking the time to review the document. I have read over the
comments and they all seem very reasonable and easily addressed. I will
work on them and get back to you soon. Just an FYI, we did run the
proposal by the s390 team for a sanity check. It looked reasonable to
them but as they said, they have not really dug into the details of
doing the work so there may be some lingering issues. I have been
working on the POWER implementation. Currently I have 44 of the 49
POWER instructions implemented. I don't foresee any issues with the
remaining instructions. I wanted to have a good proof of concept
implementation to go with the proposal as a sanity check.
Thanks again and take care. Talk with you soon.
Carl Love
>
> Comments in order of reading the doc:
>
> General: how much has this been checked out by the s390 folks
> (Florian, Christian, Divya) ?
>
> General: please give a reference, including URL, to a publically available
> standard that defines the basic arithmetic (IEEE 754-2008 ?)
>
>
> > The IBM Power and s390 systems support 32-bit, 64-bit and 128-bit
> > decimal floating point (DFP) numbers. The DFP instructions use the
> > existing floating point registers. The DFP 128-bit operands are stored in
> > registers i and i+1 where i must be even. For example the DFP 128-bit add
> > is done as follows:
> >
> > lfd 12,48(31) // load the upper 64 bits of the first 128 value
> > in reg 12 lfd 13,56(31) // load the lower 64 bits of the first 128
> > value in reg 13 addi 0,31,64
> > mr 9,0
> > lfd 0,0(9) // load the upper 64 bits of the 128 second value
> > in reg 0 lfd 1,8(9) // load the lower 64 bits of the 128 second
> > value in reg 1 daddq 0,12,0 // do the add of the 128 DFP numbers,
> > result is stored in // registers 0 and 1. Note, registers 1 and 13 are
> > not // explicitly listed but are implied
>
> It feels like there's possible some confusion between types in the IR
> (that is, IRType) and how values are represented in ppc registers. These
> concepts are distinct, and are related only in the sense that it is necessary
> to choose types that don't cause the ppc->IR and IR->ppc translations to
> be inefficient.
>
> AFAICS (also, from reading the rest of the doc) you want three new types,
> Ity_D32, Ity_I64 and Ity_D128. (yes? that sounds right to me)
>
> Note that many of the back ends already convert F32-typed expression
> trees into 64-bit floating point code (eg, the host_ppc_isel.c) so
> doing so for D32 would be considered "normal".
>
>
>
> > Notation Description
> >
> > --------------------------------------------------------------------------
> > - DFP: Decimal Floating Point number possibly 32-bit, 64-bit, or
> > 128-bit format
> >
> > D32: 32-bit decimal floating point format These values
> > use F64 registers. The D32 term is used to distinguish the value from the
> > standard 32-bit floating point value.
> >
> > D64: 64-bit decimal floating point format. These values
> > use F64 registers. The D64 term is used to distinguish the value from the
> > standard 64-bit floating point value.
> >
> > D128: 128-bit decimal floating point format. These
> > values use a pair of two 64 bit floating point registers (F64). The
> > instruction only references the first register of the register pair. The
> > second register is implied.
>
> As per comments above, the comments re the PPC register layouts isn't
> directly relevant to what you need in the IR. (I don't care; I just want
> to be sure we have our concepts straight here)
>
>
> > IRRoundingMode(I32): Indicates the I32 argument is used
> > to hold the bits that specify the rounding mode to be used by the
> > instruction.
>
> Fine; as per existing code.
>
>
> > IRRoundingExceptionModes(I32): Indicates the I32 argument is used
> > to hold the bits that specify the rounding mode and the bits that specify
> > the exception mode to be used by the instruction. The s390 instructions
> > specify both modes whereas POWER does not specify either mode.
>
> Hmm. Have you read (in detail) the limitations re floating point
> described at
> http://valgrind.org/docs/manual/manual-core.html#manual-core.limits
>
> The point is that IR and Valgrind generally doesn't provide support
> for non-default exception modes, and silently assumes that exceptions
> are to be fixed up using the default IEEE fixup actions. So there's
> no point at the moment in adding exception action information into
> the IR. None of the other front ends (xxx_to_IR.c) do it.
>
>
> > IRRoundingModeAndEponent(I32): Indicates the I32 argument will
> > contain bits to specify the rounding mode. POWER also has bits to specify
> > the desired exponent. The s390 instructions only specify the rounding
> > mode.
>
> Euh, can you elaborate on the encoding/meaning of "desired exponent" ?
> Sounds a bit like wiring a POWER-ism into the IR spec, which isn't good.
>
>
> > ARITHMETIC INSTRUCTIONS
> > -----------------------
> > IRRoundingMode(I32) X D64 X D64 -> D64
> > Iop_AddD64, Iop_SubD64, Iop_MulD64, Iop_DivD64
> >
> > IRRoundingMode(I32) X D128 X D128 -> D128
> > Iop_AddD128, Iop_SubD128, Iop_MulD128, Iop_DivD128
>
> fine
>
>
> > FORMAT CONVERSION INSTRUCTIONS
> > ------------------------------
> > InvOperationModes(I32) x D32 -> D64
> > Iop_D32toD64
>
> what's InvOperationModes? It's not specified anywhere in your doc.
>
> > InvOperationModes(I32) x D64 -> D128
> > Iop_D64toD128
>
> ditto
>
> > IRRoundingExceptionModes(I32) x D64 -> D32
> > Iop_RoundD64toD32
> >
> > IRRoundingExceptionModes(I32) x D128 -> D64
> > Iop_RoundD128toD64
> >
> > IRRoundingExceptionModes(I32) x I64 -> D64
> > Iop_I64StoD64
>
> ok
>
> > IRRoundingExceptionModes(I32) x D64 -> I64
> > Iop_D64toI64
>
> this is underspecified .. you need to decide whether that's a
> conversion to signed or unsigned I64 (or maybe you need both)
> and call them Iop_D64toI64S or Iop_D64toI64U respectively.
> (I think you comment about this further down in the doc.)
> I mention this partly because sorting out such ambiguity in the
> past for the Fxx->Ixx conversions required a lot of hoop
> jumping, so we might as well get it straightened out up front.
>
>
> > ROUNDING INSTRUCTIONS
> > -----------------------
> > IRRoundingMode(I32) x D64 -> D64
> > Iop_RoundD64
> >
> > IRRoundingMode(I32) x D128 -> D128
> > Iop_RoundD128
>
> ok
>
>
> > COMPARE INSTRUCTIONS
> > -----------------------
> > D64 x D64 -> IRCmpF64Result(I32)
> > Iop_CmpD64
> >
> > D128 x D128 -> IRCmpF64Result(I32)
> > Iop_CmpD128
>
> ok
>
>
> > D64 x D64 -> 1 if the condition is TRUE, 0 otherwise
> > Iop_CmpEQD64, Iop_CmpLTD64, Iop_CmpGTD64
> >
> > D128 x D128 -> 1 if the condition is TRUE; 0 otherwise;
> > Iop_CmpEQD128, Iop_CmpLTD128, Iop_CmpGTD128
>
> why are these 6 necessary? Isn't their functionality a subset of
> Iop_CmpD64 and Iop_CmpD128 ?
>
>
> > QUANTIZE AND ROUND INSTRUCTIONS
> > -------------------------------
> > IRRoundingMode(I32) x D64-> D64
> > Iop_QuantizeID64,
> >
> > IRRoundingMode(I32) x D128-> D128
> > Iop_QuantizeID128
> >
> > IRRoundingMode(I32) x D64 x D64 -> D64
> > Iop_QuantizeD64
> >
> > IRRoundingMode(I32) x D128 x D128 -> D128
> > Iop_QuantizeD128
>
> I'm not clear what the ID vs D signifies in these names. Can
> they instead be called Iop_Quantize{Un,Bin}{D64,D128} to denote
> unary vs binary ness (ignoring the rounding mode arg which is
> present in all 4 cases).
>
> What is quantization, anyway (in the context of DFP I mean)?
> Does it have any analogue in traditional IEEE754 FP ?
>
> > IRRoundingMode(I32) x D64 x D64 -> D64
> > Iop_SignificanceRoundD64
> >
> > IRRoundingMode(I32) x D128 x D128 -> D128
> > Iop_SignificanceRoundD128
> >
> >
> > EXTRACT AND INSERT INSTRUCTIONS
> > -------------------------------
> > D64 -> I64
> > Iop_ExtractExpD64
> >
> > D128 -> I64
> > Iop_ExtractExpD128
>
> The exponent really needs 64 bits? Can it be 32 bits? That
> might allow for more efficient code generation for 32 bit targets.
>
> > I64 x I64 -> D64
> > Iop_InsertExpD64
> >
> > I64 x I128 -> D128
> > Iop_InsertExpD128
>
> ditto comment
>
> > SHIFT SIGNIFICAND INSTRUCTIONS
> > -------------------------------
> > U16 x D64 -> D64
> > Iop_ShlD64, Iop_ShrD64
> >
> > U16 x D128 -> D128
> > Iop_ShlD128, Iop_ShrD128
>
> two things: (1) does the shift amount need to be 16 bits?
> For all the other shifting style ops we have, the shift amount
> is encoded in 8 bits (Ity_I8) and I would prefer to stick with
> that for consistency, if possible. (2) pls put the shift amount
> as the second argument, not the first, as that too is consistent
> with all other shift ops we have (eg, Iop_Shr64)
>
> ---------------
>
> > This section give the detailed mapping of Power and s390
> > instructions to the proposed DFP Iops or how they would be implemented
> > with the existing Iops and the proposed DFP Iops.
>
> I'll comment on this second half of the proposal tomorrow.
>
> J
>
|
|
From: Florian K. <br...@ac...> - 2011-12-29 16:15:38
|
Here are my comments. Everything I say for s390 is based on chapter 20 of this document: http://publibfi.boulder.ibm.com/epubs/pdf/dz9zr008.pdf > Notation Description > > ------------------------------------------------------------------ > DFP: Decimal Floating Point number possibly 32-bit, 64-bit, or > 128-bit format > > D32: 32-bit decimal floating point format These values use F64 > registers. The D32 term is used to distinguish the value > from the standard 32-bit floating point value. > > D64: 64-bit decimal floating point format. These values use F64 > registers. The D64 term is used to distinguish the value > from the standard 64-bit floating point value. > > D128: 128-bit decimal floating point format. These values use a pair > of two 64 bit floating point registers (F64). The instruction > only references the first register of the register pair. The > second register is implied. > So we need three new IRTypes Ity_D32 Ity_D64 Ity_D128 as Julian suggested in his reply. > IRRoundingMode(I32): Indicates the I32 argument is used to hold the > bits that specify the rounding mode to be used > by the instruction. > IEEE Std 754-2008 says (4.3.3) that A decimal format implementation of this standard shall provide roundTiesToAway as a user-selectable rounding-direction attribute. with: roundTiesToAway, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with larger magnitude shall be delivered. We probably should extend IRRoundingMode accordingly. s930 DFP actually has 9 rounding modes According to FPC setting Round toward 0 Round away from 0 Round toward +inf Round toward -inf Round to nearest with ties away from 0 Round to nearest with ties to even Round to nearest with ties toward 0 Round to prepare for shorter precision We can handle the unsupported ones as we do for binary floating point and map them to Irrm_NEAREST until problems arise. > IRRoundingExceptionModes(I32): Indicates the I32 argument is used to hold the > bits that specify the rounding mode and the bits > that specify the exception mode to be used by > the instruction. The s390 instructions specify > both modes whereas POWER does not specify either > mode. > If we keep support for floating point exceptions at the current level, then these exception modes do not need to be modelled in the IR. I.e. in the following IRRoundingExceptionModes can be replaced with IRRoundingMode. > IRRoundingModeAndEponent(I32): Indicates the I32 argument will contain bits to > specify the rounding mode. POWER also has bits > to specify the desired exponent. The s390 > instructions only specify the rounding mode. > > This is similar to IRRoundingExceptionModes, i.e. replace with IRRoundingMode. > > ARITHMETIC INSTRUCTIONS > ----------------------- > IRRoundingMode(I32) X D64 X D64 -> D64 > Iop_AddD64, Iop_SubD64, Iop_MulD64, Iop_DivD64 > > IRRoundingMode(I32) X D128 X D128 -> D128 > Iop_AddD128, Iop_SubD128, Iop_MulD128, Iop_DivD128 > OK > > FORMAT CONVERSION INSTRUCTIONS > ------------------------------ > InvOperationModes(I32) x D32 -> D64 > Iop_D32toD64 > You did not describe InvOperationModes. But looking at insn LDETR (which is D32 -> D64 conversion) I gather that InvOperationModes controls whether or not the IEEE-invalid-operation-exception is delivered. It's essentially a Boolean value. I propose to ignore it and deliver the exception unconditionally (which is what we do for binary floating point). > IRRoundingExceptionModes(I32) x D64 -> D32 > Iop_RoundD64toD32 > > IRRoundingExceptionModes(I32) x D128 -> D64 > Iop_RoundD128toD64 > These two should be renamed to Iop_D64toD32 and Iop_D128toD64 for symmetry in naming with binary floating point ops. > IRRoundingExceptionModes(I32) x I64 -> D64 > Iop_I64StoD64 > OK. For s390 we also need: IROp description s390 insn Iop_I64StoD128 IRRoundingMode(I32) x signed I64 -> D128 CXGTR Iop_I32StoD64 signed I32 -> D64 CDFTR Iop_I32StoD128 signed I32 -> D128 CXFTR Iop_I64UtoD64 IRRoundingMode(I32) x unsigned I64 -> D64 CDLGTR Iop_I64UtoD128 IRRoundingMode(I32) x unsigned I64 -> D128 CXLGTR Iop_I32UtoD64 unsigned I32 -> D64 CDLFTR Iop_I32UtoD128 unsigned I32 -> D128 CXLFTR > IRRoundingExceptionModes(I32) x D64 -> I64 > Iop_D64toI64 > We need both: conversion to signed and unsigned int IROp description s390 insn Iop_D64toI64S IRRoundingMode(I32) x D64 -> signed I64 CGDTR(A) Iop_D128toI64S IRRoundingMode(I32) x D128 -> signed I64 CGXTR(A) Iop_D64toI32S IRRoundingMode(I32) x D64 -> signed I32 CFDTR Iop_D128toI32S IRRoundingMode(I32) x D128 -> signed I32 CFXTR Iop_D64toI64U IRRoundingMode(I32) x D64 -> unsigned I64 CLGDTR Iop_D128toI64U IRRoundingMode(I32) x D128 -> unsigned I64 CLGXTR Iop_D64toI32U IRRoundingMode(I32) x D64 -> unsigned I32 CLFDTR Iop_D128toI32U IRRoundingMode(I32) x D128 -> unsigned I32 CLFXTR Note, the new IRops for conversion to 32-bit wide results and from D128. > > ROUNDING INSTRUCTIONS > ----------------------- > IRRoundingMode(I32) x D64 -> D64 > Iop_RoundD64 > > IRRoundingMode(I32) x D128 -> D128 > Iop_RoundD128 > These should be named Iop_RoundD64toInt and Iop_RoundD128toInt for symmetry in naming with binary floating point ops. > > COMPARE INSTRUCTIONS > ----------------------- > D64 x D64 -> IRCmpF64Result(I32) > Iop_CmpD64 > > D128 x D128 -> IRCmpF64Result(I32) > Iop_CmpD128 > OK. I would use IRCmpD64Result and IRCmpD128Result. That allows us to use a different encoding, which may be desirable. > D64 x D64 -> 1 if the condition is TRUE, 0 otherwise > Iop_CmpEQD64, Iop_CmpLTD64, Iop_CmpGTD64 > These ops appear unused. > D128 x D128 -> 1 if the condition is TRUE; 0 otherwise; > Iop_CmpEQD128, Iop_CmpLTD128, Iop_CmpGTD128 > These are unused, too. > > QUANTIZE AND ROUND INSTRUCTIONS > ------------------------------- > EXTRACT AND INSERT INSTRUCTIONS > ------------------------------- Do we need to support these at all? In other words, does GCC issue these or do they show up in hand crafted assembler shipped with GCC/GLIBC? I don't know but will find out (for s390). > > FORMAT CONVERSION INSTRUCTIONS > Iop s390 Power Description of instruction, implementation > opcode opcode details > > --------------------------------------------------------------------- > TBD > PFPO > The PFPO instruction operation is [snip] To be done.... > COMPARE INSTRUCTIONS > > -------------------------------------------------------------------- > > Iop_CmpD64 D64 x D64 -> IRCmpF64Result(I32) > dcmpo > dcmpu > > Iop_CmpD128 D128 x D128 -> IRCmpF64Result(I32) > dcmpoq > dcmpuq > The s390 insns are CDTR and CXTR, respectively. The possible condition codes are: 0 Operands equal 1 First operand low 2 First operand high 3 Operands unordered which is what IRCmpF64Result provides. So we could reuse it: typedef IRCmpF64Result IRCmpD64Result; typedef IRCmpF128Result IRCmpD128Result; or choose a different encoding. > > QUANTIZE AND ROUND INSTRUCTIONS > ----------------------------------------------------------- > > Iop_QuantizeID64 IRRoundingModeAndEponent(I32) x D64 -> D64 > QADTR dquai > The D64 source operand is converted and rounded > to the form with the immediate exponent > specified by the rounding and exponent > parameter. > > Iop_QuantizeID128 IRRoundingModeAndExponent(I32) x D128-> D128 > QAXTR dquaiq > The D128 source operand is converted and > rounded to the form with the immediate exponent > specified by the rounding and exponent > parameter. > QADTR (QAXTR) have two D64 (D128) operands, a rounding mode and a D64 (D128) result (ignoring the quantum exception control): IRRoundingMode(I32) x D64 x D64 -> D64 Did you mix this up perhaps with Iop_QuantizeD64/D128? But as I said earlier, perhaps we don't need to support these. Cheers, Florian |
|
From: Florian K. <br...@ac...> - 2012-01-04 15:23:44
|
On 12/29/2011 11:15 AM, Florian Krohm wrote:
>>
>> QUANTIZE AND ROUND INSTRUCTIONS
>> -------------------------------
>> EXTRACT AND INSERT INSTRUCTIONS
>> -------------------------------
>
> Do we need to support these at all? In other words, does GCC issue these
> or do they show up in hand crafted assembler shipped with GCC/GLIBC?
> I don't know but will find out (for s390).
>
Yes, we need to support them. They are used in libdfp.
s390 also has an "extract significance" opcode which extracts the
significant digits of a DFP number and converts them to a signed 64-bit
integer. It's also used in libdfp.
Iop_ExtractSigD64 D64 -> I64
Iop_ExtractSigD128 D128 -> I64
>>
>> FORMAT CONVERSION INSTRUCTIONS
>> Iop s390 Power Description of instruction,
> implementation
>> opcode opcode details
>>
>> ---------------------------------------------------------------------
>> TBD
>> PFPO
>> The PFPO instruction operation is
>
> To be done....
>
The PFPO insn is used to convert between binary floating point and
decimal floating point. Since we have 3 formats each, that makes 9
conversion ops for each direction:
Iop_D32toF32 IRRoundingMode(I32) x D32 -> F32
Iop_D32toF64 IRRoundingMode(I32) x D32 -> F64
Iop_D32toF128 IRRoundingMode(I32) x D32 -> F128
Iop_D64toF32 IRRoundingMode(I32) x D64 -> F32
Iop_D64toF64 IRRoundingMode(I32) x D64 -> F64
Iop_D64toF128 IRRoundingMode(I32) x D64 -> F128
Iop_D128toF32 IRRoundingMode(I32) x D128 -> F32
Iop_D128toF64 IRRoundingMode(I32) x D128 -> F64
Iop_D128toF128 IRRoundingMode(I32) x D128 -> F128
Iop_F32toD32 IRRoundingMode(I32) x F32 -> D32
Iop_F32toD64 IRRoundingMode(I32) x F32 -> D64
Iop_F32toD128 IRRoundingMode(I32) x F32 -> D128
Iop_F64toD32 IRRoundingMode(I32) x F64 -> D32
Iop_F64toD64 IRRoundingMode(I32) x F64 -> D64
Iop_F64toD128 IRRoundingMode(I32) x F64 -> D128
Iop_F128toD32 IRRoundingMode(I32) x F128 -> D32
Iop_F128toD64 IRRoundingMode(I32) x F128 -> D64
Iop_F128toD128 IRRoundingMode(I32) x F128 -> D128
I haven't studied the details of the formats so I'm not sure that a
rounding mode is needed for some of the conversions. PFPO allows the
specification of a rounding mode in all cases but that doesn't mean it's
needed..
Anybody has insight into this?
The s390 maintainer for GCC confirmed that these conversions are in fact
all used.
Florian
|
|
From: Christian B. <bor...@de...> - 2012-01-04 19:14:33
|
> The PFPO insn is used to convert between binary floating point and > decimal floating point. Since we have 3 formats each, that makes 9 > conversion ops for each direction: > > Iop_D32toF32 IRRoundingMode(I32) x D32 -> F32 > Iop_D32toF64 IRRoundingMode(I32) x D32 -> F64 > Iop_D32toF128 IRRoundingMode(I32) x D32 -> F128 > Iop_D64toF32 IRRoundingMode(I32) x D64 -> F32 > Iop_D64toF64 IRRoundingMode(I32) x D64 -> F64 > Iop_D64toF128 IRRoundingMode(I32) x D64 -> F128 > Iop_D128toF32 IRRoundingMode(I32) x D128 -> F32 > Iop_D128toF64 IRRoundingMode(I32) x D128 -> F64 > Iop_D128toF128 IRRoundingMode(I32) x D128 -> F128 > > Iop_F32toD32 IRRoundingMode(I32) x F32 -> D32 > Iop_F32toD64 IRRoundingMode(I32) x F32 -> D64 > Iop_F32toD128 IRRoundingMode(I32) x F32 -> D128 > Iop_F64toD32 IRRoundingMode(I32) x F64 -> D32 > Iop_F64toD64 IRRoundingMode(I32) x F64 -> D64 > Iop_F64toD128 IRRoundingMode(I32) x F64 -> D128 > Iop_F128toD32 IRRoundingMode(I32) x F128 -> D32 > Iop_F128toD64 IRRoundingMode(I32) x F128 -> D64 > Iop_F128toD128 IRRoundingMode(I32) x F128 -> D128 If you look at pfpo, then the instruction has the same tricky behaviour as EXecute. Since a self checking prefix and 18 Iops is pretty expensive I think that pfpo qualifies for a helper. |
|
From: Carl E. L. <ce...@li...> - 2012-01-04 16:43:11
|
On Wed, 2012-01-04 at 10:23 -0500, Florian Krohm wrote:
> On 12/29/2011 11:15 AM, Florian Krohm wrote:
>
> >>
> >> QUANTIZE AND ROUND INSTRUCTIONS
> >> -------------------------------
> >> EXTRACT AND INSERT INSTRUCTIONS
> >> -------------------------------
> >
> > Do we need to support these at all? In other words, does GCC issue these
> > or do they show up in hand crafted assembler shipped with GCC/GLIBC?
> > I don't know but will find out (for s390).
> >
>
> Yes, we need to support them. They are used in libdfp.
>
> s390 also has an "extract significance" opcode which extracts the
> significant digits of a DFP number and converts them to a signed 64-bit
> integer. It's also used in libdfp.
>
> Iop_ExtractSigD64 D64 -> I64
> Iop_ExtractSigD128 D128 -> I64
These two were included in the proposal as TBD Specifically, s390
instructions ESDTR and ESXTR. Not sure if the instruction can be easily
emulated with existing Iops and the proposed Iops or not. I left this
to the s390 team to make the call. The question is, can they be easily
emulated or not? I have not tried to do it yet. Although POWER doesn't
have this instruction, we could do a proof of concept implementation for
Power to see how hard it would be to emulate. I could take a look at
this once I finish the initial implementation of the other POWER
instructions. I think I have about 5 more to do to finish off the 49
POWER instructions. Let me know if you think it would be worth doing a
proof of concept implementation on Power so we can decide if we really
need the Iops or not.
>
> >>
> >> FORMAT CONVERSION INSTRUCTIONS
> >> Iop s390 Power Description of instruction,
> > implementation
> >> opcode opcode details
> >>
> >> ---------------------------------------------------------------------
> >> TBD
> >> PFPO
> >> The PFPO instruction operation is
> >
> > To be done....
> >
>
> The PFPO insn is used to convert between binary floating point and
> decimal floating point. Since we have 3 formats each, that makes 9
> conversion ops for each direction:
>
> Iop_D32toF32 IRRoundingMode(I32) x D32 -> F32
> Iop_D32toF64 IRRoundingMode(I32) x D32 -> F64
> Iop_D32toF128 IRRoundingMode(I32) x D32 -> F128
> Iop_D64toF32 IRRoundingMode(I32) x D64 -> F32
> Iop_D64toF64 IRRoundingMode(I32) x D64 -> F64
> Iop_D64toF128 IRRoundingMode(I32) x D64 -> F128
> Iop_D128toF32 IRRoundingMode(I32) x D128 -> F32
> Iop_D128toF64 IRRoundingMode(I32) x D128 -> F64
> Iop_D128toF128 IRRoundingMode(I32) x D128 -> F128
>
> Iop_F32toD32 IRRoundingMode(I32) x F32 -> D32
> Iop_F32toD64 IRRoundingMode(I32) x F32 -> D64
> Iop_F32toD128 IRRoundingMode(I32) x F32 -> D128
> Iop_F64toD32 IRRoundingMode(I32) x F64 -> D32
> Iop_F64toD64 IRRoundingMode(I32) x F64 -> D64
> Iop_F64toD128 IRRoundingMode(I32) x F64 -> D128
> Iop_F128toD32 IRRoundingMode(I32) x F128 -> D32
> Iop_F128toD64 IRRoundingMode(I32) x F128 -> D64
> Iop_F128toD128 IRRoundingMode(I32) x F128 -> D128
Again, the s390 PFPO instruction is in the proposal. It was left as a
TBD for the s390 team to decide on the needed Iops. Not sure that you
really need to introduce an Iop for everyone of these. If you
introduced just the Iop_D128toF128,
you could then emulate the other conversions using this Iop. For
example, if you had a D32 you would use the existing Iops to convert it
to a D128, then use the Iop_D128toF128 to convert it and then narrow the
result to the desired size.
You could do something similar with the float to DFP. Does that seem
reasonable?
The question is, what is the minimum number of Iops that really need to
be introduced?
Thoughts?
Carl Love
>
> I haven't studied the details of the formats so I'm not sure that a
> rounding mode is needed for some of the conversions. PFPO allows the
> specification of a rounding mode in all cases but that doesn't mean it's
> needed..
> Anybody has insight into this?
>
> The s390 maintainer for GCC confirmed that these conversions are in fact
> all used.
>
> Florian
>
|
|
From: Florian K. <br...@ac...> - 2012-01-05 14:24:35
|
On 01/04/2012 11:41 AM, Carl E. Love wrote: > On Wed, 2012-01-04 at 10:23 -0500, Florian Krohm wrote: >> On 12/29/2011 11:15 AM, Florian Krohm wrote: >> >> s390 also has an "extract significance" opcode which extracts the >> significant digits of a DFP number and converts them to a signed 64-bit >> integer. It's also used in libdfp. >> >> Iop_ExtractSigD64 D64 -> I64 >> Iop_ExtractSigD128 D128 -> I64 > > These two were included in the proposal as TBD Specifically, s390 > instructions ESDTR and ESXTR. Not sure if the instruction can be easily > emulated with existing Iops and the proposed Iops or not. I left this > to the s390 team to make the call. Yes, and as a member of that team I just put my stake in the ground. > The question is, can they be easily > emulated or not? It is a philosophical question. The operation could possibly be emulated using existing ops. But when you do that you loose the information that this was an "extract DFP significant" operation. Does that matter? Not for today's suite of tools. But it you wanted to do some analysis specific to DFP then it would matter. I think that we should either make all DFP operations visible in the IR or none at all e.g. by using a clean helper. The latter was already voted against by Julian IIRC on grounds of performance (although that could probably be remedied by inventing a modified helper call). I'd prefer making all DFP operations explicitly visible in the IR. > Again, the s390 PFPO instruction is in the proposal. It was left as a > TBD for the s390 team to decide on the needed Iops. Not sure that you > really need to introduce an Iop for everyone of these. If you > introduced just the Iop_D128toF128, > you could then emulate the other conversions using this Iop. For > example, if you had a D32 you would use the existing Iops to convert it > to a D128, then use the Iop_D128toF128 to convert it and then narrow the > result to the desired size. > > You could do something similar with the float to DFP. Does that seem > reasonable? I do not know. I cannot judge how hard it would be to ensure correctness in all cases. Florian |