Re: [Valgrind-developers] Proposal to add Iops for DFP support

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, 2012-01-19 at 19:12 +0100, Julian Seward wrote:
> > If there is value in adding the DFP type, I am willing to change my
> > preliminary POWER7 implementation to add the DFP types rather then using
> > the floating point types as I currently have.  Having the DFP type may
> > make the overall code clearer then having Iops with the D32, D64 and D128
> > in the name but actually operate on F32, F64 and F128 values.
> > 
> > Thoughts on adding DFP type?
> 
> Yes, please do.  It does cause the code to be a little longer, but it's
> also clearer and easier to verify as being correct.  In the long run the
> latter points are much more important.
> 
> 
> 
> > The POWER7 and s390 machines support the additional four rounding modes for
> > DFP (total of 8 not 9 modes) in the IEEE specification.  After some
> > thought, I think it would be better to explicitly change the rounding mode
> > specifier from IRRoundingMode(I32) to IRRoundingModeDFP(I32).  The concern
> > is if we extend the existing rounding mode specifier then there will be
> > rounding modes for which the existing binary floating point instructions
> > are not specified to support.  Yes, it is some replication of code to
> > create a second super-set of rounding modes for DFP but from an overall
> > consistency, clarity and accuracy standpoint I think it might be best.
> 
> I agree.   +1 for IRRoundingModeDFP.  It will have 8 possible values,
> yes?
> 
> 
> 
> > OK, I had not specifically read the document above but was aware that
> > Valgrind only had limited rounding and exception support.  I included
> > the full rounding and exception support for completeness.  I thought it
> > better to include it at this point then to gloss over it.  I added a
> > comment to the document ahead of the field definitions to that effect.
> > So, if there is no chance that the exception support will be added then
> > we can just drop the exception specification part.
> > 
> > Thoughts?
> 
> Drop the exception specification part.  If we ever need to support FP
> exceptions then we will need to design a solution which works cleanly
> for both regular FP and DFP.
> 
> 
> 
> > The immediate exponent value in the instruction is specific to the POWER
> > processor.  I have spent some time reconsidering this.  We really don't
> > need it so I will remove the  IRRoundingModeAndEponent(I32).
> 
> Good.
> 
> 
> 
> > Comment from Florian:
> > > You did not describe InvOperationModes. But looking at insn LDETR
> > > (which is D32 -> D64 conversion) I gather that InvOperationModes
> > > controls whether or not the IEEE-invalid-operation-exception  is
> > > delivered. It's essentially a Boolean value. I propose to ignore it
> > > and deliver the exception unconditionally (which is what we do for
> > > binary floating point).
> > 
> > Julian do you agree we should remove the InvOperationModes(I32) from
> > the Iop specification?  I am OK with it.
> 
> Yes, please remove.
> 
> 
> 
> > > >         InvOperationModes(I32) x D64 -> D128
> > > >         Iop_D64toD128
> > > 
> > > ditto
> > > 
> > > >         IRRoundingExceptionModes(I32) x D64 -> D32
> > > >         Iop_RoundD64toD32
> > > >         
> > > >         IRRoundingExceptionModes(I32) x D128 -> D64
> > > >         Iop_RoundD128toD64
> > > >         
> > > >         IRRoundingExceptionModes(I32) x I64 -> D64
> > > >         Iop_I64StoD64
> > > 
> > > ok
> > 
> > Comment from Florian:
> > > These two should be renamed to Iop_D64toD32 and Iop_D128toD64 for
> > > symmetry in naming with binary floating point ops.
> > 
> > I disagree.  The drsp instruction is analogous to the binary FP frsp
> > instruction, which (for ppc64) uses the Iop Iop_RoundF64ToF32.  The
> > distinction is that an Iop that does not have "Round" in the name is
> > a conversion from one general type (i.e., floating point) to a different
> > general type (i.e., integer); whereas an Iop that *does* have "Round"
> > in the name is a narrowing of the same general type.
> 
> Florian is correct, assuming that the types for these operations
> (D64 -> D32 etc) are really what you (Carl) intended.  Problem is
> that it's unclear what behaviour you want.  Your options are
> 
>    round and convert to a different format
>      in which case you want the  (eg)  Iop_D64toD32  name style
>      and the IR types for src and dst will then be different
>      (eg D64 and D32 in this case)
> 
>   round (to a smaller range) but remain in the same format
>      in which case you want the Iop_RoundD64toD32 names
>      in this case the src and dst types are the same (D64)
>      and D32 merely indicates the range to which the value is
>      rounded
> 
> so which is it?
> 
> 

The intention is the Iop_RoundD128toD64 Iop takes a value with the
physical D128 bit encoding and generates a result with the physical D64
bit encoding.  Similarly for the Iop_RoundD64toD32 Iop.  So based on
your definitions above, the first one (round and convert to a different
format) is correct and the Iops need to be renamed.  The above
clarification was really helpful to understand the naming convention.
Thanks.  I will make the Iop name changes.

> 
> 
> > Comment from Florian:
> > > For s390 we also need:
> > >   IROp                         description                   s390 insn
> > > 
> > > Iop_I64StoD128   IRRoundingMode(I32) x   signed I64 -> D128     CXGTR
> > > Iop_I32StoD64                            signed I32 -> D64      CDFTR
> > > Iop_I32StoD128                           signed I32 -> D128     CXFTR
> > > Iop_I64UtoD64    IRRoundingMode(I32) x unsigned I64 -> D64      CDLGTR
> > > Iop_I64UtoD128   IRRoundingMode(I32) x unsigned I64 -> D128     CXLGTR
> > > Iop_I32UtoD64                          unsigned I32 -> D64      CDLFTR
> > > Iop_I32UtoD128                         unsigned I32 -> D128     CXLFTR
> > > 
> > > We need both: conversion to signed and unsigned int
> > > 
> > >    IROp                         description                   s390 insn
> > > 
> > > Iop_D64toI64S    IRRoundingMode(I32) x D64  ->   signed I64     CGDTR(A)
> > > Iop_D128toI64S   IRRoundingMode(I32) x D128 ->   signed I64     CGXTR(A)
> > > Iop_D64toI32S    IRRoundingMode(I32) x D64  ->   signed I32     CFDTR
> > > Iop_D128toI32S   IRRoundingMode(I32) x D128 ->   signed I32     CFXTR
> > > Iop_D64toI64U    IRRoundingMode(I32) x D64  -> unsigned I64     CLGDTR
> > > Iop_D128toI64U   IRRoundingMode(I32) x D128 -> unsigned I64     CLGXTR
> > > Iop_D64toI32U    IRRoundingMode(I32) x D64  -> unsigned I32     CLFDTR
> > > Iop_D128toI32U   IRRoundingMode(I32) x D128 -> unsigned I32     CLFXTR
> > > 
> > > Note, the new IRops for conversion to 32-bit wide results and from D128.
> > 
> > Comment from Christian Borntraege:
> > > > The PFPO insn is used to convert between binary floating point and
> > > > decimal floating point. Since we have 3 formats each, that makes 9
> > > > conversion ops for each direction:
> > > > 
> > > > Iop_D32toF32     IRRoundingMode(I32) x D32  -> F32
> > > > Iop_D32toF64     IRRoundingMode(I32) x D32  -> F64
> > > > Iop_D32toF128    IRRoundingMode(I32) x D32  -> F128
> > > > Iop_D64toF32     IRRoundingMode(I32) x D64  -> F32
> > > > Iop_D64toF64     IRRoundingMode(I32) x D64  -> F64
> > > > Iop_D64toF128    IRRoundingMode(I32) x D64  -> F128
> > > > Iop_D128toF32    IRRoundingMode(I32) x D128 -> F32
> > > > Iop_D128toF64    IRRoundingMode(I32) x D128 -> F64
> > > > Iop_D128toF128   IRRoundingMode(I32) x D128 -> F128
> > > > 
> > > > Iop_F32toD32     IRRoundingMode(I32) x F32  -> D32
> > > > Iop_F32toD64     IRRoundingMode(I32) x F32  -> D64
> > > > Iop_F32toD128    IRRoundingMode(I32) x F32  -> D128
> > > > Iop_F64toD32     IRRoundingMode(I32) x F64  -> D32
> > > > Iop_F64toD64     IRRoundingMode(I32) x F64  -> D64
> > > > Iop_F64toD128    IRRoundingMode(I32) x F64  -> D128
> > > > Iop_F128toD32    IRRoundingMode(I32) x F128 -> D32
> > > > Iop_F128toD64    IRRoundingMode(I32) x F128 -> D64
> > > > Iop_F128toD128   IRRoundingMode(I32) x F128 -> D128
> > > 
> > > If you look at pfpo, then the instruction has the same tricky
> > > behaviour as EXecute. Since a self checking prefix and 18 Iops is
> > > 
> > >  pretty expensive I think that pfpo qualifies for a helper.
> > 
> > These conversion modes are specifically support by s390 but not POWER.
> > POWER supports a subset of the conversions supported by s390.  This is
> > one of the areas where I feel we should defer the decision on adding more
> > Iops, using a helper function or emulating these conversions with the
> > subset of conversions that have been proposed until the s390 team is ready
> > to add the s390 support to Valgrind.  This is beyond the scope of what is
> > needed for me to add the POWER support.  But I think it is important that
> > the issue has been raised to understand the full scope of what is needed
> > by both architectures.
> 
> Ok.  I agree .. just add the conversions needed for POWER, unless
> Christian/Florian want to also add the S390 needed conversions now
> (I am unclear if you do, or not)

I have no need to support all these on POWER.  I will put the following
into version 2 of the proposal so it is clear we may need additional
Iops in the future for s390.

        "Additional format conversion Iops for converting to/from
        decimal floating point and binary floating point may need to be
        added later for the s390 support."

I think it is important that we state some additional Iops maybe needed
for other architectures.  But I would like to defer the final decision
on exactly how these conversions are supported via additional Iops or a
helper function to when the s390 support is done.

> 
> 
> 
> > 
> > > >         IRRoundingExceptionModes(I32) x D64 -> I64
> > > >         Iop_D64toI64
> > > 
> > > this is underspecified .. you need to decide whether that's a
> > > conversion to signed or unsigned I64 (or maybe you need both)
> > > and call them Iop_D64toI64S or Iop_D64toI64U respectively.
> > > (I think you comment about this further down in the doc.)
> > > I mention this partly because sorting out such ambiguity in the
> > > past for the Fxx->Ixx conversions required a lot of hoop
> > > jumping, so we might as well get it straightened out up front.
> > 
> > OK, should be convert to signed integer.  Changed the name to
> > Iop_D64toI64S.
> 
> Good.
> 
> 
> 
> > > >         ROUNDING INSTRUCTIONS
> > > >         -----------------------
> > > >         IRRoundingMode(I32) x D64 -> D64
> > > >         Iop_RoundD64
> > > >         
> > > >         IRRoundingMode(I32) x D128 -> D128
> > > >         Iop_RoundD128
> > > 
> > > ok
> > 
> > Comment from Florian:
> > > These should be named Iop_RoundD64toInt and Iop_RoundD128toInt for
> > > symmetry in naming with binary floating point ops.
> > 
> > I am OK with the name change.  Comments?
> 
> /me agrees with Florian.
> 
> 
> 
> > 
> > > >         COMPARE INSTRUCTIONS
> > > >         -----------------------
> > > >         D64 x D64 -> IRCmpF64Result(I32)
> > > >         Iop_CmpD64
> > > >         
> > > >         D128 x D128 -> IRCmpF64Result(I32)
> > > >         Iop_CmpD128
> > > 
> > > ok
> > 
> > Comment from Florian:
> > > OK. I would use IRCmpD64Result and IRCmpD128Result. That allows
> > > us to use a different encoding, which may be desirable.
> > 
> > I am OK with the name change.  Comments?
> 
> Fine by me.
> 
> 
> 
> > > >         QUANTIZE AND ROUND INSTRUCTIONS
> > > >         -------------------------------
> > > >         IRRoundingMode(I32) x D64-> D64
> > > >         Iop_QuantizeID64,
> > > >         
> > > >         IRRoundingMode(I32) x D128-> D128
> > > >         Iop_QuantizeID128
> > > >         
> > > >         IRRoundingMode(I32) x D64 x D64 -> D64
> > > >         Iop_QuantizeD64
> > > >         
> > > >         IRRoundingMode(I32) x D128 x D128 -> D128
> > > >         Iop_QuantizeD128
> > > 
> > > I'm not clear what the ID vs D signifies in these names.  Can
> > > they instead be called Iop_Quantize{Un,Bin}{D64,D128} to denote
> > > unary vs binary ness (ignoring the rounding mode arg which is
> > > present in all 4 cases).
> > 
> > The first two with the I are for immediate value of the exponent for the
> > quantization operation.  Remember above, I had the
> > IRRoundingModeAndEponent(I32) specification.  It was also there for
> > specifying the immediate value.  I didn't realize that I had a redundant
> > way of specifying the immediate exponent value. So, as said above, we drop
> > IRRoundingModeAndEponent(I32) specification.
> > 
> > > What is quantization, anyway (in the context of DFP I mean)?
> > > Does it have any analogue in traditional IEEE754 FP ?
> > 
> > Quantization is the process of rounding a number to a specified exponent as
> > mentioned earlier.  For example, the U.S. specifies a dollar amount with at
> > most two fractional digits, for example $12.98.  When doing a computation
> > involving money, the quantization instruction would be used to round the
> > result to two fractional digits.
> > 
> > The immediate value to specify the desired exponent is specific to POWER. 
> > Both s390 and POWER have instructions for changing the exponent of
> > (quantizing) a value in register A to match the exponent of a value in
> > register B. I think we should remove the Iops Iop_QuantizeID64 and
> > Iop_QuantizeID128 for specify an immediate exponent.  For the POWER
> > instruction that specifies the immediate exponent, I can just generate a
> > DFP number with the desired exponent and pass that as the value in
> > register B as the target exponent.  This will help minimize the number of
> > new Iops.
> 
> Ok by me.
> 
> 
> 
> > > >         IRRoundingMode(I32) x D64 x D64 -> D64
> > > >         Iop_SignificanceRoundD64
> > > >         
> > > >         IRRoundingMode(I32) x D128 x D128 -> D128
> > > >         Iop_SignificanceRoundD128
> > > >         
> > > >         
> > > >         EXTRACT AND INSERT INSTRUCTIONS
> > > >         -------------------------------
> > > >         D64 -> I64
> > > >         Iop_ExtractExpD64
> > > >         
> > > >         D128 -> I64
> > > >         Iop_ExtractExpD128
> > > 
> > > The exponent really needs 64 bits?  Can it be 32 bits?  That
> > > might allow for more efficient code generation for 32 bit targets.
> > 
> > The biased exponent could be stored as a 32 bit biased integer.  The
> > functionality of the specific instruction that this Iop was intended for
> > specifies the exponent will be stored in a floating point register in a
> > signed integer format.  I agree we can make the target be I32 and then
> > just handle the I32 to I64 conversion as needed for the specific
> > instruction.  The exponent for any DFP number will easily fit in less then
> > 31 bits.
> > 
> > I will change the above two Iops to return an I32 value.
> 
> Good.
> 
> 
> 
> 
> > 
> > Comment from Florian:
> > > Do we need to support these at all? In other words, does GCC issue these
> > > or do they show up in hand crafted assembler shipped with GCC/GLIBC?
> > > I don't know but will find out (for s390).
> > 
> > Yes, the libdfp and binutils use them.
> > 
> > > >         I64 x I64  -> D64
> > > >         Iop_InsertExpD64
> > > >         
> > > >         I64 x I128 -> D128
> > > >         Iop_InsertExpD128
> > > 
> > > ditto comment
> > 
> > ditto above that we do need these Iops for instructions that are used.
> 
> So (unclear) as with ExtractExp, you'll change the exp value to be an I32
> instead of an I64, yes?

Sorry, missed that.  Yes, we should change this to be consistent with
the ExtractExp.  So, it will be:

         I32 x I64  -> D64
         Iop_InsertExpD64

         I32 x I128 -> D128
         Iop_InsertExpD128

where the I32 is the exponent and the I64/128 is the significand.

> > > >         SHIFT SIGNIFICAND INSTRUCTIONS
> > > >         -------------------------------
> > > >         U16 x D64 -> D64
> > > >         Iop_ShlD64, Iop_ShrD64
> > > >         
> > > >         U16 x D128 -> D128
> > > >         Iop_ShlD128, Iop_ShrD128
> > > 
> > > two things: (1) does the shift amount need to be 16 bits?
> > > For all the other shifting style ops we have, the shift amount
> > > is encoded in 8 bits (Ity_I8) and I would prefer to stick with
> > > that for consistency, if possible.  (2) pls put the shift amount
> > > as the second argument, not the first, as that too is consistent
> > > with all other shift ops we have (eg, Iop_Shr64)
> > 
> > OK, changed shift to U8 and made it D64 x U8 -> D64; D128 x U8 -> D128
> 
> Good.
> 
> 
> 
> > > ---------------
> > > 
> > > >         This section give the detailed mapping of Power and s390
> > > > 
> > > > instructions to the proposed DFP Iops or how they would be implemented
> > > > with the existing Iops and the proposed DFP Iops.
> > > 
> > > I'll comment on this second half of the proposal tomorrow.
> > > 
> > > J
> > 
> > I have not seen any additional comments.  It is probably best to focus on
> > resolving the above issues first as the second part is just a more detailed
> > discussion of the above summary of the Iops.  Once we have agreement on the
> > above discussion points, I will update and post version 2 of the proposal.
> 
> Yes, good plan.
> 
> J