|
From: Maynard J. <may...@us...> - 2011-11-04 18:39:37
|
Hi, Julian, My coworker, Carl Love, has started looking into adding support for Decimal Floating Point (DFP) to Valgrind. The DFP feature was first made available for the PowerPC architecture with ISA 2.05 (POWER6), and was expanded with some new instructions in ISA 2.06 (POWER7). So there's now a total of 50 instructions in the DFP category in the latest PowerPC architecture. There are (AFICT) three different approaches we could take to implement this support: 1. Use existing PowerPC support 2. Define new Iops (hopefully could get by with something less than 50) 3. Use the Iex_CCall type of IRExpr to invoke a helper that executes the native DFP instruction Since you have indicated in the past a reluctance to add new Iops if there's a good alternative, I asked Carl to investigate option #1 first. But this option would be quite impractical due to the complexity of DFP, requiring a *LOT* of IR code -- so much IR, in fact, that before Carl had even completed implementing one instruction, his code blew up due to exceeding the limit of IR instructions (Assertion `instrs_in->arr_used <= 10000' failed)! Next, I asked Carl to look at the feasibility of using the Iex_CCall to a helper. This looks promising, but is non-standard and not as straightforward as implementing via an instruction-specific Iop. Since the s390 architecture also has a DFP feature, we got in touch with our counterparts in s390 (mainly Christian Borntraeger). They, too, have tentative plans to add support for their DFP to Valgrind. Unfortunately, there seems to be very little overlap between s390 and PowerPC DFP functionality. The s390 DFP feature employs just 10 instructions, while the PowerPC uses 50 (the difference presumably due to CISC vs RISC). So if we were to define new Iops for PowerPC, I don't think any of them would be useful for s390; thus, they would need to define their own set of Iops. Given the information above, what advice/recommendation can you give us on which approach to take? Thanks. -Maynard |
|
From: Christian B. <bor...@de...> - 2011-11-04 20:46:54
|
On 04/11/11 19:40, Maynard Johnson wrote: > Hi, Julian, > My coworker, Carl Love, has started looking into adding support for Decimal > Floating Point (DFP) to Valgrind. The DFP feature was first made available for > the PowerPC architecture with ISA 2.05 (POWER6), and was expanded with some new > instructions in ISA 2.06 (POWER7). So there's now a total of 50 instructions in > the DFP category in the latest PowerPC architecture. There are (AFICT) three > different approaches we could take to implement this support: > > 1. Use existing PowerPC support > 2. Define new Iops (hopefully could get by with something less than 50) > 3. Use the Iex_CCall type of IRExpr to invoke a helper that executes the > native DFP instruction > > Since you have indicated in the past a reluctance to add new Iops if there's a > good alternative, I asked Carl to investigate option #1 first. But this option > would be quite impractical due to the complexity of DFP, requiring a *LOT* of IR > code -- so much IR, in fact, that before Carl had even completed implementing > one instruction, his code blew up due to exceeding the limit of IR instructions > (Assertion `instrs_in->arr_used <= 10000' failed)! Agreed, emulating DFP with IR is definitely the wrong way. > Next, I asked Carl to look at the feasibility of using the Iex_CCall to a > helper. This looks promising, but is non-standard and not as straightforward as > implementing via an instruction-specific Iop. Yes, this would work but this is also slow and non-standard. > Since the s390 architecture also has a DFP feature, we got in touch with our > counterparts in s390 (mainly Christian Borntraeger). They, too, have tentative > plans to add support for their DFP to Valgrind. Unfortunately, there seems to > be very little overlap between s390 and PowerPC DFP functionality. The s390 DFP > feature employs just 10 instructions, while the PowerPC uses 50 (the difference > presumably due to CISC vs RISC). So if we were to define new Iops for PowerPC, > I don't think any of them would be useful for s390; thus, they would need to > define their own set of Iops. I think that adding Iops is the right way to do. I just realized that we pointed Maynard to the wrong chapter of the POP (only the support instructions) - sorry for that. From a first glance it looks like s390 has almost the same ops for long (64bit) and extended (128bit) as powerpc. So to me it looks like the powerpc and s390 Iops could be the same. Maynard, Carl can you check chapter 20 and verify my observation? Julian, please be aware that Intel is also planning to provide decimal floating point in the future. They will use a different representation: - binary integer significand field encodes the significand as a large binary integer between 0 and 10p−1. vs. - densely packed decimal significand field encodes decimal digits more directly. but I think this should not matter for IR at all. I have not seen the specs, but I am confident that we could reuse a lot of the Iops for Intel as well (like add,subtract, multiply, divide, compare, convert from/to signed/unsigned int....) Given the novelty of decimal floating point libraries a tool like valgrind would be really helpful. Christian |
|
From: John R. <jr...@bi...> - 2011-11-05 01:54:24
|
On 11/04/2011 01:46 PM, Christian Borntraeger wrote: [snip] > Given the novelty of decimal floating point libraries a tool like valgrind would > be really helpful. What level of support do you imagine? Memcheck supports binary floating point coarsely: if any bit of any input to a floating-point operation is Undefined, then all the bits of the output are Undefined. -- |
|
From: Christian B. <bor...@de...> - 2011-11-05 12:43:18
|
On 05/11/11 02:54, John Reiser wrote:
> On 11/04/2011 01:46 PM, Christian Borntraeger wrote:
> [snip]
>> Given the novelty of decimal floating point libraries a tool like valgrind would
>> be really helpful.
>
> What level of support do you imagine? Memcheck supports binary floating point
> coarsely: if any bit of any input to a floating-point operation is Undefined,
> then all the bits of the output are Undefined.
>
I think it should be identical to binary floating point,e.g. for DFP 64bit Multiply
we do the same as for BFP 64bit Multiply
cborntra@br96egxr:/space/valgrind$ svn diff
Index: memcheck/mc_translate.c
===================================================================
--- memcheck/mc_translate.c (revision 12031)
+++ memcheck/mc_translate.c (working copy)
@@ -2259,6 +2259,7 @@
case Iop_SubF64:
case Iop_SubF64r32:
case Iop_MulF64:
+ case Iop_MulD64:
case Iop_MulF64r32:
case Iop_DivF64:
case Iop_DivF64r32:
cborntra@br96egxr:/space/valgrind$ svn diff VEX/
Index: VEX/priv/ir_defs.c
===================================================================
--- VEX/priv/ir_defs.c (revision 2201)
+++ VEX/priv/ir_defs.c (working copy)
@@ -260,6 +260,7 @@
case Iop_AddF64: vex_printf("AddF64"); return;
case Iop_SubF64: vex_printf("SubF64"); return;
case Iop_MulF64: vex_printf("MulF64"); return;
+ case Iop_MulD64: vex_printf("MulD64"); return;
case Iop_DivF64: vex_printf("DivF64"); return;
case Iop_AddF64r32: vex_printf("AddF64r32"); return;
case Iop_SubF64r32: vex_printf("SubF64r32"); return;
@@ -2235,6 +2236,7 @@
case Iop_AddF64: case Iop_SubF64:
case Iop_MulF64: case Iop_DivF64:
+ case Iop_MulD64:
case Iop_AddF64r32: case Iop_SubF64r32:
case Iop_MulF64r32: case Iop_DivF64r32:
TERNARY(ity_RMode,Ity_F64,Ity_F64, Ity_F64);
Index: VEX/pub/libvex_ir.h
===================================================================
--- VEX/pub/libvex_ir.h (revision 2201)
+++ VEX/pub/libvex_ir.h (working copy)
@@ -1279,7 +1279,16 @@
/* Vector Reciprocal Estimate and Vector Reciprocal Square Root Estimate
See floating-point equiwalents for details. */
- Iop_Recip32x4, Iop_Rsqrte32x4
+ Iop_Recip32x4, Iop_Rsqrte32x4,
+
+ /* ------ Decimal Floating Point IEEE 754-2008 ------ */
+
+ /* Binary operations, with rounding. */
+ /* :: IRRoundingMode(I32) x F64 x F64 -> F64 */
+ Iop_AddD64, Iop_SubD64, Iop_MulD64, Iop_DivD64
+ /* tbd */
+
+
}
IROp;
|
|
From: Julian S. <js...@ac...> - 2011-11-07 08:25:30
|
Hi Maynard, all, > > different approaches we could take to implement this support: > > 1. Use existing PowerPC support > > 2. Define new Iops (hopefully could get by with something less than 50) > > 3. Use the Iex_CCall type of IRExpr to invoke a helper that executes > > the I suspected this day would come. To be clear, I'm not per se opposed to new Iops. It's just that we already have zillions of them, so I'm a little wary of adding en mass to them, especially considering it's difficult to get rid of them later if they should turn out to be the wrong thing / not well thought out / whatever. Clearly (1) is impractical and (3), well, that might be doable, but you lose the ability to do much useful analysis on the resulting IR: Memcheck's V-bit analysis will simply worst-case it, so there's no opportunity to do any more sophisticated analysis. Also, doing one function call per machine operation is going to be slow. So new Iops look unavoidable in this case. What I would ask is, can you + the s390 folks make an initial proposal of the new Iops you need, with names, types and a summary of behaviour. The aim would be to come up with a minimal but efficient set of Iops that will support DFP on both Power and s390. Then we can mash it around and see how it looks. Also, some indication of how this relates to the Intel DFP support that Christian mentioned, would be useful. J |
|
From: Julian S. <js...@ac...> - 2011-11-07 08:35:43
|
On Monday, November 07, 2011, Julian Seward wrote: > What I would ask is, can you + the s390 folks make an initial proposal > of the new Iops you need [...] One other thing that occurs to me is, would it be cleaner/necessary/useful to add new IR type(s) for DFP values, or are the existing IR types adequate? /me knows nothing about DFP, so I can't say. Are there any good tutorials out there that I can read? J |
|
From: Christian B. <bor...@de...> - 2011-11-07 08:53:19
|
On 07/11/11 09:30, Julian Seward wrote: > One other thing that occurs to me is, would it be cleaner/necessary/useful > to add new IR type(s) for DFP values, or are the existing IR types > adequate? /me knows nothing about DFP, so I can't say. Are there any > good tutorials out there that I can read? On s390 dfp ist done in the normal FP registers, only the instruction decides how the content is interpreted. There are even instructions like "load zero" (set register to 0.0) that are valid for all 3 types that s3900 supports (binary, decimal and hex floating point) Andreas, dont you have some charts for decimal floating point? PS: We will never do hex floating point for valgrind, since it is not used in Linux |
|
From: Florian K. <br...@ac...> - 2011-11-05 14:30:13
|
On 11/04/2011 04:46 PM, Christian Borntraeger wrote: > > Agreed, emulating DFP with IR is definitely the wrong way. > Definitely. >> Next, I asked Carl to look at the feasibility of using the Iex_CCall to a >> helper. This looks promising, but is non-standard and not as straightforward as >> implementing via an instruction-specific Iop. > > Yes, this would work but this is also slow and non-standard. > I would not be too worried about speed. Apps using DFP ops will be rare. But it would definitely be a deviation from the current modelling approach. > I think that adding Iops is the right way to do. > [...] > > Julian, please be aware that Intel is also planning to provide decimal > floating point in the future. They will use a different representation: > - binary integer significand field encodes the significand as a large binary integer between 0 and 10p−1. > vs. > - densely packed decimal significand field encodes decimal digits more directly. > but I think this should not matter for IR at all. It would matter. At least for VEX purists. Think about translating across architectures, e.g. s390x dfp to intel. I'm not sure whether we've given up on cross-translating nowadays but VEX was certainly originally designed to support it. And that implies capturing the semantics correctly. Florian |
|
From: Christian B. <bor...@de...> - 2011-11-05 22:56:39
|
>> - binary integer significand field encodes the significand as a large binary integer between 0 and 10p−1. >> vs. >> - densely packed decimal significand field encodes decimal digits more directly. >> but I think this should not matter for IR at all. > > It would matter. At least for VEX purists. Think about translating > across architectures, e.g. s390x dfp to intel. I'm not sure whether > we've given up on cross-translating nowadays but VEX was certainly > originally designed to support it. And that implies capturing the > semantics correctly. That why there are some helpers with inline assemblies? ;-) All joking aside, isnt that also true for binary floating point? 754 mandates the format, but IIRC the in memory representation can differ according to the endianess. |
|
From: Julian S. <js...@ac...> - 2011-11-07 08:12:09
|
On Saturday, November 05, 2011, Christian Borntraeger wrote: > That why there are some helpers with inline assemblies? ;-) Nothing to do with me ;-) > All joking aside, isnt that also true for binary floating point? > 754 mandates the format, but IIRC the in memory representation can > differ according to the endianess. FWIW (not a lot) IR loads and stores contain a bit (IREnd, iirc) which indicates the endianness of the transfer. Without that you can't record the semantics of the load/store properly. J |
|
From: Christian B. <bor...@de...> - 2011-11-07 08:26:00
|
On 07/11/11 09:07, Julian Seward wrote: > On Saturday, November 05, 2011, Christian Borntraeger wrote: > >> That why there are some helpers with inline assemblies? ;-) > > Nothing to do with me ;-) > >> All joking aside, isnt that also true for binary floating point? >> 754 mandates the format, but IIRC the in memory representation can >> differ according to the endianess. > > FWIW (not a lot) IR loads and stores contain a bit (IREnd, iirc) > which indicates the endianness of the transfer. Without that you > can't record the semantics of the load/store properly. So we could do another bit for DFP format? But more important, Julian, would it be ok for you if Carl and Maynard start to add Iops for DFP? Christian |
|
From: Maynard J. <may...@us...> - 2011-11-08 14:21:56
|
On 11/07/2011 2:20 AM, Julian Seward wrote: > > Hi Maynard, all, > >>> different approaches we could take to implement this support: >>> 1. Use existing PowerPC support >>> 2. Define new Iops (hopefully could get by with something less than 50) >>> 3. Use the Iex_CCall type of IRExpr to invoke a helper that executes >>> the > > I suspected this day would come. To be clear, I'm not per se > opposed to new Iops. It's just that we already have zillions of > them, so I'm a little wary of adding en mass to them, especially > considering it's difficult to get rid of them later if they should > turn out to be the wrong thing / not well thought out / whatever. > > Clearly (1) is impractical and (3), well, that might be doable, but > you lose the ability to do much useful analysis on the resulting IR: > Memcheck's V-bit analysis will simply worst-case it, so there's no > opportunity to do any more sophisticated analysis. Also, doing one > function call per machine operation is going to be slow. > > So new Iops look unavoidable in this case. > > What I would ask is, can you + the s390 folks make an initial proposal > of the new Iops you need, with names, types and a summary of behaviour. > The aim would be to come up with a minimal but efficient set of Iops > that will support DFP on both Power and s390. Then we can mash it around > and see how it looks. Also, some indication of how this relates to > the Intel DFP support that Christian mentioned, would be useful. Julian, Thanks for your response. Carl will develop a proposal for a set of common DFP Iops and have it reviewed by the s390 folks before sending to you and the list for comments. As for how PowerPC and System z DFP relates to Intel . . . to my knowledge, there are no Intel processors that have hardware support for DFP. Perhaps there's an Intel person who watches this mailing list and can comment if there's anything that's been made public that we've missed. -Maynard > > J > |
|
From: John R. <jr...@bi...> - 2011-11-08 16:50:04
|
> [snip] to my > knowledge, there are no Intel processors that have hardware support for DFP. The x87 FPU (about 30 years old) and all x86 CPUs beginning with Pentium (1993) have FBLD and FBSTP which load or store signed 18-digit packed BCD integers to or from internal binary floating point [which is compatible with IEEE 754-1985: sign, 15-bit biased exponent, 64-bit significand with implied leading '1'; various control and status bits.] In addition, all x86 CPUs before x86_64 have AAA, AAD, AAM, AAS (Ascii Adjust after Addition/Division/Multiplication/Subtraction) which facilitate BCD arithmetic. These are supported by CPU flag bit AF (AsciiFlag) which is the CarryOut from bit 3 [the bit with positional value (1<<3).] Although this is not directly the scheme specified by IEEE 754-2008, such hardware does support decimal floating point arithmetic several times faster than is possible in a software-only implementation. -- |