1 | This directory contains source for a library of binary -> decimal |
---|
2 | and decimal -> binary conversion routines, for single-, double-, |
---|
3 | and extended-precision IEEE binary floating-point arithmetic, and |
---|
4 | other IEEE-like binary floating-point, including "double double", |
---|
5 | as in |
---|
6 | |
---|
7 | T. J. Dekker, "A Floating-Point Technique for Extending the |
---|
8 | Available Precision", Numer. Math. 18 (1971), pp. 224-242 |
---|
9 | |
---|
10 | and |
---|
11 | |
---|
12 | "Inside Macintosh: PowerPC Numerics", Addison-Wesley, 1994 |
---|
13 | |
---|
14 | The conversion routines use double-precision floating-point arithmetic |
---|
15 | and, where necessary, high precision integer arithmetic. The routines |
---|
16 | are generalizations of the strtod and dtoa routines described in |
---|
17 | |
---|
18 | David M. Gay, "Correctly Rounded Binary-Decimal and |
---|
19 | Decimal-Binary Conversions", Numerical Analysis Manuscript |
---|
20 | No. 90-10, Bell Labs, Murray Hill, 1990; |
---|
21 | http://cm.bell-labs.com/cm/cs/what/ampl/REFS/rounding.ps.gz |
---|
22 | |
---|
23 | (based in part on papers by Clinger and Steele & White: see the |
---|
24 | references in the above paper). |
---|
25 | |
---|
26 | The present conversion routines should be able to use any of IEEE binary, |
---|
27 | VAX, or IBM-mainframe double-precision arithmetic internally, but I (dmg) |
---|
28 | have so far only had a chance to test them with IEEE double precision |
---|
29 | arithmetic. |
---|
30 | |
---|
31 | The core conversion routines are strtodg for decimal -> binary conversions |
---|
32 | and gdtoa for binary -> decimal conversions. These routines operate |
---|
33 | on arrays of unsigned 32-bit integers of type ULong, a signed 32-bit |
---|
34 | exponent of type Long, and arithmetic characteristics described in |
---|
35 | struct FPI; FPI, Long, and ULong are defined in gdtoa.h. File arith.h |
---|
36 | is supposed to provide #defines that cause gdtoa.h to define its |
---|
37 | types correctly. File arithchk.c is source for a program that |
---|
38 | generates a suitable arith.h on all systems where I've been able to |
---|
39 | test it. |
---|
40 | |
---|
41 | The core conversion routines are meant to be called by helper routines |
---|
42 | that know details of the particular binary arithmetic of interest and |
---|
43 | convert. The present directory provides helper routines for 5 variants |
---|
44 | of IEEE binary floating-point arithmetic, each indicated by one or |
---|
45 | two letters: |
---|
46 | |
---|
47 | f IEEE single precision |
---|
48 | d IEEE double precision |
---|
49 | x IEEE extended precision, as on Intel 80x87 |
---|
50 | and software emulations of Motorola 68xxx chips |
---|
51 | that do not pad the way the 68xxx does, but |
---|
52 | only store 80 bits |
---|
53 | xL IEEE extended precision, as on Motorola 68xxx chips |
---|
54 | Q quad precision, as on Sun Sparc chips |
---|
55 | dd double double, pairs of IEEE double numbers |
---|
56 | whose sum is the desired value |
---|
57 | |
---|
58 | For decimal -> binary conversions, there are three families of |
---|
59 | helper routines: one for round-nearest (or the current rounding |
---|
60 | mode on IEEE-arithmetic systems that provide the C99 fegetround() |
---|
61 | function, if compiled with -DHonor_FLT_ROUNDS): |
---|
62 | |
---|
63 | strtof |
---|
64 | strtod |
---|
65 | strtodd |
---|
66 | strtopd |
---|
67 | strtopf |
---|
68 | strtopx |
---|
69 | strtopxL |
---|
70 | strtopQ |
---|
71 | |
---|
72 | one with rounding direction specified: |
---|
73 | |
---|
74 | strtorf |
---|
75 | strtord |
---|
76 | strtordd |
---|
77 | strtorx |
---|
78 | strtorxL |
---|
79 | strtorQ |
---|
80 | |
---|
81 | and one for computing an interval (at most one bit wide) that contains |
---|
82 | the decimal number: |
---|
83 | |
---|
84 | strtoIf |
---|
85 | strtoId |
---|
86 | strtoIdd |
---|
87 | strtoIx |
---|
88 | strtoIxL |
---|
89 | strtoIQ |
---|
90 | |
---|
91 | The latter call strtoIg, which makes one call on strtodg and adjusts |
---|
92 | the result to provide the desired interval. On systems where native |
---|
93 | arithmetic can easily make one-ulp adjustments on values in the |
---|
94 | desired floating-point format, it might be more efficient to use the |
---|
95 | native arithmetic. Routine strtodI is a variant of strtoId that |
---|
96 | illustrates one way to do this for IEEE binary double-precision |
---|
97 | arithmetic -- but whether this is more efficient remains to be seen. |
---|
98 | |
---|
99 | Functions strtod and strtof have "natural" return types, float and |
---|
100 | double -- strtod is specified by the C standard, and strtof appears |
---|
101 | in the stdlib.h of some systems, such as (at least some) Linux systems. |
---|
102 | The other functions write their results to their final argument(s): |
---|
103 | to the final two argument for the strtoI... (interval) functions, |
---|
104 | and to the final argument for the others (strtop... and strtor...). |
---|
105 | Where possible, these arguments have "natural" return types (double* |
---|
106 | or float*), to permit at least some type checking. In reality, they |
---|
107 | are viewed as arrays of ULong (or, for the "x" functions, UShort) |
---|
108 | values. On systems where long double is the appropriate type, one can |
---|
109 | pass long double* final argument(s) to these routines. The int value |
---|
110 | that these routines return is the return value from the call they make |
---|
111 | on strtodg; see the enum of possible return values in gdtoa.h. |
---|
112 | |
---|
113 | Source files g_ddfmt.c, misc.c, smisc.c, strtod.c, strtodg.c, and ulp.c |
---|
114 | should use true IEEE double arithmetic (not, e.g., double extended), |
---|
115 | at least for storing (and viewing the bits of) the variables declared |
---|
116 | "double" within them. |
---|
117 | |
---|
118 | One detail indicated in struct FPI is whether the target binary |
---|
119 | arithmetic departs from the IEEE standard by flushing denormalized |
---|
120 | numbers to 0. On systems that do this, the helper routines for |
---|
121 | conversion to double-double format (when compiled with |
---|
122 | Sudden_Underflow #defined) penalize the bottom of the exponent |
---|
123 | range so that they return a nonzero result only when the least |
---|
124 | significant bit of the less significant member of the pair of |
---|
125 | double values returned can be expressed as a normalized double |
---|
126 | value. An alternative would be to drop to 53-bit precision near |
---|
127 | the bottom of the exponent range. To get correct rounding, this |
---|
128 | would (in general) require two calls on strtodg (one specifying |
---|
129 | 126-bit arithmetic, then, if necessary, one specifying 53-bit |
---|
130 | arithmetic). |
---|
131 | |
---|
132 | By default, the core routine strtodg and strtod set errno to ERANGE |
---|
133 | if the result overflows to +Infinity or underflows to 0. Compile |
---|
134 | these routines with NO_ERRNO #defined to inhibit errno assignments. |
---|
135 | |
---|
136 | Routine strtod is based on netlib's "dtoa.c from fp", and |
---|
137 | (f = strtod(s,se)) is more efficient for some conversions than, say, |
---|
138 | strtord(s,se,1,&f). Parts of strtod require true IEEE double |
---|
139 | arithmetic with the default rounding mode (round-to-nearest) and, on |
---|
140 | systems with IEEE extended-precision registers, double-precision |
---|
141 | (53-bit) rounding precision. If the machine uses (the equivalent of) |
---|
142 | Intel 80x87 arithmetic, the call |
---|
143 | _control87(PC_53, MCW_PC); |
---|
144 | does this with many compilers. Whether this or another call is |
---|
145 | appropriate depends on the compiler; for this to work, it may be |
---|
146 | necessary to #include "float.h" or another system-dependent header |
---|
147 | file. |
---|
148 | |
---|
149 | Source file strtodnrp.c gives a strtod that does not require 53-bit |
---|
150 | rounding precision on systems (such as Intel IA32 systems) that may |
---|
151 | suffer double rounding due to use of extended-precision registers. |
---|
152 | For some conversions this variant of strtod is less efficient than the |
---|
153 | one in strtod.c when the latter is run with 53-bit rounding precision. |
---|
154 | |
---|
155 | The values that the strto* routines return for NaNs are determined by |
---|
156 | gd_qnan.h, which the makefile generates by running the program whose |
---|
157 | source is qnan.c. Note that the rules for distinguishing signaling |
---|
158 | from quiet NaNs are system-dependent. For cross-compilation, you need |
---|
159 | to determine arith.h and gd_qnan.h suitably, e.g., using the |
---|
160 | arithmetic of the target machine. |
---|
161 | |
---|
162 | C99's hexadecimal floating-point constants are recognized by the |
---|
163 | strto* routines (but this feature has not yet been heavily tested). |
---|
164 | Compiling with NO_HEX_FP #defined disables this feature. |
---|
165 | |
---|
166 | When compiled with -DINFNAN_CHECK, the strto* routines recognize C99's |
---|
167 | NaN and Infinity syntax. Moreover, unless No_Hex_NaN is #defined, the |
---|
168 | strto* routines also recognize C99's NaN(...) syntax: they accept |
---|
169 | (case insensitively) strings of the form NaN(x), where x is a string |
---|
170 | of hexadecimal digits and spaces; if there is only one string of |
---|
171 | hexadecimal digits, it is taken for the fraction bits of the resulting |
---|
172 | NaN; if there are two or more strings of hexadecimal digits, each |
---|
173 | string is assigned to the next available sequence of 32-bit words of |
---|
174 | fractions bits (starting with the most significant), right-aligned in |
---|
175 | each sequence. |
---|
176 | |
---|
177 | For binary -> decimal conversions, I've provided just one family |
---|
178 | of helper routines: |
---|
179 | |
---|
180 | g_ffmt |
---|
181 | g_dfmt |
---|
182 | g_ddfmt |
---|
183 | g_xfmt |
---|
184 | g_xLfmt |
---|
185 | g_Qfmt |
---|
186 | |
---|
187 | which do a "%g" style conversion either to a specified number of decimal |
---|
188 | places (if their ndig argument is positive), or to the shortest |
---|
189 | decimal string that rounds to the given binary floating-point value |
---|
190 | (if ndig <= 0). They write into a buffer supplied as an argument |
---|
191 | and return either a pointer to the end of the string (a null character) |
---|
192 | in the buffer, if the buffer was long enough, or 0. Other forms of |
---|
193 | conversion are easily done with the help of gdtoa(), such as %e or %f |
---|
194 | style and conversions with direction of rounding specified (so that, if |
---|
195 | desired, the decimal value is either >= or <= the binary value). |
---|
196 | On IEEE-arithmetic systems that provide the C99 fegetround() function, |
---|
197 | if compiled with -DHonor_FLT_ROUNDS, these routines honor the current |
---|
198 | rounding mode. |
---|
199 | |
---|
200 | For an example of more general conversions based on dtoa(), see |
---|
201 | netlib's "printf.c from ampl/solvers". |
---|
202 | |
---|
203 | For double-double -> decimal, g_ddfmt() assumes IEEE-like arithmetic |
---|
204 | of precision max(126, #bits(input)) bits, where #bits(input) is the |
---|
205 | number of mantissa bits needed to represent the sum of the two double |
---|
206 | values in the input. |
---|
207 | |
---|
208 | The makefile creates a library, gdtoa.a. To use the helper |
---|
209 | routines, a program only needs to include gdtoa.h. All the |
---|
210 | source files for gdtoa.a include a more extensive gdtoaimp.h; |
---|
211 | among other things, gdtoaimp.h has #defines that make "internal" |
---|
212 | names end in _D2A. To make a "system" library, one could modify |
---|
213 | these #defines to make the names start with __. |
---|
214 | |
---|
215 | Various comments about possible #defines appear in gdtoaimp.h, |
---|
216 | but for most purposes, arith.h should set suitable #defines. |
---|
217 | |
---|
218 | Systems with preemptive scheduling of multiple threads require some |
---|
219 | manual intervention. On such systems, it's necessary to compile |
---|
220 | dmisc.c, dtoa.c gdota.c, and misc.c with MULTIPLE_THREADS #defined, |
---|
221 | and to provide (or suitably #define) two locks, acquired by |
---|
222 | ACQUIRE_DTOA_LOCK(n) and freed by FREE_DTOA_LOCK(n) for n = 0 or 1. |
---|
223 | (The second lock, accessed in pow5mult, ensures lazy evaluation of |
---|
224 | only one copy of high powers of 5; omitting this lock would introduce |
---|
225 | a small probability of wasting memory, but would otherwise be harmless.) |
---|
226 | Routines that call dtoa or gdtoa directly must also invoke freedtoa(s) |
---|
227 | to free the value s returned by dtoa or gdtoa. It's OK to do so whether |
---|
228 | or not MULTIPLE_THREADS is #defined, and the helper g_*fmt routines |
---|
229 | listed above all do this indirectly (in gfmt_D2A(), which they all call). |
---|
230 | |
---|
231 | By default, there is a private pool of memory of length 2000 bytes |
---|
232 | for intermediate quantities, and MALLOC (see gdtoaimp.h) is called only |
---|
233 | if the private pool does not suffice. 2000 is large enough that MALLOC |
---|
234 | is called only under very unusual circumstances (decimal -> binary |
---|
235 | conversion of very long strings) for conversions to and from double |
---|
236 | precision. For systems with preemptively scheduled multiple threads |
---|
237 | or for conversions to extended or quad, it may be appropriate to |
---|
238 | #define PRIVATE_MEM nnnn, where nnnn is a suitable value > 2000. |
---|
239 | For extended and quad precisions, -DPRIVATE_MEM=20000 is probably |
---|
240 | plenty even for many digits at the ends of the exponent range. |
---|
241 | Use of the private pool avoids some overhead. |
---|
242 | |
---|
243 | Directory test provides some test routines. See its README. |
---|
244 | I've also tested this stuff (except double double conversions) |
---|
245 | with Vern Paxson's testbase program: see |
---|
246 | |
---|
247 | V. Paxson and W. Kahan, "A Program for Testing IEEE Binary-Decimal |
---|
248 | Conversion", manuscript, May 1991, |
---|
249 | ftp://ftp.ee.lbl.gov/testbase-report.ps.Z . |
---|
250 | |
---|
251 | (The same ftp directory has source for testbase.) |
---|
252 | |
---|
253 | Some system-dependent additions to CFLAGS in the makefile: |
---|
254 | |
---|
255 | HU-UX: -Aa -Ae |
---|
256 | OSF (DEC Unix): -ieee_with_no_inexact |
---|
257 | SunOS 4.1x: -DKR_headers -DBad_float_h |
---|
258 | |
---|
259 | If you want to put this stuff into a shared library and your |
---|
260 | operating system requires export lists for shared libraries, |
---|
261 | the following would be an appropriate export list: |
---|
262 | |
---|
263 | dtoa |
---|
264 | freedtoa |
---|
265 | g_Qfmt |
---|
266 | g_ddfmt |
---|
267 | g_dfmt |
---|
268 | g_ffmt |
---|
269 | g_xLfmt |
---|
270 | g_xfmt |
---|
271 | gdtoa |
---|
272 | strtoIQ |
---|
273 | strtoId |
---|
274 | strtoIdd |
---|
275 | strtoIf |
---|
276 | strtoIx |
---|
277 | strtoIxL |
---|
278 | strtod |
---|
279 | strtodI |
---|
280 | strtodg |
---|
281 | strtof |
---|
282 | strtopQ |
---|
283 | strtopd |
---|
284 | strtopdd |
---|
285 | strtopf |
---|
286 | strtopx |
---|
287 | strtopxL |
---|
288 | strtorQ |
---|
289 | strtord |
---|
290 | strtordd |
---|
291 | strtorf |
---|
292 | strtorx |
---|
293 | strtorxL |
---|
294 | |
---|
295 | When time permits, I (dmg) hope to write in more detail about the |
---|
296 | present conversion routines; for now, this README file must suffice. |
---|
297 | Meanwhile, if you wish to write helper functions for other kinds of |
---|
298 | IEEE-like arithmetic, some explanation of struct FPI and the bits |
---|
299 | array may be helpful. Both gdtoa and strtodg operate on a bits array |
---|
300 | described by FPI *fpi. The bits array is of type ULong, a 32-bit |
---|
301 | unsigned integer type. Floating-point numbers have fpi->nbits bits, |
---|
302 | with the least significant 32 bits in bits[0], the next 32 bits in |
---|
303 | bits[1], etc. These numbers are regarded as integers multiplied by |
---|
304 | 2^e (i.e., 2 to the power of the exponent e), where e is the second |
---|
305 | argument (be) to gdtoa and is stored in *exp by strtodg. The minimum |
---|
306 | and maximum exponent values fpi->emin and fpi->emax for normalized |
---|
307 | floating-point numbers reflect this arrangement. For example, the |
---|
308 | P754 standard for binary IEEE arithmetic specifies doubles as having |
---|
309 | 53 bits, with normalized values of the form 1.xxxxx... times 2^(b-1023), |
---|
310 | with 52 bits (the x's) and the biased exponent b represented explicitly; |
---|
311 | b is an unsigned integer in the range 1 <= b <= 2046 for normalized |
---|
312 | finite doubles, b = 0 for denormals, and b = 2047 for Infinities and NaNs. |
---|
313 | To turn an IEEE double into the representation used by strtodg and gdtoa, |
---|
314 | we multiply 1.xxxx... by 2^52 (to make it an integer) and reduce the |
---|
315 | exponent e = (b-1023) by 52: |
---|
316 | |
---|
317 | fpi->emin = 1 - 1023 - 52 |
---|
318 | fpi->emax = 1046 - 1023 - 52 |
---|
319 | |
---|
320 | In various wrappers for IEEE double, we actually write -53 + 1 rather |
---|
321 | than -52, to emphasize that there are 53 bits including one implicit bit. |
---|
322 | Field fpi->rounding indicates the desired rounding direction, with |
---|
323 | possible values |
---|
324 | FPI_Round_zero = toward 0, |
---|
325 | FPI_Round_near = unbiased rounding -- the IEEE default, |
---|
326 | FPI_Round_up = toward +Infinity, and |
---|
327 | FPI_Round_down = toward -Infinity |
---|
328 | given in gdtoa.h. |
---|
329 | |
---|
330 | Field fpi->sudden_underflow indicates whether strtodg should return |
---|
331 | denormals or flush them to zero. Normal floating-point numbers have |
---|
332 | bit fpi->nbits in the bits array on. Denormals have it off, with |
---|
333 | exponent = fpi->emin. Strtodg provides distinct return values for normals |
---|
334 | and denormals; see gdtoa.h. |
---|
335 | |
---|
336 | Compiling g__fmt.c, strtod.c, and strtodg.c with -DUSE_LOCALE causes |
---|
337 | the decimal-point character to be taken from the current locale; otherwise |
---|
338 | it is '.'. |
---|
339 | |
---|
340 | Source files dtoa.c and strtod.c in this directory are derived from |
---|
341 | netlib's "dtoa.c from fp" and are meant to function equivalently. |
---|
342 | When compiled with Honor_FLT_ROUNDS #defined (on systems that provide |
---|
343 | FLT_ROUNDS and fegetround() as specified in the C99 standard), they |
---|
344 | honor the current rounding mode. Because FLT_ROUNDS is buggy on some |
---|
345 | (Linux) systems -- not reflecting calls on fesetround(), as the C99 |
---|
346 | standard says it should -- when Honor_FLT_ROUNDS is #defined, the |
---|
347 | current rounding mode is obtained from fegetround() rather than from |
---|
348 | FLT_ROUNDS, unless Trust_FLT_ROUNDS is also #defined. |
---|
349 | |
---|
350 | Compile with -DUSE_LOCALE to use the current locale; otherwise |
---|
351 | decimal points are assumed to be '.'. With -DUSE_LOCALE, unless |
---|
352 | you also compile with -DNO_LOCALE_CACHE, the details about the |
---|
353 | current "decimal point" character string are cached and assumed not |
---|
354 | to change during the program's execution. |
---|
355 | |
---|
356 | Please send comments to David M. Gay (dmg at acm dot org, with " at " |
---|
357 | changed at "@" and " dot " changed to "."). |
---|