|
From: Irek S. <isz...@gm...> - 2013-03-05 23:22:07
|
On Tue, Mar 5, 2013 at 11:56 PM, Philippe Waroquiers <phi...@sk...> wrote: > On Tue, 2013-03-05 at 18:54 +0100, Lionel Cons wrote: >> (1) in https://bugs.kde.org/show_bug.cgi?id=197915#c9 is a joke: >> >Julian Seward 2010-07-12 15:58:25 UTC >> > As per comment #0, adding support for 80-bit floats is low priority, because (1) AIUI the majority of floating point code is portable >> > and restricts itself to 64-bit values, >> >> The majority of _consumer_ software uses using double (aka 64bit >> float), but the majority of _scientific_ software (for example the >> whole NIH bioinformtics software stack or 99.9% of CERNs simulation >> software) is relying on long long double aka 80bit or 128bit floats >> (depending on platform, AMD64 uses 80bits). valgrind is useless for >> such software. > I am not too sure about the proportion of consumer software > versus scientific software. Assuming there is more consumer software, > the note (1) above is not such a joke at the end :). > > Reading the gcc manual, wouldn't it be a good idea to have the > scientific software to be rewritten (or at least compilable) > so as to use sse ? > gcc manual tells for `sse' > The resulting code should be considerably faster in the > majority of cases and avoid the numerical instability What does 'numerical instability' mean? Unless 387 has bugs I don't know about I think this refers to the general limitations of floating point being implemented using a fixed-width datatype - which wider-than-64bit-double datatypes can partially work around (at least the amount of work to compensate using algorithms like the Kahan summation can be greatly reduced, for example using a Kahan derivative for 64bit floating point requires a 370% overhead to compensate while using a 80bit floating point datatype reduces this to less than 120%). Remember SSE, together with MMX, fall into the category of multimedia accelerators (i.e. designed for throughput and low precision calculations) and are not designed for high precision calculations. This is not an option. Otherwise you have to compensate using the Kahan summation algorithm or similar, which puts the performance win ad absurdum. > problems of 387 code, but may break some existing code that > expects temporaries to be 80bit. It's not the temporaries which have to be 80bit. I think Lionel is referring to the fact that scientific software usually requires the maximum precision available. 64bit double usually will NOT work, at least not with extra work. > > This is the default choice for the x86-64 compiler. > > And as a bonus, you can run it under Valgrind on x86/amd64 :). > If this scientific code is fully portable, Where does the claim come from that this is more portable? If the platform doesn't have a long long double which is wider than double than this is very unfortunate and requires a LOT of extra calculations, but if the platform (like AMD64) supports a wider datatype then it is of course used at all costs. > then you could also decide > to run it under Valgrind e.g. on ppc32/ppc64 systems. > > Note: at my work, we are using Ada/gnat/gcc on x86/amd64. > The application code is compiled with sse. > We contemplated recompiling and/or changing the > Ada runtime to use sse only and fully avoid the 80 bits. > However, as at the end, we found very little impact (at least for > our apps) of running the 80 bits runtime on Valgrind, we have kept > the default gnat runtime (which uses 80 bits floats here and there). > As long as these 80 bits computation are "ok" if computation is > in reality done with 64 bits float, then not much impact. > YMMV. Uh. The impact here is DATA CORRUPTION, caused by doing calculations to be meant using 80bit datatypes being done with 64bit datatypes and the remaining bits filled with garbage. If this can't be solved or is not going to be solved then valgrind should ABORT instead of causing a silent corruption of data. Irek |
|
From: <pa...@fr...> - 2013-03-06 09:19:50
|
----- Original Message ----- [snip] I'll start with a few general comments. I've been working on engineering software for most of my career, and with very few exceptions, double precision is perfectly adequate (and the fastest). The real world (at least, the Intel/AMD part of it) is somewhat complicated by there being two floating point coprocessors, the venerable x87 and SSE. By default, 32bit compiles use x87 and 64bit compiles use SSE. SSE is fairly straightforward: if you use doubles, all calculations are done at 64bit precision. x87 uses 80bit for internal calculations and converts the final result to 64bit. GCC does have an option, -ffloat-store, that inhibits this behaviour (and decent performance!). But there is more. You can compile in mixed x87/sse (though I've found this to be somewhat error prone if you also play with the floating point control and status flags). Lastly, if you do compile with sse on 32bit, by default you will link with the x87 based libm part of the standard C library. Again, GCC has an option to link with an sse version of libm, -msselibm. What is the upshot of all this? If you compile on both 32bit and 64bit platforms without changing the default options, you will get different results. Now, let Valgrind enter the picture. All calculations are done at 64bit, so there should be little or no change on 64bit platforms using sse. However, 32bit platforms using x87 will probably change to give the same results obtained with 64bit and sse. > Uh. The impact here is DATA CORRUPTION, caused by doing calculations > to be meant using 80bit datatypes being done with 64bit datatypes and > the remaining bits filled with garbage. > > If this can't be solved or is not going to be solved then valgrind > should ABORT instead of causing a silent corruption of data. Is this really true? I would expect that Valgrind does all calculations at 64bit double precision and converts. This will result in truncation and/or under/overflow. My feeling is that unless the usually small numerical differences change your control flow, then just ignore the differences. The aim of testing with Valgrind isn't to validate numerical results, it is to validate memory use or performance or threading. A+ Paul |
|
From: Irek S. <isz...@gm...> - 2013-03-06 09:50:40
|
On Wed, Mar 6, 2013 at 10:19 AM, <pa...@fr...> wrote: > ----- Original Message ----- > > [snip] > > I'll start with a few general comments. I've been working on engineering software for most of my career, and with very few exceptions, double precision is perfectly adequate (and the fastest). > > The real world (at least, the Intel/AMD part of it) is somewhat complicated by there being two floating point coprocessors, the venerable x87 and SSE. By default, 32bit compiles use x87 and 64bit compiles use SSE. SSE is fairly straightforward: if you use doubles, all calculations are done at 64bit precision. x87 uses 80bit for internal calculations and converts the final result to 64bit. GCC does have an option, -ffloat-store, that inhibits this behaviour (and decent performance!). But there is more. You can compile in mixed x87/sse (though I've found this to be somewhat error prone if you also play with the floating point control and status flags). Lastly, if you do compile with sse on 32bit, by default you will link with the x87 based libm part of the standard C library. Again, GCC has an option to link with an sse version of libm, -msselibm. > > What is the upshot of all this? > > If you compile on both 32bit and 64bit platforms without changing the default options, you will get different results. > > Now, let Valgrind enter the picture. All calculations are done at 64bit, so there should be little or no change on 64bit platforms using sse. However, 32bit platforms using x87 will probably change to give the same results obtained with 64bit and sse. > >> Uh. The impact here is DATA CORRUPTION, caused by doing calculations >> to be meant using 80bit datatypes being done with 64bit datatypes and >> the remaining bits filled with garbage. >> >> If this can't be solved or is not going to be solved then valgrind >> should ABORT instead of causing a silent corruption of data. > > Is this really true? I would expect that Valgrind does all calculations at 64bit double precision and converts. This will result in truncation and/or under/overflow. > > My feeling is that unless the usually small numerical differences change your control flow, then just ignore the differences. The aim of testing with Valgrind isn't to validate numerical results, it is to validate memory use or performance or threading. Unfortunately the assessment that there are only "small changes" or "unimportant differences" is plain wrong. In complex calculations (i.e. no one line "hello sin() world" demo apps) which rely on the wider datatypes the lack of precision will cause major malfunctions and render the applications defunct. There are enough examples for this kind of problems (and enough complains about valgrind in the NIH lists). Irek |
|
From: John R. <jr...@bi...> - 2013-03-06 15:50:20
|
On 03/06/2013, Irek Szczesniak wrote: > Unfortunately the assessment that there are only "small changes" or > "unimportant differences" is plain wrong. In complex calculations > (i.e. no one line "hello sin() world" demo apps) which rely on the > wider datatypes the lack of precision will cause major malfunctions > and render the applications defunct. There are enough examples for > this kind of problems (and enough complains about valgrind in the NIH > lists). Discard any so-called "high-precision scientific software" which does not verify experimentally at run time the actual precision used for operations and operands. Such software is junk. Who would run an experiment without controls? Adequate software will diagnose the situation, emit a message which gives both the expected and the actually-encountered precision, and then exit without attempting computation in a degraded environment. Better software will perform the diagnosis, emit a message, and offer options such as: quit, proceed "full speed" at lower precision, or proceed but use the available precision to "emulate" higher precision. Now about ranting: please identify "the NIH lists". [valgrind-users] and [valgrind-developers] are world-wide forums. A reasonable guess is the National Institutes of Health of the United States of America, http://www.nih.gov, but finding "the NIH lists" requires further digging. [Lacking a citation, one can also read NIH as "Not Invented Here."] While I have encountered cases where more than 53 bits of precision truly are required, many many projects use wider precision as a crutch in order to avoid actual numerical analysis. Some of those projects still get junk results without knowing. Do the numerical analysis! I too would like memcheck to support 80-bit and 128-bit floating point. However, there are competing products which do so already. Purify from Rational Software is one, Insure++ from Parasoft may be another. Yes, they cost money; a license can cost less than the burdened cost of one laboratory assistant for one month. So, buy a license; and if necessary then share with co-workers or neighboring projects. -- |
|
From: <pa...@fr...> - 2013-03-06 17:10:29
|
----- Original Message ----- > Unfortunately the assessment that there are only "small changes" or > "unimportant differences" is plain wrong. In complex calculations > (i.e. no one line "hello sin() world" demo apps) which rely on the > wider datatypes the lack of precision will cause major malfunctions > and render the applications defunct. There are enough examples for > this kind of problems (and enough complains about valgrind in the NIH > lists). Hi There are two issues here. Emulation of long double and exact emulation of x87 double. Adding the first of those to Valgrind seems reasonable to me, but I would be most put out if Valgrind were made much slower by adding exact x87 emulation. Other than the 2 cases that you cite, can you name any other major uses of long double? As a somewhat non-scientific measure, I used google to search for 'double C++ -"long double"' and it had about 13 million hits [that's a search for double and C++ exclusing long double]. I then searched for '"long double" C++' and had about 300 thousand hits. I for one do not accept your assertion that most scientific and engineering software is written using long double precision. And to be a bit more precise 64bit double has 53bits of precision, or about 1.1e-16 ULP 80bit floating point has 64bits, or about 5.4e-20 ULP I've run many thousands of electronic circuit simulations (hardly a "demo app"; there is a small amount of code using long double) under Valgrind and my observation is that over 90% have no detectable difference, around 10% have insignificant differences and a handful have major differences. A+ Paul |
|
From: Julian S. <js...@ac...> - 2013-03-06 18:02:22
|
>> Unfortunately the assessment that there are only "small changes" or >> "unimportant differences" is plain wrong. [...] The 64-bit approximation was done to make the initial VEX implementation of x87 floating point easier. Simulating x87 FP even without having to do 80-bit arithmetic was a major PITA, and the 64-bit approximation scheme seemed a reasonable tradeoff for portable software. It would be possible, at some effort, to do proper 80-bit arithmetic. What it would take is a few new IR primops (Iop_AddF80 etc), hacking the front ends to produce those, and having the instruction selectors generate the right code. Oh, and producing a bunch of test cases to verify it all. Although the inner standards-compliant engineer in me is more sympathetic to the current implementation's bias in favour of portable (non-80-bit) code, I can see that having x87 FP be 80 bit precise would sometimes be helpful. Even then it's not simple. The x87 control word has a bit-pair that controls the default x87 FP precision. Currently V ignores all attempts to change it, and just does its thing at 64 bits. If 80-bit arithmetic becomes supported, and V ignores attempts to change the precision away from 80 bits, will we have people complaining that the default precision can't be changed away from 80 bit? J |
|
From: Dallman, J. <joh...@si...> - 2013-03-07 10:03:48
|
Julian Seward [mailto:js...@ac...] wrote: > Even then it's not simple. The x87 control word has a bit-pair that > controls the default x87 FP precision. Currently V ignores all attempts to > change it, and just does its thing at 64 bits. If 80-bit arithmetic > becomes supported, and V ignores attempts to change the precision away > from 80 bits, will we have people complaining that the default precision > can't be changed away from 80 bit? Yes. Loudly. If x87 is stuck at 80-bit evaluation, Valgrind will become significantly less useful to me. The reasons go like this: The x87 hardware powers up at its 80-bit evaluation setting, and Linux does not change this. This means that arithmetic done with ordinary doubles, not long doubles, in code that is trying quite hard to be portable, is evaluated to 80-bit precision while it stays in x87 registers, but rounded to 64-bit when saved back into memory. In any code in C or any higher level language that's the slightest bit complicated, this rounding happens at essentially random places, subject to change whenever the code is changed. Coding round this isn't really practical, if you have lots of floating-point code. This means that if I leave the x87 set to 80-bit evaluation, I get quite a lot of differences from platforms with normal 64-bit doubles. Saying "they're more correct" doesn't help, because the random rounding means that usually isn't true. If I set the x87 to 64-bit evaluation, then Linux produces very similar results to other platforms, and that matters a lot. So Valgrind's current approach matches what I need. But if x87 becomes always 80-bit, with no way to change back to 64-bit, then I'm going to have a problem. Because of the arithmetic differences, evaluation will take sometimes take different paths, which means different memory allocations, and so on. Valgrind execution becomes a lot less similar to real use, which makes memcheck testing significantly less useful. -- John Dallman ----------------- Siemens Industry Software Limited is a limited company registered in England and Wales. Registered number: 3476850. Registered office: Faraday House, Sir William Siemens Square, Frimley, Surrey, GU16 8QD. |
|
From: Roland M. <rol...@nr...> - 2013-03-06 16:43:05
|
On Wed, Mar 6, 2013 at 10:19 AM, <pa...@fr...> wrote:
[snip]
> Is this really true? I would expect that Valgrind does all calculations at 64bit double precision and converts. This will result in truncation and/or under/overflow.
>
> My feeling is that unless the usually small numerical differences change your control flow, then just ignore the differences. The aim of testing with Valgrind isn't to validate numerical results, it is to validate memory use or performance or threading.
Erm... the problem is that this can cause major issues in numerical
simulations... ;-((
Below is a small example (yes yes... I know it's not 100% portable...
it should use at least |LDBL_DECIMAL_DIG| ... but I'm not going to
twist my mind for that unless there is demand for it) for AMD64/64bit:
-- snip --
#include <math.h>
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
int main(int ac, char *av[])
{
long double i;
unsigned long long numiterations=0ULL;
puts("# start.");
for(i=1.L ; i < 1.00000000001L ; i=nextafterl(i, 5.L))
{
numiterations++;
}
printf("# done (after %llu/0x%llx iterations).\n",
numiterations, numiterations);
return EXIT_SUCCESS;
}
-- snip --
(this is more or less reduced from real-world simulation code which
uses |nextafterl()|&co. to iterate over some variations of the input
values to weed-out stuff like singularities etc.)
The expected output looks like this:
-- snip --
# start.
# done (after 92233720/0x57f5ff8 iterations)
-- snip --
On valgrind it just hangs (e.g. if you pass in a |double| casted to
|long double| into |nextafterl()|'s first argument and then cast the
|long double| result to |double| then the result is usually identical
to the input value... as result the algorithm is stuck in an endless
loop).
----
Bye,
Roland
--
__ . . __
(o.\ \/ /.o) rol...@nr...
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
|
|
From: John R. <jr...@bi...> - 2013-03-06 17:27:21
|
> long double i; > for(i=1.L ; i < 1.00000000001L ; i=nextafterl(i, 5.L)) > On valgrind it just hangs (e.g. if you pass in a |double| casted to > |long double| into |nextafterl()|'s first argument and then cast the > |long double| result to |double| then the result is usually identical > to the input value... as result the algorithm is stuck in an endless > loop). Noticing the unexpected i == nextafter(i, ...) is one of the checks performed by higher-quality software when verifying the execution environment before launching into lengthy computations that depend on that environment. -- |
|
From: Roland M. <rol...@nr...> - 2013-03-06 17:36:14
|
On Wed, Mar 6, 2013 at 6:28 PM, John Reiser <jr...@bi...> wrote:
>> long double i;
>
>> for(i=1.L ; i < 1.00000000001L ; i=nextafterl(i, 5.L))
>
>> On valgrind it just hangs (e.g. if you pass in a |double| casted to
>> |long double| into |nextafterl()|'s first argument and then cast the
>> |long double| result to |double| then the result is usually identical
>> to the input value... as result the algorithm is stuck in an endless
>> loop).
>
> Noticing the unexpected i == nextafter(i, ...) is one of the checks
> performed by higher-quality software when verifying the execution environment
> before launching into lengthy computations that depend on that environment.
Huh ? My example never used "==" anywhere (first guess is that the
mail body encoding used "==" to represent a single '=' character) ...
and using '"==" to compare a floating-point value against another
value is likely not going to generate the results you may expect...
... which is more or less a deja-vu since we had that topic in the
AT&T AST/ksh93 development list:
---------- Forwarded message ----------
From: Roland Mainz <rol...@nr...>
Date: Fri, Feb 8, 2013 at 12:48 PM
Subject: [ast-users] Floating-point arithmetic, rounding and testing
for equality... / was: Re: Fwd: Floating point oddities with pow()?
Accuracy problem?
Cc: ast-users <ast...@re...>, ast...@re...
[snip]
> What we're facing here seems that different implementations of a libm
> function may result different results at the last digits, right? Is
> there a function to test how accurate the libm functions are, and is
> there a libm function which tests for equality in a specified
> precision?
Testing whether two _floating_point_-values are equal is a tricky
business (e.g. the C99/ksh93 operator "==" is usually of little use...
;-( ) because of the rounding errors or (AFAIK in your case when the
i387 version of IEEE 754-2008 floating-point values are used) extra
precision caused by the issue that |long double| is 80bit on x86/AMD64
but some machine instructions have extra bits available during
calculation (like the new FMA instructions vs. a "manual"
multiply–accumulate operation... or |pow()| vs. manual "power of" ...)
or the registers have more bits than they store in memory.
Another issue may be (or often is) the base10<---->base2 conversion
which causes rounding errors when strings are converted to IEEE
754-2008 floating-point values and back. As "fix" C99 ([1]) added the
printf "%a" format to represent floating-point values in a hexadecimal
floating-point representation which can be used (with your example) to
"visualise" the issue that the difference is usually off by one or two
bits at the trailing end.
Keeping the email short (due lack of time... I may try a more detailed
answer when I have time) ... the standards only define |isgreater()|,
|isgreaterequal()|, |islessequal()|, |islessgreater()|,
|isunordered()| but intentionally no |isequal()| because the same
operation using different implementations is usually unlikely to
result in _exactly_ the same result.
A quick&&dirty implementation for a |isequal()| (maybe a better term
may be |isnearequal()|) may look like this (ksh93 syntax):
-- snip --
# test whether values in variables "a" and "b" are
# "equal" with a precision of 0.000001
if (( fabs(a - b) < 0.000001 )) ; then
-- snip --
(yes yes... I'm using the wrong mathematical terms (e.g. "precision" ?
"resolution" ? "distance" ?) in this case... please correct me... ;-(
)
[1]=ksh93 supports C99 "hexfloat" via it's printf builtin (note that
you have to use $ float varname ; ... ; printf "%a\n" varname # and
NOT $ float varname ; ... ; printf "%a\n" ${varname} # because the
${varname} means that the internal IEEE 754-2008 floating-point value
is converted to a base10 floating-point string value first and THEN
passed to the printf builtin utility) and via $ typeset -lX varname #
----
Bye,
Roland
--
__ . . __
(o.\ \/ /.o) rol...@nr...
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
_______________________________________________
ast-users mailing list
ast...@li...
http://lists.research.att.com/mailman/listinfo/ast-users
--
__ . . __
(o.\ \/ /.o) rol...@nr...
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
--
__ . . __
(o.\ \/ /.o) rol...@nr...
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
|