From: Daniel J Sebald <daniel.sebald@ie...>  20070329 22:41:07

I've placed a patch on SourceForge that fixes a pathological example of range setting involving NaN. Yes, I know, no big deal. But, I've changed things to remove a routine for which Hans put forth the following question: /* {{{ dbl_raise() used by quantize_normal_tics */ /* FIXME HBB 20000426: is this really useful? */ static double dbl_raise(double x, int y) The issue in dbl_raise is a multiplying loop for raising a value to an integer power. I'm guessing the library function pow() does such a thing more efficiently in addition to handling the pathological NaN. So, the following mimics the behavior of dbl_raise(): exponent = floor(log(arg)/log(12.0)); power = floor(pow(12.0, abs(exponent)) + 0.5); if (exponent < 0) power = 1.0 / power; /* approx number of "duodecades" */ xnorm = arg / power; The line of logic is that we know 10^E where E is a positive integer is an integer value, therefore rounding to the nearest integer should remove any arithmetic precision error that pow() might introduce. That fixes the original issue I had in mind. And I've generated all.dem output in PostScript format. Everything agrees, except the dates and times inside the PS files. In some way it also removes the question that Hans raised in the code. I'm sure you can't recall that far back, Hans, but a question I have is just exactly was this approach supposed to achieve? I'm assuming it had something to do with nice tic range generation. But at the same time I wonder if there isn't a more straightforward computation. One thing I wonder about is the following. Say I modify the above code to be: /* order of magnitude of argument: */ exponent = floor(log(arg)/log(12.0)); power = floor(pow(12.0, abs(exponent)) + 0.5); /* approx number of "duodecades" */ if (exponent < 0) xnorm = arg * power; else xnorm = arg / power; Wouldn't you think that this is a better formulation in terms of precision? That is, we've a direct division in one case, and avoid a double division in another case. But if I run this version of gnuplot on all.dem, the result is differences other than just the times and dates. So which is the preferred result formulation? Dan 
From: Daniel J Sebald <daniel.sebald@ie...>  20070329 22:48:35

> But if I run this version of gnuplot on all.dem, the result is > differences other than just the times and dates. So which is the preferred > result formulation? Hold on a bit, I may have overlooked a small change... 
From: Daniel J Sebald <daniel.sebald@ie...>  20070329 22:59:52

Daniel J Sebald wrote: >>But if I run this version of gnuplot on all.dem, the result is >>differences other than just the times and dates. So which is the preferred >>result formulation? > > > Hold on a bit, I may have overlooked a small change... Yeah, changing the return value to if (exponent < 0) return (tics / power); else return (tics * power); results in the same exact PostScript results for all.dem. So there apparently is not consequence of if (exponent < 0) power = 1 / power; in terms of precision. Dan 
From: <HBB<roeker@t...>  20070331 11:32:15

Daniel J Sebald wrote: > The issue in dbl_raise is a multiplying loop for raising a value to an integer > power. I'm guessing the library function pow() does such a thing more > efficiently in addition to handling the pathological NaN. Maybe so, but at the same time, there's a solid probability that it is just too inaccurate to work. Tics generation is extremely sensitive to rounding error, mainly in extreme cases: very large or small arguments, arguments already slightly distrubed by earlier rounding errors. A while ago we had a longstanding bug where tic generation was broken just by turning on O2 in GCC, but only on some platforms. Suffice it to say that this is a kind of problem I'd rather not have to face again. > In some way it also removes the question that Hans raised in the code. I'm sure > you can't recall that far back, Hans, but a question I have is just exactly was > this approach supposed to achieve? Basically, it's that pow() isn't required to treat integer exponents specially, so it can easily destroy more precision than dbl_raise() in its existing form does. > One thing I wonder about is the following. Say I modify the above code to be: [...] > Wouldn't you think that this is a better formulation in terms of precision? > That is, we've a direct division in one case, and avoid a double division in > another case. But if I run this version of gnuplot on all.dem, the result is > differences other than just the times and dates. Well, you've just proved that it's not a better formulation, haven't you? 
From: Daniel J Sebald <daniel.sebald@ie...>  20070331 20:55:07

HansBernhard Bröker wrote: > Daniel J Sebald wrote: > > >>The issue in dbl_raise is a multiplying loop for raising a value to an integer >>power. I'm guessing the library function pow() does such a thing more >>efficiently in addition to handling the pathological NaN. > > > Maybe so, but at the same time, there's a solid probability that it is > just too inaccurate to work. > > Tics generation is extremely sensitive to rounding error, mainly in > extreme cases: very large or small arguments, arguments already slightly > distrubed by earlier rounding errors. > > A while ago we had a longstanding bug where tic generation was broken > just by turning on O2 in GCC, but only on some platforms. Suffice it > to say that this is a kind of problem I'd rather not have to face again. > > >>In some way it also removes the question that Hans raised in the code. I'm sure >>you can't recall that far back, Hans, but a question I have is just exactly was >>this approach supposed to achieve? > > > Basically, it's that pow() isn't required to treat integer exponents > specially, so it can easily destroy more precision than dbl_raise() in > its existing form does. Right. But the premise, I guess, is that for the application in question we're interested in raising an integer to an integer power, which results in an integer. So, we are free to round the pow() result to an integer and can feel fairly certain the result is the same as dbl_raise(). Your statement may be true for relatively small numbers. To confirm that for larger numbers, one would have to look at the pow() code. I mean, the way gnuplot is programmed, what if someone enters numbers on the order of 1e30? What's the precision of 10*10*...*10*10 as a floating point number? It's beyond the largest value the mantissa can hold so we know some precision is likely lost since the underlying math is binary floating point. Also, there may even be something about the existing code that is questionable, but probably the log10() function behaves nicely in this regard. This bit: floor(log10(arg)) We aren't guaranteed that will come out to be exactly the exponent desired. Say the argument is 1e15 for which we'd expect the resulting value to be 15. But, if by chance because of the way the particular log10() function is implemented in a library, it comes out to 14.9999992, the floor() would produce the incorrect result. We can't round in this case. This kind of precision issue probably works best if the number system is binary float. Then we could use frexp() and ldexp() and get the exponent directly and do highly accurates tests and checks. But I don't see any creative way of changing the problem into that realm. Maybe the best we could hope for is some kind of internal consistency between log10() and pow(), by which I mean it would be nice that if E = floor(log10(arg)) then pow(10,E) <= arg < pow(10,E+1) is always true. For what it is worth, this extra test: exponent = floor(log10(arg)); if (pow(10,exponent+1) <= arg) exponent++; does nothing to change the PostScript results. Dan 
From: <HBB<roeker@t...>  20070331 21:42:46

Daniel J Sebald wrote: > HansBernhard Bröker wrote: >> Daniel J Sebald wrote: >>> In some way it also removes the question that Hans raised in the >>> code. I'm sure you can't recall that far back, Hans, but a question >>> I have is just exactly was this approach supposed to achieve? >> Basically, it's that pow() isn't required to treat integer exponents >> specially, so it can easily destroy more precision than dbl_raise() in >> its existing form does. > Right. But the premise, I guess, is that for the application in > question we're interested in raising an integer to an integer power, > which results in an integer. No. We're interested in raising a double to an integer power, which results in a double. For simple cases, the input and output doubles will have integer values, but they're not integers by design. If we only had to worry about cases where the power is small enough for the result to be integer, a table of all 10 powersoften from 10^0 to 10^9 would suffice. But that's not the case. > Your statement may be true for relatively small numbers. To confirm > that for larger numbers, one would have to look at the pow() code. I > mean, the way gnuplot is programmed, what if someone enters numbers on > the order of 1e30? What's the precision of 10*10*...*10*10 as a floating > point number? As good as it can be. It's still likely to be better than that of the typical naive implementation of pow(x,y): exp(y*log(x)) For the fun of it, be sure to make some plots of this function: powdiff(base,power) = (base**power / exp(power*log(base)))  1.0 E.g. set samples 301 ; plot [0:60] powdiff(10,x) to see just how badly wrong this can go. > floor(log10(arg)) > > We aren't guaranteed that will come out to be exactly the exponent > desired. No, we're not. Which is why this result is used only as a guide, not as the single piece of information, by quantize_tics. There's a reason that there are cases of that switch outside the expected range of [2:20]. > Maybe the best we could hope for is some kind of internal consistency > between log10() and pow(), by which I mean it would be nice that if > > E = floor(log10(arg)) > > then > > pow(10,E) <= arg < pow(10,E+1) > > is always true. No, the best we can do is avoid having to rely on such assumptions. The code in quantize_tics() does that. 
From: Daniel J Sebald <daniel.sebald@ie...>  20070331 22:44:52

HansBernhard Bröker wrote: > Daniel J Sebald wrote: > >> HansBernhard Bröker wrote: >> >>> Daniel J Sebald wrote: > > >>>> In some way it also removes the question that Hans raised in the >>>> code. I'm sure you can't recall that far back, Hans, but a question >>>> I have is just exactly was this approach supposed to achieve? > > >>> Basically, it's that pow() isn't required to treat integer exponents >>> specially, so it can easily destroy more precision than dbl_raise() >>> in its existing form does. > > >> Right. But the premise, I guess, is that for the application in >> question we're interested in raising an integer to an integer power, >> which results in an integer. > > > No. We're interested in raising a double to an integer power, which > results in a double. For simple cases, the input and output doubles > will have integer values, but they're not integers by design. This equation double power = dbl_raise(10.0, floor(log10(arg))); from a purely mathematical standpoint, is raising an integer, 10, to an integer, floor(), power. Generally, that's a rational number. Now, looking more closely at dbl_raise(x,y), we have int i = abs(y); An integer raised to a positive integer power is an integer. (I forgot to clarify it is a positive integer power last time, maybe that's where the confusion is.) We are free to round the value and remove any finite math effects. > If we > only had to worry about cases where the power is small enough for the > result to be integer, a table of all 10 powersoften from 10^0 to 10^9 > would suffice. But that's not the case. [snip] > As good as it can be. It's still likely to be better than that of the > typical naive implementation of pow(x,y): If the ultimate result can't be represented with a binary float, that's a different matter. But I'm saying that isn't unique to pow() because neither can 10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10 be contained in binary float. It reaches a point along the way in the multiplication loop that the ability to represent the large integer is gone and each multiplication results in poorer and poorer resolution relative to the increasing product. Who knows? Perhaps the pow() function has better numerical behavior and ends up being more accurate in some circumstances. > For the fun of it, be sure to make some plots of this function: > > powdiff(base,power) = (base**power / exp(power*log(base)))  1.0 > > E.g. > > set samples 301 ; plot [0:60] powdiff(10,x) > > to see just how badly wrong this can go. > >> floor(log10(arg)) Interesting plot, and that is the issue at hand. >> >> We aren't guaranteed that will come out to be exactly the exponent >> desired. > > > No, we're not. Which is why this result is used only as a guide, not as > the single piece of information, by quantize_tics. There's a reason > that there are cases of that switch outside the expected range of [2:20]. > >> Maybe the best we could hope for is some kind of internal consistency >> between log10() and pow(), by which I mean it would be nice that if >> >> E = floor(log10(arg)) >> >> then >> >> pow(10,E) <= arg < pow(10,E+1) >> >> is always true. > > > No, the best we can do is avoid having to rely on such assumptions. The > code in quantize_tics() does that. You're looking at this in one level broader scope than I am. I'm just focused on getting xnorm to be accurate given whatever the value of arg is, i.e., the correct value of the exponent E rather than possibly being off by one... which is a little subjective in itself meaning that E being one less than the mathematically correct value won't result in overly bad number of tics anyway. Dan 
From: Daniel J Sebald <daniel.sebald@ie...>  20070407 22:58:31

Move the patch into CVS or dump it? (Discussing things and not resolving them, even for as trivial a patch as this, isn't productive.) Dan Daniel J Sebald wrote: > HansBernhard Bröker wrote: > >>Daniel J Sebald wrote: >> >> >>>HansBernhard Bröker wrote: >>> >>> >>>>Daniel J Sebald wrote: >> >> >>>>>In some way it also removes the question that Hans raised in the >>>>>code. I'm sure you can't recall that far back, Hans, but a question >>>>>I have is just exactly was this approach supposed to achieve? >> >> >>>>Basically, it's that pow() isn't required to treat integer exponents >>>>specially, so it can easily destroy more precision than dbl_raise() >>>>in its existing form does. >> >> >>>Right. But the premise, I guess, is that for the application in >>>question we're interested in raising an integer to an integer power, >>>which results in an integer. >> >> >>No. We're interested in raising a double to an integer power, which >>results in a double. For simple cases, the input and output doubles >>will have integer values, but they're not integers by design. > > > This equation > > double power = dbl_raise(10.0, floor(log10(arg))); > > from a purely mathematical standpoint, is raising an integer, 10, to an integer, > floor(), power. Generally, that's a rational number. Now, looking more closely > at dbl_raise(x,y), we have > > int i = abs(y); > > An integer raised to a positive integer power is an integer. (I forgot to > clarify it is a positive integer power last time, maybe that's where the > confusion is.) > > We are free to round the value and remove any finite math effects. > > > > If we > > only had to worry about cases where the power is small enough for the > > result to be integer, a table of all 10 powersoften from 10^0 to 10^9 > > would suffice. But that's not the case. > [snip] > > As good as it can be. It's still likely to be better than that of the > > typical naive implementation of pow(x,y): > > If the ultimate result can't be represented with a binary float, that's a > different matter. But I'm saying that isn't unique to pow() because neither can > > 10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10 > > be contained in binary float. It reaches a point along the way in the > multiplication loop that the ability to represent the large integer is gone and > each multiplication results in poorer and poorer resolution relative to the > increasing product. Who knows? Perhaps the pow() function has better numerical > behavior and ends up being more accurate in some circumstances. > > > >>For the fun of it, be sure to make some plots of this function: >> >> powdiff(base,power) = (base**power / exp(power*log(base)))  1.0 >> >>E.g. >> >> set samples 301 ; plot [0:60] powdiff(10,x) >> >>to see just how badly wrong this can go. >> >> >>>floor(log10(arg)) > > > Interesting plot, and that is the issue at hand. > > > >>>We aren't guaranteed that will come out to be exactly the exponent >>>desired. >> >> >>No, we're not. Which is why this result is used only as a guide, not as >>the single piece of information, by quantize_tics. There's a reason >>that there are cases of that switch outside the expected range of [2:20]. >> >> >>>Maybe the best we could hope for is some kind of internal consistency >>>between log10() and pow(), by which I mean it would be nice that if >>> >>> E = floor(log10(arg)) >>> >>>then >>> >>> pow(10,E) <= arg < pow(10,E+1) >>> >>>is always true. >> >> >>No, the best we can do is avoid having to rely on such assumptions. The >> code in quantize_tics() does that. > > > You're looking at this in one level broader scope than I am. I'm just focused > on getting xnorm to be accurate given whatever the value of arg is, i.e., the > correct value of the exponent E rather than possibly being off by one... which > is a little subjective in itself meaning that E being one less than the > mathematically correct value won't result in overly bad number of tics anyway. > > Dan > >  > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveysand earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > gnuplotbeta mailing list > gnuplotbeta@... > https://lists.sourceforge.net/lists/listinfo/gnuplotbeta >  Dan Sebald phone: 608 256 7718 email: daniel DOT sebald AT ieee DOT org URL: http://webpages DOT charter DOT net/dsebald/ 
From: Ethan A Merritt <merritt@u.washington.edu>  20070408 02:06:52

On Saturday 07 April 2007 15:58, Daniel J Sebald wrote: > Move the patch into CVS or dump it? (Discussing things and not resolving= them, > even for as trivial a patch as this, isn't productive.) Was there a real problem that this was supposed to solve? If not, drop it. Ethan >=20 > Dan >=20 >=20 > Daniel J Sebald wrote: > > HansBernhard Br=F6ker wrote: > >=20 > >>Daniel J Sebald wrote: > >> > >> > >>>HansBernhard Br=F6ker wrote: > >>> > >>> > >>>>Daniel J Sebald wrote: > >> > >> > >>>>>In some way it also removes the question that Hans raised in the=20 > >>>>>code. I'm sure you can't recall that far back, Hans, but a question= =20 > >>>>>I have is just exactly was this approach supposed to achieve? =20 > >> > >> > >>>>Basically, it's that pow() isn't required to treat integer exponents= =20 > >>>>specially, so it can easily destroy more precision than dbl_raise()=20 > >>>>in its existing form does. > >> > >> > >>>Right. But the premise, I guess, is that for the application in=20 > >>>question we're interested in raising an integer to an integer power,=20 > >>>which results in an integer. =20 > >> > >> > >>No. We're interested in raising a double to an integer power, which=20 > >>results in a double. For simple cases, the input and output doubles=20 > >>will have integer values, but they're not integers by design. > >=20 > >=20 > > This equation > >=20 > > double power =3D dbl_raise(10.0, floor(log10(arg))); > >=20 > > from a purely mathematical standpoint, is raising an integer, 10, to an= integer,=20 > > floor(), power. Generally, that's a rational number. Now, looking mor= e closely=20 > > at dbl_raise(x,y), we have > >=20 > > int i =3D abs(y); > >=20 > > An integer raised to a positive integer power is an integer. (I forgot= to=20 > > clarify it is a positive integer power last time, maybe that's where th= e=20 > > confusion is.) > >=20 > > We are free to round the value and remove any finite math effects. > >=20 > >=20 > > > If we > > > only had to worry about cases where the power is small enough for the > > > result to be integer, a table of all 10 powersoften from 10^0 to 1= 0^9 > > > would suffice. But that's not the case. > > [snip] > > > As good as it can be. It's still likely to be better than that of t= he > > > typical naive implementation of pow(x,y): > >=20 > > If the ultimate result can't be represented with a binary float, that's= a=20 > > different matter. But I'm saying that isn't unique to pow() because ne= ither can > >=20 > > 10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10*10= *10 > >=20 > > be contained in binary float. It reaches a point along the way in the= =20 > > multiplication loop that the ability to represent the large integer is = gone and=20 > > each multiplication results in poorer and poorer resolution relative to= the=20 > > increasing product. Who knows? Perhaps the pow() function has better = numerical=20 > > behavior and ends up being more accurate in some circumstances. > >=20 > >=20 > >=20 > >>For the fun of it, be sure to make some plots of this function: > >> > >> powdiff(base,power) =3D (base**power / exp(power*log(base)))  1.0 > >> > >>E.g. > >> > >> set samples 301 ; plot [0:60] powdiff(10,x) > >> > >>to see just how badly wrong this can go. > >> > >> > >>>floor(log10(arg)) > >=20 > >=20 > > Interesting plot, and that is the issue at hand. > >=20 > >=20 > >=20 > >>>We aren't guaranteed that will come out to be exactly the exponent=20 > >>>desired. =20 > >> > >> > >>No, we're not. Which is why this result is used only as a guide, not a= s=20 > >>the single piece of information, by quantize_tics. There's a reason=20 > >>that there are cases of that switch outside the expected range of [2:20= ]. > >> > >> > >>>Maybe the best we could hope for is some kind of internal consistency= =20 > >>>between log10() and pow(), by which I mean it would be nice that if > >>> > >>> E =3D floor(log10(arg)) > >>> > >>>then > >>> > >>> pow(10,E) <=3D arg < pow(10,E+1) > >>> > >>>is always true. =20 > >> > >> > >>No, the best we can do is avoid having to rely on such assumptions. Th= e=20 > >> code in quantize_tics() does that. > >=20 > >=20 > > You're looking at this in one level broader scope than I am. I'm just = focused=20 > > on getting xnorm to be accurate given whatever the value of arg is, i.e= =2E, the=20 > > correct value of the exponent E rather than possibly being off by one..= =2E which=20 > > is a little subjective in itself meaning that E being one less than the= =20 > > mathematically correct value won't result in overly bad number of tics = anyway. > >=20 > > Dan > >=20 > > = =2D > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share= your > > opinions on IT & business topics through brief surveysand earn cash > > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID= =3DDEVDEV > > _______________________________________________ > > gnuplotbeta mailing list > > gnuplotbeta@... > > https://lists.sourceforge.net/lists/listinfo/gnuplotbeta > >=20 >=20 >=20 =2D=20 Ethan A Merritt Biomolecular Structure Center University of Washington, Seattle 981957742 
From: Daniel J Sebald <daniel.sebald@ie...>  20070408 04:28:41

Ethan A Merritt wrote: > On Saturday 07 April 2007 15:58, Daniel J Sebald wrote: > >>Move the patch into CVS or dump it? (Discussing things and not resolving them, >>even for as trivial a patch as this, isn't productive.) > > > Was there a real problem that this was supposed to solve? > If not, drop it. Recap: The patch mimics dbl_raise() behavior, but uses an existing library function pow() that is probably more efficient than a multiplicative loop and handles the NaN value for which gnuplot appears to hang for five minutes. E.g., slightly cleaner code and handles NaN. Feel free to close the patch, but at the same time please remove /* FIXME HBB 20000426: is this really useful? */ which only invites people to investigate suggesting there is a problem. Dan 
From: Daniel J Sebald <daniel.sebald@ie...>  20070408 08:06:34

Ethan A Merritt wrote: > On Saturday 07 April 2007 21:28, you wrote: > >>Ethan A Merritt wrote: >> >>>On Saturday 07 April 2007 15:58, Daniel J Sebald wrote: >>> >>> >>>>Move the patch into CVS or dump it? (Discussing things and not resolving them, >>>>even for as trivial a patch as this, isn't productive.) >>> >>> >>>Was there a real problem that this was supposed to solve? >>>If not, drop it. >> >>Recap: The patch mimics dbl_raise() behavior, but uses an existing library >>function pow() that is probably more efficient > > > I was not paying close attention, but didn't HBB show that your > proposed "more efficient" alternative function could be off by > 15 orders of magnitude in the worst case? Well, I'm not sure. This might be an apples and oranges kind of thing. Here is the script HBB suggested: powdiff(base,power) = (base**power / exp(power*log(base)))  1.0 set samples 301 ; plot [0:60] powdiff(10,x) Note that this sampling contains both "integers" and "floats". What we are ultimately interested in our use of pow(,) or dbl_raise() is raising an integer value (speaking mathwise, not computerwise) to an integer power (again, mathwise), but that is getting off course. Let's pause to look at the behavior of gnuplot on some simple examples, say 10^10: gnuplot> print 10**10 1410065408 gnuplot> print 10**10.0 10000000000.0 gnuplot> print 10.0**10 10000000000.0 gnuplot> print exp(10*log(10)) 10000000000.0 pow(base,power) = base**power gnuplot> print pow(10,10) 1410065408 gnuplot> print pow(10,10.0) 10000000000.0 OK, so we see that gnuplot treats integers (computerwise) with integer math which is much more limiting in size compared to floating point math. But this is not how the innards of the pow(), floor(), etc. functions work: #include <math.h> double floor( double arg ); The documentation of these C functions might say "largest integer not greater than arg", but that means the math vernacular, not computer vernacular. That is why I say apples and oranges. Given gnuplot's behavior, can we come up with a comparison at the command line the reflects the behavior of C routines? I've followed the parsing to internal.c and see that ** operator does in fact use the C library pow(,) function. So we've got that. The expression exp(power*log(base)) loses accuracy, but that isn't really related to the use of pow() I've applied in the range application. So let's not look at that any further. Try this: powint(base,power) = power==0 ? 1.0 : base * powint(base,power  1) powdiff(base,power) = (base**power / powint(base,power))  1.0 set samples 61 ; plot [0:60] powdiff(10,x) You may see something different than what I'm seeing, but what I see is that near x=25 there appears to be some discrepancy. So let's pick a few points there: gnuplot> print 10.0**23  powint(10.0,23) 0.0 gnuplot> print 10.0**24  powint(10.0,24) 0.0 gnuplot> print 10.0**25  powint(10.0,25) 2147483648.0 gnuplot> print 10.0**26  powint(10.0,26) 17179869184.0 gnuplot> print 10.0**27  powint(10.0,27) 137438953472.0 gnuplot> print 10.0**28  powint(10.0,28) 0.0 gnuplot> print 10.0**29  powint(10.0,29) 0.0 So, something does become flaky here. Let's see if we can figure out where exactly the discrepancy is: gnuplot> print 10.0**23  1e23 0.0 gnuplot> print 10.0**24  1e24 0.0 gnuplot> print 10.0**25  1e25 0.0 gnuplot> print 10.0**26  1e26 0.0 gnuplot> print 10.0**27  1e27 0.0 gnuplot> print 10.0**28  1e28 0.0 gnuplot> print powint(10.0,23.0)  1e23 0.0 gnuplot> print powint(10.0,24.0)  1e24 0.0 gnuplot> print powint(10.0,25.0)  1e25 2147483648.0 gnuplot> print powint(10.0,26.0)  1e26 17179869184.0 gnuplot> print powint(10.0,27.0)  1e27 137438953472.0 gnuplot> print powint(10.0,28.0)  1e28 0.0 gnuplot> Now that' odd. It would seem that the power function is correct and multiplying isn't. Let's go one step further to confirm this: gnuplot> print 10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0 1e+25 gnuplot> print 10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0  1e25 2147483648.0 Very strange. Not concrete proof, but my conjecture about not knowing which is more accurate, outright multiplying via dbl_raise() vs. pow(), does seem a pertinent question. I was long winded, Ethan, but the answer to your question is "I don't think so." Dan 
From: Daniel J Sebald <daniel.sebald@ie...>  20070408 08:53:01

Daniel J Sebald wrote: > gnuplot> print > 10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0 >  1e25 > 2147483648.0 > > Very strange. Not concrete proof, but my conjecture about not knowing which is > more accurate, outright multiplying via dbl_raise() vs. pow(), does seem a > pertinent question. And this seems to be accurate: gnuplot> print 1e25/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10  1.0 0.0 Ummmm, so why is the division accurate while the multiplication above appears not to be? Is there something going on with the translation from exponential notation and the machine IEEE float representation somewhere around 10^25? Dan 
From: Ethan A Merritt <merritt@u.washington.edu>  20070408 16:50:38

On Sunday 08 April 2007 01:52, Daniel J Sebald wrote: > Daniel J Sebald wrote: > > > gnuplot> print > > 10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0*10.0 > >  1e25 > > 2147483648.0 > > And this seems to be accurate: > > gnuplot> print > 1e25/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10/10 >  1.0 > 0.0 > > Ummmm, so why is the division accurate while the multiplication above appears > not to be? That is exactly what you should expect. Sequential division reduces the inaccuracy of the original number by a factor of 10 at each step. Simple subtraction exposes the size of the original discrepancy. Note that 2147483648.0, although a large number in absolute terms, corresponds to an error in the 15th decimal place of the decimal representation of 10^25. More to the point, that corresponds to about 50 bits of precision. Since doubleprecision IEEE format only provides 52 bits of precision in the mantissa, this is within one bit of the expected rounding error. (And that one bit may be my mistake; I'm doing this on my fingers, as it were). Note also that if you care about this level of precision then you also must pay attention to the IEEE rounding mode. > Is there something going on with the translation from exponential > notation and the machine IEEE float representation somewhere around 10^25? Yes. That's where you hit the limit of the IEEE representation. To do better than that you'd have to switch to some other floating point representation. There are infiniteprecision math libraries available, but gnuplot isn't using one.  Ethan A Merritt 
From: Daniel J Sebald <daniel.sebald@ie...>  20070409 03:12:07

Ethan A Merritt wrote: > Note that 2147483648.0, although a large number in absolute terms, corresponds > to an error in the 15th decimal place of the decimal representation of 10^25. > More to the point, that corresponds to about 50 bits of precision. > Since doubleprecision IEEE format only provides 52 bits of precision in the > mantissa, this is within one bit of the expected rounding error. > (And that one bit may be my mistake; I'm doing this on my fingers, as it were). Right, the example powint(base,power) = power==0 ? 1.0 : base * powint(base,power  1) powdiff(base,power) = (base**power / powint(base,power))  1.0 set samples 61 ; plot [0:60] powdiff(10,x) suggests that. I guess my feeling is to not discount the pow() function off hand, unless someone has seen some awful implementations out there. It is a standard library function, you'd think designers would put effort into its accuracy and users would complain otherwise. Dan 
From: <HBB<roeker@t...>  20070409 21:05:23

Daniel J Sebald wrote: > Note that this sampling contains both "integers" and "floats". Let's try and use clear terminology. Those aren't integers, they're floats with integer values. Yes, there is a difference. > But this is not how the innards of the pow(), floor(), etc. functions work: Again, careful of the wording. There are two floor() functions you could be talking about: the gnuplot operator (standard.c:f_floor()), or the C standard library function. They behave differently, because they take arguments of different types. Our floor() takes actual integers or complex floatingpoint arguments, the C standard function takes doubles. > The documentation of these C functions might say "largest integer not greater > than arg", but that means the math vernacular, not computer vernacular. Which is why that's not what it says. What the only documentation we can rely on, i.e. the C Standard, says about floor is (C99 7.12.9.2p2): The floor functions compute the largest integer value not greater than x. The key word is "integer value". This is not an integer, but a floatingpoint number of integer value. > Given gnuplot's behavior, can we come up with a comparison at the command line > the reflects the behavior of C routines? I've followed the parsing to > internal.c and see that ** operator does in fact use the C library pow(,) > function. It's not quite as simple as that. f_power uses pow() (and some trigonometry) for floatingpoint arguments, but iterated multiplication for integer**integer. > Very strange. Not concrete proof, but my conjecture about not knowing which is > more accurate, outright multiplying via dbl_raise() vs. pow(), does seem a > pertinent question. Not really, because utmost accuracy is at most half the picture here. Trying to "make sure we get exactly the right result for xnorm" is a futile effort, because we a) don't need it exactly right, and b) we can't get it, anyway. The real problems aren't with pow() vs. dbl_raise()  pow() would just fail in a different way on the [NaN:NaN] range example that retriggered this discussion. They're with the overall result of quantize_{normalduodecimal}_tics(), for inputs that don't quite lie on the integer values the users think they see, because of rounding or sampling inaccuracies that took place before scaling even began. E.g. if the actual y range is [0:100+epsilon], how should the tic interval and autoextended range endpoint depend on the sign and magnitude of epsilon, at small values? And that's before we consider that we have exactly zero control over the quality of a given C compiler's math library, and thus their pow() function. dbl_raise() may not be the shiniest tool in the shed, but at least it's _our_ tool. Its behaviour may vary a good deal less from one platform to the next than that of pow(). 
From: Daniel J Sebald <daniel.sebald@ie...>  20070409 22:35:31

HansBernhard Bröker wrote: >> Given gnuplot's behavior, can we come up with a comparison at the >> command line the reflects the behavior of C routines? I've followed >> the parsing to internal.c and see that ** operator does in fact use >> the C library pow(,) function. > > > It's not quite as simple as that. f_power uses pow() (and some > trigonometry) for floatingpoint arguments, but iterated multiplication > for integer**integer. Hence why I used 10.0 in the plots. >> Very strange. Not concrete proof, but my conjecture about not knowing >> which is more accurate, outright multiplying via dbl_raise() vs. >> pow(), does seem a pertinent question. > > > Not really, because utmost accuracy is at most half the picture here. > Trying to "make sure we get exactly the right result for xnorm" is a > futile effort, because we a) don't need it exactly right, and b) we > can't get it, anyway. I agree. > The real problems aren't with pow() vs. dbl_raise()  pow() would just > fail in a different way on the [NaN:NaN] range example that retriggered > this discussion. In a different way, but also an acceptable way if one looks at the result, a blank plot. GIGO, not GIHFFM (garbage in, hang for five minutes). > They're with the overall result of > quantize_{normalduodecimal}_tics(), for inputs that don't quite lie on > the integer values the users think they see, because of rounding or > sampling inaccuracies that took place before scaling even began. E.g. > if the actual y range is [0:100+epsilon], how should the tic interval > and autoextended range endpoint depend on the sign and magnitude of > epsilon, at small values? This issue has come up before, as part of a bug report. And I think I have an acceptable solution for that situation as part of a patch, although the belief seems to be that this is impossible to address. > And that's before we consider that we have exactly zero control over the > quality of a given C compiler's math library, and thus their pow() > function. dbl_raise() may not be the shiniest tool in the shed, but at > least it's _our_ tool. Its behaviour may vary a good deal less from one > platform to the next than that of pow(). Assuming the way the compiler handles floating point multiply for large overflow numbers, as Ethan described, is more consistent than the pow() function, then yes. On the other hand, I'm not sure it is good practice to second guess library functions without evidence of a bug. Anyway, if we think that the dbl_raise() is the preferred route, all that need be done is remove the comment "FIXME, is this needed?" and give a short explanation why it is needed. Dan 
From: Ethan Merritt <merritt@u.washington.edu>  20070409 23:00:30

On Monday 09 April 2007 15:35, Daniel J Sebald wrote: > > The real problems aren't with pow() vs. dbl_raise()  pow() would just > > fail in a different way on the [NaN:NaN] range example that retriggered > > this discussion. > > In a different way, but also an acceptable way if one looks at the result, a > blank plot. GIGO, not GIHFFM (garbage in, hang for five minutes). Could we please decouple discussion of dbl_raise() from possible bugs in parsing 'set xrange'? If it bothers you that much to have xrange accept NaN on input, then please submit a patch to prevent it. > > And that's before we consider that we have exactly zero control over the > > quality of a given C compiler's math library, and thus their pow() > > function. dbl_raise() may not be the shiniest tool in the shed, but at > > least it's _our_ tool. Its behaviour may vary a good deal less from one > > platform to the next than that of pow(). > > Assuming the way the compiler handles floating point multiply for large overflow > numbers, as Ethan described, is more consistent than the pow() function, then > yes. That is *not* an overflow. "Overflow" is when you exceed the range of numbers that can be represented; i.e. not enough bits in the exponent. What you are seeing is a limitation of the precision; i.e. not enough bits in the mantissa. > On the other hand, I'm not sure it is good practice to second guess > library functions without evidence of a bug. It's not even a question of library functions. It's a consequence of the IEEE floating point format. Presumably you wouldn't see this on a, say, a VAX if you chose one of the native double precision representations rather than IEEE. They give you the option of spending more bits on the precision (mantissa) and fewer on the range (exponent). Of course then you'd hit true overflow sooner. You can't win.  Ethan A Merritt 
From: Daniel J Sebald <daniel.sebald@ie...>  20070409 23:20:49

Ethan Merritt wrote: > On Monday 09 April 2007 15:35, Daniel J Sebald wrote: > >>>The real problems aren't with pow() vs. dbl_raise()  pow() would just >>>fail in a different way on the [NaN:NaN] range example that retriggered >>>this discussion. >> >>In a different way, but also an acceptable way if one looks at the result, a >>blank plot. GIGO, not GIHFFM (garbage in, hang for five minutes). > > > Could we please decouple discussion of dbl_raise() from possible > bugs in parsing 'set xrange'? If it bothers you that much to have > xrange accept NaN on input, then please submit a patch to prevent it. What bothers me is a FIXME that is a red herring. Like I said, I would have probably just written a patch to disallow NaN in set.c. Of course, if like you say there are some implementation such as VAX that aren't IEEE standard then will checking for a value of 'NaN' even compile on VAX systems? >>>And that's before we consider that we have exactly zero control over the >>>quality of a given C compiler's math library, and thus their pow() >>>function. dbl_raise() may not be the shiniest tool in the shed, but at >>>least it's _our_ tool. Its behaviour may vary a good deal less from one >>>platform to the next than that of pow(). >> >>Assuming the way the compiler handles floating point multiply for large overflow >>numbers, as Ethan described, is more consistent than the pow() function, then >>yes. > > > That is *not* an overflow. > "Overflow" is when you exceed the range of numbers that can be represented; > i.e. not enough bits in the exponent. > What you are seeing is a limitation of the precision; i.e. not enough > bits in the mantissa. OK, mantissa overflow, i.e., precision, is what I meant. >>On the other hand, I'm not sure it is good practice to second guess >>library functions without evidence of a bug. > > > It's not even a question of library functions. It's a consequence of the > IEEE floating point format. Presumably you wouldn't see this on a, say, > a VAX if you chose one of the native double precision representations > rather than IEEE. They give you the option of spending more bits on the > precision (mantissa) and fewer on the range (exponent). > Of course then you'd hit true overflow sooner. You can't win. So this means dbl_raise() implementation is better than pow()? Dan 
From: Ethan Merritt <merritt@u.washington.edu>  20070409 23:44:54

On Monday 09 April 2007 16:20, Daniel J Sebald wrote: > > It's not even a question of library functions. It's a consequence of the > > IEEE floating point format. Presumably you wouldn't see this on a, say, > > a VAX if you chose one of the native double precision representations > > rather than IEEE. They give you the option of spending more bits on the > > precision (mantissa) and fewer on the range (exponent). > > Of course then you'd hit true overflow sooner. You can't win. > > So this means dbl_raise() implementation is better than pow()? I have no idea, because I can't for the life of me figure out what the issue is here. What problem are you trying to fix? If there's no problem, just forget the whole thing.  Ethan A Merritt 
From: Daniel J Sebald <daniel.sebald@ie...>  20070409 23:54:50

Ethan Merritt wrote: > I have no idea, because I can't for the life of me figure out what > the issue is here. What problem are you trying to fix? > If there's no problem, just forget the whole thing. Repeat: > Anyway, if we think that the dbl_raise() is the preferred route, all that need > be done is remove the comment "FIXME, is this needed?" and give a short > explanation why it is needed. This will prevent anyone in the future from looking at this and asking "What needs to be fixed?", i.e., the same question you are asking, "because I can't for the life of me figure out what the issue is here." That's the trap I fell in. Dan 
From: Ethan Merritt <merritt@u.washington.edu>  20070409 23:59:01

I take that to mean you agree that we have no basis on which to say that dbl_raise is "better" than pow. Therefore the comment is correct, and nothing needs to be changed. That's the end of it so far as I'm concerned. Ethan On Monday 09 April 2007 16:54, Daniel J Sebald wrote: > Ethan Merritt wrote: > > > I have no idea, because I can't for the life of me figure out what > > the issue is here. What problem are you trying to fix? > > If there's no problem, just forget the whole thing. > > Repeat: > > > Anyway, if we think that the dbl_raise() is the preferred route, all that need > > be done is remove the comment "FIXME, is this needed?" and give a short > > explanation why it is needed. > > This will prevent anyone in the future from looking at this and asking "What > needs to be fixed?", i.e., the same question you are asking, "because I can't > for the life of me figure out what the issue is here." That's the trap I fell in. > > Dan >  Ethan A Merritt Courier Deliveries: 1959 NE Pacific Dept of Biochemistry Health Sciences Building University of Washington  Seattle WA 981957742 
From: Daniel J Sebald <daniel.sebald@ie...>  20070410 00:07:05

Ethan Merritt wrote: > I take that to mean you agree that we have no basis on which to say > that dbl_raise is "better" than pow. Other than pow() handles NaN and dbl_raise() hangs for five minutes, right. > Therefore the comment is correct, and nothing needs to be changed. "is this really useful?" is a comment? Dan 