From: Leon N M. <leo...@gm...> - 2010-09-04 14:01:42
|
> 1. it seems the fs. routine presition itself is 7 digits which is > good > 2. it seems the f/ most notably with larger operands (eg. +/- 1e15) > introduces errors so the result's precision drops to 6 digits or > less Yes, the number of decimal points it prints should be related to the size of the number -- I just trusted an algorithm in the paper I linked to handle it correctly. 24 bit significands can hold about as much as 7 decimal digits, so that's about the most you should see. > 3. what is interesting the fs. conversion took seconds (atmega > @25MHz)e.g.: This is partially a mistake on my part -- there are a number of things like "10 s>f" in the code which I should really replace with "[ 10 s>f swap ] literal literal", or something like that so that they don't have to be calculated repeatedly at run time. I meant to do it before release but forgot -- I'll do it today. Even with that change, it still may be slow. -Leon |
From: pito <pi...@vo...> - 2010-09-04 15:58:52
|
I did replace all those conversions within .fs with constants: 1 s>f 2 s>f f/ fconstant _half 1 s>f fconstant _1 2 s>f fconstant _2 4 s>f fconstant _4 8 s>f fconstant _8 10 s>f fconstant _10 it is now 3x faster: > measure -3.1415851E-15 545 ms ok > P. |
From: pito <pi...@vo...> - 2010-09-04 16:49:48
|
Nooop, fs. still slow, even with fconstants: > measure 3.1415851E-15 545 ms ok > measure 3.1415915E15 1646 ms ok > measure -3.1415915E15 1646 ms ok > measure -3.1415851E-15 545 ms ok P. |
From: pito <pi...@vo...> - 2010-09-04 17:23:17
|
FYI: f+ f- f* f/ duration (@25MHz): > : measure oktimer-start ok400 0 do _pi fdup fdrop fdrop loop oktimer-stop 400 s>f f/ fs. ." sec"; ok > measure 2.6214392E-5 sec ok > : measure oktimer-start ok400 0 do _pi fdup f+ fdrop loop oktimer-stop 400 s>f f/ fs. ." sec"; ok > measure 1.12721896E-3 sec ok > : measure oktimer-start ok400 0 do _pi fdup f- fdrop loop oktimer-stop 400 s>f f/ fs. ." sec"; ok > measure 4.0108027E-3 sec ok > : measure oktimer-start ok400 0 do _pi fdup f* fdrop loop oktimer-stop 400 s>f f/ fs. ." sec"; ok > measure 4.6923761E-3 sec ok > : measure oktimer-start ok400 0 do _pi fdup f/ fdrop loop oktimer-stop 400 s>f f/ fs. ." sec"; ok > measure 1.33169138E-2 sec ok > |
From: Leon N M. <leo...@gm...> - 2010-09-04 17:53:31
|
The current method repeatedly divides (for large numbers) or multiplies (for small numbers) by 10 until the float is in the range [1,10). f/ is slow, which makes it slow to print large numbers. f* is faster because I saw a way to do the math with integers using m* -- I didn't see a similar good way to do this with division (partly since it's not left-distributive over addition -- a/(b+c) != a/b + a/c, which does hold for multiplication). That's the short reason for way the small numbers print out faster than the large numbers. I've got an idea or two to speed up fs. (e.g. can replace repeated division by repeated multiplication and one division), but f/ is what really needs optimization. I'm probably going to hold off on any of that until I get input working. -Leon >Saturday 04 September 2010 >From: "pito" <pi...@vo...> >Subject: Re: [Amforth-devel] printing floats > Nooop, fs. still slow, even with fconstants: > > measure > > 3.1415851E-15 > 545 ms ok > > > measure > > 3.1415915E15 > 1646 ms ok > > > measure > > -3.1415915E15 > 1646 ms ok > > > measure > > -3.1415851E-15 > 545 ms ok > > P. |
From: pito <pi...@vo...> - 2010-09-04 19:01:38
|
Leon, I did following - I put print"#" in f* and print"%" in f/: > 987654321. d>f _1e9 f* # ok > fs. %%%%%%%%%%%%%%%%%9.###8###7###6###5###3###6###4E17 ok > 987654321. d>f _1e-9 f* # ok > fs. #9.###8###7###6###5###3###2###6E-1 ok > 987654321. d>f _1e-9 f/ % ok > fs. %%%%%%%%%%%%%%%%%9.###8###7###6###5###4###5###9E17 ok > 987654321. d>f _1e-18 f/ % ok > fs. %%%%%%%%%%%%%%%%%%%%%%%%%%9.###8###7###6###5###5###0###7E26 ok > 987654321. d>f _1e-18 f* _1e-12 f* ## ok > fs. ######################9.###8###7###6###4###9###8E-22 ok > ok > 987654321. d>f _1e9 f* _1e12 f* ## ok > fs. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%9.###8###7###6###5###3###4###5E29 ok So when result with positive exponents E+XX the XX divisions are made. Therefore I changed the f/ (13ms) with f*(4ms) in the following routine: \ if it's too large, make it smaller begin fdup _10 f>= \ [ 10 s>f ] while _.1 f* \ _10 f/ \ [ 10 s>f ] <<<<<<<<< Here fnswap 1+ nfswap \ ." f>=" repeat So now: > 987654321. d>f _1e9 f* _1e12 f* ## ok > fs. #############################9.###8###7###6###5###1###2###5E29 ok > 987654321. d>f _1e-18 f* _1e-12 f* ## ok > fs. ######################9.###8###7###6###4###9###8E-22 ok > So now the conversion with positive exponens (previous f/) i 3x faster: > : measure oktimer-start _1 _pi f* _1e15 f* fs. timer-stop fs. ." sec"; ok > measure 3.1415874:E15 4.9283066E-1 sec ok > : measure oktimer-start _1 _pi f* _1e-15 f* fs. timer-stop fs. ." sec"; ok > measure 3.1415851E-15 5.4525948E-1 sec ok > : measure oktimer-start _-1 _pi f* _1e15 f* fs. timer-stop fs. ." sec"; ok > measure -3.1415874:E15 4.9283066E-1 sec ok > : measure oktimer-start _-1 _pi f* _1e-15 f* fs. timer-stop fs. ." sec"; ok > measure -3.1415851E-15 5.4525948E-1 sec ok > So all fs. are now ~0.5sec. I do not know why the colon in the number (:E15), but probably an induced bug (:-)). Pito |
From: Leon N. M. <leo...@gm...> - 2010-09-04 19:13:15
|
I've avoided division by reciprocal multiplication because it introduces rounding errors (.1 in decimal is infinitely repeating in binary). That said, we already have some rounding errors, so it may be worth doing -- although these new rounding errors do look slightly more severe. I'll think about it -- the test you can definitely gives me some good information. -Leon On Saturday, September 04, 2010 02:01:30 pm you wrote: > Leon, I did following - I put print"#" in f* and print"%" in f/: > > 987654321. d>f _1e9 f* > > # ok > > > fs. > > %%%%%%%%%%%%%%%%%9.###8###7###6###5###3###6###4E17 ok > > > 987654321. d>f _1e-9 f* > > # ok > > > fs. > > #9.###8###7###6###5###3###2###6E-1 ok > > > 987654321. d>f _1e-9 f/ > > % ok > > > fs. > > %%%%%%%%%%%%%%%%%9.###8###7###6###5###4###5###9E17 ok > > > 987654321. d>f _1e-18 f/ > > % ok > > > fs. > > %%%%%%%%%%%%%%%%%%%%%%%%%%9.###8###7###6###5###5###0###7E26 ok > > > 987654321. d>f _1e-18 f* _1e-12 f* > > ## ok > > > fs. > > ######################9.###8###7###6###4###9###8E-22 ok > > ok > > > 987654321. d>f _1e9 f* _1e12 f* > > ## ok > > > fs. > > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%9.###8###7###6###5###3###4###5E29 ok > > So when result with positive exponents E+XX the XX divisions are > made. Therefore I changed the f/ (13ms) with f*(4ms) in the > following routine: > > \ if it's too large, make it smaller > begin > fdup _10 f>= \ [ 10 s>f ] > while > _.1 f* \ _10 f/ \ [ 10 s>f ] <<<<<<<<< Here > fnswap 1+ nfswap > \ ." f>=" > repeat > > So now: > > 987654321. d>f _1e9 f* _1e12 f* > > ## ok > > > fs. > > #############################9.###8###7###6###5###1###2###5E29 ok > > > 987654321. d>f _1e-18 f* _1e-12 f* > > ## ok > > > fs. > > ######################9.###8###7###6###4###9###8E-22 ok > > > So now the conversion with positive exponens (previous f/) i 3x > > faster: > > : measure > > oktimer-start _1 _pi f* _1e15 f* fs. timer-stop fs. ." sec"; > ok > > > measure > > 3.1415874:E15 4.9283066E-1 sec ok > > > : measure > > oktimer-start _1 _pi f* _1e-15 f* fs. timer-stop fs. ." sec"; > ok > > > measure > > 3.1415851E-15 5.4525948E-1 sec ok > > > : measure > > oktimer-start _-1 _pi f* _1e15 f* fs. timer-stop fs. ." sec"; > ok > > > measure > > -3.1415874:E15 4.9283066E-1 sec ok > > > : measure > > oktimer-start _-1 _pi f* _1e-15 f* fs. timer-stop fs. ." sec"; > ok > > > measure > > -3.1415851E-15 5.4525948E-1 sec ok > > So all fs. are now ~0.5sec. > I do not know why the colon in the number (:E15), but probably an > induced bug (:-)). > Pito |
From: pito <pi...@vo...> - 2010-09-04 19:31:25
|
Yes, 0.1 is $3dcccccc = 0.09999999403953552 from that the result will be 3.14158984478e+/-15 (shall be 3.1415926535e+/-15). Our result is 3.1415851, or 3.1415874 - so quite precise. However, I would simply recommend to print with 4 decimal places only. For atmega users -3.1415E-27 might be ok, I guess. And nobody will be nervous (:-)). Q: can we set the number of decimal places to be print somehow? Pito |
From: pito <pi...@vo...> - 2010-09-04 20:07:08
|
The f* is ~4.7ms and f/ is ~91ms (not 13ms as I had written, sorry) at 25MHz clock. So we went from 1.7sec to 0.5sec when printed E15, that is 1.7-0.5 = ~1.2sec less. check: 15 * (0.091-0.0047) = ~1.29sec less.. P. |
From: Leon N. M. <leo...@gm...> - 2010-09-04 19:47:44
|
At the moment, there's no way to print a fixed number of digits, but it won't be hard to do -- a simple way would be to replace the last begin-while-repeat look with a do-loop of the number of digits you want and strip out the stuff dealing with M. It's something I hope to get around to eventually -- it's in the ANS standard with the SET-PRECISION word. I'll probably make it possible to have a set number of significant digits or to use the current method. -Leon On Saturday, September 04, 2010 02:31:16 pm you wrote: > Yes, 0.1 is $3dcccccc = 0.09999999403953552 > from that the result will be 3.14158984478e+/-15 (shall be > 3.1415926535e+/-15). > Our result is 3.1415851, or 3.1415874 - so quite precise. However, I > would simply recommend to print with 4 decimal places only. For > atmega users -3.1415E-27 might be ok, I guess. And nobody will be > nervous (:-)). Q: can we set the number of decimal places to be > print somehow? Pito |
From: pito <pi...@vo...> - 2010-09-04 20:10:30
|
Leon, did you see the ans float from the 4th forth? There is an ans float library and zen float library as well. Lot of routines for both.. Pito |
From: pito <pi...@vo...> - 2010-09-04 20:31:03
|
A point to fs. For small systems float with 3-4 decimal places is perfect when measuring standard stuff except when measure frequency or time. What could be nice to have is engineering notation. I have got it on my HP-25, and frankly, I do not understand how they did it on such small footprint. Not sure whether it's processor calculates in BCD directly, then it is easy. Once I did it on pic in C, but the code was big as I had needed log(). Pito |
From: Leon N. M. <leo...@gm...> - 2010-09-04 22:19:27
|
Engineering notation is also on the todo list (see FE. in the standard). I don't know about the HP25, but many older HP calculators used the Saturn microprocessor that could handle 16 BCD digits. http://en.wikipedia.org/wiki/Saturn_(microprocessor) I've seen the 4th stuff -- definitely a lot of good material (I wish I'd seen it sooner). -Leon On Saturday, September 04, 2010 03:30:56 pm you wrote: > A point to fs. For small systems float with 3-4 decimal places is > perfect when measuring standard stuff except when measure frequency > or time. What could be nice to have is engineering notation. I have > got it on my HP-25, and frankly, I do not understand how they did it > on such small footprint. Not sure whether it's processor calculates > in BCD directly, then it is easy. Once I did it on pic in C, but the > code was big as I had needed log(). Pito |
From: Leon N M. <leo...@gm...> - 2010-09-05 05:45:05
|
I did what I mentioned earlier and replaced the repeated division in FS. by repeated multiplication and one division. I didn't benchmark it rigorously, but it seems to be nearly as fast as the method where we multiply by 1/10, but it doesn't have the accuracy problems. For example: The new method: > _pi _1e3 f* fs. 3.1415925E3 ok > _pi _1e3 f/ fs. 3.1415923E-3 ok The multiply by 1/10 method: > _pi _1e3 f* fs. 3.1415915E3 ok > _pi _1e3 f/ fs. 3.1415923E-3 ok So the new method restricts the error to the least significant digit printed, where as the multiply by 1/10 method does not. Unless this turns out to be much slower than I expect, I'll run with this new method until f/ can be improved. Any objections if I make the floating point code version 4.1 dependent so that I can use 2>R, 2R> and 2LITERAL? -Leon >Saturday 04 September 2010 >From: "pito" <pi...@vo...> >Subject: Re: [Amforth-devel] printing floats > Yes, 0.1 is $3dcccccc = 0.09999999403953552 > from that the result will be 3.14158984478e+/-15 (shall be > 3.1415926535e+/-15). > Our result is 3.1415851, or 3.1415874 - so quite precise. However, I > would simply recommend to print with 4 decimal places only. For > atmega users -3.1415E-27 might be ok, I guess. And nobody will be > nervous (:-)). Q: can we set the number of decimal places to be > print somehow? Pito |
From: pito <pi...@vo...> - 2010-09-05 07:55:48
|
Thanks! I will play with it today. From my prospective no objection to work with 4.1. Pito > Any objections if I make the floating point code > version 4.1 dependent so that > I can use 2>R, 2R> and 2LITERAL? > -Leon |
From: pito <pi...@vo...> - 2010-09-05 08:10:28
|
Bug in latest source: \THREE VERSIONS, PICK YOUR POISON <<<<<< |
From: pito <pi...@vo...> - 2010-09-05 09:22:24
|
FS. duration (with latest Leon's fs., latest float$constants, time in seconds, @25MHz): > _1 _pi _1e15 f* f* timer-start fs. timer-stop fs. 3.141593E15 6.5011706E-1 ok > _1 _pi _1e-15 f* f* timer-start fs. timer-stop fs. 3.1415899E-15 6.1865978E-1 ok > _-1 _pi _1e15 f* f* timer-start fs. timer-stop fs. -3.141593E15 6.5011706E-1 ok > _-1 _pi _1e-15 f* f* timer-start fs. timer-stop fs. -3.1415899E-15 6.2914553E-1 ok > Pito ----- PŮVODNÍ ZPRÁVA ----- Od: "pito" <pi...@vo...> Komu: leo...@gm..., amf...@li... Předmět: Re: [Amforth-devel] printing floats Datum: 5.9.2010 - 10:10:20 > Bug in latest source: > \THREE VERSIONS, PICK YOUR POISON <<<<<< > > > ------------------------------------------------------------------------------ > > This SF.net Dev2Dev email is sponsored by: > > Show off your parallel programming skills. > Enter the Intel(R) Threading Challenge 2010. > http://p.sf.net/sfu/intel-thread-sfd > _______________________________________________ > Amforth-devel mailing list > Amf...@li... > https://lists.sourceforge.net/lists/listinfo/amforth-devel |
From: pito <pi...@vo...> - 2010-09-05 09:52:35
|
FYI: calc with f/ instead of f*: > _1 _pi f* _1e15 f/ timer-start fs. timer-stop fs. 3.1415911E-15 6.5011706E-1 ok > _1 _pi f* _1e-15 f/ timer-start fs. timer-stop fs. 3.1415932E15 7.0254588E-1 ok > _-1 _pi f* _1e15 f/ timer-start fs. timer-stop fs. -3.1415911E-15 6.3963132E-1 ok > _-1 _pi f* _1e-15 f/ timer-start fs. timer-stop fs. -3.1415932E15 7.0254588E-1 ok > Duration of f/ (@25MHz, in seconds): > _1 _pi f* _1e15 timer-start f/ timer-stop fswap fs. fs. 3.1415911E-15 9.4371824E-2 ok > _1 _pi f* _1e-15 timer-start f/ timer-stop fswap fs. fs. 3.1415932E15 8.388607E-2 ok > _-1 _pi f* _1e15 timer-start f/ timer-stop fswap fs. fs. -3.1415911E-15 8.388607E-2 ok > _-1 _pi f* _1e-15 timer-start f/ timer-stop fswap fs. fs. -3.1415932E15 8.388607E-2 ok > |