From: John D. <uf...@gm...> - 2006-12-07 16:01:53
|
Hello Al, "-floatstore" is the slowest possible way to ensure 64 bit. It writes to memory after every math operation. To ensure 64 bit in the FPU use the code from the link I mentioned: http://www.wrcad.com/linux_numerics.txt #ifdef linux #include <fpu_control.h> #endif main(argc, **argv) { #ifdef linux /* This puts the X86 FPU in 64-bit precision mode. The default under Linux is to use 80-bit mode, which produces subtle differences from FreeBSD and other systems, eg, (int)(1000*atof("0.3")) is 300 in 64-bit mode, 299 in 80-bit mode. */ fpu_control_t cw; _FPU_GETCW(cw); cw &= ~_FPU_EXTENDED; cw |= _FPU_DOUBLE; _FPU_SETCW(cw); #endif This is much faster. Also, I would never use "-fastmath" as it is not IEEE compliant. Please run your tests again without floatstore and with the code snippet above. I would think the code would be much faster. The 64-bit rounding is then worth it. Regards, Juan On 12/7/06, al davis <ad...@fr...> wrote: > > On Wednesday 06 December 2006 22:07, John Doe wrote: > > But for the same compiler on the same machine, the results > > are much closer across different optimization levels with 64 > > bit rounding. When I run the same regressions on windows, > > freebsd, and linux, the results are off in a much less > > significant decimal place. I want my regression to show > > significant differences due to algorithmic changes, and not > > just the fact that I chose -O2 versus -O3. > > I ran some tests on gnucap ... > > Two computers > 1: intel 1.8ghz, Debian testing, 1 g mem, light load > 2: AMD64x2, 2.4 ghz, Debian unstable, 2 g mem, hvy load > > Three compiler option settings > All "-O2" > 1. as is > 2. "-ffast-math" > 3. "-float-store" > > Configuration #1 "std" took all defaults except "-O2" > > Configuration #2 was the same except for the "-ffast-math" > option, which turns on all available floating point > optimizations, including those considered dangerous. > > Configuration #3, same as std except for the "-ffloat-store" > option. This option forces storage of intermediate results, > therefore rounding to 64 bits. > > > Two circuit files, one with 147000 nodes, other with 590000 > nodes. The larger circuit swapped unaccepably on the small > machine so I tested only the smaller circuit there. These were > used to compare speed. > > AMD, large, AMD small, intel-small > std 39 sec 9.5 sec 11.2 sec > -ffast-math: 39 sec 9.5 sec 11.2 sec > -ffloat-store: 50 sec 12 sec 13 sec > > The "small" circuit takes 30 minutes to run on ng-spice, on the > AMD, with equivalent results. Note that the time is 9.5 > SECONDS on gnucap, 30 MINUTES in ng-spice. The algorithms are > different. > > Also, complete gnucap test suite, 345 test files. > > > > Test suite showed > > AMD-64--- > no difference between AMD "std" and "-ffloat-store". > > 13 test differences between AMD "std" and "-ffast-math" > One difference was that an overflow was not properly trapped > with -fast-math. > > > Intel --- > intel with -float-store had 4 trivial test differences compared > to AMD std > > intel standard had 48 test differences, one is significant, > compared to AMD std. The significantly different test still > gave correct answers with trivial differences, but had > different time steps. > > intel with -fast-math had 43 test differences compared to AMD > std, one is significant. It had the same time stepping as the > standard version. One test had an overflow that was not > properly trapped. > > My conclusion about speed: The AMD-64 and Intel processor speed > difference corresponds to clock speed. > > The AMD gives more consistent results, apparently because the > math really is 64 bit, all the time. "-ffast-math" causes > problems and does not improve speed. "-ffloat-store" results > in a significant speed penalty (28% on the big circuit) with no > change in results. The standard setting is therefore the best > choice. > > The Intel has more differences. With the "-ffloat-store" > option, only 4 tests had any difference compared to the AMD, > and these were trivial. I think this confirms that it was > doing essentially the same 64 bit rounding. The standard > setting resulted in 48 tests with trivially different results > in all but one. I am assuming this is because of the excess > precision you mention. The "-ffast-math" option gave 43 > differences compared to the reference. I do not consider this > 43 compared to the 48 with no options difference to be > significant. There were 25 trivial differences comparing intel > with fast-math to intel with no options. One was the numeric > overflow case. > > As to which option is best, I am not sure. The "--fast-math" > option causes problems and does not improve speed, so it should > not be used. Whether the "-ffloat-store" option should be used > could be debated. It doesn't give improved accuracy, but it > does give a more predictable error, essentially matching > another 64 bit system. The option does give a speed penalty, > 16% in my test. > > The particular test that resulted in different time stepping > gives believable but incorrect results in ng-spice, with no > warnings. It is a negative resistance oscillator using the > switch element as the negative resistance device. On > resistance is 1 ohm. Off resistance is 1e9. Gnucap handles > the fast switching correctly, automatically. Spice hops past, > giving a glitch that is really trapezoidal ringing, making it > appear to work. > > One important point here is that differences in algorithms have > much more effect than differences in compiler optimization. > > > When I do AMD-64 in 64bit mode, it is going to prefer the > > 64-bit SSE instructions over the 80-bit 387 instructions. > > Now I am going to get closer results to a machine with a > > sparc chip then when I compiled the program on the same > > machine in 32-bit mode. > > > > If my result is rounded to 64-bit in the floating point > > register, less damage is done when that number is written > > back to memory and read back in. I am happier with that than > > having an 80-bit number written from register to memory, read > > back in and zero extended. > > I think I just confirmed what you said. The results were as I > expected. > > > An excellent paper on this issue is: > > http://www.wrcad.com/linux_numerics.txt > > I have read this paper, long time ago. > > > When an EDA customer gets a new update to their tools, > > they're going to validate and they want an explanation why > > the results no longer match their golden files. EDA > > companies are keenly aware of this, and often provide > > extended precision, but only as a non-default option. > > I understand this, but algorithm changes often mask the numeric > differences seen here. I especially recall that when I was > reworking the gnucap transient step control. Changing the > algorithm changed the stepping, making all text regressions > look completely different, especially those designed to point > out differences. > > Speaking of step size control ... Spice calculates the > truncation error incorrectly. It significantly underestimates > the error. How much depends on the actual step size. (that > is, the step size it is using, not the wanted step size) This > was true even in Spice-2. The same error was reproduced in > Spice-3. After over 20 years, it has never changed, even > though it has been known to be incorrect for most of that time. > > Gnucap openly defies tradition in many cases, including this > one. Should ng-spice fix the bug? > > Consider this ... Fixing the bug will result in many more time > steps in most cases, causing it to run slower. There are > options for setting tolerances. Most users don't know the true > meaning of the options, but rather just set it so they get > decent results. Fixing the bug could make it mathematically > correct, but change the meaning of the options. Golden files > will no longer match, with significant differences. It will > run slower, on average maybe 5x slower. The bug also exists, > not fixed, in most other spice derivatives, so the current > behavior matches expectations. > > Since it is a calculation error, fixing it is consistent with > keeping in the spirit of Spice. It is an easy fix, but was a > complicated analysis to prove it mathematically. That might > say to fix it. On the other hand, people know what to expect > the way it is, saying not to fix it. > > NG-spice has some hacks to the algorithm, compared to the > original, perhaps as an attempt to fix it without understanding > what the real problem is, but the real bug remains. > > Should it be fixed? > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Ngspice-devel mailing list > Ngs...@li... > https://lists.sourceforge.net/lists/listinfo/ngspice-devel > |