Re: [Ngspice-devel] ngspice and GNUcap

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello Al,

"-floatstore" is the slowest possible way to ensure 64 bit.  It writes to
memory after every math operation.  To ensure 64 bit in the FPU use the code
from the link I mentioned:
http://www.wrcad.com/linux_numerics.txt

      #ifdef linux
      #include <fpu_control.h>
      #endif

      main(argc, **argv)
      {
      #ifdef linux
        /*
        This puts the X86 FPU in 64-bit precision mode.  The default
        under Linux is to use 80-bit mode, which produces subtle
        differences from FreeBSD and other systems, eg,
        (int)(1000*atof("0.3")) is 300 in 64-bit mode, 299 in 80-bit
        mode.
        */
        fpu_control_t cw;
        _FPU_GETCW(cw);
        cw &= ~_FPU_EXTENDED;
        cw |= _FPU_DOUBLE;
        _FPU_SETCW(cw);
      #endif

This is much faster.  Also, I would never use "-fastmath" as it is not IEEE
compliant.

Please run your tests again without floatstore and with the code snippet
above. I would think the code would be much faster.  The 64-bit rounding is
then worth it.

Regards,

Juan

On 12/7/06, al davis <ad...@fr...> wrote:
>
> On Wednesday 06 December 2006 22:07, John Doe wrote:
> > But for the same compiler on the same machine, the results
> > are much closer across different optimization levels with 64
> > bit rounding. When I run the same regressions on windows,
> > freebsd, and linux, the results are off in a much less
> > significant decimal place. I want my regression to show
> > significant differences due to algorithmic changes, and not
> > just the fact that I chose -O2 versus -O3.
>
> I ran some tests on gnucap ...
>
> Two computers
> 1: intel           1.8ghz, Debian testing, 1 g mem, light load
> 2: AMD64x2, 2.4 ghz, Debian unstable, 2 g mem, hvy load
>
> Three compiler option settings
> All "-O2"
> 1. as is
> 2. "-ffast-math"
> 3. "-float-store"
>
> Configuration #1 "std" took all defaults except "-O2"
>
> Configuration #2 was the same except for the "-ffast-math"
> option, which turns on all available floating point
> optimizations, including those considered dangerous.
>
> Configuration #3, same as std except for the "-ffloat-store"
> option.  This option forces storage of intermediate results,
> therefore rounding to 64 bits.
>
>
> Two circuit files, one with 147000 nodes, other with 590000
> nodes.  The larger circuit swapped unaccepably on the small
> machine so I tested only the smaller circuit there.  These were
> used to compare speed.
>
>             AMD, large,  AMD small, intel-small
> std                39 sec    9.5 sec       11.2 sec
> -ffast-math: 39 sec     9.5 sec       11.2 sec
> -ffloat-store:   50 sec    12 sec       13 sec
>
> The "small" circuit takes 30 minutes to run on ng-spice, on the
> AMD, with equivalent results.  Note that the time is 9.5
> SECONDS on gnucap, 30 MINUTES in ng-spice.  The algorithms are
> different.
>
> Also, complete gnucap test suite, 345 test files.
>
>
>
> Test suite showed
>
> AMD-64---
> no difference between AMD "std" and "-ffloat-store".
>
> 13 test differences between AMD "std" and "-ffast-math"
> One difference was that an overflow was not properly trapped
> with -fast-math.
>
>
> Intel ---
> intel with -float-store had 4 trivial test differences compared
> to AMD std
>
> intel standard had   48 test differences, one is significant,
> compared to AMD std.  The significantly different test still
> gave correct answers with trivial differences, but had
> different time steps.
>
> intel with -fast-math had 43 test differences compared to AMD
> std, one is significant.  It had the same time stepping as the
> standard version.  One test had an overflow that was not
> properly trapped.
>
> My conclusion about speed:  The AMD-64 and Intel processor speed
> difference corresponds to clock speed.
>
> The AMD gives more consistent results, apparently because the
> math really is 64 bit, all the time.  "-ffast-math" causes
> problems and does not improve speed.  "-ffloat-store" results
> in a significant speed penalty (28% on the big circuit) with no
> change in results.  The standard setting is therefore the best
> choice.
>
> The Intel has more differences.  With the "-ffloat-store"
> option, only 4 tests had any difference compared to the AMD,
> and these were trivial.  I think this confirms that it was
> doing essentially the same 64 bit rounding.  The standard
> setting resulted in 48 tests with trivially different results
> in all but one.  I am assuming this is because of the excess
> precision you mention.  The "-ffast-math" option gave 43
> differences compared to the reference.  I do not consider this
> 43 compared to the 48 with no options difference to be
> significant.  There were 25 trivial differences comparing intel
> with fast-math to intel with no options.  One was the numeric
> overflow case.
>
> As to which option is best, I am not sure.  The "--fast-math"
> option causes problems and does not improve speed, so it should
> not be used.  Whether the "-ffloat-store" option should be used
> could be debated.  It doesn't give improved accuracy, but it
> does give a more predictable error, essentially matching
> another 64 bit system.  The option does give a speed penalty,
> 16% in my test.
>
> The particular test that resulted in different time stepping
> gives believable but incorrect results in ng-spice, with no
> warnings.  It is a negative resistance oscillator using the
> switch element as the negative resistance device.  On
> resistance is 1 ohm.  Off resistance is 1e9.  Gnucap handles
> the fast switching correctly, automatically.  Spice hops past,
> giving a glitch that is really trapezoidal ringing, making it
> appear to work.
>
> One important point here is that differences in algorithms have
> much more effect than differences in compiler optimization.
>
> > When I do AMD-64 in 64bit mode, it is going to prefer the
> > 64-bit SSE instructions over the 80-bit 387 instructions.
> > Now I am going to get closer results to a machine with a
> > sparc chip then when I compiled the program on the same
> > machine in 32-bit mode.
> >
> > If my result is rounded to 64-bit in the floating point
> > register, less damage is done when that number is written
> > back to memory and read back in. I am happier with that than
> > having an 80-bit number written from register to memory, read
> > back in and zero extended.
>
> I think I just confirmed what you said.  The results were as I
> expected.
>
> > An excellent paper on this issue is:
> > http://www.wrcad.com/linux_numerics.txt
>
> I have read this paper, long time ago.
>
> > When an EDA customer gets a new update to their tools,
> > they're going to validate and they want an explanation why
> > the results no longer match their golden files. EDA
> > companies are keenly aware of this, and often provide
> > extended precision, but only as a non-default option.
>
> I understand this, but algorithm changes often mask the numeric
> differences seen here.  I especially recall that when I was
> reworking the gnucap transient step control.  Changing the
> algorithm changed the stepping, making all text regressions
> look completely different, especially those designed to point
> out differences.
>
> Speaking of step size control ...   Spice calculates the
> truncation error incorrectly.  It significantly underestimates
> the error.  How much depends on the actual step size.  (that
> is, the step size it is using, not the wanted step size)  This
> was true even in Spice-2.  The same error was reproduced in
> Spice-3.  After over 20 years, it has never changed, even
> though it has been known to be incorrect for most of that time.
>
> Gnucap openly defies tradition in many cases, including this
> one.  Should ng-spice fix the bug?
>
> Consider this ...  Fixing the bug will result in many more time
> steps in most cases, causing it to run slower.  There are
> options for setting tolerances.  Most users don't know the true
> meaning of the options, but rather just set it so they get
> decent results.  Fixing the bug could make it mathematically
> correct, but change the meaning of the options.  Golden files
> will no longer match, with significant differences.  It will
> run slower, on average maybe 5x slower.  The bug also exists,
> not fixed, in most other spice derivatives, so the current
> behavior matches expectations.
>
> Since it is a calculation error, fixing it is consistent with
> keeping in the spirit of Spice.  It is an easy fix, but was a
> complicated analysis to prove it mathematically.  That might
> say to fix it.  On the other hand, people know what to expect
> the way it is, saying not to fix it.
>
> NG-spice has some hacks to the algorithm, compared to the
> original, perhaps as an attempt to fix it without understanding
> what the real problem is, but the real bug remains.
>
> Should it be fixed?
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Ngspice-devel mailing list
> Ngs...@li...
> https://lists.sourceforge.net/lists/listinfo/ngspice-devel
>