Menu

#622 Incorrect simulation results on aarch64

v1.0 (example)
open
nobody
None
5
2024-01-05
2023-02-17
No

When running the ngspice related tests from KiCad/qa_eeschema, the results on aarch64 differ significantly from x86_64, and are obviously incorrect.

The failing circuit is the DualNMOS test from https://gitlab.com/kicad/code/kicad/-/issues/13162

The initial transient solution finding outputs a lot of warnings, but only on aarch64, not x86_64:

Note: Starting dynamic gmin stepping
Trying gmin =   1.0000E-03 Note: One successful gmin step
Trying gmin =   1.0000E-04 Note: One successful gmin step
Trying gmin =   1.0000E-05 Warning: Further gmin increment
...
Trying gmin =   2.1981E-05 Warning: Further gmin increment
Trying gmin =   2.1990E-05 Warning: Further gmin increment
Trying gmin =   2.1992E-05 Warning: Last gmin step failed
Warning: Dynamic gmin stepping failed
Note: Starting true gmin stepping
Trying gmin =   1.0000E-03 Warning: Further gmin increment
...
Trying gmin =   9.9986E-03 Warning: Further gmin increment
Trying gmin =   9.9996E-03 Warning: Last gmin step failed
Warning: True gmin stepping failed
Note: Starting source stepping
Supplies reduced to   0.0000% Note: One successful source step
...
Supplies reduced to   7.5887% Note: One successful source step
Supplies reduced to  11.4330% Supplies reduced to   7.5887% Note: One successful source step
Supplies reduced to   8.1653% Supplies reduced to   7.5887% Warning: source stepping failed
Note: Transient op started
Note: Transient op finished successfully

See attached image for a plot of V(out) on x86_64 and aarch64.

1 Attachments

Discussion

  • Stefan Brüns

    Stefan Brüns - 2023-02-17

    KiCad device models:
    https://gitlab.com/kicad/code/kicad/-/tree/master/qa/data/eeschema

    Failing circuit netlist:

    i13162 KiCad schematic
    
    .include "./TL072-dual.lib"
    .include "./VDMOS_models.lib"
    .save all
    .probe alli
    .ic v(Tj1)={envtemp} v(Tj2)={envtemp}
    .temp {envtemp}
    .param envtemp=25
    .tran 200u 500m
    .option RELTOL=.01 ABSTOL=1N VNTOL=10u
    .control
    set controlswait
    if $?sharedmode
    rusage
    else
    run
    rusage
    settype temperature  tj1 tj2 tcase1 tcase2
    plot tj1 tj2 tcase1 tcase2
    plot in out xlimit 5.2 5.3
    end
    .endc
    Rload1 out GND 8
    C7 Net-_C7-Pad1_ GND 300m
    Vamb1 Net-_R11-Pad1_ GND {envtemp}
    R14 Net-_R11-Pad1_ Net-_C7-Pad1_ 3
    C6 Net-_C6-Pad1_ GND 300m
    R13 Net-_C7-Pad1_ Tcase2 200m
    R12 GND Tj2 1G
    R9 GND Tj1 1G
    R10 Net-_C6-Pad1_ Tcase1 200m
    R11 Net-_R11-Pad1_ Net-_C6-Pad1_ 3
    R4 Net-_U1A--_ GND 1k
    C3 Net-_U1B-+_ GND 1u
    R2 Net-_U1B-+_ GND 10k
    XU1 Net-_R16-Pad2_ Net-_U1A--_ Net-_U1A-+_ GND Net-_U1B-+_ Net-_U1B--_ Net-_R17-Pad2_ VCC TL072c
    R5 Net-_M2-D_ Net-_U1A--_ 19.5k
    R17 Net-_M2-G_ Net-_R17-Pad2_ 100
    R6 Net-_M2-S_ Net-_U1B--_ 100k
    Vin1 in GND dc 0 ac 1 sin(0 0.5 100 20m)
    C1 VCC GND 1u
    R16 Net-_M1-G_ Net-_R16-Pad2_ 100
    C2 Net-_U1A-+_ in 330n
    R3 Net-_U1A-+_ Net-_U1B--_ 100k
    R1 VCC Net-_U1B-+_ 390k
    R7 Net-_M1-S_ Net-_M2-D_ 100m
    C4 Net-_M2-D_ out 10m
    R15 out GND 1k
    C5 out Net-_M2-D_ 1u
    M2 Net-_M2-D_ Net-_M2-G_ Net-_M2-S_ Tj2 Tcase2 IRFP240 thermal
    M1 VCC Net-_M1-G_ Net-_M1-S_ Tj1 Tcase1 IRFP240 thermal
    V1 VCC GND 36
    R8 Net-_M2-S_ GND 800m
    .end
    
     
  • Stefan Brüns

    Stefan Brüns - 2023-02-17

    Error occurs with both ngspice-38 and ngspice-39.

     
  • Holger Vogt

    Holger Vogt - 2023-02-18

    The circuit is delicate about finding the operatig point. You might see this by watching the line
    .option RELTOL=.01 ABSTOL=1N VNTOL=10u
    where some base parameters have been changed to achieve op convergence. The reason is the inclusion of self heating in the power devices, which may lead to instabilities.

    I have run the circuit on Linux, Windows, or macOS, but I do not have aarch64. When I remove the cited line, I will get the buggy result (tested on macOS).

    So you may play with the parameters of the cited .option line. When I remove the line, but change the simulation command to
    .tran 200u 10 uic
    and thus skipping the op calculation, the result is o.k. again, even without the .option line.

    It is questionable why KiCad has chosen such a circuit for their qa procedures.

     

    Last edit: Holger Vogt 2023-02-18
    • Stefan Brüns

      Stefan Brüns - 2023-02-18

      So, the op determination fails on aarch64 due to problems with numerical stability? Is this a problem of the circuit (no error margin when there should be some), or of the calculation (e.g. catastrophic error propagation)?

      I have seen differences between x87 FP, x86_64, PPC and AArch64 math over the years, so this is not totally unexpected, though most of the times adjusting error margins was sufficient.

      I am not specifically interested in making this circuit work on all architectures, more on deterministic behavior across architectures.

      I agree the complexity of the used circuit is out of scope for a regression test in KiCad. Probably a significantly simpler circuit (single voltage source and resistor?) would have sufficed to trigger the error. Unfortunately I don't have an affected KiCad nightly at hand, so can't really provide a better (simpler) test case.

       
  • Stefan Brüns

    Stefan Brüns - 2023-02-18

    Btw, creating a fully emulated aarch64 machine on a x86_64 machine is fairly trivial with virt-manager. It shouldn't take more than 5 minutes to set one up (putting aside any download and installation time running in the background).
    Performance on a current x86_64 host is fairly decent, similar to an RPi3, so sufficient to build ngspice and simulate moderately sized circuits.

     
  • Stefan Brüns

    Stefan Brüns - 2024-01-05

    This now (ngspice-42) apparently also affects x86_64.

    Output from kicad-7.0.10, on aarch64, ngspice-40:

    [ 1917s] + /usr/bin/ctest --output-on-failure --force-new-ctest-process -j8 --tests-regex qa_eeschema
    [ 1917s] Test project /home/abuild/rpmbuild/BUILD/kicad-7.0.10/build
    [ 1917s]     Start 4: qa_eeschema
    [ 1923s] 1/1 Test #4: qa_eeschema ......................***Failed    6.32 sec
    [ 1923s] Running 104 test cases...
    [ 1923s] /home/abuild/rpmbuild/BUILD/kicad-7.0.10/qa/unittests/eeschema/./test_netlist_exporter_spice.h(242): [1;31;49merror: in "DualNMOSAmp": check abs( yVector[i] - refValue ) <= maxError has failed [10.510899464823588 > 0.011140000000000001]
    [ 1923s] Failure occurred in a following context:
    [ 1923s]     X vector name: time, X value: 0.029999999999999999
    [ 1923s]     Y vector name: V(out), Ref value: 0.55700000000000005, Actual value: 11.067899464823588[0;39;49m
    [ 1923s] /home/abuild/rpmbuild/BUILD/kicad-7.0.10/qa/unittests/eeschema/./test_netlist_exporter_spice.h(242): [1;31;49merror: in "DualNMOSAmp": check abs( yVector[i] - refValue ) <= maxError has failed [11.06536531291361 > 0.028740000000000002]
    [ 1923s] Failure occurred in a following context:
    [ 1923s]     X vector name: time, X value: 0.035000000000000003
    [ 1923s]     Y vector name: V(out), Ref value: -1.4370000000000001, Actual value: 9.6283653129136102[0;39;49m
    [ 1923s] /home/abuild/rpmbuild/BUILD/kicad-7.0.10/qa/unittests/eeschema/./test_netlist_exporter_spice.h(84): [1;31;49merror: in "DualNMOSAmp": 
    

    nspice-42, x86_64:

    [ 2960s] [0;39;49m
    [ 2960s] /home/abuild/rpmbuild/BUILD/kicad-7.0.10/qa/unittests/eeschema/./test_netlist_exporter_spice.h(242): [1;31;49merror: in "DualNMOSAmp": check abs( yVector[i] - refValue ) <= maxError has failed [0.35642603373163484 > 0.011140000000000001]
    [ 2960s] Failure occurred in a following context:
    [ 2960s]     X vector name: time, X value: 0.029999999999999999
    [ 2960s]     Y vector name: V(out), Ref value: 0.55700000000000005, Actual value: 0.20057396626836521[0;39;49m
    [ 2960s] /home/abuild/rpmbuild/BUILD/kicad-7.0.10/qa/unittests/eeschema/./test_netlist_exporter_spice.h(242): [1;31;49merror: in "DualNMOSAmp": check abs( yVector[i] - refValue ) <= maxError has failed [0.32961448774402657 > 0.028740000000000002]
    [ 2960s] Failure occurred in a following context:
    [ 2960s]     X vector name: time, X value: 0.035000000000000003
    [ 2960s]     Y vector name: V(out), Ref value: -1.4370000000000001, Actual value: -1.1073855122559735[0;39;49m
    [ 2960s] /home/abuild/rpmbuild/BUILD/kicad-7.0.10/qa/unittests/eeschema/./test_netlist_exporter_spice.h(84): [1;31;49merror: in "DualNMOSAmp":
    

    ngspice-42, aarch64:

    [ 3018s] /home/abuild/rpmbuild/BUILD/kicad-7.0.10/qa/unittests/eeschema/./test_netlist_exporter_spice.h(242): error: in "DualNMOSAmp": check abs( yVector[i] - refValue ) <= maxError has failed [10.448457439819462 > 0.011140000000000001]
    [ 3018s] Failure occurred in a following context:
    [ 3018s]     X vector name: time, X value: 0.029999999999999999
    [ 3018s]     Y vector name: V(out), Ref value: 0.55700000000000005, Actual value: 11.005457439819462
    [ 3018s] /home/abuild/rpmbuild/BUILD/kicad-7.0.10/qa/unittests/eeschema/./test_netlist_exporter_spice.h(242): error: in "DualNMOSAmp": check abs( yVector[i] - refValue ) <= maxError has failed [11.135914448610931 > 0.028740000000000002]
    [ 3018s] Failure occurred in a following context:
    [ 3018s]     X vector name: time, X value: 0.035000000000000003
    [ 3018s]     Y vector name: V(out), Ref value: -1.4370000000000001, Actual value: 9.6989144486109318
    [ 3018s] /home/abuild/rpmbuild/BUILD/kicad-7.0.10/qa/unittests/eeschema/./test_netlist_exporter_spice.h(84): error: in "DualNMOSAmp": 
    
     

Log in to post a comment.