When running the ngspice related tests from KiCad/qa_eeschema, the results on aarch64 differ significantly from x86_64, and are obviously incorrect.
The failing circuit is the DualNMOS test from https://gitlab.com/kicad/code/kicad/-/issues/13162
The initial transient solution finding outputs a lot of warnings, but only on aarch64, not x86_64:
Note: Starting dynamic gmin stepping
Trying gmin = 1.0000E-03 Note: One successful gmin step
Trying gmin = 1.0000E-04 Note: One successful gmin step
Trying gmin = 1.0000E-05 Warning: Further gmin increment
...
Trying gmin = 2.1981E-05 Warning: Further gmin increment
Trying gmin = 2.1990E-05 Warning: Further gmin increment
Trying gmin = 2.1992E-05 Warning: Last gmin step failed
Warning: Dynamic gmin stepping failed
Note: Starting true gmin stepping
Trying gmin = 1.0000E-03 Warning: Further gmin increment
...
Trying gmin = 9.9986E-03 Warning: Further gmin increment
Trying gmin = 9.9996E-03 Warning: Last gmin step failed
Warning: True gmin stepping failed
Note: Starting source stepping
Supplies reduced to 0.0000% Note: One successful source step
...
Supplies reduced to 7.5887% Note: One successful source step
Supplies reduced to 11.4330% Supplies reduced to 7.5887% Note: One successful source step
Supplies reduced to 8.1653% Supplies reduced to 7.5887% Warning: source stepping failed
Note: Transient op started
Note: Transient op finished successfully
See attached image for a plot of V(out)
on x86_64 and aarch64.
KiCad device models:
https://gitlab.com/kicad/code/kicad/-/tree/master/qa/data/eeschema
Failing circuit netlist:
Error occurs with both ngspice-38 and ngspice-39.
The circuit is delicate about finding the operatig point. You might see this by watching the line
.option RELTOL=.01 ABSTOL=1N VNTOL=10u
where some base parameters have been changed to achieve op convergence. The reason is the inclusion of self heating in the power devices, which may lead to instabilities.
I have run the circuit on Linux, Windows, or macOS, but I do not have aarch64. When I remove the cited line, I will get the buggy result (tested on macOS).
So you may play with the parameters of the cited .option line. When I remove the line, but change the simulation command to
.tran 200u 10 uic
and thus skipping the op calculation, the result is o.k. again, even without the .option line.
It is questionable why KiCad has chosen such a circuit for their qa procedures.
Last edit: Holger Vogt 2023-02-18
So, the op determination fails on aarch64 due to problems with numerical stability? Is this a problem of the circuit (no error margin when there should be some), or of the calculation (e.g. catastrophic error propagation)?
I have seen differences between x87 FP, x86_64, PPC and AArch64 math over the years, so this is not totally unexpected, though most of the times adjusting error margins was sufficient.
I am not specifically interested in making this circuit work on all architectures, more on deterministic behavior across architectures.
I agree the complexity of the used circuit is out of scope for a regression test in KiCad. Probably a significantly simpler circuit (single voltage source and resistor?) would have sufficed to trigger the error. Unfortunately I don't have an affected KiCad nightly at hand, so can't really provide a better (simpler) test case.
Btw, creating a fully emulated aarch64 machine on a x86_64 machine is fairly trivial with virt-manager. It shouldn't take more than 5 minutes to set one up (putting aside any download and installation time running in the background).
Performance on a current x86_64 host is fairly decent, similar to an RPi3, so sufficient to build ngspice and simulate moderately sized circuits.
This now (ngspice-42) apparently also affects x86_64.
Output from kicad-7.0.10, on aarch64, ngspice-40:
nspice-42, x86_64:
ngspice-42, aarch64: