As can be seen on the snapshots page, the dilithium-ntt regression test fails for some targets (ds390, mcs51-large, some z80 variants) on an aarch64 host.
The .ihx is the same. Looks like we just run into the simulator timeout for machines that do not have good single-thread performance. For z80/z80n, the test takes about 30s on a Ryzen AI Max 395+, but about 7 min on Power 9.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The bottleneck is 32x32->64 multiplication. We could about double the speed of that one by having a dedicated library function for it. If we also double the timeout then, it should work.
👍
1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Apparently, the same happen on ppc64 host.
The .ihx is the same. Looks like we just run into the simulator timeout for machines that do not have good single-thread performance. For z80/z80n, the test takes about 30s on a Ryzen AI Max 395+, but about 7 min on Power 9.
The bottleneck is 32x32->64 multiplication. We could about double the speed of that one by having a dedicated library function for it. If we also double the timeout then, it should work.
Both the optimization (by having a 32x32->64 multiplication in the library) and a timeout increase by 50% are now in the pqc branch.
Last edit: Philipp Klaus Krause 3 days ago