When doing ./xammsearch -p d on ARM64, I see MU/NU tuning choosing 4x5, and getting 3413 MFLOP, but when full timing is run, perf has dropped terribly. May be do to not killing all result files --> try again.
After rerunning have same strange result: MUNU & K-Clean timings have 3.4Gflop, but kernel timings have 2Gflop. I wonder if kernel timings ignore KRUNTIME, and only use actual runtime K, which prevents compiler from unrolling?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
After rerunning have same strange result: MUNU & K-Clean timings have 3.4Gflop, but kernel timings have 2Gflop. I wonder if kernel timings ignore KRUNTIME, and only use actual runtime K, which prevents compiler from unrolling?
May be a problem with not genering genstring, or bad timing files. Killed all timings and restarted ./xammsearch -p d.
seems to be bad timing files. Unfortunately, this will effect all AMMM install on ARM64 machine :(
Ticket moved from /p/math-atlas/support-requests/969/
This report was just a result of:
https://sourceforge.net/p/math-atlas/bugs/243/
so keeping that one open, closing this one.