[atlas-devel] 3.9.21
Brought to you by:
rwhaley,
tonyc040457
From: <wh...@cs...> - 2010-01-11 18:40:15
|
Guys, I have finally released 3.9.21. The main new feature is that ATLAS now probes for the number of threads to use by problem size. On the platforms I've tested so far, it doesn't have a huge effect on performance, except for some limited small problem sizes (eg., an N=200 QR got 47% faster for my 8-core system, but an N=300 wasn't noticably faster). Originally, I had hand-tuned crossover points. I hand-tuned these crossovers for the exact machines I used the new auto-tuning system for. The hope is that for other systems, the performance improvements will be greater. I will also undoubtedly need to extend this autotuner further, including trying to find additional ways to speed it up (right now it can add an hour to an install if you don't have arch defs). Let me know if you see noticable speedup or slowdown from 3.9.20 Changelog is below. Cheers, Clint ATLAS 3.9.21 released 01/11/10, changes from 3.9.20 * Fixed error in threaded SYMM, where recursion had bad pointer * Created ability to tune threaded/serial crossover points, see ATLAS/tune/blas/gemm/txover.c * Improved CacheEdge detection * Fixed bug in configure for --shared on archs w/o f77 compiler * Updated lanbtst to work wt new QR naming scheme, and to compile correctly for lanbtime (was not using lapack's ILAENV in this case) ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** |