math-atlas-errors Mailing List for Automatically Tuned Linear Algebra Soft.
Brought to you by:
rwhaley,
tonyc040457
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2003 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(2) |
2004 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2008 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: R. C. W. <rcw...@ls...> - 2016-07-28 20:13:06
|
Folks, I am happy to announce the release of ATLAS 3.10.3. Please forward this e-mail to any relevant mailing list, or to any parties that are repackaging the stable release that you know about. ATLAS 3.10.3 should be noticeably faster than 3.10.2 on modern hardware, but the 3.11 series is almost always much faster on such systems. While I was able to backport support for modern architectures, and even provide some reasonable kernels for modern ISA extensions, the 3.11 series allows for much larger block factors and improved storage formats that are required to get decent performance many modern machines (including all AVX-enabled Intel chips). So, if you can use it, 3.11 is still the best for modern machines by a long way. I had hoped to have ATLAS 4.0 out by now, but various setbacks have delayed the release, necessitating 3.10.3, since 3.10.2 was not installing well on modern machines. 3.10.3 fixes these three bugs: http://math-atlas.sourceforge.net/errata3.10.2.html#herkNaN http://math-atlas.sourceforge.net/errata3.10.2.html#syr2kNaN http://math-atlas.sourceforge.net/errata3.10.2.html#rotmg I have tested 3.10.3 to work on the following OSes: 1. Linux 2. Windows64 (cygwin64 builds now work!) 3. AIX 4. OS X For OSes 2-4, see special sections in the install guide for additional help: http://math-atlas.sourceforge.net/atlas_install/node53.html Hopefully other OSes (eg., Windows32, Solaris) still work from 3.10.2 testing. Also note that clang can now be used to build ATLAS by adding: --force-clang=/path/to/clang to your configure line. For the open version of clang, performance still tends to lag gcc, but is strongly improved from last release. Apple's clang appears to be substantially faster, but I may be mistaken. New architecture support available in 3.10.3 includes: 1. ARM32: a7, a9, a15 (auto-detect of SOFT/HARD ABI) 2. ARM64: xgene1, a53, a57 3. Intel: Corei3 & Corei4 (skylake) 4. IBM: Z series, POWER8 (including little/big endian) Support for modern vector extensions in atlas_simd.h: 1. Intel AVX2 2. IBM VSX & Z-series VX 3. ARM64 Advanced SIMD 4. ARM32 NEON (only if -Si ieee 0 flag is thrown) Regards, Clint ATLAS 3.10.3 released 07/28/16, highlights of changes from 3.10.2 * Updated F77 L1BLAS testers to those used LAPACK3.6.1 * Fixed bug in rotmg revealed by LAPACK3.6.1 testers * Fixed bug in hprk/sprk that could cause NaN propogation in HERK/SYRK due to reading uninitialized memory in BETA=0 case * Fixed bug in threaded SYR2K/HER2K that could cause NaN propogation due to reading uninitialized memory * Extended matrix/vector norm functions to detect NaNs * Extended configure: + --force-clang=/path/to/clang : will use clang for all C compilers, even goodgcc (assumes gcc flag & inline-assembly compatibility) + --cripple-atlas-performance: install despite failing throttle check + Can now use arch string rather than enum # for -A arg + --force-tids now affects ATLrun.sh as well as threaded build + ARM32 autodetects SOFTFP/HARDFP ABI * backport of config & archdefs for: + POWER[7,8]le, IBMz[10,13,19], Corei[3,4], ARM[7,9,15,17], ARM64[xgene,a53,a57] + archdefs for NEON ARMa[7,15] + config support for IBM Z[9,196,12] * backport & extension of atlas_simd.h & atlas_cplxsimd.h + New SIMD kernels for: VSX, VXZ, AVX2, AdvancedSIMD, NEON * Fixed mflop test of PrintMMLine, that sometimes failed to print valid mflop due to negative values from prior runs * Removed ATL_dmm6x1x60_sse2_32.c from z index files (not valid cplx kern) * Forced MinGW comps to be ignored unless -Si nocygwin 1 is set * Added support for WOW64 detection & basic use, numerous changes to make work on cygwin64 * Fixed uninit nM in s[1,2]nxtune.c's RecDoubleNX -- ********************************************************************** ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley ** ********************************************************************** |
From: R. C. W. <rcw...@ls...> - 2014-07-11 14:34:08
|
Guys, It became obvious that I had too much work still to do on the developer series to stabilize it for this summer, so I have issued 3.10.2 that fixes several bugs in the old stable. There are quite a few bug fixes, but probably the most important are: http://math-atlas.sourceforge.net/errata3.10.1.html#syrknan http://math-atlas.sourceforge.net/errata3.10.1.html#cqtau http://math-atlas.sourceforge.net/errata3.10.1.html#sztyp These bugs may effect even a successfully-installed 3.10.1, so you may want to scope them to see if you want to install 3.10.2. 3.10.2 doesn't have any performance wins that I know of over 3.10.1. Since the developer series has changed the GEMM kernel usage, backporting kernel support doesn't really work, so if you are using modern machines (vectorization of AVX or later for x86), the developer series can more than double performance over the stable. Due to the ongoing threading rewrite, the developer can also be much more efficient when your number of cores exceed 4 or so. So, for best performance, you may want to see if the developer series works on your platform. If it does, you can run all the provided tests that I do during stabilization, which should make it almost as reliable as the stable release for a given platform (though this takes some knowledge; I can provide some help if needed). Regards, Clint ATLAS 3.10.2 released 07/10/14, highlights of changes from 3.10.1 * Fixed all errataed bugs: + Failure to init workspace can cause NaNs in SYRK + Complex row-major Q-type factorizations produce bad TAU + Failure to cast causes integer overflow on 64-byt platforms + Missing IBM S390 assembly file * Fixed Make.bin to have threaded latime built to do parallel cache flushing * Extended extract string lengths as patched by SAGE folks * Backported fixes & some arch support to configure framework, including host of Itanium and UST1 stuff provided by SAGE folks NOTE: 3.10.2 is terribly out of date, and was released only because the threading rewrite it taking too long. If possible, you should use a developer release after testing that it works for your particular platform. In particular, developer releases are *much* faster for any x86 that uses AVX or later SIMD ISA, or any machine with ncores >= 8. The developer release also supports ARM architectures better (though performance is not hugely better if you can get stable installed). |
From: <wh...@cs...> - 2009-08-13 23:08:20
|
Guys, If ATLAS chooses an NB > 84, then it is possible for SGEMM and CGEMM to produce the wrong answer for M that are multiples of 14. If you use the architectural defaults on a 64-bit AMD K10h, then you definitely have this error, and if you don't use arch defs on other x86 platforms you could potentially have it (if the search chooses NB > 84 and uses this cleanup routine). For details and the fix, see: http://math-atlas.sourceforge.net/errata.html#mu14nb120 Regards, Clint ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** |
From: <wh...@cs...> - 2009-07-08 22:20:49
|
Guys, There's a bug in complex GEMM that can cause an unitialized memory read when c/z GEMM is called with large K and small M and N. The fix is described at: http://math-atlas.sourceforge.net/errata.html#JITB02 Cheers, Clint ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** |
From: Clint W. <wh...@cs...> - 2008-06-06 20:12:30
|
There is a performance bug that causes a fairly large performance drop on all architectures. This bug is present in both ATLAS 3.8.0 and 3.8.1. The explanation and fix is available at: http://math-atlas.sourceforge.net/errata.html#JITcpBug Regards, Clint ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** |
From: Clint W. <wh...@cs...> - 2008-05-23 20:08:00
|
An error has been found where ATLAS3.8.1 will occasionally read C even when BETA={0.0,0.0} in complex GEMM. Details, including the fix are available at http://math-atlas.sourceforge.net/errata.html#JITNaN Regards, Clint ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** |
From: Clint W. <wh...@cs...> - 2008-01-15 12:59:43
|
I thought I had already sent a message about this, but the error archive doesn't show it. There have been two errors found in 3.8.0, and errata fixing them have been issued. The most important error is: http://math-atlas.sourceforge.net/errata.html#RMAAT This causes GEMM to compute the wrong answer for row-major C = alpha*A*A' or C = alpha*A'*A. The second error only effects configure on the Pentium-M, which is misidentified as a CoreDuo. The errata and fix are at: http://math-atlas.sourceforge.net/errata.html#PM-CD Regards, Clint ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** |
From: Clint W. <wh...@cs...> - 2006-11-21 00:36:30
|
Guys, A user just discovered an error that effects all ATLAS releases at least as far back as 3.3. Essentially, complex GEMM (ZGEMM & DGEMM) can produce the wrong answer for the special case where B = transpose(A). Here's the errata entry describing the fix: http://math-atlas.sourceforge.net/errata.html#AAT Best regards, Clint |
From: <rw...@cs...> - 2004-04-17 20:05:24
|
Guys, The first serious error in 3.6.0 has been found. It affects only UltraSPARC systems. It should cause a problem on any UltraSPARC install, though I have so far reproduced it only when using the USIII/gcc defaults. It is a out-of- bounds read for some K-cleanup cases, specifically when K%NB == 8. It can cause a seg fault (but often won't, if your leading dimension > K). The fix is discussed: http://math-atlas.sourceforge.net/errata.html#USCU Regards, Clint |
From: <rw...@cs...> - 2003-12-23 15:45:11
|
Hi, The previously mentioned ATHLONSSE1 threaded bug is fixed in the newest ATLAS stable, 3.6.0, which has just been released. I include the 3.6.0 announcement below. Regards, Clint >From rw...@cs... Tue Dec 23 10:43:04 2003 X-Original-To: rw...@en... To: mat...@li... Subject: ATLAS 3.6.0 released Cc: rw...@cs... Date: Tue, 23 Dec 2003 10:43:02 -0500 (EST) From: rw...@cs... (R Clint Whaley) X-Virus-Scanned: by amavisd-new and ClamAV at cs.utk.edu X-Spam-Status: No, hits=-4.9 tagged_above=-100.0 required=7.0 tests=BAYES_00 X-Spam-Level: Hi, I am pleased to announce the release of ATLAS 3.6.0, the new ATLAS stable. http://math-atlas.sourceforge.net/ It's been two years since the last stable, so there have been a large number of changes. I will provide a few highlights here, and you can examine the ChangeLog for full details (the ChangeLog can be found on SourceFourge download page or in ATLAS/doc/ChangeLog). The first thing is, of course, speedups. ATLAS 3.6 is *much* faster than 3.4 for most common architectures. In particular, Opteron, Athlon-64, Itanium 2, Pentium 4, Pentium 3, UltraSparc II/III are all signficantly faster in 3.6 than they were under 3.4. To give a flavor of this speedup, I have posted a few initial timings at: http://math-atlas.sourceforge.net/timing/ ATLAS also supports arch defaults for several new architectures. In addition to the machines mentioned above, ATLAS has defaults for IBM Power 4, and you can also build ATLAS using Intel's compiler on most Intel platforms. Another new feature in 3.6 is an improved SYRK/HERK implementation, with associated Cholesky speedup. Regards, Clint Whaley |
From: <rw...@cs...> - 2003-12-16 16:58:33
|
Pearu Peterson of scipy.org has reported and I have confirmed a bug that occurs intermittently on multiprocessor Athlons. The fix for now is to not use the threaded interface on Athlons. This is fixed in ATLAS 3.6, the new stable, which should be released before the new year. I will send mail on this list when it is available. Regards, Clint |
From: <rw...@cs...> - 2003-11-28 17:49:42
|
Ladies & Gentlemen, Pearu of the Scipy project (www.scopy.org) reported a bug in the row-major complex Cholesky code, and I have verified and issued an errata for it: http://math-atlas.sourceforge.net/errata.html#clltR This bug will affect all codes that depend on row-major complex cholesky (i.e. the problem could show up in the solve or inversion of row-major complex matrices). If you are not using row-major operations at all, you can safely ignore. If you use row-major operations, it is probably safest to apply the fix. If you are using the developer series, this error will be fixed in 3.5.13, which should be released this weekend (I'm waiting on some contributer code for the release). Regards, Clint |
From: R C. W. <rw...@cs...> - 2003-07-28 15:53:57
|
When you make the change I sent out last week, you also need to make a similar change to slvtst. I have added this fix to the original errata entry: http://math-atlas.sourceforge.net/errata.html#rLLt Regards, Clint |
From: R C. W. <rw...@cs...> - 2003-07-24 22:26:28
|
A user discovered an error in row-major complex cholesky, as shown: http://math-atlas.sourceforge.net/errata.html#rLLt While you are scoping this patch, you should scope the other ATLAS errors. There are one or two that are not bad enough to generate an atlas-error message, but that might be of interest (eg., an error when using solaris make on install, an error in the clapack.h include file, etc.): http://math-atlas.sourceforge.net/errata.html Regards, Clint FYI: I've also been posting ATLAS timings here: http://math-atlas.sourceforge.net/timing/ |
From: R C. W. <rw...@cs...> - 2003-02-05 17:33:46
|
A very alert user has discovered a possible array bounds read in ATLAS's NRM2 implementation. Only systems that use the bad kernels will be effected. Details are given at: http://math-atlas.sourceforge.net/errata.html#NRM2 Regards, Clint |
From: R C. W. <rw...@cs...> - 2002-06-18 17:34:35
|
There's an error in 3.4.1 CIAMAX that effects AltiVec systems only. Details at: http://math-atlas.sourceforge.net/errata.html#AVCIAMAX Cheers, Clint |
From: R C. W. <rw...@cs...> - 2002-06-15 03:09:55
|
Real LU can get wrong answer for problems where M=2, N > 2, and both LU and LLt can return wrong error codes for singular matrices. Details and fix at: http://math-atlas.sourceforge.net/errata.html#LPIERR Cheers, Clint |
From: R C. W. <rw...@cs...> - 2002-05-30 00:22:32
|
An alert user has pointed out several errors in 3.4.0, see the errata for details and fixes: http://math-atlas.sourceforge.net/errata.html Cheers, Clint |
From: <cla...@cm...> - 2001-09-26 17:30:49
|
Hi. I'm new to this list so please have some patience with the neophyte. I did read the errata file and built myself a patched version of ATLAS 3.2.1 with assorted lapack on my Linux box with ATLON 800 using 3DNow2 extensions: I'm happily using the resulting cblas library. I performed successful builds on an Octane and on a Linux/Celeron laptop. All that is good. However, I also need an NT, PIII SSE1 (512K L2 cache) version and that's causing trouble. I did use the architectural defaults provided in the stable ATLAS release and I did rename the default tarball file appropriately as described in the errata. The install job progressed smoothly for a long while but in the end, during the matrix vector tuning, the time produced NaN for the MFLOP count and !KABOOM!, I got the attached error report. Any clues would be appreciated. I think that there is little in the error report that is instructive except for the NaN MFLOP results. How did that happen? Is there a timer setting that might be wrong? Any help is appreciated. Thanks in advance! Regards, Claude. -- ---- Claude Lacoursière | Tel: +1.(514)-287-1166 Critical Mass Labs | Fax: +1.(514)-287-3360 (formerly MathEngine Canada Inc.) | Cel: +1.(514)-574-7409 420 rue Notre-Dame Ouest, Suite 505 | http://www.cm-labs.com Montreal, Qc H2Y 1V3 CANADA | cla...@cm... |