OpenBLAS - Browse /v0.3.31 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
OpenBLAS-0.3.31-woa64-64-dll.zip	2026-01-17	5.0 MB	31
OpenBLAS-0.3.31-woa64-64-static.zip	2026-01-17	8.0 MB	2
OpenBLAS-0.3.31-woa64-dll.zip	2026-01-17	5.3 MB	1
OpenBLAS-0.3.31-woa64-static.zip	2026-01-17	8.2 MB	2
OpenBLAS-0.3.31-x64-64.zip	2026-01-16	40.0 MB	7
OpenBLAS-0.3.31-x64.zip	2026-01-16	40.4 MB	17
OpenBLAS-0.3.31-x86.zip	2026-01-16	22.1 MB	4
OpenBLAS-0.3.31.tar.gz	2026-01-15	25.2 MB	6
OpenBLAS-0.3.31.zip	2026-01-15	43.2 MB	5
OpenBLAS 0.3.31 version source code.tar.gz	2026-01-15	25.2 MB	1
OpenBLAS 0.3.31 version source code.zip	2026-01-15	43.5 MB	19
README.md	2026-01-15	12.7 kB	6
Totals: 12 Items		266.3 MB	101

general:

reverted a matrix partitioning optimization from 0.3.30 that could lead to race conditions and subsequent invalid results in GEMM
added the bfloat16 extensions BGEMM and BGEMV
added a BLAS interface for the ?GEMM_BATCH extensions
added the BLAS extensions ?GEMM_BATCH_STRIDED and their CBLAS interface
added the basic infrastructure for half-precision float (FP16) format using SH prefix
reimplemented the LAPACK SLAED3/DLAED3 function using multithreading, thereby improving the performance of the SSYEVD/DSYEVD eigensolver for symmetric matrices on all platforms
limited the number of retries for initial memory allocation to avoid infinite hanging on low-memory systems
fixed a thread lockup situation encountered with python 3.9 or older and numpy
introduced a problem size threshold for multithreading in STRMV/DTRMV
introduced a problem size threshold for multithreading in CHER/CHER2/CHPR/CHPR2 and ZHER/ZHER2/ZHPR/ZHPR2
improved the problem size thresholds for multithreading in SGER/DGER
improved autodetection of the Fortran compiler
fixed passing of the INTERFACE64=1 option to the flang-new compiler
fixed a potential deadlock in multithreaded code after calling fork()
fixed builds using CMake on FreeBSD
fixed builds using CMake from within Cygwin on Windows
fixed builds using CMake and the NVHPC compiler on ARM64
fixed CMake build error from misdetecting compiler or OpenMP versions
improved contents of the CMake-generated OpenBLASConfig.cmake file
added support for cross-compilation to RISCV targets via CMake
fixed cross-compilation to x86 targets from non-x86 architectures
fixed failure to install cblas.h if NO_CBLAS=0 was specified
fixed missing user-defined pre- and postfixes on functions in lapack.h,lapacke.h
included fixes from the Reference-LAPACK project:
fix ordering bug in ?LAED/?LASD (Reference-LAPACK PR 1140)
revert changes in ?GEEV from PR 1129 (Reference-LAPACK PR 1142)
fix workspace allocation in LAPACKE_?TRSEN (Reference-LAPACK PR 1144)

riscv:

added optimized SBGEMM kernels for ZVL128B and ZVL256B targets
added optimized SHGEMM kernels for ZVL128B and ZVL256B targets
added optimized SBGEMV and SHGEMV kernels for ZVL128B/ZVL256B
improved performance of the GEMV kernel for ZVL256B
improved the performance of the CROT and ZROT kernels for ZVL128B and x280
improved the detection of RVV1.0 capability
improved performance of the matrix packing helper functions for ZVL128B and ZVL256B
improved performance of OMATCOPY for ZVL128B and ZVL256B

arm:

fixed spurious executable stack in the getarch utility

arm64:

fixed spurious executable stack in the getarch utility
fixed compiler warnings arising from the timer macro RPCC
fixed cache size detection for Qualcomm Oryon under Windows on Arm
fixed argument handling in the default SVE kernel for SDOT/DDOT
building the BFLOAT16 kernels is now enabled by default
improved the overall performance of GEMM,SYMM and HEMM on A64FX
improved the performance of SDOT/DDOT on A64FX
improved the multithreading performance of SDOT/DDOT on A64FX by introduction of a throttling table matching thread count to problem size
improved the performance of SGER/DGER on A64FX and NEOVERSEV1
improved the multithreading performance of GEMM on A64FX and NEOVERSEV1
improved the performance of the GEMV kernel for SVE-capable targets
improved the multithreading performance of SGEMM on NEOVERSEV1 and V2
added optimized SAXPY/DAXPY SVE kernels for A64FX and NEOVERSEV1
added optimized BGEMM and BGEMV kernels for NEOVERSEV1
added an optimized BGEMM kernel for NEOVERSEN2
added support for the NEOVERSEV2 cpu
added dedicated support for the Apple M4 cpu as VORTEXM4
added optimized SGEMM/SSYMM/STRMM/SSYRK/SSYR2K for SME-capable targets (ARMV9SME and VORTEXM4)
improved the precision of the SNRM2 kernel
added cpu autodetection and compiler settings for Ampere One processors
fixed cpu autodetection for Apple M systems running Linux
fixed building on MacOS with AppleClang,gfortran and xcode v16 or newer
fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows

power:

added initial support for the POWER11 architecture
improved performance of DGEMM and DGEMV on POWER10
fixed the default compiler flags to use "-O3" instead of the possibly unsafe "-Ofast"
fixed building under MacOS (for old G4 Macs) with CMake
fixed potential miscompilation of DGEMV and other assembly kernels by gcc15.1
fixed compilation with recent versions of flang

loongarch64:

fixed warnings and potential inaccuracies arising from incorrect saving of registers
fixed enumeration of logical cores on big NUMA servers
fixed building with LLVM and the INTERFACE64=1 option

x86:

fixed building the GEMM3M kernels for the GENERIC target
fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows

x86_64:

added cpu autodetection for Intel Lunar Lake (Core Ultra 200V)
changed all ?MIN and ?MAX assembly kernels to use unaligned operations
fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows
fixed potential crashes in builds for Cooper Lake, Sapphire Rapids or Zen5 cpus under MS Windows

zarch:

added support for building with CMake

sparc:

fixed a potential crash in the DNRM2 kernel

general: - reverted a matrix partitioning optimization from 0.3.30 that could lead to race conditions and subsequent invalid results in GEMM - added the bfloat16 extensions BGEMM and BGEMV - added a BLAS interface for the ?GEMM_BATCH extensions - added the BLAS extensions ?GEMM_BATCH_STRIDED and their CBLAS interface - added the basic infrastructure for half-precision float (FP16) format using SH prefix - reimplemented the LAPACK SLAED3/DLAED3 function using multithreading, thereby improving the performance of the SSYEVD/DSYEVD eigensolver for symmetric matrices on all platforms - limited the number of retries for initial memory allocation to avoid infinite hanging on low-memory systems - fixed a thread lockup situation encountered with python 3.9 or older and numpy - introduced a problem size threshold for multithreading in STRMV/DTRMV - introduced a problem size threshold for multithreading in CHER/CHER2/CHPR/CHPR2 and ZHER/ZHER2/ZHPR/ZHPR2 - improved the problem size thresholds for multithreading in SGER/DGER - improved autodetection of the Fortran compiler - fixed passing of the INTERFACE64=1 option to the flang-new compiler - fixed a potential deadlock in multithreaded code after calling fork() - fixed builds using CMake on FreeBSD - fixed builds using CMake from within Cygwin on Windows - fixed builds using CMake and the NVHPC compiler on ARM64 - fixed CMake build error from misdetecting compiler or OpenMP versions - improved contents of the CMake-generated OpenBLASConfig.cmake file - added support for cross-compilation to RISCV targets via CMake - fixed cross-compilation to x86 targets from non-x86 architectures - fixed failure to install cblas.h if NO_CBLAS=0 was specified - fixed missing user-defined pre- and postfixes on functions in lapack.h,lapacke.h - included fixes from the Reference-LAPACK project: - fix ordering bug in ?LAED/?LASD (Reference-LAPACK PR 1140) - revert changes in ?GEEV from PR 1129 (Reference-LAPACK PR 1142) - fix workspace allocation in LAPACKE_?TRSEN (Reference-LAPACK PR 1144)

riscv: - added optimized SBGEMM kernels for ZVL128B and ZVL256B targets - added optimized SHGEMM kernels for ZVL128B and ZVL256B targets - added optimized SBGEMV and SHGEMV kernels for ZVL128B/ZVL256B - improved performance of the GEMV kernel for ZVL256B - improved the performance of the CROT and ZROT kernels for ZVL128B and x280 - improved the detection of RVV1.0 capability - improved performance of the matrix packing helper functions for ZVL128B and ZVL256B - improved performance of OMATCOPY for ZVL128B and ZVL256B

arm: - fixed spurious executable stack in the getarch utility

arm64: - fixed spurious executable stack in the getarch utility - fixed compiler warnings arising from the timer macro RPCC - fixed cache size detection for Qualcomm Oryon under Windows on Arm - fixed argument handling in the default SVE kernel for SDOT/DDOT - building the BFLOAT16 kernels is now enabled by default - improved the overall performance of GEMM,SYMM and HEMM on A64FX - improved the performance of SDOT/DDOT on A64FX - improved the multithreading performance of SDOT/DDOT on A64FX by introduction of a throttling table matching thread count to problem size - improved the performance of SGER/DGER on A64FX and NEOVERSEV1 - improved the multithreading performance of GEMM on A64FX and NEOVERSEV1 - improved the performance of the GEMV kernel for SVE-capable targets - improved the multithreading performance of SGEMM on NEOVERSEV1 and V2 - added optimized SAXPY/DAXPY SVE kernels for A64FX and NEOVERSEV1 - added optimized BGEMM and BGEMV kernels for NEOVERSEV1 - added an optimized BGEMM kernel for NEOVERSEN2 - added support for the NEOVERSEV2 cpu - added dedicated support for the Apple M4 cpu as VORTEXM4 - added optimized SGEMM/SSYMM/STRMM/SSYRK/SSYR2K for SME-capable targets (ARMV9SME and VORTEXM4) - improved the precision of the SNRM2 kernel - added cpu autodetection and compiler settings for Ampere One processors - fixed cpu autodetection for Apple M systems running Linux - fixed building on MacOS with AppleClang,gfortran and xcode v16 or newer - fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows

power: - added initial support for the POWER11 architecture - improved performance of DGEMM and DGEMV on POWER10 - fixed the default compiler flags to use "-O3" instead of the possibly unsafe "-Ofast" - fixed building under MacOS (for old G4 Macs) with CMake - fixed potential miscompilation of DGEMV and other assembly kernels by gcc15.1 - fixed compilation with recent versions of flang

loongarch64: - fixed warnings and potential inaccuracies arising from incorrect saving of registers - fixed enumeration of logical cores on big NUMA servers - fixed building with LLVM and the INTERFACE64=1 option

x86: - fixed building the GEMM3M kernels for the GENERIC target - fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows

x86_64: - added cpu autodetection for Intel Lunar Lake (Core Ultra 200V) - changed all ?MIN and ?MAX assembly kernels to use unaligned operations - fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows - fixed potential crashes in builds for Cooper Lake, Sapphire Rapids or Zen5 cpus under MS Windows

zarch: - added support for building with CMake

sparc: - fixed a potential crash in the DNRM2 kernel

md5sums: 05050271d9196f65bc4ac3a89c6a3b05 OpenBLAS-0.3.31.tar.gz 5480a9052e083e7abc9a3298fbf9079b OpenBLAS-0.3.31.zip e9a72628979846f456ac04c440b0ede5 OpenBLAS-0.3.31-x86.zip c6d0e83e9a543386291ade73022dc249 OpenBLAS-0.3.31-x64.zip 437f0c0611f7a473d3bd38e1e25ec967 OpenBLAS-0.3.31-x64-64.zip a0c1f8b37fad9bd866dc924d3bc090a4 OpenBLAS-0.3.31-woa64-static.zip fb16c99278818db855c26a3c786c470f OpenBLAS-0.3.31-woa64-dll.zip 53d3bb3e234437d6d8e43d76840c0bd6 OpenBLAS-0.3.31-woa64-64-static.zip 27474c9090dca9ca8231d2ee0d966272 OpenBLAS-0.3.31-woa64-64-dll.zip

Source: README.md, updated 2026-01-15

OpenBLAS Files

OpenBLAS is an optimized BLAS library based on GotoBLAS2.

general:

riscv:

arm:

arm64:

power:

loongarch64:

x86:

x86_64:

zarch:

sparc:

OpenBLAS Files

OpenBLAS is an optimized BLAS library based on GotoBLAS2.

Get an email when there's a new version of OpenBLAS

general:

riscv:

arm:

arm64:

power:

loongarch64:

x86:

x86_64:

zarch:

sparc: