Download Latest Version OpenBLAS-0.3.31-woa64-64-dll.zip (5.0 MB)
Email in envelope

Get an email when there's a new version of OpenBLAS

Home / v0.3.31
Name Modified Size InfoDownloads / Week
Parent folder
OpenBLAS-0.3.31-woa64-64-dll.zip 2026-01-17 5.0 MB
OpenBLAS-0.3.31-woa64-64-static.zip 2026-01-17 8.0 MB
OpenBLAS-0.3.31-woa64-dll.zip 2026-01-17 5.3 MB
OpenBLAS-0.3.31-woa64-static.zip 2026-01-17 8.2 MB
OpenBLAS-0.3.31-x64-64.zip 2026-01-16 40.0 MB
OpenBLAS-0.3.31-x64.zip 2026-01-16 40.4 MB
OpenBLAS-0.3.31-x86.zip 2026-01-16 22.1 MB
OpenBLAS-0.3.31.tar.gz 2026-01-15 25.2 MB
OpenBLAS-0.3.31.zip 2026-01-15 43.2 MB
OpenBLAS 0.3.31 version source code.tar.gz 2026-01-15 25.2 MB
OpenBLAS 0.3.31 version source code.zip 2026-01-15 43.5 MB
README.md 2026-01-15 12.7 kB
Totals: 12 Items   266.3 MB 101

general:

  • reverted a matrix partitioning optimization from 0.3.30 that could lead to race conditions and subsequent invalid results in GEMM
  • added the bfloat16 extensions BGEMM and BGEMV
  • added a BLAS interface for the ?GEMM_BATCH extensions
  • added the BLAS extensions ?GEMM_BATCH_STRIDED and their CBLAS interface
  • added the basic infrastructure for half-precision float (FP16) format using SH prefix
  • reimplemented the LAPACK SLAED3/DLAED3 function using multithreading, thereby improving the performance of the SSYEVD/DSYEVD eigensolver for symmetric matrices on all platforms
  • limited the number of retries for initial memory allocation to avoid infinite hanging on low-memory systems
  • fixed a thread lockup situation encountered with python 3.9 or older and numpy
  • introduced a problem size threshold for multithreading in STRMV/DTRMV
  • introduced a problem size threshold for multithreading in CHER/CHER2/CHPR/CHPR2 and ZHER/ZHER2/ZHPR/ZHPR2
  • improved the problem size thresholds for multithreading in SGER/DGER
  • improved autodetection of the Fortran compiler
  • fixed passing of the INTERFACE64=1 option to the flang-new compiler
  • fixed a potential deadlock in multithreaded code after calling fork()
  • fixed builds using CMake on FreeBSD
  • fixed builds using CMake from within Cygwin on Windows
  • fixed builds using CMake and the NVHPC compiler on ARM64
  • fixed CMake build error from misdetecting compiler or OpenMP versions
  • improved contents of the CMake-generated OpenBLASConfig.cmake file
  • added support for cross-compilation to RISCV targets via CMake
  • fixed cross-compilation to x86 targets from non-x86 architectures
  • fixed failure to install cblas.h if NO_CBLAS=0 was specified
  • fixed missing user-defined pre- and postfixes on functions in lapack.h,lapacke.h
  • included fixes from the Reference-LAPACK project:
  • fix ordering bug in ?LAED/?LASD (Reference-LAPACK PR 1140)
  • revert changes in ?GEEV from PR 1129 (Reference-LAPACK PR 1142)
  • fix workspace allocation in LAPACKE_?TRSEN (Reference-LAPACK PR 1144)

riscv:

  • added optimized SBGEMM kernels for ZVL128B and ZVL256B targets
  • added optimized SHGEMM kernels for ZVL128B and ZVL256B targets
  • added optimized SBGEMV and SHGEMV kernels for ZVL128B/ZVL256B
  • improved performance of the GEMV kernel for ZVL256B
  • improved the performance of the CROT and ZROT kernels for ZVL128B and x280
  • improved the detection of RVV1.0 capability
  • improved performance of the matrix packing helper functions for ZVL128B and ZVL256B
  • improved performance of OMATCOPY for ZVL128B and ZVL256B

arm:

  • fixed spurious executable stack in the getarch utility

arm64:

  • fixed spurious executable stack in the getarch utility
  • fixed compiler warnings arising from the timer macro RPCC
  • fixed cache size detection for Qualcomm Oryon under Windows on Arm
  • fixed argument handling in the default SVE kernel for SDOT/DDOT
  • building the BFLOAT16 kernels is now enabled by default
  • improved the overall performance of GEMM,SYMM and HEMM on A64FX
  • improved the performance of SDOT/DDOT on A64FX
  • improved the multithreading performance of SDOT/DDOT on A64FX by introduction of a throttling table matching thread count to problem size
  • improved the performance of SGER/DGER on A64FX and NEOVERSEV1
  • improved the multithreading performance of GEMM on A64FX and NEOVERSEV1
  • improved the performance of the GEMV kernel for SVE-capable targets
  • improved the multithreading performance of SGEMM on NEOVERSEV1 and V2
  • added optimized SAXPY/DAXPY SVE kernels for A64FX and NEOVERSEV1
  • added optimized BGEMM and BGEMV kernels for NEOVERSEV1
  • added an optimized BGEMM kernel for NEOVERSEN2
  • added support for the NEOVERSEV2 cpu
  • added dedicated support for the Apple M4 cpu as VORTEXM4
  • added optimized SGEMM/SSYMM/STRMM/SSYRK/SSYR2K for SME-capable targets (ARMV9SME and VORTEXM4)
  • improved the precision of the SNRM2 kernel
  • added cpu autodetection and compiler settings for Ampere One processors
  • fixed cpu autodetection for Apple M systems running Linux
  • fixed building on MacOS with AppleClang,gfortran and xcode v16 or newer
  • fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows

power:

  • added initial support for the POWER11 architecture
  • improved performance of DGEMM and DGEMV on POWER10
  • fixed the default compiler flags to use "-O3" instead of the possibly unsafe "-Ofast"
  • fixed building under MacOS (for old G4 Macs) with CMake
  • fixed potential miscompilation of DGEMV and other assembly kernels by gcc15.1
  • fixed compilation with recent versions of flang

loongarch64:

  • fixed warnings and potential inaccuracies arising from incorrect saving of registers
  • fixed enumeration of logical cores on big NUMA servers
  • fixed building with LLVM and the INTERFACE64=1 option

x86:

  • fixed building the GEMM3M kernels for the GENERIC target
  • fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows

x86_64:

  • added cpu autodetection for Intel Lunar Lake (Core Ultra 200V)
  • changed all ?MIN and ?MAX assembly kernels to use unaligned operations
  • fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows
  • fixed potential crashes in builds for Cooper Lake, Sapphire Rapids or Zen5 cpus under MS Windows

zarch:

  • added support for building with CMake

sparc:

  • fixed a potential crash in the DNRM2 kernel

general: - reverted a matrix partitioning optimization from 0.3.30 that could lead to race conditions and subsequent invalid results in GEMM - added the bfloat16 extensions BGEMM and BGEMV - added a BLAS interface for the ?GEMM_BATCH extensions - added the BLAS extensions ?GEMM_BATCH_STRIDED and their CBLAS interface - added the basic infrastructure for half-precision float (FP16) format using SH prefix - reimplemented the LAPACK SLAED3/DLAED3 function using multithreading, thereby improving the performance of the SSYEVD/DSYEVD eigensolver for symmetric matrices on all platforms - limited the number of retries for initial memory allocation to avoid infinite hanging on low-memory systems - fixed a thread lockup situation encountered with python 3.9 or older and numpy - introduced a problem size threshold for multithreading in STRMV/DTRMV - introduced a problem size threshold for multithreading in CHER/CHER2/CHPR/CHPR2 and ZHER/ZHER2/ZHPR/ZHPR2 - improved the problem size thresholds for multithreading in SGER/DGER - improved autodetection of the Fortran compiler - fixed passing of the INTERFACE64=1 option to the flang-new compiler - fixed a potential deadlock in multithreaded code after calling fork() - fixed builds using CMake on FreeBSD - fixed builds using CMake from within Cygwin on Windows - fixed builds using CMake and the NVHPC compiler on ARM64 - fixed CMake build error from misdetecting compiler or OpenMP versions - improved contents of the CMake-generated OpenBLASConfig.cmake file - added support for cross-compilation to RISCV targets via CMake - fixed cross-compilation to x86 targets from non-x86 architectures - fixed failure to install cblas.h if NO_CBLAS=0 was specified - fixed missing user-defined pre- and postfixes on functions in lapack.h,lapacke.h - included fixes from the Reference-LAPACK project: - fix ordering bug in ?LAED/?LASD (Reference-LAPACK PR 1140) - revert changes in ?GEEV from PR 1129 (Reference-LAPACK PR 1142) - fix workspace allocation in LAPACKE_?TRSEN (Reference-LAPACK PR 1144)

riscv: - added optimized SBGEMM kernels for ZVL128B and ZVL256B targets - added optimized SHGEMM kernels for ZVL128B and ZVL256B targets - added optimized SBGEMV and SHGEMV kernels for ZVL128B/ZVL256B - improved performance of the GEMV kernel for ZVL256B - improved the performance of the CROT and ZROT kernels for ZVL128B and x280 - improved the detection of RVV1.0 capability - improved performance of the matrix packing helper functions for ZVL128B and ZVL256B - improved performance of OMATCOPY for ZVL128B and ZVL256B

arm: - fixed spurious executable stack in the getarch utility

arm64: - fixed spurious executable stack in the getarch utility - fixed compiler warnings arising from the timer macro RPCC - fixed cache size detection for Qualcomm Oryon under Windows on Arm - fixed argument handling in the default SVE kernel for SDOT/DDOT - building the BFLOAT16 kernels is now enabled by default - improved the overall performance of GEMM,SYMM and HEMM on A64FX - improved the performance of SDOT/DDOT on A64FX - improved the multithreading performance of SDOT/DDOT on A64FX by introduction of a throttling table matching thread count to problem size - improved the performance of SGER/DGER on A64FX and NEOVERSEV1 - improved the multithreading performance of GEMM on A64FX and NEOVERSEV1 - improved the performance of the GEMV kernel for SVE-capable targets - improved the multithreading performance of SGEMM on NEOVERSEV1 and V2 - added optimized SAXPY/DAXPY SVE kernels for A64FX and NEOVERSEV1 - added optimized BGEMM and BGEMV kernels for NEOVERSEV1 - added an optimized BGEMM kernel for NEOVERSEN2 - added support for the NEOVERSEV2 cpu - added dedicated support for the Apple M4 cpu as VORTEXM4 - added optimized SGEMM/SSYMM/STRMM/SSYRK/SSYR2K for SME-capable targets (ARMV9SME and VORTEXM4) - improved the precision of the SNRM2 kernel - added cpu autodetection and compiler settings for Ampere One processors - fixed cpu autodetection for Apple M systems running Linux - fixed building on MacOS with AppleClang,gfortran and xcode v16 or newer - fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows

power: - added initial support for the POWER11 architecture - improved performance of DGEMM and DGEMV on POWER10 - fixed the default compiler flags to use "-O3" instead of the possibly unsafe "-Ofast" - fixed building under MacOS (for old G4 Macs) with CMake - fixed potential miscompilation of DGEMV and other assembly kernels by gcc15.1 - fixed compilation with recent versions of flang

loongarch64: - fixed warnings and potential inaccuracies arising from incorrect saving of registers - fixed enumeration of logical cores on big NUMA servers - fixed building with LLVM and the INTERFACE64=1 option

x86: - fixed building the GEMM3M kernels for the GENERIC target - fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows

x86_64: - added cpu autodetection for Intel Lunar Lake (Core Ultra 200V) - changed all ?MIN and ?MAX assembly kernels to use unaligned operations - fixed several errors in the C code replacements for the complex and double precision complex LAPACK functions that get used (only) when compiling with Microsoft C and NOFORTRAN=1 under MS Windows - fixed potential crashes in builds for Cooper Lake, Sapphire Rapids or Zen5 cpus under MS Windows

zarch: - added support for building with CMake

sparc: - fixed a potential crash in the DNRM2 kernel

md5sums: 05050271d9196f65bc4ac3a89c6a3b05 OpenBLAS-0.3.31.tar.gz 5480a9052e083e7abc9a3298fbf9079b OpenBLAS-0.3.31.zip e9a72628979846f456ac04c440b0ede5 OpenBLAS-0.3.31-x86.zip c6d0e83e9a543386291ade73022dc249 OpenBLAS-0.3.31-x64.zip 437f0c0611f7a473d3bd38e1e25ec967 OpenBLAS-0.3.31-x64-64.zip a0c1f8b37fad9bd866dc924d3bc090a4 OpenBLAS-0.3.31-woa64-static.zip fb16c99278818db855c26a3c786c470f OpenBLAS-0.3.31-woa64-dll.zip 53d3bb3e234437d6d8e43d76840c0bd6 OpenBLAS-0.3.31-woa64-64-static.zip 27474c9090dca9ca8231d2ee0d966272 OpenBLAS-0.3.31-woa64-64-dll.zip

Download OpenBLAS

Source: README.md, updated 2026-01-15