Download Latest Version llama-b9161-bin-ubuntu-openvino-2026.0-x64.tar.gz (12.3 MB)
Email in envelope

Get an email when there's a new version of llama.cpp

Home / b9158
Name Modified Size InfoDownloads / Week
Parent folder
llama-b9158-xcframework.zip < 14 hours ago 202.6 MB
llama-b9158-bin-win-vulkan-x64.zip < 14 hours ago 32.6 MB
llama-b9158-bin-win-sycl-x64.zip < 14 hours ago 111.6 MB
llama-b9158-bin-win-opencl-adreno-arm64.zip < 14 hours ago 10.1 MB
llama-b9158-bin-win-hip-radeon-x64.zip < 14 hours ago 319.4 MB
llama-b9158-bin-win-cuda-13.1-x64.zip < 14 hours ago 137.1 MB
llama-b9158-bin-win-cuda-12.4-x64.zip < 14 hours ago 218.3 MB
llama-b9158-bin-win-cpu-x64.zip < 14 hours ago 15.9 MB
llama-b9158-bin-win-cpu-arm64.zip < 14 hours ago 9.5 MB
llama-b9158-bin-ubuntu-x64.tar.gz < 14 hours ago 14.0 MB
llama-b9158-bin-ubuntu-vulkan-x64.tar.gz < 14 hours ago 31.5 MB
llama-b9158-bin-ubuntu-vulkan-arm64.tar.gz < 14 hours ago 24.8 MB
llama-b9158-bin-ubuntu-sycl-fp32-x64.tar.gz < 14 hours ago 44.3 MB
llama-b9158-bin-ubuntu-sycl-fp16-x64.tar.gz < 14 hours ago 44.5 MB
llama-b9158-bin-ubuntu-s390x.tar.gz < 14 hours ago 12.5 MB
llama-b9158-bin-ubuntu-rocm-7.2-x64.tar.gz < 14 hours ago 129.3 MB
llama-b9158-bin-ubuntu-openvino-2026.0-x64.tar.gz < 14 hours ago 12.3 MB
llama-b9158-bin-ubuntu-arm64.tar.gz < 14 hours ago 11.0 MB
llama-b9158-bin-macos-x64.tar.gz < 14 hours ago 8.5 MB
llama-b9158-bin-macos-arm64.tar.gz < 14 hours ago 8.5 MB
llama-b9158-bin-macos-arm64-kleidiai.tar.gz < 14 hours ago 8.5 MB
llama-b9158-bin-android-arm64.tar.gz < 14 hours ago 65.2 MB
llama-b9158-bin-910b-openEuler-x86-aclgraph.tar.gz < 14 hours ago 11.6 MB
llama-b9158-bin-910b-openEuler-aarch64-aclgraph.tar.gz < 14 hours ago 10.9 MB
llama-b9158-bin-310p-openEuler-x86.tar.gz < 14 hours ago 11.6 MB
llama-b9158-bin-310p-openEuler-aarch64.tar.gz < 14 hours ago 10.9 MB
cudart-llama-bin-win-cuda-13.1-x64.zip < 14 hours ago 402.6 MB
cudart-llama-bin-win-cuda-12.4-x64.zip < 14 hours ago 391.4 MB
b9158 source code.tar.gz < 17 hours ago 33.8 MB
b9158 source code.zip < 17 hours ago 35.2 MB
README.md < 17 hours ago 4.5 kB
Totals: 31 Items   2.4 GB 0
HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (#22880) Adds RDNA3 support to the CUDA mma FA kernel. To make the RDNA3 tensor cores work with the FP16 accumulation for VKQ the tiles they need to be 32 logical units long in direction of the attention head; for head sizes 80 and 112 that are not exactly divided by 32 the regular length of 16 with FP32 accumulation is used instead. The longer tiles also enable more efficient transposition for a warp size of 32 which is why it's also used for RDNA4. However, this scrambles the data layout of the accumulators along the attention head dimension. To prevent accidental misuse I added another entry to ggml_cuda_mma::data_layout. I also tuned the kernel parameters for RDNA3, RDNA4, and CDNA1 in general, during which I discovered that the kernel can be made to work for head sizes up to 256 for CDNA. For RDNA3/4 I was not able to get better performance that the tile kernel for head sizes > 128.

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Source: README.md, updated 2026-05-14