llama.cpp - Browse /b9158 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
llama-b9158-xcframework.zip	< 14 hours ago	202.6 MB	0
llama-b9158-bin-win-vulkan-x64.zip	< 14 hours ago	32.6 MB	0
llama-b9158-bin-win-sycl-x64.zip	< 14 hours ago	111.6 MB	0
llama-b9158-bin-win-opencl-adreno-arm64.zip	< 14 hours ago	10.1 MB	0
llama-b9158-bin-win-hip-radeon-x64.zip	< 14 hours ago	319.4 MB	0
llama-b9158-bin-win-cuda-13.1-x64.zip	< 14 hours ago	137.1 MB	0
llama-b9158-bin-win-cuda-12.4-x64.zip	< 14 hours ago	218.3 MB	0
llama-b9158-bin-win-cpu-x64.zip	< 14 hours ago	15.9 MB	0
llama-b9158-bin-win-cpu-arm64.zip	< 14 hours ago	9.5 MB	0
llama-b9158-bin-ubuntu-x64.tar.gz	< 14 hours ago	14.0 MB	0
llama-b9158-bin-ubuntu-vulkan-x64.tar.gz	< 14 hours ago	31.5 MB	0
llama-b9158-bin-ubuntu-vulkan-arm64.tar.gz	< 14 hours ago	24.8 MB	0
llama-b9158-bin-ubuntu-sycl-fp32-x64.tar.gz	< 14 hours ago	44.3 MB	0
llama-b9158-bin-ubuntu-sycl-fp16-x64.tar.gz	< 14 hours ago	44.5 MB	0
llama-b9158-bin-ubuntu-s390x.tar.gz	< 14 hours ago	12.5 MB	0
llama-b9158-bin-ubuntu-rocm-7.2-x64.tar.gz	< 14 hours ago	129.3 MB	0
llama-b9158-bin-ubuntu-openvino-2026.0-x64.tar.gz	< 14 hours ago	12.3 MB	0
llama-b9158-bin-ubuntu-arm64.tar.gz	< 14 hours ago	11.0 MB	0
llama-b9158-bin-macos-x64.tar.gz	< 14 hours ago	8.5 MB	0
llama-b9158-bin-macos-arm64.tar.gz	< 14 hours ago	8.5 MB	0
llama-b9158-bin-macos-arm64-kleidiai.tar.gz	< 14 hours ago	8.5 MB	0
llama-b9158-bin-android-arm64.tar.gz	< 14 hours ago	65.2 MB	0
llama-b9158-bin-910b-openEuler-x86-aclgraph.tar.gz	< 14 hours ago	11.6 MB	0
llama-b9158-bin-910b-openEuler-aarch64-aclgraph.tar.gz	< 14 hours ago	10.9 MB	0
llama-b9158-bin-310p-openEuler-x86.tar.gz	< 14 hours ago	11.6 MB	0
llama-b9158-bin-310p-openEuler-aarch64.tar.gz	< 14 hours ago	10.9 MB	0
cudart-llama-bin-win-cuda-13.1-x64.zip	< 14 hours ago	402.6 MB	0
cudart-llama-bin-win-cuda-12.4-x64.zip	< 14 hours ago	391.4 MB	0
b9158 source code.tar.gz	< 17 hours ago	33.8 MB	0
b9158 source code.zip	< 17 hours ago	35.2 MB	0
README.md	< 17 hours ago	4.5 kB	0
Totals: 31 Items		2.4 GB	0

HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (#22880) Adds RDNA3 support to the CUDA mma FA kernel. To make the RDNA3 tensor cores work with the FP16 accumulation for VKQ the tiles they need to be 32 logical units long in direction of the attention head; for head sizes 80 and 112 that are not exactly divided by 32 the regular length of 16 with FP32 accumulation is used instead. The longer tiles also enable more efficient transposition for a warp size of 32 which is why it's also used for RDNA4. However, this scrambles the data layout of the accumulators along the attention head dimension. To prevent accidental misuse I added another entry to ggml_cuda_mma::data_layout. I also tuned the kernel parameters for RDNA3, RDNA4, and CDNA1 in general, during which I discovered that the kernel can be made to work for head sizes up to 256 for CDNA. For RDNA3/4 I was not able to get better performance that the tile kernel for head sizes > 128.

macOS/iOS:

Linux: