Download Latest Version llama-b9284-bin-ubuntu-openvino-2026.0-x64.tar.gz (12.8 MB)
Email in envelope

Get an email when there's a new version of llama.cpp

Home / b9279
Name Modified Size InfoDownloads / Week
Parent folder
llama-b9279-xcframework.zip < 19 hours ago 203.8 MB
llama-b9279-bin-win-vulkan-x64.zip < 19 hours ago 32.7 MB
llama-b9279-bin-win-sycl-x64.zip < 19 hours ago 111.6 MB
llama-b9279-bin-win-opencl-adreno-arm64.zip < 19 hours ago 10.1 MB
llama-b9279-bin-win-hip-radeon-x64.zip < 19 hours ago 319.6 MB
llama-b9279-bin-win-cuda-13.1-x64.zip < 19 hours ago 158.4 MB
llama-b9279-bin-win-cuda-12.4-x64.zip < 19 hours ago 259.9 MB
llama-b9279-bin-win-cpu-x64.zip < 19 hours ago 15.9 MB
llama-b9279-bin-win-cpu-arm64.zip < 19 hours ago 9.5 MB
llama-b9279-bin-ubuntu-x64.tar.gz < 19 hours ago 14.0 MB
llama-b9279-bin-ubuntu-vulkan-x64.tar.gz < 19 hours ago 31.5 MB
llama-b9279-bin-ubuntu-vulkan-arm64.tar.gz < 19 hours ago 24.8 MB
llama-b9279-bin-ubuntu-sycl-fp32-x64.tar.gz < 19 hours ago 44.7 MB
llama-b9279-bin-ubuntu-sycl-fp16-x64.tar.gz < 19 hours ago 44.8 MB
llama-b9279-bin-ubuntu-s390x.tar.gz < 19 hours ago 12.5 MB
llama-b9279-bin-ubuntu-rocm-7.2-x64.tar.gz < 19 hours ago 129.6 MB
llama-b9279-bin-ubuntu-openvino-2026.0-x64.tar.gz < 19 hours ago 12.4 MB
llama-b9279-bin-ubuntu-arm64.tar.gz < 19 hours ago 11.1 MB
llama-b9279-bin-macos-x64.tar.gz < 19 hours ago 8.5 MB
llama-b9279-bin-macos-arm64.tar.gz < 19 hours ago 8.5 MB
llama-b9279-bin-macos-arm64-kleidiai.tar.gz < 19 hours ago 8.5 MB
llama-b9279-bin-android-arm64.tar.gz < 19 hours ago 65.2 MB
llama-b9279-bin-910b-openEuler-x86-aclgraph.tar.gz < 19 hours ago 11.7 MB
llama-b9279-bin-910b-openEuler-aarch64-aclgraph.tar.gz < 19 hours ago 11.0 MB
llama-b9279-bin-310p-openEuler-x86.tar.gz < 19 hours ago 11.8 MB
llama-b9279-bin-310p-openEuler-aarch64.tar.gz < 19 hours ago 11.0 MB
cudart-llama-bin-win-cuda-13.1-x64.zip < 19 hours ago 402.6 MB
cudart-llama-bin-win-cuda-12.4-x64.zip < 19 hours ago 391.4 MB
b9279 source code.tar.gz 2026-05-21 33.9 MB
b9279 source code.zip 2026-05-21 35.3 MB
README.md 2026-05-21 5.7 kB
Totals: 31 Items   2.4 GB 0
vulkan: fuse snake activation (mul, sin, sqr, mul, add) (#22855) * vulkan: fuse snake activation (mul, sin, sqr, mul, add) Add snake.comp shader with F32 / F16 / BF16 pipelines and ggml_vk_snake_dispatch_fused. The matcher recognizes the naive 5 op decomposition emitted by audio decoders (BigVGAN, Vocos) for snake activation y = x + sin(a*x)^2 * inv_b and rewrites it to a single elementwise kernel. test_snake_fuse from the CUDA PR now also compares CPU naive vs Vulkan fused across F32 / F16 / BF16. * vulkan: address jeffbolznv review for fused snake activation Rename T / C to ne0 / ne1 in the shader and push constants to match the standard naming convention used across the Vulkan backend. Tighten ggml_vk_can_fuse_snake: require x and dst to be contiguous (the shader uses idx = i0 + i1 * ne0) and require a / inv_b to be tightly packed on the broadcast dim (the shader reads data_a[i1]). * vulkan: tighten snake fusion type checks for all operands (address jeffbolznv review) * vulkan: reject snake fusion when ne[2] or ne[3] > 1 (address jeffbolznv review) * vulkan: address 0cc4m review for fused snake activation snake.comp is renamed to follow the ggml DATA_A_* / A_TYPE convention. A_TYPE now applies to the activation tensor data_a instead of the broadcast multiplier, and the bindings become data_a (A_TYPE), data_b (float), data_c (float) and data_d (D_TYPE). A header at the top of the shader maps each buffer to its role in y = x + sin(b * x)^2 * c. On the C++ side, ggml_vk_can_fuse_snake reuses the existing snake_pattern constant instead of duplicating the op list, sin_node is extracted as a named local alongside the other chain nodes, and the broadcast operands a and inv_b are now required to be GGML_TYPE_F32 to match the hardcoded float bindings on data_b and data_c (the previous a->type == x->type would silently reject any future BF16 or F16 chain once the supports_op gate for SIN / SQR is lifted). ggml_vk_snake_dispatch_fused gets an explicit GGML_TYPE_F32 case and GGML_ABORT on default in place of the silent f32 fallback, and a stale comment about data_a[i1] / data_inv_b[i1] is refreshed to match the new binding names.

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Source: README.md, updated 2026-05-21