Download Latest Version llama-b8641-bin-ubuntu-openvino-2026.0-x64.tar.gz (77.0 MB)
Email in envelope

Get an email when there's a new version of llama.cpp

Home / b8639
Name Modified Size InfoDownloads / Week
Parent folder
llama-b8639-xcframework.zip < 11 hours ago 175.8 MB
llama-b8639-bin-win-vulkan-x64.zip < 11 hours ago 56.3 MB
llama-b8639-bin-win-sycl-x64.zip < 11 hours ago 135.1 MB
llama-b8639-bin-win-opencl-adreno-arm64.zip < 11 hours ago 33.1 MB
llama-b8639-bin-win-hip-radeon-x64.zip < 11 hours ago 360.1 MB
llama-b8639-bin-win-cuda-13.1-x64.zip < 11 hours ago 167.7 MB
llama-b8639-bin-win-cuda-12.4-x64.zip < 11 hours ago 249.2 MB
llama-b8639-bin-win-cpu-x64.zip < 11 hours ago 39.1 MB
llama-b8639-bin-win-cpu-arm64.zip < 11 hours ago 31.9 MB
llama-b8639-bin-ubuntu-x64.tar.gz < 11 hours ago 31.6 MB
llama-b8639-bin-ubuntu-vulkan-x64.tar.gz < 11 hours ago 48.7 MB
llama-b8639-bin-ubuntu-vulkan-arm64.tar.gz < 11 hours ago 40.9 MB
llama-b8639-bin-ubuntu-s390x.tar.gz < 11 hours ago 35.4 MB
llama-b8639-bin-ubuntu-rocm-7.2-x64.tar.gz < 11 hours ago 159.2 MB
llama-b8639-bin-ubuntu-openvino-2026.0-x64.tar.gz < 11 hours ago 76.2 MB
llama-b8639-bin-ubuntu-arm64.tar.gz < 11 hours ago 27.8 MB
llama-b8639-bin-macos-x64.tar.gz < 11 hours ago 102.3 MB
llama-b8639-bin-macos-arm64.tar.gz < 11 hours ago 40.2 MB
llama-b8639-bin-910b-openEuler-x86-aclgraph.tar.gz < 11 hours ago 71.8 MB
llama-b8639-bin-910b-openEuler-aarch64-aclgraph.tar.gz < 11 hours ago 64.3 MB
llama-b8639-bin-310p-openEuler-x86.tar.gz < 11 hours ago 71.8 MB
llama-b8639-bin-310p-openEuler-aarch64.tar.gz < 11 hours ago 64.2 MB
cudart-llama-bin-win-cuda-13.1-x64.zip < 11 hours ago 402.6 MB
cudart-llama-bin-win-cuda-12.4-x64.zip < 11 hours ago 391.4 MB
b8639 source code.tar.gz < 13 hours ago 29.6 MB
b8639 source code.zip < 13 hours ago 30.9 MB
README.md < 13 hours ago 4.5 kB
Totals: 27 Items   2.9 GB 0
ggml-webgpu: add vectorized flash attention (#20709) * naive vectorized version * add vectorized flash attention * update vec version * remove unused path and shader * remove unused helper functions * add comments * remove pad path * ggml-webgpu: fix flash-attn vec nwg=1 path and tighten vec specialization * change back to vec4 * enable multi split * enable vec path when: - Q->ne[1] < 20 - Q->ne[0] % 32 == 0 - V->ne[0] % 4 == 0 - K->type == f16 * update flast_attn_vec_split.wgsl to reduce redundant workgroup barrier usage and use select * enable vec path for q4 and q8 * flash-attn vec nwg=1 fast path (skip tmp/reduce staging) * use packed f16 K loads in flash-attn vec split * use packed f16 K loads in flash-attn vec split on host side * tune flash-attn vec f16 VEC_NE by head dim * cleanup * cleanup * keep host side clean * cleanup host side * change back to original host wait/submit behavior * formatting * reverted param-buffer pool r ecfactor * add helper functions * ggml-webgpu: move flash-attn vec pipeline caching back into shader lib * ggml-webgpu: remove duplicate functions * ggml-webgpu: reserve flash-attn vec scratch in dst buffer allocation * ggml-webgpu: revert unrelated change * ggml-webgpu: revert deleted comment * disable uniformity check * remove unnecessary change * Update ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_split.wgsl * Update ggml/src/ggml-webgpu/ggml-webgpu.cpp --------- Co-authored-by: Reese Levine <reeselevine1@gmail.com>

macOS/iOS:

Linux:

Windows:

openEuler:

Source: README.md, updated 2026-04-02