Download Latest Version llama-b8661-bin-ubuntu-openvino-2026.0-x64.tar.gz (77.1 MB)
Email in envelope

Get an email when there's a new version of llama.cpp

Home / b8658
Name Modified Size InfoDownloads / Week
Parent folder
llama-b8658-xcframework.zip < 19 hours ago 175.8 MB
llama-b8658-bin-win-vulkan-x64.zip < 19 hours ago 56.7 MB
llama-b8658-bin-win-sycl-x64.zip < 19 hours ago 135.5 MB
llama-b8658-bin-win-opencl-adreno-arm64.zip < 19 hours ago 33.4 MB
llama-b8658-bin-win-hip-radeon-x64.zip < 19 hours ago 360.4 MB
llama-b8658-bin-win-cuda-13.1-x64.zip < 19 hours ago 168.0 MB
llama-b8658-bin-win-cuda-12.4-x64.zip < 19 hours ago 249.6 MB
llama-b8658-bin-win-cpu-x64.zip < 19 hours ago 39.5 MB
llama-b8658-bin-win-cpu-arm64.zip < 20 hours ago 32.2 MB
llama-b8658-bin-ubuntu-x64.tar.gz < 20 hours ago 31.7 MB
llama-b8658-bin-ubuntu-vulkan-x64.tar.gz < 20 hours ago 48.8 MB
llama-b8658-bin-ubuntu-vulkan-arm64.tar.gz < 20 hours ago 41.0 MB
llama-b8658-bin-ubuntu-s390x.tar.gz < 20 hours ago 35.1 MB
llama-b8658-bin-ubuntu-rocm-7.2-x64.tar.gz < 20 hours ago 167.9 MB
llama-b8658-bin-ubuntu-openvino-2026.0-x64.tar.gz < 20 hours ago 77.1 MB
llama-b8658-bin-ubuntu-arm64.tar.gz < 20 hours ago 27.9 MB
llama-b8658-bin-macos-x64.tar.gz < 20 hours ago 104.5 MB
llama-b8658-bin-macos-arm64.tar.gz < 20 hours ago 40.3 MB
llama-b8658-bin-910b-openEuler-x86-aclgraph.tar.gz < 20 hours ago 72.6 MB
llama-b8658-bin-910b-openEuler-aarch64-aclgraph.tar.gz < 20 hours ago 65.0 MB
llama-b8658-bin-310p-openEuler-x86.tar.gz < 20 hours ago 72.7 MB
llama-b8658-bin-310p-openEuler-aarch64.tar.gz < 20 hours ago 64.9 MB
cudart-llama-bin-win-cuda-13.1-x64.zip < 20 hours ago 402.6 MB
cudart-llama-bin-win-cuda-12.4-x64.zip < 20 hours ago 391.4 MB
b8658 source code.tar.gz 2026-04-03 29.7 MB
b8658 source code.zip 2026-04-03 30.9 MB
README.md 2026-04-03 3.8 kB
Totals: 27 Items   3.0 GB 0
server: save and clear idle slots on new task (`--clear-idle`) (#20993) * server: clear idle slots KV from VRAM (LLAMA_KV_KEEP_ONLY_ACTIVE) * server: move idle slot KV clearing to slot release The save "cost" is now paid by the finishing request. * server: add --kv-clear-idle flag, enable by default * server: skip clearing last idle slot, clear on launch * server: test --no-kv-clear-idle flag * server: simplify on-release clearing loop * server: remove on-release KV clearing, keep launch-only * cont : clean-up * tests: update log strings after --clear-idle rename * tests: use debug tags instead of log message matching * test: fix Windows CI by dropping temp log file unlink --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

macOS/iOS:

Linux:

Windows:

openEuler:

Source: README.md, updated 2026-04-03