Download Latest Version llama-b8144-bin-910b-openEuler-x86-aclgraph.tar.gz (62.4 MB)
Email in envelope

Get an email when there's a new version of llama.cpp

Home / b8140
Name Modified Size InfoDownloads / Week
Parent folder
llama-b8140-xcframework.zip < 11 hours ago 168.6 MB
llama-b8140-bin-win-vulkan-x64.zip < 11 hours ago 47.9 MB
llama-b8140-bin-win-sycl-x64.zip < 11 hours ago 120.9 MB
llama-b8140-bin-win-opencl-adreno-arm64.zip < 11 hours ago 25.5 MB
llama-b8140-bin-win-hip-radeon-x64.zip < 11 hours ago 369.6 MB
llama-b8140-bin-win-cuda-13.1-x64.zip < 11 hours ago 148.7 MB
llama-b8140-bin-win-cuda-12.4-x64.zip < 11 hours ago 220.3 MB
llama-b8140-bin-win-cpu-x64.zip < 11 hours ago 31.3 MB
llama-b8140-bin-win-cpu-arm64.zip < 11 hours ago 24.7 MB
llama-b8140-bin-ubuntu-x64.tar.gz < 11 hours ago 24.9 MB
llama-b8140-bin-ubuntu-vulkan-x64.tar.gz < 11 hours ago 41.8 MB
llama-b8140-bin-ubuntu-s390x.tar.gz < 11 hours ago 25.8 MB
llama-b8140-bin-ubuntu-rocm-7.2-x64.tar.gz < 11 hours ago 137.4 MB
llama-b8140-bin-macos-x64.tar.gz < 11 hours ago 86.9 MB
llama-b8140-bin-macos-arm64.tar.gz < 11 hours ago 30.7 MB
llama-b8140-bin-910b-openEuler-x86-aclgraph.tar.gz < 11 hours ago 62.4 MB
llama-b8140-bin-910b-openEuler-aarch64-aclgraph.tar.gz < 11 hours ago 56.3 MB
llama-b8140-bin-310p-openEuler-x86.tar.gz < 11 hours ago 62.4 MB
llama-b8140-bin-310p-openEuler-aarch64.tar.gz < 11 hours ago 56.3 MB
cudart-llama-bin-win-cuda-13.1-x64.zip < 11 hours ago 402.6 MB
cudart-llama-bin-win-cuda-12.4-x64.zip < 11 hours ago 391.4 MB
b8140 source code.tar.gz < 13 hours ago 29.1 MB
b8140 source code.zip < 13 hours ago 30.1 MB
README.md < 13 hours ago 3.8 kB
Totals: 24 Items   2.6 GB 0
hexagon refactor all Ops to use local context struct (#19819) * hexagon: refactor set/get/sum-rows ops to use local context * hexagon: refactor ROPE and Softmax Ops to use local context Improves performance a bit by precomputing things and saving in the context. * hexagon: refactor activation ops to use local context struct * hexagon: refactor unary ops to use local context struct and DMA/VTCM * hexagon: use aligned hvx_scale function * hexagon: remove unused fields from op_context * hexagon: rewrite ROPE to use DMA and VTCM scratchpad * hex-rope: keep N rows in scratchpad (instead of just two) * hex-rope: introduce rowidx cache * hex-rope: remove unused fields * hex-rope: rewrite dma prefetch logic to allow for multi-row fetch/compute also removes the need for fastdiv. * hex-rope: minor formatting * hex-rope: use indices and unroll the loops * hex-rope: more updates to cleanup rope-block handling * hexagon: cleanup supported type/dims checks * hexagon: all reduce funcs replicated across lanes There is no need to explicitly replicate the first value. * snapdragon: update adb and windows scripts to use ubatch-size 256 Updated Ops support handles larger ubatches.

macOS/iOS:

Linux:

Windows:

openEuler:

Source: README.md, updated 2026-02-24