Download Latest Version llama-b7845-bin-910b-openEuler-x86-aclgraph.tar.gz (59.5 MB)
Email in envelope

Get an email when there's a new version of llama.cpp

Home / b7844
Name Modified Size InfoDownloads / Week
Parent folder
llama-b7844-xcframework.zip < 12 hours ago 174.2 MB
llama-b7844-bin-win-vulkan-x64.zip < 12 hours ago 46.7 MB
llama-b7844-bin-win-sycl-x64.zip < 12 hours ago 119.7 MB
llama-b7844-bin-win-opencl-adreno-arm64.zip < 12 hours ago 24.6 MB
llama-b7844-bin-win-hip-radeon-x64.zip < 12 hours ago 361.7 MB
llama-b7844-bin-win-cuda-13.1-x64.zip < 12 hours ago 145.3 MB
llama-b7844-bin-win-cuda-12.4-x64.zip < 12 hours ago 220.6 MB
llama-b7844-bin-win-cpu-x64.zip < 12 hours ago 30.2 MB
llama-b7844-bin-win-cpu-arm64.zip < 12 hours ago 23.8 MB
llama-b7844-bin-ubuntu-x64.tar.gz < 12 hours ago 23.8 MB
llama-b7844-bin-ubuntu-vulkan-x64.tar.gz < 12 hours ago 40.6 MB
llama-b7844-bin-ubuntu-s390x.tar.gz < 12 hours ago 24.5 MB
llama-b7844-bin-macos-x64.tar.gz < 12 hours ago 83.2 MB
llama-b7844-bin-macos-arm64.tar.gz < 12 hours ago 29.5 MB
llama-b7844-bin-910b-openEuler-x86-aclgraph.tar.gz < 12 hours ago 59.5 MB
llama-b7844-bin-910b-openEuler-aarch64-aclgraph.tar.gz < 12 hours ago 53.8 MB
llama-b7844-bin-310p-openEuler-x86.tar.gz < 12 hours ago 59.5 MB
llama-b7844-bin-310p-openEuler-aarch64.tar.gz < 12 hours ago 53.8 MB
cudart-llama-bin-win-cuda-13.1-x64.zip < 12 hours ago 402.6 MB
cudart-llama-bin-win-cuda-12.4-x64.zip < 12 hours ago 391.4 MB
b7844 source code.tar.gz < 14 hours ago 28.8 MB
b7844 source code.zip < 14 hours ago 29.8 MB
README.md < 14 hours ago 3.1 kB
Totals: 23 Items   2.4 GB 0
[CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full (#19042) * [CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full With pipeline parallelism, during prompt processing, the CPU-side CUDA command buffer gets full, stalling the CPU. Due to this, enough work doesn't get submitted to the GPU, causing bubbles in the GPU timeline. Fix this by setting the CUDA environment variable CUDA_SCALE_LAUNCH_QUEUES to 4x to increase the command buffer size. * Set the env variable in the CUDA backend registry allocation * Add link to PR in code comment * Remove warning logs and update documentation

macOS/iOS: - macOS Apple Silicon (arm64) - macOS Intel (x64) - iOS XCFramework

Linux: - Ubuntu x64 (CPU) - Ubuntu x64 (Vulkan) - Ubuntu s390x (CPU)

Windows: - Windows x64 (CPU) - Windows arm64 (CPU) - Windows x64 (CUDA 12) - CUDA 12.4 DLLs - Windows x64 (CUDA 13) - CUDA 13.1 DLLs - Windows x64 (Vulkan) - Windows x64 (SYCL) - Windows x64 (HIP)

openEuler: - openEuler x86 (310p) - openEuler x86 (910b, ACL Graph) - openEuler aarch64 (310p) - openEuler aarch64 (910b, ACL Graph)

Source: README.md, updated 2026-01-27