Download Latest Version llama-b8192-bin-910b-openEuler-x86-aclgraph.tar.gz (63.1 MB)
Email in envelope

Get an email when there's a new version of llama.cpp

Home / b8190
Name Modified Size InfoDownloads / Week
Parent folder
llama-b8190-xcframework.zip < 24 hours ago 169.4 MB
llama-b8190-bin-win-vulkan-x64.zip < 24 hours ago 48.3 MB
llama-b8190-bin-win-sycl-x64.zip < 24 hours ago 121.0 MB
llama-b8190-bin-win-opencl-adreno-arm64.zip < 24 hours ago 25.6 MB
llama-b8190-bin-win-hip-radeon-x64.zip < 24 hours ago 345.0 MB
llama-b8190-bin-win-cuda-13.1-x64.zip < 24 hours ago 148.9 MB
llama-b8190-bin-win-cuda-12.4-x64.zip < 24 hours ago 220.4 MB
llama-b8190-bin-win-cpu-x64.zip < 24 hours ago 31.4 MB
llama-b8190-bin-win-cpu-arm64.zip < 24 hours ago 24.7 MB
llama-b8190-bin-ubuntu-x64.tar.gz < 24 hours ago 25.1 MB
llama-b8190-bin-ubuntu-vulkan-x64.tar.gz < 24 hours ago 42.3 MB
llama-b8190-bin-ubuntu-s390x.tar.gz < 24 hours ago 26.2 MB
llama-b8190-bin-ubuntu-rocm-7.2-x64.tar.gz < 24 hours ago 145.2 MB
llama-b8190-bin-macos-x64.tar.gz < 24 hours ago 88.5 MB
llama-b8190-bin-macos-arm64.tar.gz < 24 hours ago 30.7 MB
llama-b8190-bin-910b-openEuler-x86-aclgraph.tar.gz < 24 hours ago 63.1 MB
llama-b8190-bin-910b-openEuler-aarch64-aclgraph.tar.gz < 24 hours ago 57.0 MB
llama-b8190-bin-310p-openEuler-x86.tar.gz < 24 hours ago 63.1 MB
llama-b8190-bin-310p-openEuler-aarch64.tar.gz < 24 hours ago 57.0 MB
cudart-llama-bin-win-cuda-13.1-x64.zip < 24 hours ago 402.6 MB
cudart-llama-bin-win-cuda-12.4-x64.zip < 24 hours ago 391.4 MB
b8190 source code.tar.gz 2026-03-03 29.1 MB
b8190 source code.zip 2026-03-03 30.1 MB
README.md 2026-03-03 4.0 kB
Totals: 24 Items   2.6 GB 0
ggml webgpu: fix workgroup dispatch limit for large batch sizes (#19965) * ggml-webgpu: fix workgroup dispatch limit for large batch sizes WebGPU limits workgroup sizes to 65535 per dimension. Large MUL_MAT operations with batch sizes exceedeing this limi would fail. * add compute_2d_workgroups() helper to split total workgroup ID across X/Y dimensions * update mul_mat_reg_tile.wgsl to reconstruct linear workgroup ID from 2D dispatch * update mul_mat_subgroup_matrix.wgsl to reconstruct linear workgroup ID from 2D dispatch * update mul_mat.wgsl to compute global index from 2D workgroup coordinates * refactor all three mul_mat dispatch paths to use the shared helper * ggml-webgpu: add bounds checking for over-dispatched workgroups 2D workgroup dispatch can over-dispatch when total workgroups don't divide evenly into the 65535 per-dimension limit. Extra workgroups would compute invalid batch indices, causing memory corruption. * add batch_idx bound check to mul_mat_reg_tile.wgsl and mul_mat_subgroup_matrix.wgsl to prevent over-dispatched workgroups from accessing invalid memory * fixes test failures with large batch sizes (eg., bs=[128, 1024]) * ggml-webgpu: add back TODO for spliting large sizes into batches * Optimize 2d workgroup provisioning * Set some parameters that increase speed --------- Co-authored-by: Reese Levine <reeselevine1@gmail.com>

macOS/iOS:

Linux:

Windows:

openEuler:

Source: README.md, updated 2026-03-03