Download Latest Version llama-b7740-bin-910b-openEuler-x86.tar.gz (57.7 MB)
Email in envelope

Get an email when there's a new version of llama.cpp

Home / b7739
Name Modified Size InfoDownloads / Week
Parent folder
llama-b7739-xcframework.zip < 13 hours ago 164.6 MB
llama-b7739-bin-win-vulkan-x64.zip < 13 hours ago 46.1 MB
llama-b7739-bin-win-sycl-x64.zip < 13 hours ago 119.1 MB
llama-b7739-bin-win-opencl-adreno-arm64.zip < 13 hours ago 24.2 MB
llama-b7739-bin-win-hip-radeon-x64.zip < 13 hours ago 360.1 MB
llama-b7739-bin-win-cuda-13.1-x64.zip < 13 hours ago 142.8 MB
llama-b7739-bin-win-cuda-12.4-x64.zip < 13 hours ago 217.2 MB
llama-b7739-bin-win-cpu-x64.zip < 13 hours ago 29.6 MB
llama-b7739-bin-win-cpu-arm64.zip < 13 hours ago 23.4 MB
llama-b7739-bin-ubuntu-x64.tar.gz < 13 hours ago 23.2 MB
llama-b7739-bin-ubuntu-vulkan-x64.tar.gz < 13 hours ago 40.0 MB
llama-b7739-bin-ubuntu-s390x.tar.gz < 13 hours ago 24.3 MB
llama-b7739-bin-macos-x64.tar.gz < 13 hours ago 82.7 MB
llama-b7739-bin-macos-arm64.tar.gz < 13 hours ago 29.1 MB
llama-b7739-bin-910b-openEuler-x86.tar.gz < 13 hours ago 57.7 MB
llama-b7739-bin-910b-openEuler-aarch64.tar.gz < 13 hours ago 52.4 MB
llama-b7739-bin-310p-openEuler-x86.tar.gz < 13 hours ago 57.7 MB
llama-b7739-bin-310p-openEuler-aarch64.tar.gz < 13 hours ago 52.4 MB
cudart-llama-bin-win-cuda-13.1-x64.zip < 13 hours ago 402.6 MB
cudart-llama-bin-win-cuda-12.4-x64.zip < 13 hours ago 391.4 MB
b7739 source code.tar.gz < 15 hours ago 28.7 MB
b7739 source code.zip < 15 hours ago 29.7 MB
README.md < 15 hours ago 4.2 kB
Totals: 23 Items   2.4 GB 0
CUDA: Factor out and re-use `block_reduce` function (#18785) * CUDA: Refactor and expose two_stage_warp_reduce_* function * Use `two_stage_warp_reduce` also in softmax kernel, move smem out of it Moving smem out of `__device__` function to `__global__` function allows for explicit smem reuse, as either compiler or cuda rt seem to not free it afterwards (`cudaFuncSetAttribute` fails when not accounting for it once for each call to two_stage_warp_reduce) * Update ggml/src/ggml-cuda/common.cuh Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Use two_stage_warp_reduce in group_norm_f32 * Use two_stage_warp_reduce in rms_norm_f32 * Fix smem calculation which expects bytes * Make `two_stage_warp_reduce` accept all values warp_reduce accepts Also integrate it into norm_f32 function * Use two_stage_warp_reduce in l2_norm_f32 * Use type traits for block reduction for better legibility Also adresss other requests by @am17an such as variable renaming * Make norm tests cover all cuda paths * Mark columns % WARP_SIZE !=0 as supported for RMS_NORM_BACK Unit-tests passed locally, let's see if they pass in the CI as well * Use `enum class` for `block_reduce_method` This is more type-safe than plain enum * Rename variables as suggested in code review by @am17an * Rename two_stage_warp_reduce -> block_reduce * Fix trailing whitespace in common.cuh * Make condition of static_assert type-dependent This delays evaluation until the template is actually instantiated. Otherwise, some compilers may evaluate the assert when parsing the template, resulting in build errors as observed here: https://github.com/ggml-org/llama.cpp/actions/runs/20960323123/job/60235530068?pr=18785 * Inline definitions --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>

macOS/iOS: - macOS Apple Silicon (arm64) - macOS Intel (x64) - iOS XCFramework

Linux: - Ubuntu x64 (CPU) - Ubuntu x64 (Vulkan) - Ubuntu s390x (CPU)

Windows: - Windows x64 (CPU) - Windows arm64 (CPU) - Windows x64 (CUDA 12) - CUDA 12.4 DLLs - Windows x64 (CUDA 13) - CUDA 13.1 DLLs - Windows x64 (Vulkan) - Windows x64 (SYCL) - Windows x64 (HIP)

openEuler: - openEuler x86 (310p) - openEuler x86 (910b) - openEuler aarch64 (310p) - openEuler aarch64 (910b)

Source: README.md, updated 2026-01-15