llama.cpp - Browse /b7739 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
llama-b7739-xcframework.zip	< 13 hours ago	164.6 MB	0
llama-b7739-bin-win-vulkan-x64.zip	< 13 hours ago	46.1 MB	0
llama-b7739-bin-win-sycl-x64.zip	< 13 hours ago	119.1 MB	0
llama-b7739-bin-win-opencl-adreno-arm64.zip	< 13 hours ago	24.2 MB	0
llama-b7739-bin-win-hip-radeon-x64.zip	< 13 hours ago	360.1 MB	0
llama-b7739-bin-win-cuda-13.1-x64.zip	< 13 hours ago	142.8 MB	0
llama-b7739-bin-win-cuda-12.4-x64.zip	< 13 hours ago	217.2 MB	0
llama-b7739-bin-win-cpu-x64.zip	< 13 hours ago	29.6 MB	0
llama-b7739-bin-win-cpu-arm64.zip	< 13 hours ago	23.4 MB	0
llama-b7739-bin-ubuntu-x64.tar.gz	< 13 hours ago	23.2 MB	0
llama-b7739-bin-ubuntu-vulkan-x64.tar.gz	< 13 hours ago	40.0 MB	0
llama-b7739-bin-ubuntu-s390x.tar.gz	< 13 hours ago	24.3 MB	0
llama-b7739-bin-macos-x64.tar.gz	< 13 hours ago	82.7 MB	0
llama-b7739-bin-macos-arm64.tar.gz	< 13 hours ago	29.1 MB	0
llama-b7739-bin-910b-openEuler-x86.tar.gz	< 13 hours ago	57.7 MB	0
llama-b7739-bin-910b-openEuler-aarch64.tar.gz	< 13 hours ago	52.4 MB	0
llama-b7739-bin-310p-openEuler-x86.tar.gz	< 13 hours ago	57.7 MB	0
llama-b7739-bin-310p-openEuler-aarch64.tar.gz	< 13 hours ago	52.4 MB	0
cudart-llama-bin-win-cuda-13.1-x64.zip	< 13 hours ago	402.6 MB	0
cudart-llama-bin-win-cuda-12.4-x64.zip	< 13 hours ago	391.4 MB	0
b7739 source code.tar.gz	< 15 hours ago	28.7 MB	0
b7739 source code.zip	< 15 hours ago	29.7 MB	0
README.md	< 15 hours ago	4.2 kB	0
Totals: 23 Items		2.4 GB	0

CUDA: Factor out and re-use `block_reduce` function (#18785) * CUDA: Refactor and expose two_stage_warp_reduce_* function * Use `two_stage_warp_reduce` also in softmax kernel, move smem out of it Moving smem out of `__device__` function to `__global__` function allows for explicit smem reuse, as either compiler or cuda rt seem to not free it afterwards (`cudaFuncSetAttribute` fails when not accounting for it once for each call to two_stage_warp_reduce) * Update ggml/src/ggml-cuda/common.cuh Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Use two_stage_warp_reduce in group_norm_f32 * Use two_stage_warp_reduce in rms_norm_f32 * Fix smem calculation which expects bytes * Make `two_stage_warp_reduce` accept all values warp_reduce accepts Also integrate it into norm_f32 function * Use two_stage_warp_reduce in l2_norm_f32 * Use type traits for block reduction for better legibility Also adresss other requests by @am17an such as variable renaming * Make norm tests cover all cuda paths * Mark columns % WARP_SIZE !=0 as supported for RMS_NORM_BACK Unit-tests passed locally, let's see if they pass in the CI as well * Use `enum class` for `block_reduce_method` This is more type-safe than plain enum * Rename variables as suggested in code review by @am17an * Rename two_stage_warp_reduce -> block_reduce * Fix trailing whitespace in common.cuh * Make condition of static_assert type-dependent This delays evaluation until the template is actually instantiated. Otherwise, some compilers may evaluate the assert when parsing the template, resulting in build errors as observed here: https://github.com/ggml-org/llama.cpp/actions/runs/20960323123/job/60235530068?pr=18785 * Inline definitions --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>

macOS/iOS: - macOS Apple Silicon (arm64) - macOS Intel (x64) - iOS XCFramework

Linux: - Ubuntu x64 (CPU) - Ubuntu x64 (Vulkan) - Ubuntu s390x (CPU)

Windows: - Windows x64 (CPU) - Windows arm64 (CPU) - Windows x64 (CUDA 12) - CUDA 12.4 DLLs - Windows x64 (CUDA 13) - CUDA 13.1 DLLs - Windows x64 (Vulkan) - Windows x64 (SYCL) - Windows x64 (HIP)

openEuler: - openEuler x86 (310p) - openEuler x86 (910b) - openEuler aarch64 (310p) - openEuler aarch64 (910b)

Source: README.md, updated 2026-01-15

llama.cpp Files

Port of Facebook's LLaMA model in C/C++

llama.cpp Files

Port of Facebook's LLaMA model in C/C++

Get an email when there's a new version of llama.cpp