llama.cpp - Browse /b8121 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
llama-b8121-xcframework.zip	< 12 hours ago	168.5 MB	0
llama-b8121-bin-win-vulkan-x64.zip	< 12 hours ago	47.7 MB	0
llama-b8121-bin-win-sycl-x64.zip	< 12 hours ago	120.6 MB	0
llama-b8121-bin-win-opencl-adreno-arm64.zip	< 12 hours ago	25.3 MB	0
llama-b8121-bin-win-hip-radeon-x64.zip	< 12 hours ago	369.3 MB	0
llama-b8121-bin-win-cuda-13.1-x64.zip	< 12 hours ago	148.5 MB	0
llama-b8121-bin-win-cuda-12.4-x64.zip	< 12 hours ago	220.0 MB	0
llama-b8121-bin-win-cpu-x64.zip	< 12 hours ago	31.0 MB	0
llama-b8121-bin-win-cpu-arm64.zip	< 12 hours ago	24.4 MB	0
llama-b8121-bin-ubuntu-x64.tar.gz	< 12 hours ago	24.6 MB	0
llama-b8121-bin-ubuntu-vulkan-x64.tar.gz	< 12 hours ago	41.5 MB	0
llama-b8121-bin-ubuntu-s390x.tar.gz	< 12 hours ago	25.6 MB	0
llama-b8121-bin-macos-x64.tar.gz	< 12 hours ago	86.0 MB	0
llama-b8121-bin-macos-arm64.tar.gz	< 12 hours ago	30.4 MB	0
llama-b8121-bin-910b-openEuler-x86-aclgraph.tar.gz	< 12 hours ago	61.6 MB	0
llama-b8121-bin-910b-openEuler-aarch64-aclgraph.tar.gz	< 12 hours ago	55.6 MB	0
llama-b8121-bin-310p-openEuler-x86.tar.gz	< 12 hours ago	61.6 MB	0
llama-b8121-bin-310p-openEuler-aarch64.tar.gz	< 12 hours ago	55.6 MB	0
cudart-llama-bin-win-cuda-13.1-x64.zip	< 12 hours ago	402.6 MB	0
cudart-llama-bin-win-cuda-12.4-x64.zip	< 12 hours ago	391.4 MB	0
b8121 source code.tar.gz	< 14 hours ago	29.0 MB	0
b8121 source code.zip	< 14 hours ago	30.1 MB	0
README.md	< 14 hours ago	3.7 kB	0
Totals: 23 Items		2.5 GB	0

Improve CUDA graph capture (#19754) * Improve CUDA graph capture Currently, CUDA graphs are eagerly enabled on the first call to ggml_backend_cuda_graph_compute. If the graph properties keep changing (4+ consecutive updates), the graph is permanently disabled. This is suboptimal because: - The first call always incurs CUDA graph capture overhead even if the graph is unstable - Once permanently disabled, CUDA graphs never re-enable even after the graph stabilizes (e.g., switching from prompt processing to decode) The new approach delays CUDA graph activation until warmup completes: the same cgraph must be called at least twice with matching properties before CUDA graph capture begins. This avoids wasted capture overhead on volatile graphs and allows graphs to become eligible once they stabilize. This also fixes issues such as https://github.com/ggml-org/llama.cpp/discussions/19708 * Update ggml/src/ggml-cuda/ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Remove EM dashes * Update ggml/src/ggml-cuda/ggml-cuda.cu Co-authored-by: Aman Gupta <amangupta052@gmail.com> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de> Co-authored-by: Aman Gupta <amangupta052@gmail.com>

macOS/iOS:

Linux:

Windows: