Download Latest Version llama-b7898-bin-910b-openEuler-x86-aclgraph.tar.gz (61.4 MB)
Email in envelope

Get an email when there's a new version of llama.cpp

Home / b7895
Name Modified Size InfoDownloads / Week
Parent folder
llama-b7895-xcframework.zip 2026-01-30 174.6 MB
llama-b7895-bin-win-vulkan-x64.zip 2026-01-30 47.2 MB
llama-b7895-bin-win-sycl-x64.zip 2026-01-30 120.2 MB
llama-b7895-bin-win-opencl-adreno-arm64.zip 2026-01-30 25.0 MB
llama-b7895-bin-win-hip-radeon-x64.zip 2026-01-30 365.2 MB
llama-b7895-bin-win-cuda-13.1-x64.zip 2026-01-30 146.7 MB
llama-b7895-bin-win-cuda-12.4-x64.zip 2026-01-30 216.6 MB
llama-b7895-bin-win-cpu-x64.zip 2026-01-30 30.6 MB
llama-b7895-bin-win-cpu-arm64.zip 2026-01-30 24.2 MB
llama-b7895-bin-ubuntu-x64.tar.gz 2026-01-30 24.4 MB
llama-b7895-bin-ubuntu-vulkan-x64.tar.gz 2026-01-30 41.2 MB
llama-b7895-bin-ubuntu-s390x.tar.gz 2026-01-30 25.2 MB
llama-b7895-bin-macos-x64.tar.gz 2026-01-30 84.8 MB
llama-b7895-bin-macos-arm64.tar.gz 2026-01-30 30.0 MB
llama-b7895-bin-910b-openEuler-x86-aclgraph.tar.gz 2026-01-30 61.4 MB
llama-b7895-bin-910b-openEuler-aarch64-aclgraph.tar.gz 2026-01-30 55.4 MB
llama-b7895-bin-310p-openEuler-x86.tar.gz 2026-01-30 61.4 MB
llama-b7895-bin-310p-openEuler-aarch64.tar.gz 2026-01-30 55.4 MB
cudart-llama-bin-win-cuda-13.1-x64.zip 2026-01-30 402.6 MB
cudart-llama-bin-win-cuda-12.4-x64.zip 2026-01-30 391.4 MB
b7895 source code.tar.gz 2026-01-30 28.9 MB
b7895 source code.zip 2026-01-30 29.9 MB
README.md 2026-01-30 4.3 kB
Totals: 23 Items   2.4 GB 1
lookup, lookahead: fix crash when n_ctx not specified (#18729) * lookup, lookahead: fix crash when n_ctx not specified Since PR [#16653] (Dec 15, 2025), the default n_ctx is 0 to enable automatic GPU memory fitting. This causes llama-lookup and llama-lookahead to crash when run without explicit -c flag: GGML_ASSERT(batch.seq_id[batch.n_tokens] && "llama_batch size exceeded") Root cause: Both examples use params.n_ctx directly for batch initialization, but params.n_ctx remains 0 even after the context is properly initialized to n_ctx_train internally. Bug history: - Nov 2023: lookahead.cpp created (PR [#4207]) with params.n_ctx pattern - Dec 2023: lookup.cpp created (PR [#4484]) with same pattern - Nov 2024: default n_ctx changed to 4096 (PR [#10136]) - bug dormant - Dec 2025: default n_ctx changed to 0 (PR [#16653]) - bug activated The bug was dormant for 2+ years because params.n_ctx defaulted to 512, then 4096. PR [#16653] changed it to 0 for GPU auto-fitting, triggering the crash. Fix: Use llama_n_ctx(ctx) to get the actual runtime context size, matching the pattern already used elsewhere in lookup.cpp (line 72) and in speculative.cpp/speculative-simple.cpp. Tested: llama-lookup now works without -c flag (12.5% acceptance on Gemma-3-1B). Note: llama-lookahead has a separate pre-existing issue with sequence initialization (n_seq_max=1 vs W+G+1 needed) that is unrelated to this fix. * lookahead: fix n_seq_max and kv_unified configuration Lookahead decoding requires: - W + G + 1 = 31 sequences for parallel Jacobi decoding - Unified KV cache for coupled sequences in batch splitting These requirements were broken after PR [#14482] changed validation logic. Consolidates fix from PR [#18730] per maintainer request. Commit message drafted with Claude.

macOS/iOS: - macOS Apple Silicon (arm64) - macOS Intel (x64) - iOS XCFramework

Linux: - Ubuntu x64 (CPU) - Ubuntu x64 (Vulkan) - Ubuntu s390x (CPU)

Windows: - Windows x64 (CPU) - Windows arm64 (CPU) - Windows x64 (CUDA 12) - CUDA 12.4 DLLs - Windows x64 (CUDA 13) - CUDA 13.1 DLLs - Windows x64 (Vulkan) - Windows x64 (SYCL) - Windows x64 (HIP)

openEuler: - openEuler x86 (310p) - openEuler x86 (910b, ACL Graph) - openEuler aarch64 (310p) - openEuler aarch64 (910b, ACL Graph)

Source: README.md, updated 2026-01-30