Download Latest Version llama-b8115-bin-910b-openEuler-x86-aclgraph.tar.gz (61.6 MB)
Email in envelope

Get an email when there's a new version of llama.cpp

Home / b8113
Name Modified Size InfoDownloads / Week
Parent folder
llama-b8113-xcframework.zip < 7 hours ago 168.5 MB
llama-b8113-bin-win-vulkan-x64.zip < 7 hours ago 47.7 MB
llama-b8113-bin-win-sycl-x64.zip < 7 hours ago 120.6 MB
llama-b8113-bin-win-opencl-adreno-arm64.zip < 7 hours ago 25.3 MB
llama-b8113-bin-win-hip-radeon-x64.zip < 7 hours ago 369.3 MB
llama-b8113-bin-win-cuda-13.1-x64.zip < 7 hours ago 148.5 MB
llama-b8113-bin-win-cuda-12.4-x64.zip < 7 hours ago 220.0 MB
llama-b8113-bin-win-cpu-x64.zip < 7 hours ago 31.0 MB
llama-b8113-bin-win-cpu-arm64.zip < 7 hours ago 24.4 MB
llama-b8113-bin-ubuntu-x64.tar.gz < 7 hours ago 24.7 MB
llama-b8113-bin-ubuntu-vulkan-x64.tar.gz < 7 hours ago 41.5 MB
llama-b8113-bin-ubuntu-s390x.tar.gz < 7 hours ago 25.7 MB
llama-b8113-bin-macos-x64.tar.gz < 7 hours ago 86.1 MB
llama-b8113-bin-macos-arm64.tar.gz < 7 hours ago 30.4 MB
llama-b8113-bin-910b-openEuler-x86-aclgraph.tar.gz < 7 hours ago 61.6 MB
llama-b8113-bin-910b-openEuler-aarch64-aclgraph.tar.gz < 7 hours ago 55.6 MB
llama-b8113-bin-310p-openEuler-x86.tar.gz < 7 hours ago 61.6 MB
llama-b8113-bin-310p-openEuler-aarch64.tar.gz < 7 hours ago 55.6 MB
cudart-llama-bin-win-cuda-13.1-x64.zip < 7 hours ago 402.6 MB
cudart-llama-bin-win-cuda-12.4-x64.zip < 7 hours ago 391.4 MB
b8113 source code.tar.gz < 14 hours ago 29.0 MB
b8113 source code.zip < 14 hours ago 30.1 MB
README.md < 14 hours ago 4.9 kB
Totals: 23 Items   2.5 GB 0
common : fix Step-3.5-Flash format detection and thinking support (#19635) * common : fix Step-3.5-Flash format detection and thinking support Step-3.5-Flash uses the same XML-style tool call format as Qwen3-Coder (<tool_call><function=...><parameter=...>) but its Jinja template lacks the bare <function> and plural <parameters> markers that the detection logic previously required. This caused it to fall through to Hermes 2 Pro, which doesn't call func_args_not_string(), so arguments stayed as JSON strings and templates using arguments|items crashed. Additionally, the Qwen3-Coder-XML format handler had no thinking support. Models like Step-3.5-Flash that unconditionally emit <think> in their generation prompt need the same thinking_forced_open handling that Nemotron v3 and Hermes 2 Pro already have, otherwise reasoning_content is never separated from content in API responses. Changes: - Relax Qwen3-Coder XML detection to only require the 3 shared markers - Tighten Nemotron v3 branch to also require bare <function> and plural <parameters>, preventing Step-3.5-Flash from being misrouted via <think> - Add thinking_forced_open support to Qwen3-Coder-XML init function - Add <think>/</think> to preserved tokens - Fix build_grammar_xml_tool_call to handle thinking_forced_open in the grammar root rule, allowing </think> before tool calls - Add Step-3.5-Flash chat template and format detection test Builds on: https://github.com/ggml-org/llama.cpp/pull/19283 * chat : route Step-3.5-Flash to Nemotron v3 PEG parser, add tests Step-3.5-Flash uses the same XML tool call format as Qwen3-Coder and Nemotron 3 Nano (<tool_call>/<function=...>/<parameter=...>) but with unconditional <think> output. Route it to the Nemotron v3 PEG parser for streaming and schema-aware parameter parsing. Detection: templates with <think> + XML tool tags use Nemotron v3 PEG parser; templates without <think> (Qwen3-Coder) use GBNF grammar. Tests cover: basic messages, tool calls with/without thinking content, parallel tool calls, code string parameters, optional </parameter> closing tags, and JSON schema response format. * chat : remove dead thinking code from qwen3_coder_xml Remove thinking handling code that became unreachable after routing Step-3.5-Flash to the Nemotron v3 PEG parser. Qwen3-Coder has no <think> in its template, so the thinking_forced_open logic, preserved tokens, and grammar prefix were dead paths.

macOS/iOS:

Linux:

Windows:

openEuler:

Source: README.md, updated 2026-02-19