The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
koboldcpp.exe	2025-08-10	586.4 MB	7
koboldcpp-mac-arm64	2025-08-10	27.6 MB	1
koboldcpp-linux-x64-oldpc	2025-08-10	477.1 MB	0
koboldcpp-linux-x64-nocuda	2025-08-10	88.6 MB	1
koboldcpp-linux-x64	2025-08-10	580.3 MB	0
koboldcpp-oldpc.exe	2025-08-10	411.9 MB	1
koboldcpp-nocuda.exe	2025-08-10	79.4 MB	1
koboldcpp-1.97.3 source code.tar.gz	2025-08-09	31.3 MB	3
koboldcpp-1.97.3 source code.zip	2025-08-09	31.7 MB	16
README.md	2025-08-09	3.8 kB	0
Totals: 10 Items		2.3 GB	30

koboldcpp-1.97.3

wander

Merged support for GLM4.5 family of models
Merged support for GPT-OSS models (note that this model performs poorly if OpenAI instruct templates are not obeyed. To use it in raw story mode, append <|start|>assistant<|channel|>final<|message|> to memory)
Merged support for Voxtral (Voxtral Small 24B is better than Voxtral Mini 3B, but both are not great. See https://github.com/ggml-org/llama.cpp/pull/14862#issuecomment-3135794073)
Added /ping stub endpoint to permit usage on Runpod serverless.
Allow MoE layers to be easily kept on CPU with --moecpu (layercount) flag. Using this flag without a number will keep all MoE layers on CPU.
Clearer indication of support for each multimodal modality Vision/Audio
Increased max length of terminal prints allowed in debugmode.
Do not attempt context shifting for any mrope models.
Adjusted some adapter instruct templates, tweaked mistral template.
Handle empty objects returned by tool calls, also remove misinterpretation of the tools calls instruct tag within ChatML autoguess.
Allow multiple tool calls to be chained, and allow them to be triggered by any role.
WebSearch fix url params parsing
Increased regex stack size limit for MSVC builds (fix for mistral models).
Updated Kobold Lite, multiple fixes and improvements
Added 2 more save slots
Added a (+/-) modifier field for Adventure mode rolls
Fixed deleting wrong image if multiple selected images are identical.
Button to insert textDB separator
Improved mid-streaming rendering
Slightly lowered default rep pen
Simplified Mistral template, added GPT-OSS Harmony template
Merged new model support, fixes and improvements from upstream

Hotfix 1.97.1 - More template fixes, now shows generated token's ID in debugmode terminal log, fixed flux loading speed regression, Vulkan BSOD fixed. Hotfix 1.97.2 - Fix CLBlast regression, limit vulkan bsod fix to nvidia only, updated lite, merged upstream fixes.
Hotfix 1.97.3 - Fix a regression with GPT-OSS that resulted in incoherence

Known issues: OldPC CUDA builds are currently broken if flash attention is used. The last working version is 1.94.2

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux. If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary. Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI. and then once loaded, you can connect like this (or use the full koboldai client): http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Source: README.md, updated 2025-08-09