Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
koboldcpp.exe | 2025-08-10 | 586.4 MB | |
koboldcpp-mac-arm64 | 2025-08-10 | 27.6 MB | |
koboldcpp-linux-x64-oldpc | 2025-08-10 | 477.1 MB | |
koboldcpp-linux-x64-nocuda | 2025-08-10 | 88.6 MB | |
koboldcpp-linux-x64 | 2025-08-10 | 580.3 MB | |
koboldcpp-oldpc.exe | 2025-08-10 | 411.9 MB | |
koboldcpp-nocuda.exe | 2025-08-10 | 79.4 MB | |
koboldcpp-1.97.3 source code.tar.gz | 2025-08-09 | 31.3 MB | |
koboldcpp-1.97.3 source code.zip | 2025-08-09 | 31.7 MB | |
README.md | 2025-08-09 | 3.8 kB | |
Totals: 10 Items | 2.3 GB | 30 |
koboldcpp-1.97.3
- Merged support for GLM4.5 family of models
- Merged support for GPT-OSS models (note that this model performs poorly if OpenAI instruct templates are not obeyed. To use it in raw story mode, append
<|start|>assistant<|channel|>final<|message|>
to memory) - Merged support for Voxtral (Voxtral Small 24B is better than Voxtral Mini 3B, but both are not great. See https://github.com/ggml-org/llama.cpp/pull/14862#issuecomment-3135794073)
- Added
/ping
stub endpoint to permit usage on Runpod serverless. - Allow MoE layers to be easily kept on CPU with
--moecpu (layercount)
flag. Using this flag without a number will keep all MoE layers on CPU. - Clearer indication of support for each multimodal modality Vision/Audio
- Increased max length of terminal prints allowed in debugmode.
- Do not attempt context shifting for any mrope models.
- Adjusted some adapter instruct templates, tweaked mistral template.
- Handle empty objects returned by tool calls, also remove misinterpretation of the tools calls instruct tag within ChatML autoguess.
- Allow multiple tool calls to be chained, and allow them to be triggered by any role.
- WebSearch fix url params parsing
- Increased regex stack size limit for MSVC builds (fix for mistral models).
- Updated Kobold Lite, multiple fixes and improvements
- Added 2 more save slots
- Added a (+/-) modifier field for Adventure mode rolls
- Fixed deleting wrong image if multiple selected images are identical.
- Button to insert textDB separator
- Improved mid-streaming rendering
- Slightly lowered default rep pen
- Simplified Mistral template, added GPT-OSS Harmony template
- Merged new model support, fixes and improvements from upstream
Hotfix 1.97.1 - More template fixes, now shows generated token's ID in debugmode terminal log, fixed flux loading speed regression, Vulkan BSOD fixed.
Hotfix 1.97.2 - Fix CLBlast regression, limit vulkan bsod fix to nvidia only, updated lite, merged upstream fixes.
Hotfix 1.97.3 - Fix a regression with GPT-OSS that resulted in incoherence
Known issues: OldPC CUDA builds are currently broken if flash attention is used. The last working version is 1.94.2
Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag. You can also refer to the readme and the wiki.