Download Latest Version koboldcpp-1.95.1 source code.tar.gz (31.0 MB)
Email in envelope

Get an email when there's a new version of KoboldCpp

Home / v1.94.1
Name Modified Size InfoDownloads / Week
Parent folder
koboldcpp-nocuda.exe 2025-06-25 75.6 MB
koboldcpp.exe 2025-06-25 574.8 MB
koboldcpp-mac-arm64 2025-06-25 27.3 MB
koboldcpp-linux-x64-oldpc 2025-06-25 468.4 MB
koboldcpp-linux-x64-nocuda 2025-06-25 84.8 MB
koboldcpp-linux-x64 2025-06-25 568.6 MB
koboldcpp-oldpc.exe 2025-06-25 403.1 MB
koboldcpp-1.94.1 source code.tar.gz 2025-06-22 30.9 MB
koboldcpp-1.94.1 source code.zip 2025-06-22 31.4 MB
README.md 2025-06-22 5.6 kB
Totals: 10 Items   2.3 GB 5

koboldcpp-1.94.1

are we comfy yet?

demo

  • NEW: Added unpacked mini-launcher: Now when unpacking KoboldCpp to a directory, a 5MB mini pyinstaller launcher is also generated in that same directory, that allows you to easily start an unpacked KoboldCpp without needing to install python or other dependencies. You can copy the unpacked directory and use it anywhere (thanks @henk717)
  • NEW: Chroma Image Generation Support: Merged support for the Chroma model, a new architecture based on Flux Schnell (thanks @stduhpf)
  • This model also requires a T5-XXL encoder and Flux VAE to work, be sure to load all 3 files respectively!
  • Chroma requires descriptive prompts and negative prompts to work well! Simple prompts will produce poor results.
  • NEW: Added PhotoMaker Face Cloning Use --sdphotomaker to load PhotoMaker along with any SDXL based model. Then open KoboldCpp SDUI and upload any reference image in the PhotoMaker input to clone the face! Works in all modes (inpaint/img2img/text2img).
  • Swapping .gguf models in admin mode now allows overriding the config with a different one as well (both are customizable).
  • Improve GNBF grammar performance by attempting culled grammar search first (thanks @Reithan)
  • Allow changing the main GPU with --maingpu when loading multi-gpu setups. The main GPU uses more VRAM and has a larger performance impact. By default it is the first GPU.
  • Added configurable soft resolution limits and VAE tiling limits (thanks @wbruna), also fixed VAE tiling artifacts.
  • Added --sdclampedsoft which provides "soft" total resolution clamping instead.(e.g. 640 would allow 640x640, 512x768 and 768x512 images), can be combined with --sdclamped which provides hard clamping (no dimension can exceed it)
  • Added --sdtiledvae which replaces --sdnotile: Allows specifying a size beyond which VAE tiling is applied.
  • Use --embeddingsmaxctx to limit the max context length for embedding models (if you run out of memory, this will help)
  • Added --embeddingsgpu to allow offloading embeddings model layers to GPU. This is NOT recommended as it doesn't provide much speedup, since embedding models already use the GPU for processing even without dedicated offload.
  • Display available RAM on startup, display version number in terminal window title
  • ComfyUI emulation now covers the /upload/image endpoint which allows Img2Img comfyui workflows. Files are stored temporarily in memory only.
  • Added more performance stats for token speeds and timings.
  • Updated Kobold Lite, multiple fixes and improvements
  • Fixed Chub.ai importer again
  • Added card importer for char-archive.evulid.cc
  • Added option to import image from webcam
  • Allow markdown when streaming current turn
  • Improved CSS import sanitizer (thanks @PeterPeet)
  • Word Frequency Search (inspired from @trincadev MyGhostWriter)
  • Allow usermods and CSS to be loaded from file.
  • Added WebSearch for corpo mode
  • Added Img2Img support for ComfyUI backends
  • Added ability to use custom OpenAI endpoint for TextDB embedding model
  • Minor linting and splitter/merge tool by @ehoogeveen-medweb
  • Fixed lookahead scanning for Author's note insertion point
  • Merged new model support, fixes and improvements from upstream

Hotfix 1.94.1 - Minor bugfixes, fixed ollama compatible vision, added avx/avx2 detection for backend auto-selection, cleaned up oldpc builds to only include oldpc files.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux. If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary. Click here for .gguf conversion and quantization tools

Deprecation Reminder: Binary filenames have been renamed: The files named koboldcpp_cu12.exe, koboldcpp_oldcpu.exe, koboldcpp_nocuda.exe, koboldcpp-linux-x64-cuda1210, and koboldcpp-linux-x64-cuda1150 have been removed. Please switch to the new filenames.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI. and then once loaded, you can connect like this (or use the full koboldai client): http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Source: README.md, updated 2025-06-22