Recent changes to Home

Home modified by Henk

Henk — Mon, 05 May 2025 15:14:44 -0000

--- v65
+++ v66
@@ -483,7 +483,7 @@
 This option allows the placeholder tags `{{user}}` `{{char}}` `{{[INPUT]}}` and `{{[OUTPUT]}}` to be used by character card or scenario authors, which will be dynamically replaced with the correct value on runtime. For example, `{{char}}` will get replaced with the chatbot's selected nickname.

 ### **Thinking Tags**  
-In some modern reasoning models like Deepseek R1, the model performs a chain-of-thought process that outputs thinking steps before deriving a final answer. In Context > Tokens > Thinking, you can specify regex to handle output from reasoning models, to hide, remove, or ignore the Chain-Of-Thought tags like `<think>`.
+In some modern reasoning models like Deepseek R1, the model performs a chain-of-thought process that outputs thinking steps before deriving a final answer. In Settings > Tokens > Thinking, you can specify regex to handle output from reasoning models, to hide, remove, or ignore the Chain-Of-Thought tags like `<think>`.

 ### **Persist Autosave Session**  
 This option autosaves your story and settings, which will be restored the next time you start KoboldCpp again. However, to avoid data loss you are still recommended to manually export your saved story .json files from time to time.
</think></think>

Home modified by Henk

Henk — Mon, 05 May 2025 15:14:44 -0000

--- v64
+++ v65
@@ -90,10 +90,10 @@
   - You could, but why would you want to? The basic `make` should work without issues with build essentials. Finding appropriate libraries for GPU acceleration may be difficult.

 ## KoboldCpp General Usage and Troubleshooting  
-### **I don't want to use the GUI launcher. How to use the 'command line/terminal' with extra parameters to launch koboldcpp?**  
+### **I don't want to use the GUI launcher. How to use the command line terminal with extra parameters to launch koboldcpp?**  
 Here are some easy ways to start koboldcpp from the command line. Pick one that suits you best.  
 - Windows: Go to Start > Run (or WinKey+R) and input the full path of your koboldcpp.exe followed by the launch flags. e.g. `C:\mystuff\koboldcpp.exe --usecublas --gpulayers 10`. Alternatively, you can also create a desktop shortcut to the koboldcpp.exe file, and set the desired values in the `Properties > Target` box. Lastly, you can also start command prompt in your koboldcpp.exe directory (with `cmd`), and pass the desired flags to it from the terminal window.  
-- Linux/OSX: Navigate to the koboldcpp directory, and build koboldcpp with `make` (as described in 'How do I compile KoboldCpp'). Then run the command `python3 koboldcpp.py --model (path you your model)`, plus whatever flags you need e.g. `--useclblast` or `--stream`
+- Linux/OSX: Navigate to the koboldcpp directory, and build koboldcpp with `make` (as described in 'How do I compile KoboldCpp'). Then run the command `python3 koboldcpp.py --model (path to your model)`, plus whatever flags you need e.g. `--useclblast` or `--stream`

 ### **How do I see the available commands and how to use them?**  
 You can launch KoboldCpp from the command line with the `--help` parameter to view the available command list. See the section on "How to use the command line terminal"
@@ -179,10 +179,13 @@
 This parameter launches the KoboldAI Lite UI alone without loading a model. The Kobold Lite UI can be used to connect to an external KoboldCpp instance, or other AI services such as the AI Horde.

 ### **What is `--config`? What are .kcpps files?**  
-`.kcpps` files are configuration files that store your KoboldCpp launcher preferences and settings. You can save and load them into the GUI, or run them directly with the `--config` flag.
+`.kcpps` files are configuration files that store your KoboldCpp launcher preferences and settings. You can save and load them into the GUI, or run them directly with the `--config` flag. You can export configs from command line with `--exportconfig` flag.

 ### **What are .kcppt files?**  
-`.kcppt` files are configuration *templates* that store KoboldCpp launcher preferences and settings. You can save and load them into the GUI, or run them directly with the `--config` flag. The difference between this and .kcpps files is that .kcppt files are intended to be shared, thus they will not include device specific settings like the GPU to use, instead those are decided by the other user.
+`.kcppt` files are configuration *templates* that store KoboldCpp launcher preferences and settings. You can save and load them into the GUI, or run them directly with the `--config` flag. The difference between this and .kcpps files is that .kcppt files are intended to be shared, thus they will not include device specific settings like the GPU to use, instead those are decided by the other user. You can export templates from command line with `--exporttemplate` flag.
+
+### **Can I remove the BOS token?**  
+By default, the begin-of-stream (BOS) token is appended to all inputs before generation. Some models (e.g. Qwen) may not like this so much. If you want to prevent the BOS token from being automatically added, launch with the `--nobostoken` flag.

 ### **What is `--multiuser` mode?**  
 Multiuser mode allows multiple people to share a single KoboldCpp instance, connecting different devices to a common endpoint (over LAN, a port forwarded public IP, or through an internet tunnel). It's enabled by default. It automatically handles queuing requests and dispatching them to the correct clients. An optional extra parameter number allows you to specify the max simultaneous users. Set to `--multiuser 0` to disable this.
@@ -201,6 +204,9 @@

 ### **What is `--preloadstory`**  
 You can pass a Kobold Lite JSON file with this parameter when launching the KoboldCpp server. The save file will automatically be served and loaded to any new Kobold Lite clients who connect to your server, effectively giving you a preconfigured story that you can easily share over the network.
+
+### **Can I save stories to the koboldcpp server remotely? (Server Side Saves)**  
+Server-Sided (networked) save slots can be used. You can specify a database file when launching KoboldCpp using `--savedatafile`. Then, you will be able to save and load persistent stories over the network to that KoboldCpp server, and access it from any other browser or device connected to it over the network. This can also be combined with `--password` to require an API key to save/load the stories.

 ### **What is `--chatcompletionsadapter`**  
 You can pass an optional ChatCompletions Adapter JSON file to force custom instruct tags when launching the KoboldCpp server. This is useful when using the OpenAI compatible Chat Completions API with third party clients. The adapter file takes the following JSON format, all fields are optional.
@@ -277,7 +283,13 @@
 Further reading: https://github.com/LostRuins/koboldcpp/discussions/514 and https://github.com/LostRuins/koboldcpp/pull/224

 ### **What is LLaVA and mmproj**  
-`--mmproj` can be used to load a multimodal projector onto a model (e.g. LLaVA), allowing the model to have AI vision capabilities, to perceive and react to images you send it. You can get projectors for some popular architectures [at this link](https://huggingface.co/koboldcpp/mmproj/tree/main), though they are optimized for the LLaVA finetune.
+`--mmproj` can be used to load a multimodal projector onto a model (e.g. LLaVA), allowing the model to have AI vision capabilities, to perceive and react to images you send it. You can get projectors for some popular architectures [at this link](https://huggingface.co/koboldcpp/mmproj/tree/main), make sure you pick the correct projector for your architecture! (E.g. A Gemma3 12B model MUST pick the gemma3 12B mmproj gguf). Once loaded, it can be toggled by clicking on any image in Lite and selecting Multimodal Vision for AI Vision, then simply chat with the model normally and it will recognize whatever images you upload. You can adjust the maximum resolution of each image with `--visionmaxres`.
+
+### **With Chat Completions, how do I control how many tokens the AI outputs?**  
+You can control the AI output length by setting the `max_tokens` field in the API request to `/v1/completions` and `/v1/chat/completions`. However, some third party clients do not set this field. In those cases, you can use the flag `--defaultgenamount` to controls the max amount of tokens generated by default if not specified.
+
+### **What are Embeddings and how can they be used?**  
+GGUF embedding models can now be loaded with `--embeddingsmodel` and accessed from `/v1/embeddings` or `/api/extra/embeddings`, this can be used to encoding text for search or storage within a Vector database. This feature is not directly supported in the inbuilt KoboldAI Lite UI and is intended for third party vector DB solutions.

 ### **Flash Attention**  
 `--flashattention` can be used to enable flash attention when running with CUDA/CuBLAS, which can be faster and more memory efficient.
@@ -293,11 +305,19 @@
 ### **Overriding MoE models**  
 `--moeexperts` - Overwrite the number of experts to use in MoE models

+### **What is Admin mode? Can I switch models at runtime?**  
+You can switch models, settings and configs at runtime. This also allows for remote model swapping. Launch with `--admin` to enable this feature, and also provide `--admindir` containing `.kcpps` launch configs. Optionally, provide `--adminpassword` to secure admin functions. You will be able to swap between any model's config at runtime from the Admin panel in Lite. You can prepare `.kcpps` configs for different layers, backends, models, etc.
+KoboldCpp will then terminate the current instance and relaunch to a new config.
+
 ### **What is Whisper?**  
 Whisper is a speech-to-text model that can be used for transcription and voice control within Kobold Lite. Load a Whisper GGML model with `--whispermodel`. In Kobold Lite, uses microphone when enabled in settings panel. You can use Push-To-Talk (PTT) or automatic Voice Activity Detection (VAD) aka Hands Free Mode, everything runs locally within your browser including resampling and wav format conversion, and interfaces directly with the KoboldCpp transcription endpoint.

 ### **What is OuteTTS Text To Speech?**  
-OuteTTS is a text-to-speech model that can be used for narration by generating audio within Kobold Lite. You need two models, an OuteTTS GGUF and a WavTokenizer GGUF which you can find [here](https://github.com/LostRuins/koboldcpp/wiki#getting-an-ai-model-file). Once downloaded, load them in the Audio tab or using `--ttsmodel` and `--ttswavtokenizer`. You can also use `--ttsgpu` to load them on the GPU instead, and `--ttsthreads` to set a custom thread count used.
+OuteTTS is a text-to-speech model that can be used for narration by generating audio within Kobold Lite. You need two models, an OuteTTS GGUF and a WavTokenizer GGUF which you can find [here](https://github.com/LostRuins/koboldcpp/wiki#getting-an-ai-model-file). Once downloaded, load them in the Audio tab or using `--ttsmodel` and `--ttswavtokenizer`. 
+  - You can also use `--ttsgpu` to load them on the GPU instead.
+  - Use `--ttsthreads` to set a custom thread count used.
+  - Use `--ttsmaxlen` to limit the maximum amount of audio tokens that will be generated in output for a request.
+  - Check the API documentation to see how to change speakers or use voice cloning.

 ### **Can I use SSL?**  
 You can now import your own SSL cert to use with KoboldCpp and serve it over HTTPS with `--ssl [cert.pem] [key.pem]` or via the GUI. The `.pem` files must be unencrypted, you can also generate them with OpenSSL, eg. `openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365 -config openssl.cnf -nodes` for your own self signed certificate.
@@ -366,6 +386,9 @@
 ### **What is `--prompt`**  
 This flag can be to run KoboldCpp directly from the command line without running the server, the output of the prompt will be generated and printed into the terminal before exiting. When running with --prompt, all other console outputs are suppressed, except for that prompt's response which is piped directly to stdout. You can control the output length with --promptlimit. These 2 flags can also be combined with --benchmark, allowing benchmarking with a custom prompt and returning the response. Note that this mode is only intended for quick testing and simple usage, no sampler settings will be configurable.

+### **Can I chat with KoboldCpp interactively from the Command Line / Command Prompt directly?**  
+The flag `--cli` launches KoboldCpp with an interactive command line interface without running the server, allowing you to use it without a GUI, just like  llama.cpp. Simply run it with --cli to enter terminal mode, where you can chat interactively using the command line shell.
+
 ### **Can I generate images with KoboldCpp?**  
 Yes, KoboldCpp now natively supports Local Image Generation, thanks to stable-diffusion.cpp. It provides a ComfyUI and A1111 compatible txt2img endpoint which you can use within the embedded Kobold Lite, or in many other compatible frontends such as SillyTavern. 
   - Just select a compatible SD3, Flux, SD1.5 or SDXL `.safetensors` model to load, either through the GUI launcher or with `--sdmodel`
@@ -459,6 +482,9 @@
 ### **Placeholder Tags**  
 This option allows the placeholder tags `{{user}}` `{{char}}` `{{[INPUT]}}` and `{{[OUTPUT]}}` to be used by character card or scenario authors, which will be dynamically replaced with the correct value on runtime. For example, `{{char}}` will get replaced with the chatbot's selected nickname.

+### **Thinking Tags**  
+In some modern reasoning models like Deepseek R1, the model performs a chain-of-thought process that outputs thinking steps before deriving a final answer. In Context > Tokens > Thinking, you can specify regex to handle output from reasoning models, to hide, remove, or ignore the Chain-Of-Thought tags like `<think>`.
+
 ### **Persist Autosave Session**  
 This option autosaves your story and settings, which will be restored the next time you start KoboldCpp again. However, to avoid data loss you are still recommended to manually export your saved story .json files from time to time.

@@ -521,7 +547,11 @@
 This is the opposite problem to the above, sometimes the AI has many interesting things to say, but they get trimmed away because it responded across multiple lines or even multiple paragraphs. Enabling 'Multiline Replies' allow such responses to be used. Remember - the AI learns from examples. A boring prompt or dull messages from the user can lead to dull AI replies.

 ### **What is AI Vision?**  
-AI Vision is an attempt to provide multimodality by allow the model to recognize and interpret uploaded or generated images. This uses AI Horde or a local A1111 endpoint to perform image interrogation, similar to LLaVa, although not as precise. Click on any image and you can enable it within Lite. This functionality is not provided by KCPP itself.
+AI Vision is an attempt to provide multimodality by allow the model to recognize and interpret uploaded or generated images. There are three modes supported to provide vision.
+  - Interrogate (AI Horde): This uses AI Horde to perform image interrogation online, giving you a basic description of the image.
+  - Interrogate (Local): local A1111 or Forge endpoint to perform image interrogation to generate a simple description.
+  - Multimodal Vision: This is *true* vision, it requires using a multimodal projector (mmproj) and allows the model to recognize and interpret images naturally in great detail.
+Click on any image and you can enable it within the dropdown box in KoboldAI Lite.

 ### **What file formats does Kobold Lite support?**  
 Kobold Lite supports many file formats, automatically determined when the file is loaded. These include:
@@ -589,7 +619,7 @@
 When WebSearch is enabled, KoboldCpp now optionally functions as a WebSearch proxy with a new `/api/extra/websearch` endpoint, allowing your queries to be augmented with web searches. Works with all models, needs top be enabled both on Lite and on Kcpp with `--websearch` or in the GUI. The websearch is executed locally from the KoboldCpp instance, and is powered by DuckDuckGo.

 ### **Can I Talk To or Search my Documents**  
-KoboldCpp offers a TextDB Document Lookup in KoboldAI Lite - This is a very rudimentary form of browser-based RAG. You can access it from the Context > TextDB tab. It's powered by a text-based minisearch engine, you can paste a very large text document which is chunked and stored into the database, and at runtime it will find relevant snippets to add to the context depending on the query/instruction you send to the AI. You can use the historical context as a document, or paste a custom text document to use. Note that this is NOT an embedding model, it uses lunr and minisearch for retrieval scoring instead. 
+KoboldCpp offers a TextDB Document Lookup in KoboldAI Lite - This is a very rudimentary form of browser-based RAG. You can access it from the Context > TextDB tab. It's powered by a text-based minisearch engine, you can paste a very large text document which is chunked and stored into the database, and at runtime it will find relevant snippets to add to the context depending on the query/instruction you send to the AI. You can use the historical context as a document, or paste a custom text document to use. Note that this is NOT an embedding model or true vector database, it uses lunr and minisearch for retrieval scoring instead. 

 ### **Are my chats private? What is with the Share button?**  
 KoboldCpp is capable of running fully locally offline without internet, and does not send your inputs to anywhere else. Generated content using the API is displayed in the terminal console, which is cleared when the application is closed. Likewise, Kobold Lite UI will store your content only locally within the browser, it is not sent to any other external server. KoboldCpp and Kobold Lite are fully open source with AGPLv3, and you can compile from source or review it on github.  
</think>

Home modified by Henk

Henk — Mon, 05 May 2025 15:14:44 -0000

--- v63
+++ v64
@@ -283,7 +283,7 @@
 `--flashattention` can be used to enable flash attention when running with CUDA/CuBLAS, which can be faster and more memory efficient.

 ### **Quantized KV Cache**  
-You can now utilize the Quantized KV Cache feature in KoboldCpp with `--quantkv [level]`, where `level 0=f16, 1=q8, 2=q4`. Note that quantized KV cache is only available if `--flashattention` is used, and is **NOT** compatible with Context Shifting, which will be disabled if `--quantkv` is used.
+You can now utilize the Quantized KV Cache feature in KoboldCpp with `--quantkv [level]`, where `level 0=f16, 1=q8, 2=q4`. Note that fully quantized KV cache is only available if `--flashattention` is used, otherwise only K cache can be quantized.

 ### **Speculative Decoding (Draft Models)**  
 You can explore speculative decoding by loading a draft model. This is intended to be a smaller fast model with the same vocab as the big model, that tries to speed up inference by guessing tokens. Use `--draftmodel` to select the speculative decoding model.

Home modified by Henk

Henk — Mon, 05 May 2025 15:14:44 -0000

--- v62
+++ v63
@@ -607,8 +607,8 @@

 ### Other established resources  
 [Local LLM guide from /lmg/, with good beginner models](https://rentry.org/local_LLM_guide)  
-[SillyTavern documentation regarding KoboldAI](https://docs.sillytavern.app/usage/api-connections/koboldai/)  
-[PygmalionAI documentation regarding KoboldAI](https://docs.pygmalion.chat/local-installation-(cpu)/pygcpp/#android)  
+[SillyTavern documentation regarding KoboldAI](https://docs.sillytavern.app/usage/api-connections/koboldcpp/)  
+[PygmalionAI documentation regarding KoboldAI](https://docs.pygmalion.chat/en/backend/kobold-cpp)  
 [KoboldAI Discord Server](https://koboldai.org/discord)  
 Also check out /lmg/, r/KoboldAI and r/LocalLLaMA/

Home modified by Henk

Henk — Mon, 05 May 2025 15:14:43 -0000

--- v61
+++ v62
@@ -114,7 +114,7 @@
 - 32 layers with LLAMA 7B
 - 18 layers with LLAMA 13B
 - 8 layers with LLAMA 30B
-You can specify `--gpulayers -1` and allow KoboldCpp to guess how many layers it should offload, though this is often not the most accurate.
+You can specify `--gpulayers -1` and allow KoboldCpp to guess how many layers it should offload, though this is often not the most accurate, and doesn't work accurately for multi-gpu setups. You are recommended to determine the optimal layer fit through trial and error for best results.

 ### **How can I run KoboldCpp on my android phone (Termux)?**  
 Inference directly on a mobile device is probably not optimal as it's likely to be slow and memory limited. Consider running it remotely instead, as described in the "Running remotely over network" section. If you still want to proceed, the best way on Android is to build and run KoboldCpp within Termux. Also, check out the guide below "Installing KoboldCpp on Android via Termux".

Home modified by Henk

Henk — Mon, 05 May 2025 15:14:43 -0000

--- v60
+++ v61
@@ -253,7 +253,7 @@
 - NTK-Aware Scaling, set with 'frequency base`, the second parameter of `--ropeconfig`, e.g. `--ropeconfig 1.0 32000` for approx 2x scale, or `--ropeconfig 1.0 82000` for approx 4x scale. Experiment to find optimal values. If `--ropeconfig` is not set, NTK-Aware scaling is the default, automatically set based off your `--contextsize` value.

 ### **What is mmap**  
-mmap, or memory-mapped file I/O, maps files or devices into memory. It is a method of reducing the amount of RAM needed for loading the model, as parts can be read from disk into RAM on demand. mmap is enabled by default, but if it causes issues, you can disable it with `--nommap`
+mmap, or memory-mapped file I/O, maps files or devices into memory. It is a method of reducing the amount of RAM needed for loading the model, as parts can be read from disk into RAM on demand. You can enable it with `--usemmap`

 ### **What is mlock**  
 mlock is a technique used to force a model to remain in RAM after it has been loaded. On some systems, especially when RAM is scarce, the OS may trigger memory swapping too frequently, reducing performance. Setting `--usemlock` will prevent that from happening. mlock is disabled by default.

Home modified by Henk

Henk — Mon, 05 May 2025 15:14:43 -0000

--- v59
+++ v60
@@ -12,7 +12,7 @@
 - An incomplete list of architectures is listed, but there are *many hundreds of other GGUF models*. In general, if it's GGUF, it should work.
   - Llama / Llama2 / Llama3 / Alpaca / GPT4All / Vicuna / Koala / Pygmalion / Metharme / WizardLM / Mistral / Mixtral / Miqu / Qwen / Qwen2 / Yi / Gemma / Gemma2 / GPT-2 / Cerebras / Phi-2 / Phi-3 / GPT-NeoX / Pythia / StableLM / Dolly / RedPajama / GPT-J / RWKV4 / MPT / Falcon / Starcoder / Deepseek and many, **many** more.
 - The best place to get GGUF text models is **Huggingface**. For image models, **CivitAI** has a good selection. Here are some to get you started.
-  - A quick and easy text model to start with is [Airoboros Mistral](https://huggingface.co/TheBloke/airoboros-mistral2.2-7B-GGUF/resolve/main/airoboros-mistral2.2-7b.Q4_K_S.gguf) (smaller and weaker) or [Tiefighter 13B](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter-GGUF/resolve/main/LLaMA2-13B-Tiefighter.Q4_K_S.gguf) (larger model) or [Beepo 22B](https://huggingface.co/concedo/Beepo-22B-GGUF/resolve/main/Beepo-22B-Q4_K_S.gguf) (largest and most powerful).
+  - A quick and easy text model to start with is [Airoboros Mistral 7B](https://huggingface.co/TheBloke/airoboros-mistral2.2-7B-GGUF/resolve/main/airoboros-mistral2.2-7b.Q4_K_S.gguf) (smaller and weaker) or [Tiefighter 13B](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter-GGUF/resolve/main/LLaMA2-13B-Tiefighter.Q4_K_S.gguf) (larger model) or [Beepo 22B](https://huggingface.co/concedo/Beepo-22B-GGUF/resolve/main/Beepo-22B-Q4_K_S.gguf) (largest and most powerful).
   - Other good text generation models to try are [L3-8B-Stheno-v3.2](https://huggingface.co/bartowski/L3-8B-Stheno-v3.2-GGUF/resolve/main/L3-8B-Stheno-v3.2-Q4_K_S.gguf) and [Fimbulvetr-11B-v2](https://huggingface.co/mradermacher/Fimbulvetr-11B-v2-GGUF/resolve/main/Fimbulvetr-11B-v2.Q4_K_S.gguf)
   - Image Generation: [Anything v3](https://huggingface.co/admruul/anything-v3.0/resolve/main/Anything-V3.0-pruned-fp16.safetensors) or [Deliberate V2](https://huggingface.co/Yntec/Deliberate2/resolve/main/Deliberate_v2.safetensors) or [Dreamshaper SDXL](https://huggingface.co/Lykon/dreamshaper-xl-v2-turbo/resolve/main/DreamShaperXL_Turbo_v2_1.safetensors)
   - Image Recognition MMproj: [Pick the correct one for your model architecture here](https://huggingface.co/koboldcpp/mmproj/tree/main)

Home modified by Henk

Henk — Mon, 05 May 2025 15:14:43 -0000

--- v58
+++ v59
@@ -296,8 +296,8 @@
 ### **What is Whisper?**  
 Whisper is a speech-to-text model that can be used for transcription and voice control within Kobold Lite. Load a Whisper GGML model with `--whispermodel`. In Kobold Lite, uses microphone when enabled in settings panel. You can use Push-To-Talk (PTT) or automatic Voice Activity Detection (VAD) aka Hands Free Mode, everything runs locally within your browser including resampling and wav format conversion, and interfaces directly with the KoboldCpp transcription endpoint.

-### **What is OuteTTS?**  
-Whisper is a text-to-speech model that can be used for narration by generating audio within Kobold Lite. You need two models, an OuteTTS GGUF and a WavTokenizer GGUF which you can find [here](https://github.com/LostRuins/koboldcpp/wiki#getting-an-ai-model-file). Once downloaded, load them in the Audio tab or using `--ttsmodel` and `--ttswavtokenizer`. You can also use `--ttsgpu` to load them on the GPU instead.
+### **What is OuteTTS Text To Speech?**  
+OuteTTS is a text-to-speech model that can be used for narration by generating audio within Kobold Lite. You need two models, an OuteTTS GGUF and a WavTokenizer GGUF which you can find [here](https://github.com/LostRuins/koboldcpp/wiki#getting-an-ai-model-file). Once downloaded, load them in the Audio tab or using `--ttsmodel` and `--ttswavtokenizer`. You can also use `--ttsgpu` to load them on the GPU instead, and `--ttsthreads` to set a custom thread count used.

 ### **Can I use SSL?**  
 You can now import your own SSL cert to use with KoboldCpp and serve it over HTTPS with `--ssl [cert.pem] [key.pem]` or via the GUI. The `.pem` files must be unencrypted, you can also generate them with OpenSSL, eg. `openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365 -config openssl.cnf -nodes` for your own self signed certificate.

Home modified by Henk

Henk — Mon, 05 May 2025 15:14:43 -0000

--- v57
+++ v58
@@ -12,7 +12,7 @@
 - An incomplete list of architectures is listed, but there are *many hundreds of other GGUF models*. In general, if it's GGUF, it should work.
   - Llama / Llama2 / Llama3 / Alpaca / GPT4All / Vicuna / Koala / Pygmalion / Metharme / WizardLM / Mistral / Mixtral / Miqu / Qwen / Qwen2 / Yi / Gemma / Gemma2 / GPT-2 / Cerebras / Phi-2 / Phi-3 / GPT-NeoX / Pythia / StableLM / Dolly / RedPajama / GPT-J / RWKV4 / MPT / Falcon / Starcoder / Deepseek and many, **many** more.
 - The best place to get GGUF text models is **Huggingface**. For image models, **CivitAI** has a good selection. Here are some to get you started.
-  - A quick and easy text model to start with is [BookAdventures 8B](https://huggingface.co/KoboldAI/Llama-3.1-8B-BookAdventures-GGUF/resolve/main/Llama-3.1-8B-BookAdventures.Q4_K_S.gguf) or [Tiefighter 13B](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter-GGUF/resolve/main/LLaMA2-13B-Tiefighter.Q4_K_S.gguf) (larger model).
+  - A quick and easy text model to start with is [Airoboros Mistral](https://huggingface.co/TheBloke/airoboros-mistral2.2-7B-GGUF/resolve/main/airoboros-mistral2.2-7b.Q4_K_S.gguf) (smaller and weaker) or [Tiefighter 13B](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter-GGUF/resolve/main/LLaMA2-13B-Tiefighter.Q4_K_S.gguf) (larger model) or [Beepo 22B](https://huggingface.co/concedo/Beepo-22B-GGUF/resolve/main/Beepo-22B-Q4_K_S.gguf) (largest and most powerful).
   - Other good text generation models to try are [L3-8B-Stheno-v3.2](https://huggingface.co/bartowski/L3-8B-Stheno-v3.2-GGUF/resolve/main/L3-8B-Stheno-v3.2-Q4_K_S.gguf) and [Fimbulvetr-11B-v2](https://huggingface.co/mradermacher/Fimbulvetr-11B-v2-GGUF/resolve/main/Fimbulvetr-11B-v2.Q4_K_S.gguf)
   - Image Generation: [Anything v3](https://huggingface.co/admruul/anything-v3.0/resolve/main/Anything-V3.0-pruned-fp16.safetensors) or [Deliberate V2](https://huggingface.co/Yntec/Deliberate2/resolve/main/Deliberate_v2.safetensors) or [Dreamshaper SDXL](https://huggingface.co/Lykon/dreamshaper-xl-v2-turbo/resolve/main/DreamShaperXL_Turbo_v2_1.safetensors)
   - Image Recognition MMproj: [Pick the correct one for your model architecture here](https://huggingface.co/koboldcpp/mmproj/tree/main)

Home modified by Henk

Henk — Mon, 05 May 2025 15:14:43 -0000

--- v56
+++ v57
@@ -1,4 +1,5 @@
 # The KoboldCpp FAQ and Knowledgebase
+[**NEED A GGUF MODEL? CLICK HERE AND READ!**](https://github.com/LostRuins/koboldcpp/wiki#what-models-does-koboldcpp-support-what-architectures-are-supported)  
 Welcome to the KoboldCpp knowledgebase! If you have issues with KoboldCpp, please check if your question is answered here or in one of the link reference first. If not, you can open an issue on Github, or contact us on our [KoboldAI Discord Server](https://koboldai.org/discord). You can find me there as Concedo, or just ask around (we have plenty of people around to help).

 ## Introduction