Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
textgen-portable-3.10-windows-cuda11.7.zip | 2025-08-12 | 729.6 MB | |
textgen-portable-3.10-windows-cuda12.4.zip | 2025-08-12 | 841.7 MB | |
textgen-portable-3.10-windows-vulkan.zip | 2025-08-12 | 202.6 MB | |
textgen-portable-3.10-windows-cpu.zip | 2025-08-12 | 193.5 MB | |
textgen-portable-3.10-linux-cuda12.4.zip | 2025-08-12 | 846.8 MB | |
textgen-portable-3.10-linux-cuda11.7.zip | 2025-08-12 | 774.3 MB | |
textgen-portable-3.10-macos-x86_64.zip | 2025-08-12 | 164.8 MB | |
textgen-portable-3.10-linux-cpu.zip | 2025-08-12 | 231.7 MB | |
textgen-portable-3.10-linux-vulkan.zip | 2025-08-12 | 240.8 MB | |
textgen-portable-3.10-macos-arm64.zip | 2025-08-12 | 176.7 MB | |
README.md | 2025-08-12 | 2.1 kB | |
v3.10 - Multimodal support! source code.tar.gz | 2025-08-12 | 24.9 MB | |
v3.10 - Multimodal support! source code.zip | 2025-08-12 | 25.0 MB | |
Totals: 13 Items | 4.5 GB | 28 |
See the Multimodal Tutorial
Changes
- Add multimodal support to the UI and API
- With the llama.cpp loader (#7027). This was possible thanks to PR https://github.com/ggml-org/llama.cpp/pull/15108 to llama.cpp. Thanks @65a.
- With ExLlamaV3 through a new ExLlamaV3 loader (#7174). Thanks @Katehuuh.
- Add speculative decoding to the new ExLlamaV3 loader.
- Use ExLlamav3 instead of ExLlamav3_HF by default for EXL3 models, since it supports multimodal and speculative decoding.
- Support loading chat templates from
chat_template.json
files (EXL3/EXL2/Transformers models) - Default max_tokens to 512 in the API instead of 16
- Better organize the right sidebar in the UI
- llama.cpp: Pass
--swa-full
to llama-server whenstreaming-llm
is checked to make it work for models with SWA.
Bug fixes
- Fix getting the ctx-size for newer EXL3/EXL2/Transformers models
- Fix the exllamav2 loader ignoring add_bos_token
- Fix the color of italic text in chat messages
- Fix edit window and buttons in Messenger theme (#7100). Thanks @mykeehu.
Backend updates
- Bump llama.cpp to https://github.com/ggml-org/llama.cpp/commit/f4586ee5986d6f965becb37876d6f3666478a961
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
- Windows/Linux:
- NVIDIA GPU: Use
cuda12.4
for newer GPUs orcuda11.7
for older GPUs and systems with older drivers. - AMD/Intel GPU: Use
vulkan
builds. -
CPU only: Use
cpu
builds. -
Mac:
- Apple Silicon: Use
macos-arm64
. - Intel CPU: Use
macos-x86_64
.
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_data
folder with the one in your existing install. All your settings and models will be moved.