| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| textgen-portable-3.19-windows-cuda12.4.zip | 2025-11-29 | 910.5 MB | |
| textgen-portable-3.19-windows-vulkan.zip | 2025-11-29 | 213.8 MB | |
| textgen-portable-3.19-windows-cpu.zip | 2025-11-29 | 199.5 MB | |
| textgen-portable-3.19-linux-cuda12.4.zip | 2025-11-29 | 1.6 GB | |
| textgen-portable-3.19-linux-rocm.zip | 2025-11-29 | 618.4 MB | |
| textgen-portable-3.19-macos-x86_64.zip | 2025-11-29 | 166.4 MB | |
| textgen-portable-3.19-linux-vulkan.zip | 2025-11-29 | 293.2 MB | |
| textgen-portable-3.19-linux-cpu.zip | 2025-11-29 | 249.6 MB | |
| textgen-portable-3.19-macos-arm64.zip | 2025-11-29 | 192.9 MB | |
| README.md | 2025-11-29 | 1.2 kB | |
| v3.19 source code.tar.gz | 2025-11-29 | 24.9 MB | |
| v3.19 source code.zip | 2025-11-29 | 25.0 MB | |
| Totals: 12 Items | 4.5 GB | 0 | |
Qwen3-Next llama.cpp support!
Changes
- Add slider for --ubatch-size for llama.cpp loader, change defaults for better MoE performance (#7316). Thanks, @GodEmperor785.
- This significantly improves prompt processing speeds for MoE models in both full-GPU and GPU+CPU configurations.
Bug fixes
- fix(deps): upgrade coqui-tts to >=0.27.0 for transformers 4.55 compatibility (#7329). Thanks, @aidevtime.
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/ff55414c42522adbeaa1bd9c52c0e9db16942484, adding Qwen3-Next support
- Update ExLlamaV3 to 0.0.16
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
- Windows/Linux:
- NVIDIA GPU: Use
cuda12.4. - AMD/Intel GPU: Use
vulkanbuilds. -
CPU only: Use
cpubuilds. -
Mac:
- Apple Silicon: Use
macos-arm64.
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.