| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| textgen-portable-3.18-windows-cuda12.4.zip | 2025-11-19 | 909.2 MB | |
| textgen-portable-3.18-windows-vulkan.zip | 2025-11-19 | 213.4 MB | |
| textgen-portable-3.18-windows-cpu.zip | 2025-11-19 | 199.2 MB | |
| textgen-portable-3.18-linux-cuda12.4.zip | 2025-11-19 | 1.6 GB | |
| textgen-portable-3.18-linux-rocm.zip | 2025-11-19 | 614.9 MB | |
| textgen-portable-3.18-linux-vulkan.zip | 2025-11-19 | 292.2 MB | |
| textgen-portable-3.18-linux-cpu.zip | 2025-11-19 | 248.7 MB | |
| textgen-portable-3.18-macos-arm64.zip | 2025-11-19 | 192.1 MB | |
| README.md | 2025-11-19 | 1.2 kB | |
| v3.18 source code.tar.gz | 2025-11-19 | 24.9 MB | |
| v3.18 source code.zip | 2025-11-19 | 25.0 MB | |
| Totals: 11 Items | 4.3 GB | 0 | |
Changes
- Add
--cpu-moeflag for llama.cpp to move MoE model experts to CPU, reducing VRAM usage. - Add ROCm portable builds for AMD GPUs on Linux. This was made possible by PR https://github.com/oobabooga/llama-cpp-binaries/pull/7 by @ShortTimeNoSee. Thanks, @ShortTimeNoSee.
- Remove deprecated macOS 13 wheels (no longer supported by GitHub Actions).
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/10e9780154365b191fb43ca4830659ef12def80f
- Update ExLlamaV3 to 0.0.15
- Update peft to 0.18.*
- Update triton-windows to 3.5.1.post21
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
- Windows/Linux:
- NVIDIA GPU: Use
cuda12.4. - AMD/Intel GPU: Use
vulkanbuilds. -
CPU only: Use
cpubuilds. -
Mac:
- Apple Silicon: Use
macos-arm64.
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.