Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
textgen-portable-3.12-windows-cuda12.4.zip | 2025-09-02 | 841.0 MB | |
textgen-portable-3.12-windows-cuda11.7.zip | 2025-09-02 | 729.0 MB | |
textgen-portable-3.12-windows-vulkan.zip | 2025-09-02 | 207.6 MB | |
textgen-portable-3.12-windows-cpu.zip | 2025-09-02 | 194.4 MB | |
textgen-portable-3.12-linux-cuda12.4.zip | 2025-09-02 | 846.5 MB | |
textgen-portable-3.12-linux-cuda11.7.zip | 2025-09-02 | 774.1 MB | |
textgen-portable-3.12-linux-vulkan.zip | 2025-09-02 | 246.3 MB | |
textgen-portable-3.12-linux-cpu.zip | 2025-09-02 | 232.9 MB | |
textgen-portable-3.12-macos-arm64.zip | 2025-09-02 | 177.9 MB | |
textgen-portable-3.12-macos-x86_64.zip | 2025-09-02 | 165.1 MB | |
README.md | 2025-09-02 | 2.7 kB | |
v3.12 source code.tar.gz | 2025-09-02 | 24.9 MB | |
v3.12 source code.zip | 2025-09-02 | 25.0 MB | |
Totals: 13 Items | 4.5 GB | 21 |
Changes
- Characters can now think in
chat-instruct
mode! This was possible thanks to many simplifications and improvements to jinja2 template handling:
- Add support for the Seed-OSS-36B-Instruct template.
- Better handle the growth of the chat input textarea:
Before | After |
---|---|
- Make the
--model
flag work with absolute paths for gguf models, like--model /tmp/gemma-3-270m-it-IQ4_NL.gguf
- Make venv portable installs work with Python 3.13
- Optimize LaTeX rendering during streaming for long replies
- Give streaming instruct messages more vertical space
- Preload the instruct and chat fonts for smoother startup
- Improve right sidebar borders in light mode
- Remove the
--flash-attn
flag (it's always on now in llama.cpp) - Suppress "Attempted to select a non-interactive or hidden tab" console warnings, reducing the UI CPU usage during streaming
- Statically link MSVC runtime to remove the Visual C++ Redistributable dependency on Windows for the llama.cpp binaries
- Make the llama.cpp terminal output with
--verbose
less verbose
Bug fixes
- llama.cpp: Fix stderr deadlock while loading some models
- llama.cpp: Fix obtaining the maximum sequence length for GPT-OSS
- Fix the UI failing to launch if the Notebook prompt is too long
- Fix LaTeX rendering for equations with asterisks
- Fix italic and quote colors in headings
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/9961d244f2df6baf40af2f1ddc0927f8d91578c8
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
- Windows/Linux:
- NVIDIA GPU: Use
cuda12.4
for newer GPUs orcuda11.7
for older GPUs and systems with older drivers. - AMD/Intel GPU: Use
vulkan
builds. -
CPU only: Use
cpu
builds. -
Mac:
- Apple Silicon: Use
macos-arm64
. - Intel CPU: Use
macos-x86_64
.
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_data
folder with the one in your existing install. All your settings and models will be moved.