| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| LocalAI.dmg | 2025-11-26 | 12.0 MB | |
| local-ai-launcher-linux.tar.xz | 2025-11-26 | 16.2 MB | |
| LocalAI-v3.8.0-checksums.txt | 2025-11-26 | 473 Bytes | |
| LocalAI-v3.8.0-source.tar.gz | 2025-11-26 | 10.0 MB | |
| local-ai-v3.8.0-darwin-amd64 | 2025-11-26 | 77.5 MB | |
| local-ai-v3.8.0-darwin-arm64 | 2025-11-26 | 75.4 MB | |
| local-ai-v3.8.0-linux-amd64 | 2025-11-26 | 76.0 MB | |
| local-ai-v3.8.0-linux-arm64 | 2025-11-26 | 73.4 MB | |
| README.md | 2025-11-26 | 23.9 kB | |
| v3.8.0 source code.tar.gz | 2025-11-26 | 10.0 MB | |
| v3.8.0 source code.zip | 2025-11-26 | 10.4 MB | |
| Totals: 11 Items | 361.0 MB | 15 | |
Welcome to LocalAI 3.8.0 !
LocalAI 3.8.0 focuses on smoothing out the user experience and exposing more power to the user without requiring restarts or complex configuration files. This release introduces a new onboarding flow and a universal model loader that handles everything from HF URLs to local files.
We’ve also improved the chat interface, addressed long-standing requests regarding OpenAI API compatibility (specifically SSE streaming standards) and exposed more granular controls for some backends (llama.cpp) and backend management.
📌 TL;DR
| Feature | Summary |
|---|---|
| Universal Model Import | Import directly from Hugging Face, Ollama, OCI, or local paths. Auto-detects backends and handles chat templates. |
| UI & Index Overhaul | New onboarding wizard, auto-model selection on boot, and a cleaner tabular view for model management. |
| MCP Live Streaming | New: Agent actions and tool calls are now streamed live via the Model Context Protocol—see reasoning in real-time. |
| Hot-Reloadable Settings | Modify watchdogs, API keys, P2P settings, and defaults without restarting the container. |
| Chat enhancements | Chat history and parallel conversations are now persisted in local storage. |
| Strict SSE Compliance | Fixed streaming format to exactly match OpenAI specs (resolves issues with LangChain/JS clients). |
| Advanced Config | Fine-tune context_shift, cache_ram, and parallel workers via YAML options. |
| Logprobs & Logitbias | Added token-level probability support for improved agent/eval workflows. |
Feature Breakdown
🚀 Universal Model Import (URL-based)
We have refactored how models are imported. You no longer need to manually write configuration files for common use cases. The new importer accepts URLs from Hugging Face, Ollama, and OCI registries, or local file paths also from the Web interface.
https://github.com/user-attachments/assets/230576c2-2abe-4b20-97c0-935d4ed6e7e7
- Auto-Detection: The system attempts to identify the correct backend (e.g.,
llama.cppvsdiffusers) and applies native chat templates (e.g.,llama-3,mistral) automatically by reading the model metadata. - Customization during Import: You can override defaults immediately, for example, forcing a specific quantization on a GGUF file or selecting
vLLMovertransformers. - Multimodal Support: Vision components (
mmproj) are detected and configured automatically. - File Safety: We added a safeguard to prevent the deletion of model files (blobs) if they are shared by multiple model configurations.
🎨 Complete UI Overhaul
The web interface has been redesigned for better usability and clearer management.
https://github.com/user-attachments/assets/260a566d-8ffb-4659-b8d2-966c18b3688d
- Onboarding Wizard: A guided flow helps first-time users import or install a model in under 30 seconds.
- Auto-Focus & Selection: The input field captures focus automatically, and a default model is loaded on startup so you don't start in a "no model selected" state.
- Tabular Management: Models and backends are now organized in a cleaner list view, making it easier to see what is installed.
https://github.com/user-attachments/assets/a522d85b-9c31-4fe8-a9fa-b97c17b0bb4d
🤖 Agentic Ecosystem & MCP Live Streaming
LocalAI 3.8.0 significantly upgrades support for agentic workflows using the Model Context Protocol (MCP).
- Live Action Streaming: We have added a new endpoint to stream agent results as they happen. Instead of waiting for the final output, you can now watch the agent "think": seeing tool calls, reasoning steps, and intermediate actions streamed live in the UI.
https://github.com/user-attachments/assets/857d62d1-183c-4e2f-ad15-63450702e3d4
Configuring MCP via the interface is now simplified:
https://github.com/user-attachments/assets/161387c0-5126-416d-ae3b-e2a2e4858b89
🔁 Runtime System Settings
A new Settings > System panel exposes configuration options that previously required environment variables or a restart.
https://github.com/user-attachments/assets/a1e3e205-1ec6-4d02-a86a-370bc24a74e5
- Immediate Effect: Toggling Watchdogs, P2P, and Gallery availability applies instantly.
- API Key Management: You can now generate, rotate, and expire API keys via the UI.
- Network: CORS and CSRF settings are now accessible here (note: these specific network settings still require a restart to take effect).
Note: In order to benefit from persisting runtime settings, in older LocalAI deployments it's necessary to mount the
/configurationdirectory from the container image.
⚙️ Advanced llama.cpp Configuration
For power users running large context windows or high-throughput setups, we've exposed additional underlying llama.cpp options in the YAML config. You can now tune context shifting, RAM limits for the KV cache, and parallel worker slots.
:::yaml
options:
- context_shift:false
- cache_ram:-1
- use_jinja:true
- parallel:2
- grpc_servers:localhost:50051,localhost:50052
📊 Logprobs & Logitbias Support
This release adds full support for logitbias and logprobs. This is critical for advanced agentic logic, Self-RAG, and evaluating model confidence / hallucination rates. It supports the OpenAI specification.
🛠️ Fixes & Improvements
OpenAI Compatibility:
* SSE Streaming: Fixed a critical issue where streaming responses were slightly non-compliant (e.g., sending empty content chunks or missing finish_reason). This resolves integration issues with openai-node, LangChain, and LlamaIndex.
* Top_N Behavior: In the reranker, top_n can now be omitted or set to 0 to return all results, rather than defaulting to an arbitrary limit.
General Fixes:
* Model Preview: When downloading, you can now see the actual filename and size before committing to the download.
* Tool Handling: Fixed crashes when tool content is missing or malformed.
* TTS: Fixed dropdown selection states for TTS models.
* Browser Storage: Chat history is now persisted in your browser's local storage. You can switch between parallel chats, rename them, and export them to JSON.
* True Cancellation: Clicking "Stop" during a stream now correctly propagates a cancellation context to the backend (works for llama.cpp, vLLM, transformers, and diffusers). This immediately stops generation and frees up resources.
🚀 The Complete Local Stack for Privacy-First AI
LocalAI |
The free, Open Source OpenAI alternative. Drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required. |
LocalAGI |
Local AI agent management platform. Drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI. |
LocalRecall |
RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Works alongside LocalAI and LocalAGI. |
❤️ Thank You
Over 35,000 stars and growing. LocalAI is a true FOSS movement — built by contributors, powered by community.
If you believe in privacy-first AI: - ✅ Star the repo - 💬 Contribute code, docs, or feedback - 📣 Share with others
Your support keeps this stack alive.
✅ Full Changelog
📋 Click to expand full changelog
## What's Changed ### Bug fixes :bug: * fix(reranker): respect `top_n` in the request by @mkhludnev in https://github.com/mudler/LocalAI/pull/7025 * fix(chatterbox): pin numpy by @mudler in https://github.com/mudler/LocalAI/pull/7198 * fix(reranker): support omitting top_n by @mkhludnev in https://github.com/mudler/LocalAI/pull/7199 * fix(api): SSE streaming format to comply with specification by @Copilot in https://github.com/mudler/LocalAI/pull/7182 * fix(edit): propagate correctly opts when reloading by @mudler in https://github.com/mudler/LocalAI/pull/7233 * fix(reranker): llama-cpp sort score desc, crop top_n by @mkhludnev in https://github.com/mudler/LocalAI/pull/7211 * fix: handle tool errors by @mudler in https://github.com/mudler/LocalAI/pull/7271 * fix(reranker): tests and top_n check fix [#7212] by @mkhludnev in https://github.com/mudler/LocalAI/pull/7284 * fix the tts model dropdown to show the currently selected model by @ErixM in https://github.com/mudler/LocalAI/pull/7306 * fix: do not delete files if used by other configured models by @mudler in https://github.com/mudler/LocalAI/pull/7235 * fix(llama.cpp): handle corner cases with tool content by @mudler in https://github.com/mudler/LocalAI/pull/7324 ### Exciting New Features 🎉 * feat(llama.cpp): allow to set cache-ram and ctx_shift by @mudler in https://github.com/mudler/LocalAI/pull/7009 * chore: show success toast when system prompt is updated by @shohidulbari in https://github.com/mudler/LocalAI/pull/7131 * feat(llama.cpp): consolidate options and respect tokenizer template when enabled by @mudler in https://github.com/mudler/LocalAI/pull/7120 * feat: respect context and add request cancellation by @mudler in https://github.com/mudler/LocalAI/pull/7187 * feat(ui): add wizard when p2p is disabled by @mudler in https://github.com/mudler/LocalAI/pull/7218 * feat(ui): chat stats, small visual enhancements by @mudler in https://github.com/mudler/LocalAI/pull/7223 * chore: display file names in model preview by @shohidulbari in https://github.com/mudler/LocalAI/pull/7251 * feat: import models via URI by @mudler in https://github.com/mudler/LocalAI/pull/7245 * chore(importers): small logic enhancements by @mudler in https://github.com/mudler/LocalAI/pull/7262 * feat(ui): allow to cancel ops by @mudler in https://github.com/mudler/LocalAI/pull/7264 * feat: migrate to echo and enable cancellation of non-streaming requests by @mudler in https://github.com/mudler/LocalAI/pull/7270 * feat(mcp): add LocalAI endpoint to stream live results of the agent by @mudler in https://github.com/mudler/LocalAI/pull/7274 * chore: do not use placeholder image by @mudler in https://github.com/mudler/LocalAI/pull/7279 * chore: guide the user to import models by @mudler in https://github.com/mudler/LocalAI/pull/7280 * chore(ui): import vendored libs by @mudler in https://github.com/mudler/LocalAI/pull/7281 * feat(importers): add transformers and vLLM by @mudler in https://github.com/mudler/LocalAI/pull/7278 * feat: restyle index by @mudler in https://github.com/mudler/LocalAI/pull/7282 * feat: add support to logitbias and logprobs by @mudler in https://github.com/mudler/LocalAI/pull/7283 * feat(ui): small refinements by @mudler in https://github.com/mudler/LocalAI/pull/7285 * feat(index): minor enhancements by @mudler in https://github.com/mudler/LocalAI/pull/7288 * chore: scroll in thinking mode, better buttons placement by @mudler in https://github.com/mudler/LocalAI/pull/7289 * chore: small ux enhancements by @mudler in https://github.com/mudler/LocalAI/pull/7290 * feat(ui): add backend reinstall button by @mudler in https://github.com/mudler/LocalAI/pull/7305 * feat(importer): unify importing code with CLI by @mudler in https://github.com/mudler/LocalAI/pull/7299 * feat(ui): runtime settings by @mudler in https://github.com/mudler/LocalAI/pull/7320 * feat(importers): Add diffuser backend importer with ginkgo tests and UI support by @Copilot in https://github.com/mudler/LocalAI/pull/7316 * feat(ui): add chat history by @mudler in https://github.com/mudler/LocalAI/pull/7325 * feat(inpainting): add inpainting endpoint, wire ImageGenerationFunc and return generated image URL by @gmaOCR in https://github.com/mudler/LocalAI/pull/7328 ### 🧠 Models * chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/6972 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/6982 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/6989 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/7017 * chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/7024 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/7039 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/7040 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/7068 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/7077 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/7127 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/7133 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/7162 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/7205 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/7216 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/7237 * chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/7248 ### 📖 Documentation and examples * feat: docs revamp by @mudler in https://github.com/mudler/LocalAI/pull/7313 * fix: Update Installer Options URL by @filipeaaoliveira in https://github.com/mudler/LocalAI/pull/7330 ### 👒 Dependencies * chore(deps): bump github.com/mudler/cogito from 0.4.0 to 0.5.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7054 * chore(deps): bump github.com/onsi/ginkgo/v2 from 2.26.0 to 2.27.2 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7056 * chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.0.0 to 1.1.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7053 * chore(deps): bump github.com/valyala/fasthttp from 1.55.0 to 1.68.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7057 * chore(deps): bump github.com/mudler/edgevpn from 0.31.0 to 0.31.1 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7055 * chore(deps): bump github.com/containerd/containerd from 1.7.28 to 1.7.29 in the go_modules group across 1 directory by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7149 * chore(deps): bump appleboy/ssh-action from 1.2.2 to 1.2.3 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7224 * chore(deps): bump github.com/mudler/cogito from 0.5.0 to 0.5.1 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7226 * chore(deps): bump github.com/jaypipes/ghw from 0.19.1 to 0.20.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7227 * chore(deps): bump github.com/docker/docker from 28.5.1+incompatible to 28.5.2+incompatible by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7228 * chore(deps): bump github.com/testcontainers/testcontainers-go from 0.38.0 to 0.40.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7230 * chore(deps): bump github.com/ebitengine/purego from 0.9.0 to 0.9.1 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7229 * chore(deps): bump fyne.io/fyne/v2 from 2.7.0 to 2.7.1 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7293 * chore(deps): bump go.yaml.in/yaml/v2 from 2.4.2 to 2.4.3 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7294 * chore(deps): bump github.com/alecthomas/kong from 1.12.1 to 1.13.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7296 * chore(deps): bump google.golang.org/protobuf from 1.36.8 to 1.36.10 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7295 * chore(deps): bump golang.org/x/crypto from 0.43.0 to 0.45.0 in the go_modules group across 1 directory by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/7319 ### Other Changes * docs: :arrow_up: update docs version mudler/LocalAI by @localai-bot in https://github.com/mudler/LocalAI/pull/6996 * chore: :arrow_up: Update ggml-org/whisper.cpp to `999a7e0cbf8484dc2cea1e9f855d6b39f34f7ae9` by @localai-bot in https://github.com/mudler/LocalAI/pull/6997 * chore: :arrow_up: Update ggml-org/llama.cpp to `2f68ce7cfd20e9e7098514bf730e5389b7bba908` by @localai-bot in https://github.com/mudler/LocalAI/pull/6998 * chore: :arrow_up: Update ggml-org/llama.cpp to `cd5e3b57541ecc52421130742f4d89acbcf77cd4` by @localai-bot in https://github.com/mudler/LocalAI/pull/7023 * chore: display warning only when directory is present by @mudler in https://github.com/mudler/LocalAI/pull/7050 * chore: :arrow_up: Update ggml-org/llama.cpp to `c5023daf607c578d6344c628eb7da18ac3d92d32` by @localai-bot in https://github.com/mudler/LocalAI/pull/7069 * chore: :arrow_up: Update ggml-org/llama.cpp to `ad51c0a720062a04349c779aae301ad65ca4c856` by @localai-bot in https://github.com/mudler/LocalAI/pull/7098 * chore: :arrow_up: Update ggml-org/llama.cpp to `a44d77126c911d105f7f800c17da21b2a5b112d1` by @localai-bot in https://github.com/mudler/LocalAI/pull/7125 * chore: :arrow_up: Update ggml-org/llama.cpp to `7f09a680af6e0ef612de81018e1d19c19b8651e8` by @localai-bot in https://github.com/mudler/LocalAI/pull/7156 * chore: use air to live reload in dev environment by @shohidulbari in https://github.com/mudler/LocalAI/pull/7186 * chore: :arrow_up: Update ggml-org/llama.cpp to `65156105069fa86a4a81b6cb0e8cb583f6420677` by @localai-bot in https://github.com/mudler/LocalAI/pull/7184 * chore: :arrow_up: Update ggml-org/llama.cpp to `333f2595a3e0e4c0abf233f2f29ef1710acd134d` by @localai-bot in https://github.com/mudler/LocalAI/pull/7201 * chore: :arrow_up: Update ggml-org/llama.cpp to `b8595b16e69e3029e06be3b8f6635f9812b2bc3f` by @localai-bot in https://github.com/mudler/LocalAI/pull/7210 * chore: :arrow_up: Update ggml-org/whisper.cpp to `a1867e0dad0b21b35afa43fc815dae60c9a139d6` by @localai-bot in https://github.com/mudler/LocalAI/pull/7231 * chore: :arrow_up: Update ggml-org/llama.cpp to `13730c183b9e1a32c09bf132b5367697d6c55048` by @localai-bot in https://github.com/mudler/LocalAI/pull/7232 * chore: :arrow_up: Update ggml-org/llama.cpp to `7d019cff744b73084b15ca81ba9916f3efab1223` by @localai-bot in https://github.com/mudler/LocalAI/pull/7247 * feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/7267 * chore: :arrow_up: Update ggml-org/whisper.cpp to `d9b7613b34a343848af572cc14467fc5e82fc788` by @localai-bot in https://github.com/mudler/LocalAI/pull/7268 * chore(deps): bump llama.cpp to `c4abcb2457217198efdd67d02675f5fddb7071c2` by @mudler in https://github.com/mudler/LocalAI/pull/7266 * chore: :arrow_up: Update ggml-org/llama.cpp to `9b17d74ab7d31cb7d15ee7eec1616c3d825a84c0` by @localai-bot in https://github.com/mudler/LocalAI/pull/7273 * feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/7276 * chore: :arrow_up: Update ggml-org/llama.cpp to `662192e1dcd224bc25759aadd0190577524c6a66` by @localai-bot in https://github.com/mudler/LocalAI/pull/7277 * feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/7286 * chore: :arrow_up: Update ggml-org/llama.cpp to `80deff3648b93727422461c41c7279ef1dac7452` by @localai-bot in https://github.com/mudler/LocalAI/pull/7287 * chore(docs): improve documentation and split into sections bigger topics by @mudler in https://github.com/mudler/LocalAI/pull/7292 * chore: :arrow_up: Update ggml-org/whisper.cpp to `b12abefa9be2abae39a73fa903322af135024a36` by @localai-bot in https://github.com/mudler/LocalAI/pull/7300 * chore: :arrow_up: Update ggml-org/llama.cpp to `cb623de3fc61011e5062522b4d05721a22f2e916` by @localai-bot in https://github.com/mudler/LocalAI/pull/7301 * chore(deps): bump llama.cpp to '10e9780154365b191fb43ca4830659ef12def80f by @mudler in https://github.com/mudler/LocalAI/pull/7311 * chore: :arrow_up: Update ggml-org/llama.cpp to `7d77f07325985c03a91fa371d0a68ef88a91ec7f` by @localai-bot in https://github.com/mudler/LocalAI/pull/7314 * chore: :arrow_up: Update ggml-org/whisper.cpp to `19ceec8eac980403b714d603e5ca31653cd42a3f` by @localai-bot in https://github.com/mudler/LocalAI/pull/7321 * chore(docs): add documentation about import by @mudler in https://github.com/mudler/LocalAI/pull/7315 * chore: :arrow_up: Update ggml-org/llama.cpp to `dd0f3219419b24740864b5343958a97e1b3e4b26` by @localai-bot in https://github.com/mudler/LocalAI/pull/7322 * chore(chatterbox): bump l4t index to support more recent pytorch by @mudler in https://github.com/mudler/LocalAI/pull/7332 * chore: :arrow_up: Update ggml-org/llama.cpp to `23bc779a6e58762ea892eca1801b2ea1b9050c00` by @localai-bot in https://github.com/mudler/LocalAI/pull/7331 * Revert "chore(chatterbox): bump l4t index to support more recent pytorch" by @mudler in https://github.com/mudler/LocalAI/pull/7333New Contributors
- @shohidulbari made their first contribution in https://github.com/mudler/LocalAI/pull/7131
- @mkhludnev made their first contribution in https://github.com/mudler/LocalAI/pull/7025
- @ErixM made their first contribution in https://github.com/mudler/LocalAI/pull/7306
- @filipeaaoliveira made their first contribution in https://github.com/mudler/LocalAI/pull/7330
Full Changelog: https://github.com/mudler/LocalAI/compare/v3.7.0...v3.8.0