Download Latest Version LocalAI-v4.4.3-source.tar.gz (19.3 MB)
Email in envelope

Get an email when there's a new version of LocalAI

Home / v4.4.0
Name Modified Size InfoDownloads / Week
Parent folder
LocalAI-v4.4.0-checksums.txt 2026-06-10 378 Bytes
local-ai-v4.4.0-darwin-arm64 2026-06-10 147.0 MB
local-ai-v4.4.0-linux-amd64 2026-06-10 150.5 MB
local-ai-v4.4.0-linux-arm64 2026-06-10 142.1 MB
LocalAI-v4.4.0-source.tar.gz 2026-06-10 19.2 MB
local-ai-launcher-linux.tar.xz 2026-06-10 16.6 MB
LocalAI.dmg 2026-06-10 12.2 MB
README.md 2026-06-10 30.4 kB
v4.4.0 source code.tar.gz 2026-06-10 19.2 MB
v4.4.0 source code.zip 2026-06-10 20.4 MB
Totals: 10 Items   527.3 MB 1

๐ŸŽ‰ LocalAI 4.4.0 Release! ๐Ÿš€




LocalAI 4.4.0 is out!

This is a big, multimodal-and-distributed release. Two brand-new audio backends land - parakeet.cpp (NVIDIA NeMo Parakeet ASR) and CrispASR (a multi-architecture ASR and TTS engine) - alongside native object detection + segmentation (rfdetr-cpp), video understanding in llama-cpp, and LTX-2 video generation in stablediffusion-ggml. Distributed mode grows up: prefix-cache-aware routing is on by default, and file transfers become resumable. There's a new intelligent middleware layer for request routing, PII filtering and cloud-model proxying, a security hardening pass that closes a credential-leak class across every outbound HTTP client, an interactive local-ai chat CLI, RAG source citations for agents, and a long run of reasoning / tool-call streaming fixes.


๐Ÿ“Œ TL;DR

Area Summary
๐ŸŽ™๏ธ Two new ASR backends parakeet-cpp (NeMo FastConformer TDT/CTC/RNNT, streaming, word/segment timestamps) and crispasr (many ASR architectures + TTS in one binary).
๐Ÿงญ Intelligent Middleware Capability-based model routing, PII detection/redaction, cloud-model proxies + a MITM proxy for subscription-auth Claude Code / Codex.
๐Ÿ›ฐ๏ธ Distributed v4 Prefix-cache-aware routing (on by default), NATS JWT auth + TLS/mTLS, worker registration-token enforcement, resumable HTTP file transfers, boot-time model prefetch, ds4 layer-split inference.
๐ŸŽฅ Video, both ways Video input (understanding) in llama-cpp via mtmd, and video generation via LTX-2 in stablediffusion-ggml.
๐Ÿ‘๏ธ Detection + Segmentation New native rfdetr-cpp backend (RF-DETR), 32 prebuilt GGUFs, bbox + per-detection PNG masks.
๐Ÿ” Outbound HTTP hardening pkg/httpclient refuses cross-host credential-leaking redirects across every outbound client (GHSA-3mj3-57v2-4636).
๐Ÿ—ฃ๏ธ TTS per-request control instructions + a generic params map plumbed end to end (Qwen3-TTS VoiceDesign / CustomVoice, Chatterbox).
๐Ÿ’ป local-ai chat Interactive terminal chat against a running server, with /models, /model, /clear.
๐Ÿ“š RAG citations Agent answers now append a clickable Sources: block from the Knowledge Base.
๐Ÿง  Models Gemma 4 QAT family + QAT-matched MTP speculative-decoding bundles, Ideogram4, LTX-2.3 22B GGUFs.

๐Ÿš€ New Features & Major Enhancements

๐ŸŽ™๏ธ Audio Gets Serious: Two New ASR Backends

This release doubles down on speech-to-text with two independent, cgo-less Go backends (purego, CGO_ENABLED=0), each shipping a full CI matrix, gallery importer and docs.

parakeet-cpp - NVIDIA NeMo Parakeet (#10084). Wraps parakeet.cpp (github.com), a C++/ggml port of NeMo Parakeet (FastConformer TDT/CTC/RNNT/hybrid) that matches the upstream PyTorch models on CPU. Text transcription, OpenAI-compatible word timestamps, and cache-aware streaming (16 kHz PCM chunks, <EOU>/<EOB> utterance boundaries). GGUFs for all 10 Parakeet models ร— 5 quants ship in mudler/parakeet-cpp-gguf. Follow-ups in this cycle made it production-grade:

  • Dynamic batching (#10112) - concurrent transcription requests are batched for throughput.
  • Real, NeMo-faithful segment timestamps (#10207) - words are grouped into segments exactly like NeMo's get_segment_offsets (sentence-punctuation boundaries by default, opt-in segment_gap_threshold silence splitting in encoder frames). Streaming FinalResult segments now carry start/end when the library exposes the ABI v4 JSON entry points.
  • nemotron-3.5-asr multilingual streaming (#10199) + per-request language selection.

crispasr - many architectures + TTS in one backend (#10099). Wraps CrispASR (a whisper.cpp/ggml fork, MIT) through its session C-ABI. One backend serves ASR or TTS depending on the loaded model, with the architecture auto-detected from the GGUF (or forced via backend:). The gallery gains 36 -crispasr entries (32 ASR + 4 TTS):

  • ASR (e2e-verified across Whisper / Parakeet / Moonshine): parakeet, canary, cohere, qwen3, voxtral, granite, fastconformer-ctc, wav2vec2, hubert, data2vec, glm-asr, kyutai-stt, firered-asr, moonshine, mimo-asr, and more.
  • TTS (all four e2e-verified to valid 24 kHz mono WAV): vibevoice, chatterbox, qwen3-tts CustomVoice, orpheus - via backend: / codec: / speaker: / voice: model options.

๐Ÿงญ Intelligent Middleware: Routing, PII Filtering & Cloud Proxies

A new middleware layer (#9802) analyzes, routes, filters and transforms chat requests before they hit a model.

  • Capability-based routing. Requests are classified (e.g. via an ArchRouter-style model) and scored across the capabilities they may require, then routed to the smallest model that satisfies them - easy requests go to small specialized models, hard or uncertain ones to larger general-purpose models. Classified embeddings are reused via cosine similarity so similar requests skip re-classification.
  • PII filtering. Private information is detected per-pattern and can be redacted, rerouted, or blocked, with a streaming PII filter that preserves a buffered-emit invariant on /v1/chat/completions, Anthropic /v1/messages, and /v1/completions. A per-model PII pattern editor lives in the model config UI.
  • Cloud model proxies + MITM. Cloud models and a MITM proxy can take part in routing/filtering - send easy requests to local models and hard ones to the cloud, and use Claude Code / Codex subscriptions (OAuth) through the PII filter via the MITM proxy (subject to provider ToS). Emits proxy_connect + proxy_traffic audit events and restores its listener from runtime_settings.json on restart.

Usage stats are recorded end to end and surfaced in REST, the UI, and MCP. Outbound clients used by this path were also the trigger for the security pass below.


๐Ÿ›ฐ๏ธ Distributed Mode v4

Distributed mode keeps maturing across routing, security and resilience.

Prefix-cache-aware routing, on by default (#10071). Routing now biases toward the replica that already holds the relevant KV/prefix cache, as a load-guarded hint that never routes worse than today's round-robin. A generic prefix tree (pkg/radixtree) maps cumulative prompt-prefix hashes to nodes; core/services/nodes/prefixcache turns the rendered prompt into a deterministic xxhash chain and makes a filter-then-score decision (narrow to load-eligible replicas, then prefer the longest-prefix match), feeding a preferredNodeID into the existing atomic SELECT ... FOR UPDATE pick. Observations sync across frontends over NATS. Round-robin is the floor; disable with --distributed-prefix-cache=false.

NATS JWT auth + TLS/mTLS (#10159). Previously anyone with access to the NATS port could publish backend-install messages or agent jobs (an SSRF / accidental-exposure risk). This adds JWT authentication and TLS/mTLS options, with workers acquiring and auto-refreshing their NATS credentials. Complemented by worker file-transfer registration-token enforcement (#10183).

Resumable file transfers (#10109). Large model GGUFs over flaky/throttled links no longer restart from byte 0. The worker's PUT /v1/files/<key> honors Content-Range (308/416 resume semantics, X-Content-SHA256 binding, final-hash verification) and the master-side stager HEAD-probes for the last accepted offset and resumes, switching to an outer time budget (LOCALAI_FILE_TRANSFER_BUDGET, default 1h) with exponential backoff.

ds4 layer-split distributed inference (#10098). Manual layer-split inference for the ds4 backend: a coordinator owns layers 0:K and listens; workers dial in and own higher ranges, each loading only its slice of the GGUF (a new dependency-free ds4-worker binary, driven via local-ai worker ds4-distributed). Fully back-compatible when ds4_role is absent.

Operational glue. Boot-time gallery prefetch via LOCALAI_PREFETCH_MODELS (#10108); a gated X-LocalAI-Node response header for attribution (#9976); plus fixes: self-heal stale "model not loaded" routing (#10181), stage directory-based models to remote nodes (#10175), in-flight tracking for non-LLM methods - VAD, diarize, voice (#10238), reconciler survives frontend restarts (#9981), cross-replica OpCache sync (#9983), and the reinstall/upgrade UI no longer sticks on "reinstalling" (#10214).


๐ŸŽฅ Video, Both Directions

Video input / understanding in llama-cpp (#10216). Video-capable multimodal models (e.g. SmolVLM2-Video) can now be sent a video in a chat request, mirroring the existing image and audio paths. Tracks the upstream mtmd video landing (ggml-org/llama.cpp#24269); grpc-server.cpp forwards request->videos() into the mtmd files vector on both the template and non-template paths, and the React chat UI accepts video/*, renders an inline <video controls> player, and emits video_url content parts. allow_video is auto-gated by whether the loaded mmproj supports it. ffmpeg/ffprobe (already in the runtime image) extract frames.

Video generation via LTX-2 (#9980). stablediffusion-ggml wires audio_vae_path and embeddings_connectors_path through to the upstream LTX-2 fields, with a new gallery/ltx-ggml.yaml template (T2V / I2V / FLF2V recipes) and six LTX-2.3 22B GGUF gallery entries (dev + distilled, UD-Q4_K_M / Q4_K_M / Q8_0), each bundling the text encoder + video VAE + audio VAE + embeddings connectors. Follow-up fixes wired the diffusion_model flag and vae_decode_only:false for the i2v/flf2v paths (#9986, [#9987]) and muxed LTX-2 audio into the output MP4 (#9990).


๐Ÿ‘๏ธ Native Object Detection + Segmentation: rfdetr-cpp

A new Go native gRPC backend (#10028) dlopens librfdetr.so (built from mudler/rf-detr.cpp (github.com)) and exposes the RF-DETR pipeline through LocalAI's Detect RPC. Supports all 5 detection variants (Nanoโ€ฆLarge) and 3 segmentation variants (SegNano/SegSmall/SegMedium) at F32/F16/Q8_0/Q4_K, with 32 prebuilt GGUFs on HuggingFace. Detection returns bbox + class_name + confidence; segmentation adds per-detection PNG-encoded masks. Matches PyTorch on CPU (sub-pixel bbox match, mask IoU 0.99+), with an HF gallery importer that auto-routes GGUF repos to the native backend.

๐Ÿ”— PR: [#10028]. Also new: Ideogram4 support in stablediffusion-ggml (#10201).


๐Ÿ—ฃ๏ธ TTS: Per-Request Instructions & Params

The OpenAI-compatible /v1/audio/speech instructions field was silently dropped at the HTTPโ†’gRPC boundary, so style/voice could only come from static YAML. PR [#10172] plumbs a generic per-request instructions string and an optional backend-specific params map end to end (proto, schema, core/backend/tts.go), unlocking per-line emotion/style (Qwen3-TTS CustomVoice, Chatterbox) and describe-a-voice (Qwen3-TTS VoiceDesign) from a single model config. Fully backward compatible - empty instructions falls back to YAML.

:::bash
curl http://localhost:8080/v1/audio/speech -H "Content-Type: application/json" -d '{
  "model": "qwen-tts-design",
  "input": "Hello world, this is a test.",
  "instructions": "A calm, low-pitched elderly storyteller with a warm tone."
}'

Also: Qwen3-TTS request-language normalization for flexible matching (#10174), and LocalVQE v1.3 with input/output spectrogram views in the Audio Transform UI (#10113).


๐Ÿง  Reasoning & Tool-Call Streaming Hardening

A focused run of correctness fixes for reasoning models and streaming tool calls:

  • reasoning_effort honored per request and forwarded to the backend so jinja models can act on it (#10082, [#10184]).
  • <think> parsing: stop <think> leaking into content in pure-content mode (#9991), stop a prefilled <think> from swallowing tag-less answers (#10225), and don't auto-enable self-spec MTP for draft-only assistant GGUFs (#10208).
  • Streaming + tools: stop tool-call double-emission when the autoparser is active (#10055), stop tool-call JSON leaking into content on tokenizer-template models (#10057), validate auto-detected XML tool-call names with a robust glm-4.5/Hermes guard (#10059), and stop healing-marker stubs / prefill-misclassified content from corrupting the stream (#9999, [#10000]).

๐Ÿ’ป local-ai chat + ๐Ÿ“š RAG Citations + ๐Ÿ›ฐ๏ธ Realtime

  • Interactive CLI chat (#10226). A new opt-in local-ai chat command connects to a running server over the OpenAI-compatible API, streams completions, and supports /models, /model <name>, /clear, /exit. Keeps local-ai run focused on the server lifecycle. (Fixes [#1535].)
  • RAG source citations (#10228). When an agent answers from the Knowledge Base, the response now appends a clickable Sources: block listing the original documents - deduplicated per source, with the citation-free version saved to long-term memory. (Closes [#9331].)
  • Configurable WebRTC ICE candidates (#10231). New LOCALAI_WEBRTC_NAT_1TO1_IPS / LOCALAI_WEBRTC_ICE_INTERFACES knobs fix /v1/realtime calls dropping a few seconds in under Docker host networking (unroutable docker0/veth candidates).
  • "Fits in my GPU" filter (#10017) on the Install Models page, plus a single shared /api/operations poller across UI consumers (#10029) and a React bundle code-split (#10042).

๐Ÿงฉ Backend Capability Registration & Startup Speed

  • Backend capability registration fixes so the right backend is picked for the right job: register 5 backends missing from BackendCapabilities (#10107), and add face/speaker-recognition constants registering insightface + speaker-recognition (#10110).
  • Faster startup (#10213): skip vocab arrays and mmap GGUF headers during config parsing.

Click for the full changelog below! ## What's Changed ### Bug fixes :bug: * fix(config): register 5 backends missing from BackendCapabilities by @Dennisadira in https://github.com/mudler/LocalAI/pull/10107 * fix(config): register parakeet-cpp as a transcript backend (#9718) by @Dennisadira in https://github.com/mudler/LocalAI/pull/10106 * fix(parakeet-cpp): cublas/hipblas/vulkan builds were silently CPU-only by @localai-bot in https://github.com/mudler/LocalAI/pull/10120 * fix(nemo): pin texterrors to 1.1.6 for GLIBCXX compatibility by @fqscfqj in https://github.com/mudler/LocalAI/pull/10134 * fix(parakeet-cpp): convert audio before the non-batched transcribe path by @localai-bot in https://github.com/mudler/LocalAI/pull/10161 * fix(distributed): stage directory-based models to remote nodes by @localai-bot in https://github.com/mudler/LocalAI/pull/10175 * fix(config): add face/speaker recognition constants and register insightface + speaker-recognition by @Dennisadira in https://github.com/mudler/LocalAI/pull/10110 * fix(distributed): self-heal stale 'model not loaded' routing by @localai-bot in https://github.com/mudler/LocalAI/pull/10181 * fix(docs): use relearn notice shortcode instead of unsupported alert by @localai-bot in https://github.com/mudler/LocalAI/pull/10206 * fix(mtp): don't auto-enable self-spec MTP for draft-only assistant GGUFs by @localai-bot in https://github.com/mudler/LocalAI/pull/10208 * fix(config): skip vocab arrays and mmap GGUF headers to speed up startup by @Dennisadira in https://github.com/mudler/LocalAI/pull/10213 * fix: distributed backend reinstall/upgrade UI stuck on 'reinstalling' by @localai-bot in https://github.com/mudler/LocalAI/pull/10214 * fix(reasoning): stop prefilled <think> from swallowing tag-less answers by @localai-bot in https://github.com/mudler/LocalAI/pull/10225 * fix(cli): handle chat output errors by @Oceankj in https://github.com/mudler/LocalAI/pull/10229 * fix(distributed): track in-flight for non-LLM inference methods (VAD, diarize, voice, ...) by @localai-bot in https://github.com/mudler/LocalAI/pull/10238 ### Exciting New Features ๐ŸŽ‰ * feat: prefix-cache-aware routing for distributed mode by @localai-bot in https://github.com/mudler/LocalAI/pull/10071 * feat(ds4): layer-split distributed inference by @localai-bot in https://github.com/mudler/LocalAI/pull/10098 * feat(crispasr): add CrispASR backend โ€” multi-architecture ASR + TTS by @localai-bot in https://github.com/mudler/LocalAI/pull/10099 * feat(worker): add LOCALAI_PREFETCH_MODELS for boot-time gallery prefetch by @localai-bot in https://github.com/mudler/LocalAI/pull/10108 * feat(distributed): resumable file uploads via HTTP Content-Range by @localai-bot in https://github.com/mudler/LocalAI/pull/10109 * feat(localvqe/audio): v1.3 release and add spectrograms to audio transform UI by @richiejp in https://github.com/mudler/LocalAI/pull/10113 * feat(parakeet-cpp): dynamic batching for concurrent transcription requests by @localai-bot in https://github.com/mudler/LocalAI/pull/10112 * feat(distributed): Add NATS JWT authentication and TLS/mTLS options by @richiejp in https://github.com/mudler/LocalAI/pull/10159 * feat(tts): support per-request instructions and params by @localai-bot in https://github.com/mudler/LocalAI/pull/10172 * feat(qwen3-tts-cpp): normalize request language for flexible matching by @localai-bot in https://github.com/mudler/LocalAI/pull/10174 * feat(distributed): enforce registration token for worker file transfer by @richiejp in https://github.com/mudler/LocalAI/pull/10183 * feat: forward reasoning_effort to the backend so jinja models honor it by @localai-bot in https://github.com/mudler/LocalAI/pull/10184 * Harden gallery-agent Hugging Face fetches against transient rate limiting by @Copilot in https://github.com/mudler/LocalAI/pull/10187 * feat(parakeet-cpp): nemotron-3.5-asr multilingual streaming model + request language support by @localai-bot in https://github.com/mudler/LocalAI/pull/10199 * feat: support Ideogram4 in stablediffusion-ggml backend + gallery by @localai-bot in https://github.com/mudler/LocalAI/pull/10201 * feat(parakeet-cpp): real segment timestamps (NeMo-faithful) by @localai-bot in https://github.com/mudler/LocalAI/pull/10207 * feat(llama-cpp): video input support (mtmd [#24269]) by @localai-bot in https://github.com/mudler/LocalAI/pull/10216 * feat(agents): surface KB source citations in RAG responses by @petechentw in https://github.com/mudler/LocalAI/pull/10228 * feat(cli): add interactive chat mode by @Oceankj in https://github.com/mudler/LocalAI/pull/10226 * feat(realtime): make WebRTC ICE candidates configurable by @localai-bot in https://github.com/mudler/LocalAI/pull/10231 ### ๐Ÿง  Models * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/10163 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/10200 * chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/10209 * feat(gallery): add Gemma 4 QAT family + MTP speculative-decoding pairs by @localai-bot in https://github.com/mudler/LocalAI/pull/10215 ### ๐Ÿ“– Documentation and examples * docs: :arrow_up: update docs version mudler/LocalAI by @localai-bot in https://github.com/mudler/LocalAI/pull/10091 * docs: :arrow_up: update docs version mudler/LocalAI by @localai-bot in https://github.com/mudler/LocalAI/pull/10114 * docs: fix documentation typos by @Zhao73 in https://github.com/mudler/LocalAI/pull/10125 * docs(llama.cpp): note tensor split now works with quantized KV cache by @mudler in https://github.com/mudler/LocalAI/pull/10135 * docs: position LocalAI as a composable engine, not a bundle by @localai-bot in https://github.com/mudler/LocalAI/pull/10136 * docs: architecture & feature diagrams (blueprint style) by @localai-bot in https://github.com/mudler/LocalAI/pull/10137 * docs: fix distributed-mode diagram (workers use NATS, not PostgreSQL) by @localai-bot in https://github.com/mudler/LocalAI/pull/10138 ### ๐Ÿ‘’ Dependencies * chore: :arrow_up: Update ikawrakow/ik_llama.cpp to `3f40e73c367ad9f0c1b1819f28c7348c26aa340d` by @localai-bot in https://github.com/mudler/LocalAI/pull/10097 * chore: :arrow_up: Update antirez/ds4 to `ba00a8a88c4c5810a3d1fed6b7b8fa2b44b82fdc` by @localai-bot in https://github.com/mudler/LocalAI/pull/10095 * chore: :arrow_up: Update leejet/stable-diffusion.cpp to `d2797b86670622b6538123b4aeb5fbb6be2653c5` by @localai-bot in https://github.com/mudler/LocalAI/pull/10094 * chore: :arrow_up: Update ggml-org/llama.cpp to `d6588daa800058dfa54f1d7ea695b1a810c8ae18` by @localai-bot in https://github.com/mudler/LocalAI/pull/10093 * chore: :arrow_up: Update mudler/parakeet.cpp to `cb45f68068081af01e7092e91b038ee353eb56be` by @localai-bot in https://github.com/mudler/LocalAI/pull/10116 * chore: :arrow_up: Update ggml-org/whisper.cpp to `fe69461618ffc50ba8afa65c25cc6c6e34d4537f` by @localai-bot in https://github.com/mudler/LocalAI/pull/10117 * chore: :arrow_up: Update leejet/stable-diffusion.cpp to `be65ac7511b30379b003626c15224798929e33d4` by @localai-bot in https://github.com/mudler/LocalAI/pull/10118 * chore: :arrow_up: Update ggml-org/llama.cpp to `399739d5c5978351f39e3454bfbfbab4f369088f` by @localai-bot in https://github.com/mudler/LocalAI/pull/10119 * chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/10131 * chore: :arrow_up: Update ggml-org/whisper.cpp to `23ee03506a91ac3d3f0071b40e66a430eebdfa1d` by @localai-bot in https://github.com/mudler/LocalAI/pull/10130 * chore: :arrow_up: Update leejet/stable-diffusion.cpp to `7948df8ac1070f5f6881b8d34675821893eb97d6` by @localai-bot in https://github.com/mudler/LocalAI/pull/10127 * chore: :arrow_up: Update mudler/parakeet.cpp to `8a7c48209d7882a7ce79a6b306270e4703194543` by @localai-bot in https://github.com/mudler/LocalAI/pull/10129 * chore: :arrow_up: Update ggml-org/llama.cpp to `5dcb71166686799f0d873eab7386234302d05ecf` by @localai-bot in https://github.com/mudler/LocalAI/pull/10128 * chore: :arrow_up: Update CrispStrobe/CrispASR to `05e60432bcb5bc2113f8c395a41e86497c11504a` by @localai-bot in https://github.com/mudler/LocalAI/pull/10115 * chore(deps): bump github.com/mudler/edgevpn from 0.32.2 to 0.34.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/10153 * chore: :arrow_up: Update mudler/parakeet.cpp to `9edf17c3ada66e0f881dcff155492867db7ac4cf` by @localai-bot in https://github.com/mudler/LocalAI/pull/10141 * chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.65.0 to 0.66.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/10151 * chore(deps): bump grpcio from 1.80.0 to 1.81.0 in /backend/python/vllm by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/10157 * chore(deps): bump github.com/google/go-containerregistry from 0.21.5 to 0.21.6 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/10149 * chore: :arrow_up: Update leejet/stable-diffusion.cpp to `2d40a8b2adcdf8b5b0ca0535f3bb7801b6ba13e5` by @localai-bot in https://github.com/mudler/LocalAI/pull/10144 * chore(deps): bump securego/gosec from 2.22.9 to 2.27.1 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/10147 * chore: :arrow_up: Update ggml-org/whisper.cpp to `610e664ba7cfe3af46125ed1b5a1184fccb51bcd` by @localai-bot in https://github.com/mudler/LocalAI/pull/10140 * chore(deps): bump grpcio from 1.80.0 to 1.81.0 in /backend/python/transformers by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/10158 * chore: :arrow_up: Update ggml-org/llama.cpp to `5c394fdc8b564eff6faacc50a139529d875f0e36` by @localai-bot in https://github.com/mudler/LocalAI/pull/10143 * chore: :arrow_up: Update antirez/ds4 to `477c0e82e2699b35a65fd0a1ed6fe66b41087dfe` by @localai-bot in https://github.com/mudler/LocalAI/pull/10142 * chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/10169 * chore: :arrow_up: Update ggml-org/llama.cpp to `94a220cd6745e6e3f8de62870b66fd5b9bc92700` by @localai-bot in https://github.com/mudler/LocalAI/pull/10168 * chore: :arrow_up: Update leejet/stable-diffusion.cpp to `1f9ee88e09c258053fa59d5e05e23dfb10fa0b13` by @localai-bot in https://github.com/mudler/LocalAI/pull/10166 * chore: :arrow_up: Update CrispStrobe/CrispASR to `13d54e110e1538e0f0bc3af0680b9ab246cfb48d` by @localai-bot in https://github.com/mudler/LocalAI/pull/10145 * chore: :arrow_up: Update predict-woo/qwen3-tts.cpp to `136e5d36c17083da0321fd96512dc7b263f94a44` by @localai-bot in https://github.com/mudler/LocalAI/pull/10165 * chore: :arrow_up: Update mudler/parakeet.cpp to `b11fe5bca78ad8b342dd559a43d76df3984bb447` by @localai-bot in https://github.com/mudler/LocalAI/pull/10167 * chore: :arrow_up: Update ikawrakow/ik_llama.cpp to `1520eda980564241434b791ce2bbbd128c4be9ea` by @localai-bot in https://github.com/mudler/LocalAI/pull/10180 * chore: :arrow_up: Update ggml-org/llama.cpp to `7c158fbb4aec1bdc9c81d6ca0e785139f4826fae` by @localai-bot in https://github.com/mudler/LocalAI/pull/10179 * chore: :arrow_up: Update ggml-org/whisper.cpp to `99613cb720b65036237d44b52f753b51f75c2797` by @localai-bot in https://github.com/mudler/LocalAI/pull/10178 * chore: :arrow_up: Update vllm-project/vllm cu130 wheel to `0.22.1` by @localai-bot in https://github.com/mudler/LocalAI/pull/10188 * chore: bump LocalAGI + localrecall (fix pgvector hybrid search seqscan, [#10186]) by @localai-bot in https://github.com/mudler/LocalAI/pull/10192 * chore: :arrow_up: Update mudler/parakeet.cpp to `843600590f96a31467a5199f827c253f34c110f7` by @localai-bot in https://github.com/mudler/LocalAI/pull/10198 * chore: :arrow_up: Update ikawrakow/ik_llama.cpp to `6b9de3dbaa21ae95ea80638e5ee836795cc48c93` by @localai-bot in https://github.com/mudler/LocalAI/pull/10190 * chore: :arrow_up: Update mudler/parakeet.cpp to `abd0087dcc92ec5ad1f96f9fd86c49eb26a5ce67` by @localai-bot in https://github.com/mudler/LocalAI/pull/10204 * chore: :arrow_up: Update ggml-org/whisper.cpp to `a8ec021f2750a473ff4a8f3883bc9fdf5feafa84` by @localai-bot in https://github.com/mudler/LocalAI/pull/10202 * chore(turboquant): bump to 7d9715f1 + fix compilation against rebased fork by @localai-bot in https://github.com/mudler/LocalAI/pull/10205 * chore: :arrow_up: Update ggml-org/llama.cpp to `31e82494c0a3913c919c1027fa70500fbf4c07dd` by @localai-bot in https://github.com/mudler/LocalAI/pull/10191 * chore: :arrow_up: Update mudler/parakeet.cpp to `e270af73b94c9a5c37ec516230219ed4580e1db6` by @localai-bot in https://github.com/mudler/LocalAI/pull/10212 * chore: :arrow_up: Update leejet/stable-diffusion.cpp to `b3d56d0ba1bd437886079e339118e8e75bb79ee7` by @localai-bot in https://github.com/mudler/LocalAI/pull/10211 * chore: :arrow_up: Update ggml-org/llama.cpp to `9e3b928fd8c9d14dbf15a8768b9fdd7e5c721d66` by @localai-bot in https://github.com/mudler/LocalAI/pull/10210 * chore: :arrow_up: Update antirez/ds4 to `c463029c205c2ec8d7ab6c0df4a3f52979091286` by @localai-bot in https://github.com/mudler/LocalAI/pull/10189 * chore: :arrow_up: Update CrispStrobe/CrispASR to `f7838a306687f22c281d29c250f879a4ab3df2d7` by @localai-bot in https://github.com/mudler/LocalAI/pull/10177 * chore: :arrow_up: Update antirez/ds4 to `512d07cb08f234b704b5a5959aa9e2d4c466eeb0` by @localai-bot in https://github.com/mudler/LocalAI/pull/10224 * chore: :arrow_up: Update ikawrakow/ik_llama.cpp to `2768b6251548b78b6610e95edad13f888ad95982` by @localai-bot in https://github.com/mudler/LocalAI/pull/10219 * chore: :arrow_up: Update leejet/stable-diffusion.cpp to `19bdfe22d255d5b4dff39d449318b9bc5ea2317f` by @localai-bot in https://github.com/mudler/LocalAI/pull/10222 * chore: :arrow_up: Update CrispStrobe/CrispASR to `97cad527d247edefc904e6c40c4cf5ee78bed055` by @localai-bot in https://github.com/mudler/LocalAI/pull/10221 * chore: :arrow_up: Update ggml-org/whisper.cpp to `df7638d8229a243af8a4b5a8ae557e0d74e0a0ae` by @localai-bot in https://github.com/mudler/LocalAI/pull/10220 * chore: :arrow_up: Update ikawrakow/ik_llama.cpp to `e6f8112f3ba126eed3ff5b30cdd08085414a7516` by @localai-bot in https://github.com/mudler/LocalAI/pull/10233 * chore: :arrow_up: Update antirez/ds4 to `91bafb5acd5a6cf00b1e55ef68bf40ddd207bee7` by @localai-bot in https://github.com/mudler/LocalAI/pull/10234 * chore: :arrow_up: Update ggml-org/llama.cpp to `039e20a2db9e87b2477c76cc04905f3e1acad77f` by @localai-bot in https://github.com/mudler/LocalAI/pull/10223 * chore: :arrow_up: Update CrispStrobe/CrispASR to `c29f6653a516a3001d923944dad8892072cc7334` by @localai-bot in https://github.com/mudler/LocalAI/pull/10236 ### Other Changes * refactor(routing): extract replica picker into pkg/clusterrouting by @localai-bot in https://github.com/mudler/LocalAI/pull/10123 * test(react-ui): add page render-smoke specs, reset the coverage gate by @richiejp in https://github.com/mudler/LocalAI/pull/10122

๐Ÿ™Œ New Contributors

Enjoy!


Full Changelog: https://github.com/mudler/LocalAI/compare/v4.3.0...v4.4.0

Source: README.md, updated 2026-06-10