The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2025-06-27	4.1 kB	0
v1.7.1 source code.tar.gz	2025-06-27	32.9 MB	0
v1.7.1 source code.zip	2025-06-27	33.7 MB	0
Totals: 3 Items		66.6 MB	0

What's new in 1.7.1 (2025-06-27)

These are the changes in inference v1.7.1.

New features

FEAT: [UI] enhance audio & rerank model registration params. by @yiboyasss in https://github.com/xorbitsai/inference/pull/3656
FEAT: support async client by @zhcn000000 in https://github.com/xorbitsai/inference/pull/3645
FEAT: [UI] add max_tokens display in rerank model. by @yiboyasss in https://github.com/xorbitsai/inference/pull/3671
FEAT: [UI] add model_ability options for LLM registration. by @yiboyasss in https://github.com/xorbitsai/inference/pull/3663
FEAT: support qwenLong-l1 by @Jun-Howie in https://github.com/xorbitsai/inference/pull/3691
FEAT: [UI] model registration supports packages. by @yiboyasss in https://github.com/xorbitsai/inference/pull/3702
FEAT: support MLU device by @nan9126 in https://github.com/xorbitsai/inference/pull/3693
FEAT: vllm v1 auto enabling by @qinxuye in https://github.com/xorbitsai/inference/pull/3637
FEAT: distributed inference for MLX by @qinxuye in https://github.com/xorbitsai/inference/pull/3700

ENH: add enable_flash_attn param for loading qwen3 embedding & rerank by @qinxuye in https://github.com/xorbitsai/inference/pull/3640
ENH: add more abilities for builtin model families API by @qinxuye in https://github.com/xorbitsai/inference/pull/3658
ENH: improve local cluster startup reliability via child-process readiness signaling by @Checkmate544 in https://github.com/xorbitsai/inference/pull/3642
ENH: FishSpeech support pcm by @codingl2k1 in https://github.com/xorbitsai/inference/pull/3680
ENH: Add 4-sample micro-batching to Qwen-3 reranker to reduce GPU memory by @yasu-oh in https://github.com/xorbitsai/inference/pull/3666
ENH: Limit default n_parallel for llama.cpp backend by @codingl2k1 in https://github.com/xorbitsai/inference/pull/3712
BLD: pin flash-attn & flashinfer-python version and limit sgl-kernel version by @amumu96 in https://github.com/xorbitsai/inference/pull/3669
BLD: Update Dockerfile by @XiaoXiaoJiangYun in https://github.com/xorbitsai/inference/pull/3695
REF: remove unused code by @qinxuye in https://github.com/xorbitsai/inference/pull/3664

BUG: fix TTS error bug :No such file or directory by @robin12jbj in https://github.com/xorbitsai/inference/pull/3625
BUG: Fix max_tokens value in Qwen3 Reranker by @yasu-oh in https://github.com/xorbitsai/inference/pull/3665
BUG: fix custom embedding by @qinxuye in https://github.com/xorbitsai/inference/pull/3677
BUG: [UI] rename the command-line argument from download-hub to download_hub. by @yiboyasss in https://github.com/xorbitsai/inference/pull/3685
BUG: fix jina-clip-v2 for text only or image only by @qinxuye in https://github.com/xorbitsai/inference/pull/3690
BUG: internvl chat error using vllm engine by @amumu96 in https://github.com/xorbitsai/inference/pull/3722
BUG: fix the parsing logic of streaming tool calls by @amumu96 in https://github.com/xorbitsai/inference/pull/3721
BUG: fix <think> wrongly added when set chat_template_kwargs {"enable_thinking": False} by @qinxuye in https://github.com/xorbitsai/inference/pull/3718

DOC: add doc for paraformer by @leslie2046 in https://github.com/xorbitsai/inference/pull/3631
DOC: Flexible model (traditional ML models) by @qinxuye in https://github.com/xorbitsai/inference/pull/3714

Full Changelog: https://github.com/xorbitsai/inference/compare/v1.7.0...v1.7.1

Source: README.md, updated 2025-06-27