| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2026-03-28 | 3.6 kB | |
| v2.4.0 source code.tar.gz | 2026-03-28 | 55.7 MB | |
| v2.4.0 source code.zip | 2026-03-28 | 56.7 MB | |
| Totals: 3 Items | 112.4 MB | 0 | |
What's new in 2.4.0 (2026-03-29)
These are the changes in inference v2.4.0.
New features
- FEAT: introducing OTEL by @leslie2046 in https://github.com/xorbitsai/inference/pull/4666
- FEAT: [UI] add Xagent link by @yiboyasss in https://github.com/xorbitsai/inference/pull/4693
- FEAT: [UI] remove featured/all toggle and prioritize featured models by @yiboyasss in https://github.com/xorbitsai/inference/pull/4694
- feat(vllm): support v0.18.0 by @llyycchhee in https://github.com/xorbitsai/inference/pull/4718
- FEAT: add gpu load metrics by @leslie2046 in https://github.com/xorbitsai/inference/pull/4712
- feat: Upgrade the base image to version 0.17.1 and add support for aarch64 version images by @zwt-1234 in https://github.com/xorbitsai/inference/pull/4726
- feat(ci): fix aarch64 build by @zwt-1234 in https://github.com/xorbitsai/inference/pull/4735
Enhancements
- ENH: update model "qwen3.5" JSON by @qinxuye in https://github.com/xorbitsai/inference/pull/4689
- ENH: update model "qwen3.5" JSON by @llyycchhee in https://github.com/xorbitsai/inference/pull/4707
- ENH: update models JSON [llm] by @XprobeBot in https://github.com/xorbitsai/inference/pull/4710
- ENH: update models JSON [llm] by @XprobeBot in https://github.com/xorbitsai/inference/pull/4713
- enh: adapt
normalizeparam of vllm>0.16.0 for embedding models. by @la1ty in https://github.com/xorbitsai/inference/pull/4729 - BLD: Requirements dependency version adjustment by @zwt-1234 in https://github.com/xorbitsai/inference/pull/4736
- bld: Requirements dependency version adjustment by @zwt-1234 in https://github.com/xorbitsai/inference/pull/4737
- bld: Requirements dependency version adjustment by @zwt-1234 in https://github.com/xorbitsai/inference/pull/4738
- REF: parallelize supervisor model registration listing by @leslie2046 in https://github.com/xorbitsai/inference/pull/4690
Bug fixes
- BUG: Fix async client FormData handling and response lifecycle issues by @qinxuye in https://github.com/xorbitsai/inference/pull/4687
- BUG: MLX backend accumulates intermediate generation steps into final output (tested on 1.17.0, 2.0.0, 2.1.0) [#4615] by @nasircsms in https://github.com/xorbitsai/inference/pull/4617
- fix(worker): inject parent site-packages into child venv via .pth file by @nasircsms in https://github.com/xorbitsai/inference/pull/4692
- BUG: launch multi gpu qwen3.5 error by @llyycchhee in https://github.com/xorbitsai/inference/pull/4700
- fix(tool_call): add qwen3.5 by @llyycchhee in https://github.com/xorbitsai/inference/pull/4703
- fix(qwen3.5): support tool calls by @llyycchhee in https://github.com/xorbitsai/inference/pull/4709
- FIX: qwen3.5 reasoning parse by @llyycchhee in https://github.com/xorbitsai/inference/pull/4719
- fix(qwen3.5): support XML-like tool call format in non-streaming mode by @amumu96 in https://github.com/xorbitsai/inference/pull/4715
- FIX: webui crash when gpu_utilization is none by @leslie2046 in https://github.com/xorbitsai/inference/pull/4728
Documentation
- DOC: add v2.3.0 release notes by @qinxuye in https://github.com/xorbitsai/inference/pull/4688
- DOC: add xagent in readme by @qinxuye in https://github.com/xorbitsai/inference/pull/4699
New Contributors
- @nasircsms made their first contribution in https://github.com/xorbitsai/inference/pull/4617
- @octo-patch made their first contribution in https://github.com/xorbitsai/inference/pull/4704
- @la1ty made their first contribution in https://github.com/xorbitsai/inference/pull/4729
Full Changelog: https://github.com/xorbitsai/inference/compare/v2.3.0...v2.4.0