| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2026-04-28 | 3.4 kB | |
| v2.0.3_ fine-tuning validation, runtime stability, and streaming improvements source code.tar.gz | 2026-04-28 | 3.0 MB | |
| v2.0.3_ fine-tuning validation, runtime stability, and streaming improvements source code.zip | 2026-04-28 | 3.1 MB | |
| Totals: 3 Items | 6.1 MB | 3 | |
VoxCPM v2.0.3
This release focuses on fine-tuning usability, runtime stability, safer LoRA loading, and faster streaming inference.
Highlights
- Added
voxcpm validatefor pre-flight JSONL training manifest validation. - Added optional
ref_audiosupport in the fine-tuning data pipeline. - Improved runtime device handling with explicit
--devicesupport and safer MPS dtype behavior. - Improved VoxCPM2 streaming VAE decoding by avoiding redundant overlap decoding.
- Hardened legacy LoRA checkpoint loading with
weights_only=True. - Fixed LoRA rank mismatch handling in
lora_ft_webui.py.
New Features
- Add
voxcpm validate --manifest train.jsonlto catch training data issues before fine-tuning. - Validates JSONL format, required
text/audiofields, audio existence/readability, sample rate, duration stats, text length stats, and optionalref_audio. - Add optional
ref_audiosupport for fine-tuning manifests. - Training packing now supports
[103, ref_audio, 104, text, 101, target_audio, 102]. - Loss is applied only to the target audio segment.
- Add
--deviceCLI argument for model inference commands. - Supports
auto,cpu,mps,cuda, and indexed CUDA devices such ascuda:0.
Performance
- Improve VoxCPM2 streaming VAE decode with a stateful
StreamingVAEDecoder. - Streaming decode now processes only the newest latent patch and carries causal convolution state internally.
- This removes redundant overlap decoding and reduces streaming VAE decode overhead.
Fixes
- Fix CUDA Graph dynamic-shape accumulation by using the uncompiled feature encoder for prefill.
- Fix CPU SDPA attention mask broadcasting by using an explicit broadcastable mask shape.
- Fix non-string text validation order to raise the intended
ValueErrorinstead ofAttributeError. - Fix file descriptor leaks when loading
config.jsonin local model loaders. - Fix MPS audio quality issues by promoting low-precision dtypes to
float32on Apple Silicon by default. - Fix
VOXCPM_MPS_DTYPEoverride validation to match supported dtype aliases. - Fix LoRA rank mismatch in
lora_ft_webui.pyby reloading the model when checkpoint rank differs. - Fix Web Demo control text handling by stripping parentheses before constructing the model prompt.
Security
- Legacy LoRA
.ckpt/.pthloading now usestorch.load(..., weights_only=True). - This reduces the risk of arbitrary pickle payload execution while preserving tensor-only checkpoint compatibility.
Documentation
- Document vLLM-Omni as a production serving option for VoxCPM2.
- Update Web Demo usage to
python app.py --port 8808. - Update ModelScope local download example.
- Clarify Python requirement as
>=3.10,<3.13. - Add ComfyUI_RH_VoxCPM to the ecosystem list.
Tests
- Added coverage for training manifest validation, including sample-rate mismatch, missing audio, relative paths,
ref_audio, and CLI exit codes. - Added runtime device selection tests.
- Added LoRA checkpoint safety tests for tensor-only checkpoints and malicious pickle payloads.
- Added CLI tests for
--devicedefaults and argument forwarding.
Contributors
Thanks to the contributors included in this release:
- @KevinAHM
- @kuishou68
- @sharziki
- @SuperMarioYL
- @Oumnya
- @linyueqian
- @shaun0927
- @gluttony-10
Full Changelog: https://github.com/OpenBMB/VoxCPM/compare/2.0.2...2.0.3