MOSS‑TTS Family open‑source speech and sound generation model
Capable of understanding text, audio, vision, video
Qwen3-omni is a natively end-to-end, omni-modal LLM
A 0.1B Omni model trained from scratch
Provides convenient access to the Anthropic REST API from any Python 3
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Controllable & emotion-expressive zero-shot TTS
Code for the paper Hybrid Spectrogram and Waveform Source Separation