Qwen3-omni is a natively end-to-end, omni-modal LLM
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Capable of understanding text, audio, vision, video
Toolkit for conversational AI
Repo of Qwen2-Audio chat & pretrained large audio language model
Chat & pretrained large audio language model proposed by Alibaba Cloud
Private Open AI on Kubernetes
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)