Repo of Qwen2-Audio chat & pretrained large audio language model
Chat & pretrained large audio language model proposed by Alibaba Cloud
Large Audio Language Model built for natural interactions
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Qwen3-omni is a natively end-to-end, omni-modal LLM
AudioMuse-AI is an Open Source Dockerized environment
Capable of understanding text, audio, vision, video
Build multimodal language agents for fast prototype and production
Data Infrastructure providing an approach to multimodal AI workloads
Code and models for ICML 2024 paper, NExT-GPT
LLM Large Model of Selling Anchor
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Data Lake for Deep Learning. Build, manage, and query datasets
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)