Repo of Qwen2-Audio chat & pretrained large audio language model
Chat & pretrained large audio language model proposed by Alibaba Cloud
Large Audio Language Model built for natural interactions
Qwen3-omni is a natively end-to-end, omni-modal LLM
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
AudioMuse-AI is an Open Source Dockerized environment
Capable of understanding text, audio, vision, video
Scalable data pre processing and curation toolkit for LLMs
Streamlines and simplifies prompt design for both developers
Code and models for ICML 2024 paper, NExT-GPT
Data Infrastructure providing an approach to multimodal AI workloads
Build multimodal language agents for fast prototype and production
GLM-4-Voice | End-to-End Chinese-English Conversational Model
LLM Large Model of Selling Anchor
Data Lake for Deep Learning. Build, manage, and query datasets
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)