Repo of Qwen2-Audio chat & pretrained large audio language model
Chat & pretrained large audio language model proposed by Alibaba Cloud
Large Multimodal Models for Video Understanding and Editing
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
CTC-based forced aligner for audio-text in 158 languages