AudioMuse-AI is an Open Source Dockerized environment
Audio foundation model excelling in audio understanding
AI tool converting video/audio into structured documents instantly
Repo of Qwen2-Audio chat & pretrained large audio language model
Sample code and notebooks for Generative AI on Google Cloud
Open-source framework for intelligent speech interaction
Chat & pretrained large audio language model proposed by Alibaba Cloud
Large Audio Language Model built for natural interactions
LLM-based Reinforcement Learning audio edit model
Multi-modal large language model designed for audio understanding
Official Python inference and LoRA trainer package
Spring AI Alibaba examples for building and testing AI apps
A python tool that uses GPT-4, FFmpeg, and OpenCV
Framework for building real-time voice and multimodal AI agents
Multimodal Diffusion with Representation Alignment
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Implementation of AudioLM audio generation model in Pytorch
A Systematic Framework for Interactive World Modeling
The official Python library for the OpenAI API
The most powerful and modular diffusion model GUI, api and backend
A Family of Open Sourced Music Foundation Models
Streaming Real-time Audio-Driven Avatar Generation
48khz stereo neural audio codec for general audio
Multimodal-Driven Architecture for Customized Video Generation
Unofficial Python API and agentic skill for Google NotebookLM