AudioMuse-AI is an Open Source Dockerized environment
Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
AI tool converting video/audio into structured documents instantly
Open-source framework for intelligent speech interaction
Chat & pretrained large audio language model proposed by Alibaba Cloud
Sample code and notebooks for Generative AI on Google Cloud
Large Audio Language Model built for natural interactions
LLM-based Reinforcement Learning audio edit model
Multi-modal large language model designed for audio understanding
Official Python inference and LoRA trainer package
Spring AI Alibaba examples for building and testing AI apps
A python tool that uses GPT-4, FFmpeg, and OpenCV
Framework for building real-time voice and multimodal AI agents
Multimodal Diffusion with Representation Alignment
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Implementation of AudioLM audio generation model in Pytorch
A Systematic Framework for Interactive World Modeling
A Family of Open Sourced Music Foundation Models
The most powerful and modular diffusion model GUI, api and backend
The official Python library for the OpenAI API
48khz stereo neural audio codec for general audio
Multimodal-Driven Architecture for Customized Video Generation
Unofficial Python API and agentic skill for Google NotebookLM
Open Source Speech Language Model