A text-to-speech, speech-to-text and speech-to-speech library
Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
Large Audio Language Model built for natural interactions
GUI for a Vocal Remover that uses Deep Neural Networks
The official Python library for the Fish Audio API
Audio Plugin for Audio to MIDI transcription using deep learning
Official Python inference and LoRA trainer package
A Family of Open Sourced Music Foundation Models
A Python library for audio
Python Audio Analysis Library: Feature Extraction, Classification
The open-source voice synthesis studio powered by Qwen3-TTS
Tokenizer-Free TTS for Multilingual Speech Generation
Taming Stable Diffusion for Lip Sync
A Python library for audio data augmentation
Code for openai.fm, a demo for the OpenAI Speech API
Instant voice cloning by MIT and MyShell. Audio foundation model
Miso TTS is an 8 billion, highly emotive text-to-speech model
TTS model capable of streaming conversational audio in realtime
48khz stereo neural audio codec for general audio
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
A lightweight audio-to-MIDI converter with pitch bend detection
Video translation and dubbing tool powered by LLMs
MOSS-TTS-Nano is an open-source multilingual tiny speech generation
Generate audiobooks from e-books, voice cloning & 1107+ languages