A text-to-speech, speech-to-text and speech-to-speech library
Audio foundation model excelling in audio understanding
Open-source framework for intelligent speech interaction
Repo of Qwen2-Audio chat & pretrained large audio language model
Large Audio Language Model built for natural interactions
LLM-based Reinforcement Learning audio edit model
Multi-modal large language model designed for audio understanding
GUI for a Vocal Remover that uses Deep Neural Networks
The official Python library for the Fish Audio API
Official Python inference and LoRA trainer package
A Family of Open Sourced Music Foundation Models
A Python library for audio
Python Audio Analysis Library: Feature Extraction, Classification
Tokenizer-Free TTS for Multilingual Speech Generation
Taming Stable Diffusion for Lip Sync
A Python library for audio data augmentation
Speech recognition module for Python
Instant voice cloning by MIT and MyShell. Audio foundation model
Miso TTS is an 8 billion, highly emotive text-to-speech model
TTS model capable of streaming conversational audio in realtime
48khz stereo neural audio codec for general audio
A lightweight audio-to-MIDI converter with pitch bend detection
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
MOSS-TTS-Nano is an open-source multilingual tiny speech generation
Generate audiobooks from e-books, voice cloning & 1107+ languages