SOTA Open Source TTS
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Open-source multi-speaker long-form text-to-speech model
A general fine-tuning kit geared toward image/video/audio diffusion
Comprehensive Gradio WebUI for audio processing
Generate audiobooks from EPUBs, PDFs and text with captions
TTS model capable of streaming conversational audio in realtime
Download videos from websites like YouTube and many others
The music player of today
Chat & pretrained large audio language model proposed by Alibaba Cloud
PersonaPlex code
Qwen3-omni is a natively end-to-end, omni-modal LLM
A python module to download twitter spaces
An open-source music player with simple UI
Open source AI model for generating full songs from lyrics prompts
Generate blog articles from video or audio
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Video player for improving quality of hand-drawn images
Capable of understanding text, audio, vision, video
Generate audiobooks from e-books
Interface for OuteTTS models
Streaming Real-time Audio-Driven Avatar Generation
Robust Speech Recognition via Large-Scale Weak Supervision
Fast multimodal LLM for real-time voice interaction and AI apps
Multilingual speech recognition and audio understanding model