Multi-modal large language model designed for audio understanding
Automagically synchronize subtitles with video
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Industrial-level controllable zero-shot text-to-speech system
AI tool converting video/audio into structured documents instantly
Taming Stable Diffusion for Lip Sync
Synchronized Translation for Videos
A python tool that uses GPT-4, FFmpeg, and OpenCV
Fast multimodal LLM for real-time voice interaction and AI apps
A simple native web interface that uses ChatTTS to synthesize text
Video editing with Python
A Telegram bot that integrates with OpenAI's official ChatGPT APIs
TorchMultimodal is a PyTorch library
Towards Human-Level Text-to-Speech through Style Diffusion
YouTube video and audio download tool
A high quality MP3 encoder
Converts Videos to .AMV (Used For Cheap MP3 Players)
A Conversational Speech Generation Model
A deep learning toolkit for Text-to-Speech, battle-tested in research
Audio Transcription software for Linux (Gstreamer) with a foot pedal
Audio Transcription software for Linux (Vlc) with a foot pedal
Audio Transcription software for Linux (Vlc) with a foot pedal
Audio Transcription software for Linux (Gstreamer) with a foot pedal
State-of-the-art deep learning based audio codec