A text-to-speech, speech-to-text and speech-to-speech library
Automatic Speech Recognition with Word-level Timestamps
Synchronized Translation for Videos
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
SoundTranscriber can be used to generate automatic transcription / aut
Generate blog articles from video or audio
An opinionated CLI to transcribe Audio files w/ Whisper on-device
A Family of Open Sourced Music Foundation Models
A Web UI for easy subtitle using whisper model
A lightweight audio-to-MIDI converter with pitch bend detection
Comprehensive Gradio WebUI for audio processing
A nearly-live implementation of OpenAI's Whisper
Qwen3-ASR is an open-source series of ASR models
Multilingual speech recognition and audio understanding model
Voice Recognition to Text Tool
AI tool converting video/audio into structured documents instantly
A2M is a desktop app that converts AUDIO TO MIDI in one click.
AI-powered tool for generating, optimizing, and translating subtitles
GLM-4-Voice | End-to-End Chinese-English Conversational Model
The official Python Library for the Groq API
Translate the video from one language to another and embed dubbing
A python tool that uses GPT-4, FFmpeg, and OpenCV
Get your documents ready for gen AI
Build AI-powered semantic search applications
Audio Transcription software for Linux (Vlc) with a foot pedal