Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Multilingual Automatic Speech Recognition with word-level timestamps
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
An Open Source text-to-speech system built by inverting Whisper
A2M is a desktop app that converts AUDIO TO MIDI in one click.
Task of transcribing piano recordings into MIDI files
A CLI script to generate subtitle files (SRT/VTT/TXT) for any video