Speech-to-text, text-to-speech, and speaker recognition
Offline speech recognition API for Android, iOS, Raspberry Pi
Automatic Speech Recognition with Word-level Timestamps
A PyTorch-based Speech Toolkit
Self-hosted AI audio transcription
A Web UI for easy subtitle using whisper model
Translate the video from one language to another and embed dubbing
Multi-modal large language model designed for audio understanding
End-to-end speech processing toolkit
Java library designed to integrate Speech-to-Text
Chinese voice dialogue robot/smart speaker project
Speech Recognition Toolkit
Cross Audio-Visual Recognition using 3D Architectures
Beamforming and Speech Recognition Toolkit
Speaker Recognition System - Matlab source code
Voice to Text Sentiment Analysis
(audio, video, image) Multimedia Multimodal Information Retrieval
A Sound Recognition Framework developed to J2ME plataform