Enhances Tesseract OCR output using LLMs (local or API)
Handwritten Text Recognition (HTR) system implemented with TensorFlow
OCR software, free and offline
A GUI tool for extracting hard-coded subtitle (hardsub) from videos
Open source annotation tool for machine learning practitioners
Open-source industrial-grade ASR models
Readest is a modern, feature-rich ebook reader
Voice Recognition to Text Tool
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
OCR offline image text recognition command line windows program
StreamSpeech is a seamless model for offline speech recognition
OCR expert VLM powered by Hunyuan's native multimodal architecture
Deploy your private Gemini application for free with one click
Foundational Models for State-of-the-Art Speech and Text Translation
Translate the video from one language to another and embed dubbing
A gallery that showcases on-device ML/GenAI use cases
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
WPPConnect is an open source project
Workflow and speech recognition app
A framework to enable multimodal models to operate a computer
Towards Studio-Grade Character Animation via In-Context Learning of 3D
AzioSpeech Recognition and Translation
LLM Large Model of Selling Anchor
Powerful Android AI agent with tools, automation, and Linux shell
Bailing is a voice dialogue robot similar to GPT-4o