Automatic Speech Recognition with Word-level Timestamps
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Faster Whisper transcription with CTranslate2
Crowdsourcing platform for full text transcription and tagging
Library for OCR-related tasks powered by Deep Learning
Voice Recognition to Text Tool
Training data (data labeling, annotation, workflow) for all data types
Han Language Processing
Enhances Tesseract OCR output using LLMs (local or API)
Book_4_Matrix Power | The Iris Book: From Addition, Subtraction
Toolkit for conversational AI
Formula recognition based on LaTeX-OCR and ONNXRuntime
CLI tool to extract (meta)data from PDF and manipulate PDF files
Replace OpenAI GPT with another LLM in your app
Framework for building real-time voice and multimodal AI agents
Accurate × Fast × Comprehensive
Omnilingual ASR Open-Source Multilingual SpeechRecognition
Open source AI VTuber platform with voice chat and Live2D avatars
OCRmyPDF adds an OCR text layer to scanned PDF files
The no-nonsense RAG chunking library
Open source annotation tool for machine learning practitioners
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
2D and 3D Face alignment library build using pytorch
A high-quality tool for convert PDF to Markdown and JSON
Fast multimodal LLM for real-time voice interaction and AI apps