Document (PDF, Word, PPTX ...) extraction and parse API
Generate audiobooks from EPUBs, PDFs and text with captions
OCR model for complex documents with layout-aware structured outputs
Enhances Tesseract OCR output using LLMs (local or API)
Open source healthcare AI
A Repo For Document AI
Stable Diffusion web UI
OCR software, free and offline
Faster Whisper transcription with CTranslate2
Comprehensive Gradio WebUI for audio processing
Visual Causal Flow
A full spaCy pipeline and models for scientific/biomedical documents
Persian NLP Toolkit
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Stable Diffusion web UI
Contexts Optical Compression
Use Microsoft Edge's online text-to-speech service from Python
An opinionated CLI to transcribe Audio files w/ Whisper on-device
A TTS that fits in your CPU (and pocket)
Easy-to-use and high-performance NLP and LLM framework
Deep Research framework, combining language models with tools
Modular AI image and video generation web UI with extensible tools
Public opinion analysis system
Qwen3-ASR is an open-source series of ASR models
Automated translation solution for visual novels