Document (PDF, Word, PPTX ...) extraction and parse API
Generate audiobooks from EPUBs, PDFs and text with captions
OCR model for complex documents with layout-aware structured outputs
Enhances Tesseract OCR output using LLMs (local or API)
Open source healthcare AI
A Repo For Document AI
PDF to Markdown with vision models
OCR software, free and offline
Stable Diffusion web UI
Python ETL framework for stream processing, real-time analytics, LLM
Misc; latest version of waifu2x; 2D video to stereo 3D video
Faster Whisper transcription with CTranslate2
Comprehensive Gradio WebUI for audio processing
A full spaCy pipeline and models for scientific/biomedical documents
Persian NLP Toolkit
Automatic subtitle synchronization tool
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Stable Diffusion web UI
Visual Causal Flow
Translate the video from one language to another and embed dubbing
Contexts Optical Compression
Cut videos with a text editor
Use Microsoft Edge's online text-to-speech service from Python
Automated YouTube Shorts pipeline
A TTS that fits in your CPU (and pocket)