Large Language Model Text Generation Inference
Document (PDF, Word, PPTX ...) extraction and parse API
Module for automatic summarization of text documents and HTML pages
High-performance inference server for text embeddings models API layer
AI tool that removes hardcoded subtitles and text from videos locally
Persian NLP Toolkit
Robust Speech Recognition via Large-Scale Weak Supervision
Han Language Processing
A full spaCy pipeline and models for scientific/biomedical documents
Underthesea - Vietnamese NLP Toolkit
Open source healthcare AI
OCR model for complex documents with layout-aware structured outputs
OCRmyPDF adds an OCR text layer to scanned PDF files
Contexts Optical Compression
Comprehensive Gradio WebUI for audio processing
Toolkit for conversational AI
The most accurate natural language detection library for Python
A Repo For Document AI
Agent harness to make your slop code well-engineered and beautiful
Easy-to-use and powerful NLP library with Awesome model zoo
Generate audiobooks from EPUBs, PDFs and text with captions
A high-quality PDF to Markdown tool based on large language model
Easy-to-use and high-performance NLP and LLM framework
Enhances Tesseract OCR output using LLMs (local or API)
Open source libraries and APIs to build custom preprocessing pipelines