A GUI tool for extracting hard-coded subtitle (hardsub) from videos
CLI tool to extract (meta)data from PDF and manipulate PDF files
ExtractThinker is a Document Intelligence library for LLMs
Structured data extraction and instruction calling with ML, LLM
Document (PDF, Word, PPTX ...) extraction and parse API
Zero-copy PDF text extraction library written in Zig
ContextGem: Effortless LLM extraction from documents
A high-quality tool for convert PDF to Markdown and JSON
Open source NLP guide with models, methods, and real use cases
No-code LLM Platform to launch APIs and ETL Pipelines
Make websites accessible for AI agents
Document content and metadata extraction microservice
A Simple and Universal Swarm Intelligence Engine
Python Audio Analysis Library: Feature Extraction, Classification
AI-ready web crawler that extracts and structures website content
dude uncomplicated data extraction: A simple framework
End-to-end pipeline converting generative videos
Python & command-line tool to gather text on the Web
Did you say you like data?
Synthetic data curation for post-training and data extraction
A cross-platform GUI wrapper for yt-dlp written in PySide6
NLP Cloud serves high performance pre-trained or custom models for NER
PDF scientific paper translation with preserved formats
The highest-scoring AI memory system ever benchmarked
NSFW Windows app to batch download images and videos