CLI tool to extract (meta)data from PDF and manipulate PDF files
ExtractThinker is a Document Intelligence library for LLMs
Structured data extraction and instruction calling with ML, LLM
Fast, local-first web content extraction for LLMs
MD/.JSON Document OCR and structured data extraction API
Turn entire websites into LLM-ready markdown or structured data
No-code LLM Platform to launch APIs and ETL Pipelines
PDF Parser for AI-ready data. Automate PDF accessibility
ContextGem: Effortless LLM extraction from documents
Flexible Node.js AI-assisted crawler library
Crawl a website starting from a URL, find relevant pages
Clean network diagrams, One-time setup, zero upkeep
Automatic extraction of relevant features from time series
Model Context Protocol server that integrates AgentQL's data
AI-ready web crawler that extracts and structures website content
Unreal Engine Archives Explorer
Library for extracting streaming site data without official APIs
Fast and efficient unstructured data extraction
BlockArrays for Julia
Synthetic data curation for post-training and data extraction
Extract and convert data from any document, images, pdfs, word doc
Extract internal monitoring data from application logs
Declarative web scraping
Enhance any agent's browser use skill
Document content and metadata extraction microservice