dude uncomplicated data extraction: A simple framework
CLI tool to extract (meta)data from PDF and manipulate PDF files
ExtractThinker is a Document Intelligence library for LLMs
Did you say you like data?
Structured data extraction and instruction calling with ML, LLM
AI-ready web crawler that extracts and structures website content
No-code LLM Platform to launch APIs and ETL Pipelines
ContextGem: Effortless LLM extraction from documents
Make websites accessible for AI agents
Document content and metadata extraction microservice
Zero-copy PDF text extraction library written in Zig
A high-quality tool for convert PDF to Markdown and JSON
Synthetic data curation for post-training and data extraction
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine
A Python tool to help extracting information from structured PDFs
Document (PDF, Word, PPTX ...) extraction and parse API
An on-premises, OCR-free unstructured data extraction
Python tool for crawling and extracting structured data from news site
Burp Suite extension for JavaScript static analysis
Python module for parsing semi-structured text into python tables
Python3 web crawler practice
A Simple and Universal Swarm Intelligence Engine
End-to-end pipeline converting generative videos
Python & command-line tool to gather text on the Web
Superlinked is a Python framework for AI Engineers