Box Extract
Box Extract is an AI-powered data extraction solution that intelligently identifies, retrieves, and converts structured information from unstructured content such as documents, spreadsheets, PDFs, images, and other file types into metadata that can be stored, searched, and used to automate business processes. It combines advanced large language models, integrated OCR, chain-of-thought prompting, extraction-specific retrieval-augmented generation, and agentic reasoning techniques to understand document meaning and structure with high accuracy, without requiring custom model training or heavy configuration. Users can choose between Standard and Enhanced Extract Agents, handling everything from basic fields like names, dates, and amounts to complex items such as risky clauses, tables, and graphs, and build Custom Extract Agents with configurable metadata templates that run at scale across folders and repositories.
Learn more
Tablextract
TableXtract is an AI-powered tool designed for the easy extraction of tables from PDFs and images, allowing users to convert them into Excel, CSV, or JSON formats. It automates data entry, significantly reducing the time spent on manual tasks. To use TableXtract, simply upload your document (PDF, JPG, PNG, etc.), and the AI will automatically recognize and extract tables. You can then download the extracted tables in your preferred format. TableXtract supports extraction from PDFs, images, and scanned documents, and exports extracted tables to Excel, CSV, or JSON. It uses advanced AI for accurate table recognition and structure preservation. Use cases include extracting financial data from reports, converting research article tables into spreadsheets, and transcribing tables from receipts and invoices.
Learn more
DeepTagger
DeepTagger is a no-code, AI-powered document processing platform that turns any documents (PDFs, images, Word, etc.) into structured, usable data through an intuitive “highlight-and-label” interface. You upload your files; highlight the pieces of data you care about; train the model via examples rather than templates; then run predictions, export results, and refine accuracy. It handles complex/nested structures (e.g., line items within invoices, tables within tables), supports scanned documents and low-quality images via strong OCR, and offers features like splitting multi-document PDFs, intent/context understanding, and position-aware extraction (so if the same phrase appears many times, DeepTagger can distinguish which instance to pull). Pricing is usage-based with a free tier processing up to 200 documents; higher tiers unlock features like batch prediction, nested schemas, priority support, multi-tenant architecture, and enterprise-grade compliance.
Learn more
Parsebridge
Product information: Parsebridge is a PDF parsing API that transforms PDFs into clean, structured Markdown. It extracts text, tables, and data from PDF documents with a powerful API built for developers who need reliable document parsing at scale. Complex PDFs, tables, multi-column layouts, nested structures, and scanned pages are handled in one API call, turning the hard parts that usually break other parsers into Markdown you can actually use. Merged cells, nested headers, and complex layouts are parsed correctly instead of coming back garbled. Parsebridge supports live testing by pasting a PDF URL or uploading a PDF to the preview page-one Markdown without an account. It currently supports PDF files only, focusing on extraction quality for PDF documents, with files up to 100MB supported. Under the hood, Parsebridge uses Docling, an open source parser known for table extraction and layout preservation, while the platform handles infrastructure, OCR, scaling, and the API layer on top.
Learn more