DigiParser
DigiParser is a document workflow automation platform that simplifies data extraction from documents like invoices, contracts, forms, resumes, and receipts.
It uses advanced OCR and machine learning to extract, validate, and process data, converting documents into structured JSON or CSV formats. Users can create custom parsers for their documents, automate workflows, and integrate the extracted data into tools like Zapier, QuickBooks, Xero, Salesforce, Google Sheets, etc.
DigiParser supports team collaboration with flexible billing options, allowing multiple team members to work on different parsers. With features like schema customization, review stages, and workflow automation, it ensures high accuracy in data extraction while saving time and reducing manual work.
Learn more
PrecisionOCR
PrecisionOCR is a ready-to-use, secure, HIPAA-compliant, cloud-based platform for extracting medical meaning from unstructured documents using Optical Character Recognition (OCR).
PrecisionOCR uses custom Optical Character Recognition and AI algorithms to convert PDFs/JPEGs/PNGs into structured, searchable documents. Organizations can work with our team to build OCR report extractors which look for specific types of information to extract or highlight to reduce the noise that comes from extracting all of the data within a document.
Natural language processing (NLP) and machine learning (ML) power the semi-automated and automated transformation of source material such as pdfs or images into structured data records that integrate seamlessly with EMR data using HL7s FHIR standards. Data can be automatically stored along side patient records.
Our OCR document classification is also available along with multiple ways to integrate including API and CLI support.
Learn more
DeepTagger
DeepTagger is a no-code, AI-powered document processing platform that turns any documents (PDFs, images, Word, etc.) into structured, usable data through an intuitive “highlight-and-label” interface. You upload your files; highlight the pieces of data you care about; train the model via examples rather than templates; then run predictions, export results, and refine accuracy. It handles complex/nested structures (e.g., line items within invoices, tables within tables), supports scanned documents and low-quality images via strong OCR, and offers features like splitting multi-document PDFs, intent/context understanding, and position-aware extraction (so if the same phrase appears many times, DeepTagger can distinguish which instance to pull). Pricing is usage-based with a free tier processing up to 200 documents; higher tiers unlock features like batch prediction, nested schemas, priority support, multi-tenant architecture, and enterprise-grade compliance.
Learn more
NuExtract
NuExtract is a large language model specialized in extracting structured information from documents of any format, including raw text, scanned images, PDFs, PowerPoints, spreadsheets, and more, supporting over a dozen languages and mixed‑language inputs. It delivers JSON‑formatted output that faithfully follows user‑defined templates, with built‑in verification and null‑value handling to minimize hallucinations. Users define extraction tasks by creating a template, either by describing the desired fields or importing existing schemas—and can improve accuracy by adding document, output examples in the example set. The NuExtract Platform provides an intuitive workspace for designing templates, testing extractions in a playground, managing teaching examples, and fine‑tuning settings such as model temperature and document rasterization DPI. Once validated, projects can be deployed via a RESTful API endpoint that processes documents in real time.
Learn more