DocStrange is an open-source document understanding and extraction library designed to convert complex files into structured, LLM-ready outputs such as Markdown, JSON, CSV, and HTML. Developed by Nanonets, the project combines OCR, layout detection, table understanding, and structured extraction into one end-to-end pipeline, which reduces the need to stitch together multiple separate services. It is built for developers who need high-quality parsing from scans, photos, PDFs, office files, and other document sources while preserving privacy and control over the processing flow. One of its key differentiators is deployment flexibility: it offers a cloud API for managed usage as well as a fully private offline mode that runs locally on a GPU. The platform also supports synchronous extraction, streaming responses, and asynchronous processing for larger documents, which makes it adaptable to both interactive workflows and heavier back-end pipelines.

Features

  • Extraction from PDFs, images, Word files, Excel files, PowerPoint files, and URLs
  • Output generation in Markdown, JSON, CSV, and HTML formats
  • End-to-end OCR, layout analysis, and table extraction pipeline
  • Private offline GPU mode in addition to managed cloud API access
  • Streaming support for real-time extraction results
  • Asynchronous processing for larger multi-page documents

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow DocStrange

DocStrange Web Site

Other Useful Business Software
Train ML Models With SQL You Already Know Icon
Train ML Models With SQL You Already Know

BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
Try Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DocStrange!

Additional Project Details

Programming Language

TypeScript

Related Categories

TypeScript Large Language Models (LLM)

Registered

2026-03-09