DocStrange is an open-source document understanding and extraction library designed to convert complex files into structured, LLM-ready outputs such as Markdown, JSON, CSV, and HTML. Developed by Nanonets, the project combines OCR, layout detection, table understanding, and structured extraction into one end-to-end pipeline, which reduces the need to stitch together multiple separate services. It is built for developers who need high-quality parsing from scans, photos, PDFs, office files, and other document sources while preserving privacy and control over the processing flow. One of its key differentiators is deployment flexibility: it offers a cloud API for managed usage as well as a fully private offline mode that runs locally on a GPU. The platform also supports synchronous extraction, streaming responses, and asynchronous processing for larger documents, which makes it adaptable to both interactive workflows and heavier back-end pipelines.

Features

  • Extraction from PDFs, images, Word files, Excel files, PowerPoint files, and URLs
  • Output generation in Markdown, JSON, CSV, and HTML formats
  • End-to-end OCR, layout analysis, and table extraction pipeline
  • Private offline GPU mode in addition to managed cloud API access
  • Streaming support for real-time extraction results
  • Asynchronous processing for larger multi-page documents

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow DocStrange

DocStrange Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DocStrange!

Additional Project Details

Programming Language

TypeScript

Related Categories

TypeScript Large Language Models (LLM)

Registered

2026-03-09