OCRBase is a self-hostable document OCR and structured extraction system built to turn PDFs into machine-usable outputs at scale, aiming to bridge the gap between raw text extraction and production-ready pipelines. Instead of treating OCR as a one-off script, it presents an API-driven workflow where documents are submitted as jobs and processed through a queue-based architecture that can handle high throughput. The core output is designed for downstream automation, producing structured results like JSON according to user-defined schemas while also providing readable formats like Markdown for human review or indexing. It includes real-time job progress updates via WebSockets, which makes it easier to integrate into UIs, dashboards, or ingestion systems where users need feedback on long-running document processing.

Features

  • OCR pipeline using PaddleOCR-VL-0.9B for text extraction
  • Schema-driven structured extraction that returns JSON outputs
  • Queue-based processing designed for high-volume document workloads
  • Type-safe TypeScript SDK including React hooks for integration
  • Real-time WebSocket updates for job progress and completion
  • Self-hostable deployment model built around Docker and Bun

Project Samples

Project Activity

See All Activity >

Categories

PDF

License

MIT License

Follow OCRBase

OCRBase Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of OCRBase!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

TypeScript

Related Categories

TypeScript PDF Software

Registered

2026-01-27