Nanonets-OCR-s is an advanced image-to-markdown OCR model that transforms documents into structured and semantically rich markdown. It goes beyond basic text extraction by intelligently recognizing content types and applying meaningful tags, making the output ideal for Large Language Models (LLMs) and automated workflows. The model expertly converts mathematical equations into LaTeX syntax, distinguishing between inline and display modes for accuracy. It also generates descriptive <img> tags for images like logos, charts, and graphs, enabling better interpretation by downstream systems. Signatures and watermarks are detected and isolated within dedicated tags to maintain document integrity, which is vital for legal and business uses. Form elements like checkboxes and radio buttons are converted into standardized Unicode symbols for consistent handling. Additionally, complex tables are extracted and formatted in both markdown and HTML to support versatile document processing.

Features

  • Converts mathematical formulas into LaTeX, differentiating inline ($...$) and display ( . . . ...) equations
  • Generates structured image descriptions within <img> tags, including captions when available
  • Detects and isolates signatures within <signature> tags for precise legal document processing
  • Extracts watermark text wrapped inside <watermark> tags to preserve document authenticity
  • Converts checkboxes and radio buttons into standardized Unicode symbols (☐, ☑, ☒) for form data consistency
  • Accurately extracts complex tables and outputs them in both markdown and HTML formats
  • Applies semantic tagging to diverse document elements for enhanced readability and machine processing
  • Supports large token limits (up to 15,000 tokens) for handling lengthy or complex documents

Project Samples

Project Activity

See All Activity >

Categories

OCR, AI Models

Follow Nanonets-OCR-s

Nanonets-OCR-s Web Site

Other Useful Business Software
Our Free Plans just got better! | Auth0 Icon
Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Nanonets-OCR-s!

Additional Project Details

Programming Language

Python, JavaScript

Related Categories

Python OCR Software, Python AI Models, JavaScript OCR Software, JavaScript AI Models

Registered

2025-06-26