Showing 44 open source projects for "linux ocr"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 1
    Umi-OCR

    Umi-OCR

    OCR software, free and offline

    Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch...
    Downloads: 76 This Week
    Last Update:
    See Project
  • 2
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 3
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B),...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Streamline Azure Security with Palo Alto Networks VM-Series Icon
    Streamline Azure Security with Palo Alto Networks VM-Series

    Centrally manage physical and virtualized firewalls with Panorama

    Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.
    Learn more
  • 5
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 95 This Week
    Last Update:
    See Project
  • 6
    LLM-Aided OCR Project

    LLM-Aided OCR Project

    Enhances Tesseract OCR output using LLMs (local or API)

    LLM Aided OCR is an open-source system designed to improve optical character recognition accuracy by combining traditional OCR tools with large language models. The project addresses common OCR challenges such as distorted text, unusual fonts, historical documents, and complex layouts that often produce inaccurate results with standard OCR pipelines. The system first extracts raw text using OCR engines and then applies language models to analyze and correct recognition errors based on...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    PaddleOCR

    PaddleOCR

    Awesome multilingual OCR toolkits based on PaddlePaddle

    ...PaddleOCR is easy to install and easy to use on Windows, Linux, MacOS and other systems.
    Downloads: 40 This Week
    Last Update:
    See Project
  • 8
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    dots.ocr

    dots.ocr

    Multilingual Document Layout Parsing in a Single Vision-Language Model

    dots.ocr is a cutting-edge multilingual document parsing system built on a unified vision-language model that combines layout detection, text recognition, and structural understanding into a single architecture. Unlike traditional OCR pipelines that rely on multiple specialized components, dots.ocr integrates these processes end-to-end, reducing error propagation and improving consistency across tasks. The model is designed to recognize virtually any human script, making it highly effective...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Paper2GUI

    Paper2GUI

    Convert AI papers to GUI

    ...It already supports 40+ AI models, covering AI painting, speech synthesis, video frame complementing, video super-resolution, object detection, and image stylization. , OCR recognition and other fields. Support Windows, Mac, Linux systems. Paper2GUI: 一款面向普通人的 AI 桌面 APP 工具箱,免安装即开即用,已支持 40+AI 模型,内容涵盖 AI 绘画、语音合成、视频补帧、视频超分、目标检测、图片风格化、OCR 识别等领域。支持 Windows、Mac、Linux 系统。
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Self-Operating Computer

    Self-Operating Computer

    A framework to enable multimodal models to operate a computer

    ...Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen. The framework supports features like Optical Character Recognition (OCR) and Set-of-Mark (SoM) prompting to enhance visual grounding capabilities. It is designed to be compatible with macOS, Windows, and Linux (with X server installed), and is released under the MIT license.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 13
    Docling

    Docling

    Get your documents ready for gen AI

    Docling is an open-source document processing toolkit built to prepare diverse content types for modern generative AI and data workflows. The project focuses on converting and parsing many document formats into a unified structured representation that downstream systems can easily consume. It supports advanced PDF understanding, including layout detection, table extraction, and reading order analysis, enabling high-fidelity document intelligence pipelines. Docling is designed to run...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 14
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    OpenRecall

    OpenRecall

    OpenRecall is a fully open-source, privacy-first alternative

    ...The platform supports multiple operating systems, including Windows, macOS, and Linux, making it widely accessible across different environments.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    deepdoctection

    deepdoctection

    A Repo For Document AI

    DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Chandra

    Chandra

    OCR model for complex documents with layout-aware structured outputs

    Chandra is an advanced OCR model designed to extract and structure information from complex documents such as tables, forms, handwritten notes, and mathematical content. It focuses on preserving full document layout, meaning that extracted text is accompanied by positional metadata like bounding boxes for each element. Chandra supports multiple output formats including Markdown, HTML, and JSON, making it suitable for downstream processing and integration into data pipelines. It is capable of...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    NeMo Retriever Library

    NeMo Retriever Library

    Document content and metadata extraction microservice

    NeMo Retriever Library is a scalable microservice framework designed for extracting, structuring, and enriching content from documents to support downstream generative AI applications. It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Sparrow

    Sparrow

    Structured data extraction and instruction calling with ML, LLM

    Sparrow is an open-source platform designed to extract structured information from documents, images, and other unstructured data sources using machine learning and large language models. The system focuses on transforming complex documents such as invoices, receipts, forms, and scanned pages into structured formats like JSON that can be processed by downstream applications. It combines several components, including OCR pipelines, vision-language models, and LLM-based reasoning modules to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 21
    autoMate

    autoMate

    AI tool for automating desktop tasks via natural language input

    autoMate is an AI-powered local automation tool designed to enable users to control and automate their computers using natural language instructions instead of traditional scripting or rule-based systems. It combines large language models with computer vision techniques to interpret user intent and understand on-screen content, allowing it to interact with graphical interfaces similarly to a human user. autoMate follows an observe-decide-act workflow, where it analyzes the screen, plans...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    SimpleHTR

    SimpleHTR

    Handwritten Text Recognition (HTR) system implemented with TensorFlow

    SimpleHTR is an open-source implementation of a handwriting text recognition system based on deep learning techniques. The project focuses on converting images of handwritten text into machine-readable digital text using neural networks. The system uses a combination of convolutional neural networks and recurrent neural networks to extract visual features and model sequential character patterns in handwriting. It also employs connectionist temporal classification (CTC) to align predicted...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    AI Engineering Hub

    AI Engineering Hub

    In-depth tutorials on LLMs, RAGs and real-world AI agent applications

    The AI Engineering Hub repository is a large open-source collection of hands-on projects, tutorials, and real-world AI engineering resources designed to help developers learn and build with modern AI technologies, especially large language models (LLMs), retrieval-augmented generation (RAG), and agent-based systems. It includes more than 90 production-ready projects across skill levels, organized into beginner, intermediate, and advanced categories to guide users progressively from simple...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    docext

    docext

    An on-premises, OCR-free unstructured data extraction

    docext is a document intelligence toolkit that uses vision-language models to extract structured information from documents such as PDFs, forms, and scanned images. The system is designed to operate entirely on-premises, allowing organizations to process sensitive documents without relying on external cloud services. Unlike traditional document processing pipelines that rely heavily on optical character recognition, docext leverages multimodal AI models capable of understanding both visual...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    MiniCPM-o

    MiniCPM-o

    A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

    MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB