Showing 15 open source projects for "tesseract pdf"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    Tesseract OCR

    Tesseract OCR

    Open Source OCR Engine

    Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. It also has unicode (UTF-8) support.
    Downloads: 3,092 This Week
    Last Update:
    See Project
  • 2
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 3
    Extractous

    Extractous

    Fast and efficient unstructured data extraction

    ...It also supports OCR for images and scanned documents through Tesseract, making it useful for document ingestion pipelines that include image-based or scanned inputs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Provides optical character recognition (OCR) solutions for Vietnamese language.
    Leader badge
    Downloads: 170 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    A GUI to ease the process of producing a multipage PDF from a scan. gscan2pdf should work on almost any Linux/BSD machine.
    Leader badge
    Downloads: 166 This Week
    Last Update:
    See Project
  • 6
    Multiuser HylaFAX PHP/MySQL Web interface for viewing faxes online, downloading & emailing in PDF format, and categorizing & archiving all sent and received faxes.
    Downloads: 40 This Week
    Last Update:
    See Project
  • 7
    gImageReader

    gImageReader

    A graphical frontend to tesseract-ocr

    gImageReader is a simple Gtk/Qt front-end to tesseract. Features include: - Import PDF documents and images from disk, scanning devices, clipboard and screenshots - Process multiple images and documents in one go - Manual or automatic recognition area definition - Recognize to plain text or to hOCR documents - Recognized text displayed directly next to the image - Post-process the recognized text, including spellchecking - Generate PDF documents from hOCR documents **Note**: This page is only a mirror for the downloads. ...
    Leader badge
    Downloads: 102 This Week
    Last Update:
    See Project
  • 8
    Linux-Intelligent-Ocr-Solution

    Linux-Intelligent-Ocr-Solution

    Easy-OCR solution and Tesseract trainer for GNU/Linux

    Linux-intelligent-ocr-solution Lios is a free and open source software for converting print in to text using either scanner or a camera, It can also produce text out of scanned images from other sources such as Pdf, Image, Folder containing Images or screenshot. Program is given total accessibility for visually impaired. A Tesseract Trainer GUI is also shipped with this package. Forum : https://groups.google.com/forum/#!forum/lios Video Tutorial : https://www.youtube.com/playlist?list=PLn29o8rxtRe1zS1r2-yGm1DNMOZCgdU0i Tesseract Training Tutorial (beta) : https://www.youtube.com/watch?...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9

    neocr

    Provides OCR solutions for Nepali, based on Tesseract 4.0.

    NeOCR is a free software based on Tesseract (Open Source OCR Engine) for the Windows operating system. It provides an easy and user-friendly user interface to recognize texts contained in images as well as PDF documents and convert to editable text formats (.txt, .doc, .docx). This product is accessible to Blind and Visually Impaired peoples (tested with NVDA and Narrator).
    Downloads: 8 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10

    OCR Template Creator

    Create template for images or PDF files to be OCR'ed and databased

    An OCR application which allows for tag/value templates to be created with a Web GUI for handling the automatic processing of PDF's or images of documents/receipts/contracts etc. Interface to Tesseract
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    pdfsandwich generates "sandwich" OCR pdf files, i.e. pdf files which contain only images (but no editable text) will be processed by optical character recognition (OCR) and the text will be added to each page invisibly "behind" the images. pdfsandwich is a command line tool which is supposed to be useful to OCR scanned books or journals. It is able to recognize the page layout even for multicolumn text.
    Leader badge
    Downloads: 211 This Week
    Last Update:
    See Project
  • 12
    yagf

    yagf

    YAGF is a tesseract and cuneiform wrapper and helper*

    YAGF is a graphical front-end for cuneiform and tesseract OCR tools. With YAGF you can open already scanned image files or obtain new images via XSane (scanning results are automatically passed to YAGF). Once you have a scanned image you can prepare it for recognition, select particular image areas for recognition, set the recognition language and so on. Recognized text is displayed in a editor window where it can be corrected, saved to disk or copied to clipboard. YAGF also provides some...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13

    edocias

    Electronic Document Index And Search

    EDocIAS (Electronic Document Index And Search) is a PHP-based tool for indexing and searching files of various types. Third-party tools (tesseract, xpdf, etc.) can be configured to support any type of file.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    This project aims to create a single easy to use GUI wrapper for ghostscript and tesseract to allow scanned pdf to plain text or HTML for scanned documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    OCR Reader

    The tool supports template-based parsing, allowing structured output i

    OCR Reader is a lightweight Windows utility designed to extract text from PDF files and images using OCR (Tesseract engine). The tool supports template-based parsing, allowing structured output into CSV or TXT without manual coding. Core components Tesseract OCR engine Poppler (PDF rendering) Template-based extraction system Homepage: https://martan1484.github.io/OCR_Reader
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB