Showing 29 open source projects for "ocr"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Atera all-in-one platform IT management software with AI agents Icon
    Atera all-in-one platform IT management software with AI agents

    Ideal for internal IT departments or managed service providers (MSPs)

    Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
    Learn More
  • 1
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 2
    Stirling-PDF

    Stirling-PDF

    Web application that allows you to perform operations on PDF files

    Stirling PDF is a powerful, locally hosted web-based PDF manipulation tool offering a wide range of editing, conversion, and utility features. It allows users to merge, split, compress, convert, OCR, and perform other operations on PDF files directly from a browser without uploading data to third-party servers. The tool is privacy-conscious, self-hostable via Docker, and built with modularity in mind to allow future expansion and integration.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 3
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    ...Papermerge supports multiple users. Each user can be assigned different permissions to perform only a specific kind of action e.g. view only documents from a specific folder. OCR technology is vital part of Papermerge. It extracts text information from scanned documents, PDF, JPEG, TIFF files.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 4
    MyBox

    MyBox

    Easy Tools of PDF, Image, File, Network, Data, and Medias

    javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Cloud tools for web scraping and data extraction Icon
    Cloud tools for web scraping and data extraction

    Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.

    Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
    Explore 10,000+ tools
  • 5
    bitfarm-Archiv Document Management - DMS
    bitfarm-Archiv is a powerful Document Management (DMS), Enterprise Content Management (ECM) and Knowledge Management System (KMS) with Workflow Components. Help us! As we live in the internet age, the best thing, you can help, is to write a short statement about your scenario and your use of the DMS, along with your experiences and put it on your own website or in a blog or forum. It would help us best, if you can also add a hyperlink to our site http://www.bitfarm-archiv.com. By this...
    Leader badge
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6

    UniversalTextExtractor

    Command-line toolset for extracting text from files

    Command-line toolset for extracting text from files (documents, images, archives) into SQLite with OCR support. Simple, expandable, one shell script only.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Stirling-PDF

    Stirling-PDF

    #1 Locally hosted web application that allows you to work on PDFs

    This is a robust, locally hosted web-based PDF manipulation tool using Docker. It enables you to carry out various operations on PDF files, including splitting, merging, converting, reorganizing, adding images, rotating, compressing, and more. This locally hosted web application has evolved to encompass a comprehensive set of features, addressing all your PDF requirements. Stirling PDF does not initiate any outbound calls for record-keeping or tracking purposes. All files and PDFs...
    Downloads: 120 This Week
    Last Update:
    See Project
  • 8
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 9
    Super PDF Editor (a Batch PDF Processor)

    Super PDF Editor (a Batch PDF Processor)

    Create, Edit, Delete, Organize , Convert, Export, Secure & Sign PDF.

    ...The easy-to-use software is complete with editing tools for modifying PDF files your way. Most comprehensive, powerful, process-based and lightning-fast batch processor software. OCR PDF. PDF Imposition, Reverse Pages, Resize Page, Scale Page, Booklet, N-up Pages, Merge, Split by page, Extract Page, Rotate Page. Replace Page, Insert Page, Delete Page. Export To Word, Excel. Password Protection, Remove Password, Watermark/Background. Your Privacy, Our Priority Protect Your Data with Complete Confidence. ...
    Leader badge
    Downloads: 46 This Week
    Last Update:
    See Project
  • Grafana: The open and composable observability platform Icon
    Grafana: The open and composable observability platform

    Faster answers, predictable costs, and no lock-in built by the team helping to make observability accessible to anyone.

    Grafana is the open source analytics & monitoring solution for every database.
    Learn More
  • 10
    docconv

    docconv

    Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text

    ...Make sure that the full path to the executable is in your PATH environment variable. To add image support to the docconv library you first need to install and build gosseract. Now you can add -tags ocr to any go command when building/fetching/testing docconv to include support for processing images. Documents can be sent as a multipart POST request and the plain text (body) and meta information are then returned as a JSON object.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Teedy

    Teedy

    Lightweight document management system

    ...As a content-oriented document management system, the user interface is not cluttered with buttons and menus and works both on desktop and mobile. Document searching has never been easier thanks to the powerful full-text search engine in Teddy. You can search in images (embedded OCR), DOCX, ODT, TXT, PDF, and more. Verify or validate your documents with people of your organization using workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Data Entry System v2

    Data Entry System v2

    Framework with web data entry, OCR & designer

    Framework with web data entry,, verification, OCR & project designer. It works with Docker or Debian dedicated server. Fast and Optimized version.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Super-PDF-Editor

    Super-PDF-Editor

    World's most comprehensive, powerful, process-based PDF editor

    World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. PDF editing with 60+ features rich tools and function like OCR pdf and images and produce output like searchable PDF, Text, Hocr, Box, Unlv. Also, improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry. Easy pdf imposition, booklet, n ups pages, and more. OCR performs in pdf files, scanned pdf files and any pdf files. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 14
    Super-PDF-Editor-Lite

    Super-PDF-Editor-Lite

    World's most comprehensive, powerful, process-based PDF editor

    ...Create a processing log file. Extract Page, Split Page, Rotate Page, Merge Page, Duplicate page, Move Page, Printing, and Compress Page. Improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry. Easy pdf imposition, booklet, n ups pages, and more. OCR performs in pdf files, scanned pdf files and any pdf files. OCR performs in image files, and supports multiple image formats. Auto and manual image enhancement for better OCR accuracy and quality. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    OpenKM Document Management - DMS

    OpenKM Document Management - DMS

    Document Management System and Content Management System

    OpenKM is a electronic document management system and record management system EDRMS ( DMS, RMS, CMS ). It provides modern and flexible architecture that meet today's IT demands, based on open technology (Java, Tomcat, GWT, Lucene, Hibernate, Spring and jBPM), powerful and scalable multiplatform application. OpenKM is a Web 2.0 application that works with Internet Explorer, Firefox, Safari and Opera. Can be configured in major DMBS like Oracle, PostgreSQL and MySQL among...
    Leader badge
    Downloads: 515 This Week
    Last Update:
    See Project
  • 16
    e-Dokyumento

    e-Dokyumento

    e-Dokyumento is web-based Document Management System (DMS)

    e-Dokyumento is opensource web-based Document Management System (DMS) A Document Management which automates the basic office document workflow such as receiving, filing, routing, and approving through capturing (scanning), digitizing (OCR Reading), storing, tagging, and electronically routing and approving (e-signature) of electronic documents. # Demo : https://e-dokyumento.herokuapp.com/ https://edokyu.seillig.com/ (refer to Readme.md for the accounts) #Dockerhub: https://hub.docker.com/r/nelsonmaligro/edokyumento # Install using the ISO: 1. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 17
    OCR Image Simply

    OCR Image Simply

    Simple Windows application to OCR images

    Probably the simplest Windows application to OCR images with use of Tesseract 3.05.02. Languages recognized: German, English, French, Italian, Polish, Spanish Just download ZIP file Then unzip archive And feel free to use everywhere - Solution published under MIT license Description can be found at: https://coolautomations.com/ocr-as-simple-as-it-can-be/
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18

    Merge PDF Files

    It is a Windows library that merges standard PDFs into a final PDF

    ...You can send the input PDFs (by file name or by byte array) and you can have the final PDF (saved on a file or get back on a byte array). The library calls can be synchronous or asynchronous. We want to give you a benchmark, the library was used to create a PDF from single page(scanned) image by an OCR SDK (it is not included in our library, you can use any on the market): 20,000 Images (the OCR SDK creates single page PDF text searchable, running 50 threads) in 80 minutes. The size of the final PDF searchable was 800Mb. If you download the library, we provide a sample which cover all the scenarios possible (synchronous and asynchronous).
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    JATI - Just Another Tesseract Interface

    JATI - Just Another Tesseract Interface

    Another interface for tesseract OCR to convert image to text.

    Tesseract OCR is an open source, highly accurate image to text converter. Nevertheless, Tesseract OCR provides only command line interface. JATI is just another interface to the Tesseract OCR engine, providing GUI interface to convert an image to text. It can do batch conversion, including converting only portion of the image into text.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    MyOCR

    MyOCR

    Start Your Own Captcha Solving Business Portal

    Captcha Solutions OCR Captcha Solver Reseller Website to Start Your Own Captcha Solving Business Portal
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    DJVU++

    DJVU++

    The DjVu complete solution,with OCR Technology(Arabic ,English).

    ...o DjVu++ supports multiple formats:  Convert PDF document into DjVu format with smaller file size and the same performance.  Convert DjVu into PDF format.  Combine images to a single DjVu document. Perform OCR operations on multiple image formats.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22

    edocias

    Electronic Document Index And Search

    EDocIAS (Electronic Document Index And Search) is a PHP-based tool for indexing and searching files of various types. Third-party tools (tesseract, xpdf, etc.) can be configured to support any type of file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    ocr2data
    Full OCR stack for document digitalization analisys and OCR that provide external conexion by API, standard document exchange formats and database.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Note as of 2013-09-13: I'm moving this project over to github due to this: http://www.gluster.org/2013/08/how-far-the-once-mighty-sourceforge-has-fallen/ Feel free to rejoin the more updated versions on https://github.com/mnott/PDFOCRWrapper Thanks. Matthias -- This is a wrapper written in Java that allows to recursively iterate a directory structure and call an OCR engine on each found PDF on the condition that it hat not yet been called for that PDF. It works well with the ABBYY OCR Engine for Linux.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Java open source scanner for all platform. This application make the use of JSane. It also includes OCR for Thai and English characters. This project is supported and funded by Thai Life Insurance Company - A Thai Company for the Thai people (http://
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next