Showing 144 open source projects for "character recognition source code"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    Tesseract OCR

    Tesseract OCR

    Open Source OCR Engine

    Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns.
    Downloads: 3,399 This Week
    Last Update:
    See Project
  • 2
    SimpleHTR

    SimpleHTR

    Handwritten Text Recognition (HTR) system implemented with TensorFlow

    ...It also employs connectionist temporal classification (CTC) to align predicted character sequences with input images without requiring character-level segmentation. The repository provides code for training models, performing inference on handwritten text images, and evaluating recognition accuracy. SimpleHTR is commonly used as an educational example for understanding how modern handwriting recognition systems operate.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Umi-OCR

    Umi-OCR

    OCR software, free and offline

    Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines.
    Downloads: 35 This Week
    Last Update:
    See Project
  • 4
    PaddleOCR

    PaddleOCR

    Awesome multilingual OCR toolkits based on PaddlePaddle

    PaddleOCR offers exceptional, multilingual, and practical Optical Character Recognition (OCR) tools that can help users train better models and apply them into practice. Inspired by PaddlePaddle, PaddleOCR is an ultra lightweight OCR system, with multilingual recognition, digit recognition, vertical text recognition, as well as long text recognition. It features a PPOCR series of high-quality pre-trained models, which includes: ultra lightweight ppocr_mobile series models, general...
    Downloads: 81 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    InsightFace

    InsightFace

    State-of-the-art 2D and 3D Face Analysis Project

    State-of-the-art deep face analysis library. InsightFace is an open-source 2D&3D deep face analysis library. InsightFace is an integrated Python library for 2D&3D face analysis. InsightFace efficiently implements a wide variety of state-of-the-art algorithms for face recognition, face detection, and face alignment, which are optimized for both training and deployment. Research institutes and industrial organizations can get benefits from InsightFace library.
    Downloads: 456 This Week
    Last Update:
    See Project
  • 7
    Tesseract.js

    Tesseract.js

    A pure Javascript Multilingual OCR

    Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. Tesseract.js' library supports more than 100 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js can run either in a browser and on a server with NodeJS. Tesseract.js is a javascript library that gets words in almost any spoken language out of images. The main Tesseract.js functions (ex. recognize, detect) take an image...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 8
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    ...Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. ...
    Downloads: 23 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    whisper.cpp

    whisper.cpp

    Port of OpenAI's Whisper model in C/C++

    whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples....
    Downloads: 365 This Week
    Last Update:
    See Project
  • 11
    Pot Desktop

    Pot Desktop

    A cross-platform software for text translation and recognition

    Pot-Desktop is a cross-platform productivity tool aimed at helping users quickly translate, perform OCR (optical character recognition), and synthesize speech for selected text or images — all with minimal friction. It supports picking text via mouse selection (“highlight-and-translate”), clipboard listening, or screenshot-based OCR; this makes it ideal for reading webpages, documents, images — or any on-screen text — and instantly getting translations or text extraction. The tool supports...
    Downloads: 52 This Week
    Last Update:
    See Project
  • 12
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 14
    Self-Operating Computer

    Self-Operating Computer

    A framework to enable multimodal models to operate a computer

    The Self-Operating Computer Framework is an innovative system that enables multimodal models to autonomously operate a computer by interpreting the screen and executing mouse and keyboard actions to achieve specified objectives. This framework is compatible with various multimodal models and currently integrates with GPT-4o, o1, Gemini Pro Vision, Claude 3, and LLaVa. Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen....
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    Open-LLM-VTuber

    Open-LLM-VTuber

    Open source AI VTuber platform with voice chat and Live2D avatars

    Open-LLM-VTuber is an open source platform designed to create AI-powered VTuber characters that can interact with users through voice and animated avatars. It enables hands-free conversations with large language models by combining speech recognition, language processing, and text-to-speech synthesis into a single system. Users can speak directly to the AI character, and the system can respond with a generated voice while animating a Live2D avatar to simulate a talking virtual personality. ...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 16
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 108 This Week
    Last Update:
    See Project
  • 17
    LLM-Aided OCR Project

    LLM-Aided OCR Project

    Enhances Tesseract OCR output using LLMs (local or API)

    LLM Aided OCR is an open-source system designed to improve optical character recognition accuracy by combining traditional OCR tools with large language models. The project addresses common OCR challenges such as distorted text, unusual fonts, historical documents, and complex layouts that often produce inaccurate results with standard OCR pipelines. The system first extracts raw text using OCR engines and then applies language models to analyze and correct recognition errors based on context. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Flock

    Flock

    Flock is a workflow-based low-code platform for building chatbots

    Flock is a workflow-based low-code platform designed for building AI applications such as chatbots, retrieval-augmented generation systems, and multi-agent workflows. The platform uses a visual workflow architecture where different nodes represent processing steps such as input processing, model inference, retrieval operations, and tool execution. Developers can connect these nodes to create complex pipelines that orchestrate multiple language models and external services. Built on...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    Agently

    Agently

    AI Agent Application Development Framework

    Build AI agent native application in very little code. Easy to interact with AI agents in code using structure data and chained-calls syntax. Enhance AI Agent using plugins instead of rebuilding a whole new agent. Agently is a development framework that helps developers build AI agent native applications really fast. You can use and build AI agents in your code in an extremely simple way.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Text2Code for Jupyter notebook

    Text2Code for Jupyter notebook

    A proof-of-concept jupyter extension which converts english queries

    Text2Code for Jupyter notebook project is a proof-of-concept extension for Jupyter Notebook that allows users to generate Python code directly from natural language queries written in English. The tool is designed to simplify data analysis workflows by enabling users to describe their intended operation in plain language instead of manually writing code. When a user enters a textual command, the extension interprets the request and generates a corresponding Python code snippet that can be...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Python Client For NLP Cloud

    Python Client For NLP Cloud

    NLP Cloud serves high performance pre-trained or custom models for NER

    NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, dialogue summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, blog post generation, source code generation, question answering, automatic speech recognition, machine translation, language detection, semantic search, semantic similarity, tokenization, POS tagging, embeddings, and dependency parsing. It is ready for production, served through a REST API. You can either use the NLP Cloud pre-trained models, fine-tune your own models, or deploy your own models.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    Xorbits Inference

    Xorbits Inference

    Replace OpenAI GPT with another LLM in your app

    Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop. Xorbits Inference(Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    pixelmatch

    pixelmatch

    The smallest, simplest JavaScript pixel-level image comparison library

    The smallest, simplest and fastest JavaScript pixel-level image comparison library, originally created to compare screenshots in tests. Features accurate anti-aliased pixels detection and perceptual color difference metrics. Inspired by Resemble.js and Blink-diff. Unlike these libraries, pixelmatch is around 150 lines of code, has no dependencies, and works on raw typed arrays of image data, so it's blazing fast and can be used in any environment (Node or browsers). Compares two images,...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    Automaton

    Automaton

    The first AI that can earn its own existence, replicate, and evolve

    Automaton is an open-source project designed to provide a flexible framework for building and simulating computational automata and related formal systems. The repository focuses on giving developers and researchers a programmable environment to experiment with state machines, language recognition models, and algorithmic behaviors that mirror theoretical computer science constructs. Its architecture emphasizes modularity so users can extend or customize automaton types without rewriting core...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    LLPlayer

    LLPlayer

    The media player for language learning, with dual subtitles

    LLPlayer is an open-source media player designed specifically for language learning through video content. Unlike traditional media players, the application focuses on advanced subtitle-related features that help learners understand and interact with foreign language media more effectively. The player supports dual subtitles so users can simultaneously view text in both the original language and their native language while watching videos. It can also automatically generate subtitles in real...
    Downloads: 43 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB