Showing 557 open source projects for "video text ocr"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Application Monitoring That Won't Slow Your App Down Icon
    Application Monitoring That Won't Slow Your App Down

    AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

    Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.
    Start Free
  • 1
    Video-subtitle-extractor

    Video-subtitle-extractor

    A GUI tool for extracting hard-coded subtitle (hardsub) from videos

    ...Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu and Ali to complete text recognition locally. Support GPU acceleration, after GPU acceleration, you can get higher accuracy and faster extraction speed. (CLI version) No need for users to manually set the subtitle area, the project automatically detects the subtitle area through the text detection model.
    Downloads: 43 This Week
    Last Update:
    See Project
  • 2
    Tesseract OCR

    Tesseract OCR

    Open Source OCR Engine

    Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns.
    Downloads: 3,706 This Week
    Last Update:
    See Project
  • 3
    Umi-OCR

    Umi-OCR

    OCR software, free and offline

    Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines.
    Downloads: 41 This Week
    Last Update:
    See Project
  • 4
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    Zerox OCR

    Zerox OCR

    PDF to Markdown with vision models

    A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense. ZeroX is an open-source machine learning framework designed for fast experimentation and production deployment, optimized for speed and ease of use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 108 This Week
    Last Update:
    See Project
  • 8
    PaddleOCR-json

    PaddleOCR-json

    OCR offline image text recognition command line windows program

    PaddleOCR-json is an OCR engine based on the PaddleOCR project that provides a command-line interface and tools for extracting text from images and exporting results in structured JSON format. It wraps the PaddleOCR models, which are capable of detecting and recognizing text in a wide variety of languages and layouts, into a self-contained executable that can be run locally without needing a deep learning environment configured manually.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 9
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...
    Downloads: 12 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    PaddleOCR

    PaddleOCR

    Awesome multilingual OCR toolkits based on PaddlePaddle

    PaddleOCR offers exceptional, multilingual, and practical Optical Character Recognition (OCR) tools that can help users train better models and apply them into practice. Inspired by PaddlePaddle, PaddleOCR is an ultra lightweight OCR system, with multilingual recognition, digit recognition, vertical text recognition, as well as long text recognition. It features a PPOCR series of high-quality pre-trained models, which includes: ultra lightweight ppocr_mobile series models, general ppocr_server series models, and ultra lightweight compression ppocr_mobile_slim series models. ...
    Downloads: 49 This Week
    Last Update:
    See Project
  • 11
    LLM-Aided OCR Project

    LLM-Aided OCR Project

    Enhances Tesseract OCR output using LLMs (local or API)

    LLM Aided OCR is an open-source system designed to improve optical character recognition accuracy by combining traditional OCR tools with large language models. The project addresses common OCR challenges such as distorted text, unusual fonts, historical documents, and complex layouts that often produce inaccurate results with standard OCR pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    DocTR

    DocTR

    Library for OCR-related tasks powered by Deep Learning

    ...Easy integration (available templates for browser demo & API deployment). End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). As such, you can select the architecture used for text detection, and the one for text recognition from the list of available implementations.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    Pot Desktop

    Pot Desktop

    A cross-platform software for text translation and recognition

    Pot-Desktop is a cross-platform productivity tool aimed at helping users quickly translate, perform OCR (optical character recognition), and synthesize speech for selected text or images — all with minimal friction. It supports picking text via mouse selection (“highlight-and-translate”), clipboard listening, or screenshot-based OCR; this makes it ideal for reading webpages, documents, images — or any on-screen text — and instantly getting translations or text extraction. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 14
    Tesseract.js

    Tesseract.js

    A pure Javascript Multilingual OCR

    Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. Tesseract.js' library supports more than 100 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js can run either in a browser and on a server with NodeJS. Tesseract.js is a javascript library that gets words in almost any spoken language out of images.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 15
    Scribe.js

    Scribe.js

    JavaScript OCR and text extraction for images and PDFs

    Scribe.js is a JavaScript library that provides Optical Character Recognition (OCR) and text extraction capabilities for both images and PDF documents, aimed at developers who want to build OCR features directly into their applications. The library can take image files (such as PNG or JPEG) and recognize the text they contain, and it can also extract text from PDF files that either already contain text or are image-based scans, using modern web standards and WebAssembly under the hood. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Video-subtitle-remover (VSR)

    Video-subtitle-remover (VSR)

    AI tool that removes hardcoded subtitles and text from videos locally

    Video Subtitle Remover is an AI-based application designed to remove hardcoded subtitles from videos and generate new files without the embedded text. Video Subtitle Remover analyzes video frames and detects subtitle regions, then replaces the removed areas using an AI algorithm that fills the space with reconstructed visual content. This process aims to maintain the original resolution and visual continuity of the video after subtitle removal. ...
    Downloads: 45 This Week
    Last Update:
    See Project
  • 17
    EasyOCR

    EasyOCR

    Ready-to-use OCR with 80+ supported languages

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc. EasyOCR is a python module for extracting text from image. It is a general OCR that can read both natural scene text and dense text in document. We are currently supporting 80+ languages and expanding. Second-generation models: multiple times smaller size, multiple times faster inference, additional characters and comparable accuracy to the first generation models. ...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 18
    TTime

    TTime

    Screenshots, word marking, OCR, AI, translation software

    TTime is a desktop productivity tool that combines translation, OCR, and screen capture capabilities into a unified application designed for fast and efficient text processing workflows. It allows users to translate text through multiple methods, including direct input, screenshot-based capture, and real-time word selection, making it versatile for both casual use and professional tasks. The software integrates a wide range of translation engines and OCR services, including cloud-based providers and offline options, ensuring flexibility across different environments and connectivity conditions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Readest

    Readest

    Readest is a modern, feature-rich ebook reader

    ...Although the repository is not as widely documented or popular as some, the idea is that Readest supports features to help with reading comprehension — likely combining OCR / text retrieval, translation, note-taking, or summarization for reading materials (eBooks, articles, PDFs). The goal appears to be to let users feed in arbitrary reading material and then interact with it (highlighting, translation, lookup, maybe TTS or summarization) more comfortably. Because of that, it's oriented towards learners, researchers, or people dealing with multilingual documents — especially when they need to rapidly digest or reference large amounts of text.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 20
    Video Diffusion - Pytorch

    Video Diffusion - Pytorch

    Implementation of Video Diffusion Models

    ...Any new developments for text-to-video synthesis will be centralized at Imagen-pytorch. For conditioning on text, they derived text embeddings by first passing the tokenized text through BERT-large. You can also directly pass in the descriptions of the video as strings, if you plan on using BERT-base for text conditioning. This repository also contains a handy Trainer class for training on a folder of gifs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Olive Video Editor

    Olive Video Editor

    Free open-source non-linear video editor

    ...Olive 0.2 provides powerful and flexible node-based compositing. Node editing is a form of visual programming that gives you full control over how Olive renders your video. Rather than a "fixed" pipeline where one effect occurs after the other, nodes allow you to connect anything to anything else allowing a ton of flexibility for creating effects. You'll be able to create virtually any effect without writing a single line of code (or waiting for us to implement it for you). Additionally, these nodes can be copied and pasted into text allowing them to be shared extremely easily. ...
    Downloads: 92 This Week
    Last Update:
    See Project
  • 22
    Qwen3-VL

    Qwen3-VL

    Qwen3-VL, the multimodal large language model series by Alibaba Cloud

    ...It also brings advanced perception capabilities, including spatial grounding, object recognition, OCR across 32 languages, and robust handling of challenging inputs like low-light or distorted text.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 23
    Paper2GUI

    Paper2GUI

    Convert AI papers to GUI

    ...让每个人都简单方便的使用前沿人工智能技术 Paper2GUI: An AI desktop APP toolbox for ordinary people. It can be used immediately without installation. It already supports 40+ AI models, covering AI painting, speech synthesis, video frame complementing, video super-resolution, object detection, and image stylization. , OCR recognition and other fields. Support Windows, Mac, Linux systems. Paper2GUI: 一款面向普通人的 AI 桌面 APP 工具箱,免安装即开即用,已支持 40+AI 模型,内容涵盖 AI 绘画、语音合成、视频补帧、视频超分、目标检测、图片风格化、OCR 识别等领域。支持 Windows、Mac、Linux 系统。
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    Make-A-Video - Pytorch (wip)

    Make-A-Video - Pytorch (wip)

    Implementation of Make-A-Video, new SOTA text to video generator

    Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch. They combine pseudo-3d convolutions (axial convolutions) and temporal attention and show much better temporal fusion. The pseudo-3d convolutions isn't a new concept. It has been explored before in other contexts, say for protein contact prediction as "dimensional hybrid residual networks".
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    paperless-gpt

    paperless-gpt

    Use LLMs and LLM Vision (OCR) to handle paperless-ngx

    paperless-gpt is an AI-powered extension for document management systems that enhances the capabilities of paperless-ngx by integrating large language models and vision-based OCR to automate document processing and organization. It is designed to transform scanned or uploaded documents into structured, searchable, and intelligently categorized data without requiring manual tagging or sorting. The system uses OCR combined with LLM reasoning to extract text, classify documents, and generate metadata such as tags, titles, and categories automatically. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB