Search Results for "character recognition code"

Sort By:

115 projects for "character recognition code" with 1 filter applied:

BSD Clear Filters & Widen Search

Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
Cut Data Warehouse Costs by 54%
Easily migrate from Snowflake, Redshift, or Databricks with free tools.

BigQuery delivers 54% lower TCO with exabyte scale and flexible pricing. Free migration tools handle the SQL translation automatically.

Try Free
1

DeepSeek-OCR

Contexts Optical Compression

DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. ...

Downloads: 4 This Week

Last Update: 2026-01-27
See Project
2

Open Semantic Search

Open source semantic search and text analytics for large document sets

...Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.

Downloads: 3 This Week

Last Update: 2 days ago
See Project
3

whisper.cpp

Port of OpenAI's Whisper model in C/C++

whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples. whisper.cpp supports integer quantization of the Whisper ggml models. ...

Downloads: 378 This Week

Last Update: 2026-01-15
See Project
4

GLM-OCR

Accurate × Fast × Comprehensive

GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. ...

Downloads: 3 This Week

Last Update: 2026-02-26
See Project
Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
5

DeepSeek-OCR 2

Visual Causal Flow

DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. ...

Downloads: 8 This Week

Last Update: 2026-02-03
See Project
6

SCAIL

Towards Studio-Grade Character Animation via In-Context Learning of 3D

SCAIL is a project developed by the ZAI Organization, focusing on AI-driven research initiatives. While specific documentation about SCAIL’s exact goals and implementation is limited from the repository context alone, the project appears to be part of a collection of machine learning and AI research tools that facilitate scalable model development, evaluation, or application workflows. Given its listing alongside other ZAI projects like speech recognition and text-to-speech systems, SCAIL...

Downloads: 0 This Week

Last Update: 2026-01-30
See Project
7

Unredact

A simple tool for reading in poorly redacted documents

Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and linguistic patterns to produce candidate reconstructions. It accepts a variety of input formats, automatically identifies redacted regions, and then generates text suggestions that are presented alongside visual overlays so users can choose or refine outputs.

Downloads: 50 This Week

Last Update: 2026-02-03
See Project
8

Flock

Flock is a workflow-based low-code platform for building chatbots

...The platform supports multi-agent collaboration, allowing developers to design workflows where different agents handle specialized tasks within the same system. Flock also includes features such as intent recognition, code execution nodes, and human-in-the-loop approval processes that make it suitable for production AI applications.

Downloads: 2 This Week

Last Update: 22 hours ago
See Project
9

Exclusively Dark Image Dataset

ExDARK dataset is the largest collection of low-light images

...Each image is annotated with both image-level labels and object-level bounding boxes for 12 object categories, making it suitable for detection and classification tasks. The dataset was created to address the lack of large-scale low-light datasets available for research in object detection, recognition, and enhancement. It has been widely used in studies of low-light image enhancement, deep learning approaches, and domain adaptation for vision models. Researchers can also explore its associated source code for low-light image enhancement tasks, making it an essential resource for advancing work in night-time and low-light visual recognition.

Downloads: 2 This Week

Last Update: 3 days ago
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
10

LLPlayer

The media player for language learning, with dual subtitles

LLPlayer is an open-source media player designed specifically for language learning through video content. Unlike traditional media players, the application focuses on advanced subtitle-related features that help learners understand and interact with foreign language media more effectively. The player supports dual subtitles so users can simultaneously view text in both the original language and their native language while watching videos. It can also automatically generate subtitles in real...

Downloads: 15 This Week

Last Update: 5 days ago
See Project
11

Ralph Wiggum Marketer

A Claude Code Plugin that provides an autonomous AI copywriter

Ralph Wiggum Marketer is a Claude Code plugin that serves as an autonomous AI copywriter tailored for SaaS content marketing, enabling automated generation of marketing copy such as landing pages, taglines, feature summaries, and promotional messaging. It leverages the Ralph Wiggum loop concept — a continuous iteration pattern named after the iconic character that symbolizes persistent, repeated refinement — to let Claude Code keep iterating on content until predefined completion criteria are met, rather than stopping after a single output. ...

Downloads: 0 This Week

Last Update: 2026-02-01
See Project
12

StoryMem

Official code for StoryMem: Multi-shot Long Video Storytelling

StoryMem is a narrative-focused memory accumulation system that lets users build, store, and reference past conversational context or story elements with an AI, effectively enabling the AI to maintain and recall personalized story memories or character arcs over time. Instead of treating each interaction as stateless, it tracks user-defined memory nodes, tags, and story threads so that future interactions can draw on established narrative context like character traits, past events, or...

Downloads: 0 This Week

Last Update: 2026-02-03
See Project
13

PRML

PRML algorithms implemented in Python

PRML repository is a respected and well-maintained project that implements the foundational algorithms from the famous textbook Pattern Recognition and Machine Learning by Christopher M. Bishop, providing a practical and accessible Python reference for both students and professionals. Rather than just summarizing concepts, the repository includes working code that demonstrates linear regression and classification, kernel methods, neural networks, graphical models, mixture models with EM algorithms, approximate inference, and sequential data methods — all following the book’s structure and notation. ...

Downloads: 0 This Week

Last Update: 2026-02-16
See Project
14

Scribe.js

JavaScript OCR and text extraction for images and PDFs

Scribe.js is a JavaScript library that provides Optical Character Recognition (OCR) and text extraction capabilities for both images and PDF documents, aimed at developers who want to build OCR features directly into their applications. The library can take image files (such as PNG or JPEG) and recognize the text they contain, and it can also extract text from PDF files that either already contain text or are image-based scans, using modern web standards and WebAssembly under the hood. ...

Downloads: 0 This Week

Last Update: 2026-02-13
See Project
15

PersonaPlex

PersonaPlex code

PersonaPlex is an open-source real-time conversational speech AI model that goes beyond traditional text chat by providing full-duplex speech-to-speech interaction, meaning it can listen and talk at the same time instead of waiting for you to finish speaking before responding. This architectural approach eliminates awkward pauses and makes conversations feel much more human-like, with natural behaviors such as overlapping speech, interruptions, and fluent turn-taking, traits that traditional...

Downloads: 3 This Week

Last Update: 2026-03-02
See Project
16

minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm

minbpe is a minimal, clean implementation of byte-level Byte Pair Encoding (BPE), the tokenization approach widely used in modern language models. It operates on UTF-8 encoded bytes rather than Unicode characters, which makes it robust to arbitrary text inputs and avoids needing a language-specific character vocabulary. The repository is structured as a teaching-oriented implementation that shows how to train a tokenizer by learning merge rules, then apply those merges to encode text into...

Downloads: 2 This Week

Last Update: 2026-03-02
See Project
17

CutLER

Code release for Cut and Learn for Unsupervised Object Detection

CutLER is an approach for unsupervised object detection and instance segmentation that trains detectors without human-annotated labels, and the repo also includes VideoCutLER for unsupervised video instance segmentation. The method follows a “Cut-and-LEaRn” recipe: bootstrap object proposals, refine them iteratively, and train detection/segmentation heads to discover objects across diverse datasets. The codebase provides training and inference scripts, model configs, and references to...

Downloads: 0 This Week

Last Update: 2025-10-09
See Project
18

HunyuanImage-3.0

A Powerful Native Multimodal Model for Image Generation

HunyuanImage-3.0 is a powerful, native multimodal text-to-image generation model released by Tencent’s Hunyuan team. It unifies multimodal understanding and generation in a single autoregressive framework, combining text and image modalities seamlessly rather than relying on separate image-only diffusion components. It uses a Mixture-of-Experts (MoE) architecture with many expert subnetworks to scale efficiently, deploying only a subset of experts per token, which allows large parameter...

1 Review

Downloads: 12 This Week

Last Update: 2026-02-03
See Project
19

International Components for Unicode

The home of the ICU project source code

ICU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software. ICU is released under a nonrestrictive open-source license that is suitable for use with both commercial software and with other open-source or free software. Convert text data to or from Unicode and nearly any other character set or encoding....

Downloads: 13 This Week

Last Update: 2026-01-08
See Project
20

FLUX.2

Official inference repo for FLUX.2 models

FLUX.2 is a state-of-the-art open-weight image generation and editing model released by Black Forest Labs aimed at bridging the gap between research-grade capabilities and production-ready workflows. The model offers both text-to-image generation and powerful image editing, including editing of multiple reference images, with fidelity, consistency, and realism that push the limits of what open-source generative models have achieved. It supports high-resolution output (up to ~4 megapixels),...

Downloads: 27 This Week

Last Update: 2026-02-17
See Project
21

latexify

A library to generate LaTeX expression from Python code

latexify_py converts small, math-heavy pieces of Python code into human-readable LaTeX that mirrors the intent of the computation, not just its surface syntax. It parses Python functions and expressions into an abstract syntax tree (AST), applies symbolic rewrites for common mathematical constructs, and then emits LaTeX that compiles cleanly in standard environments. Typical use cases include turning analytical utilities—like probability mass functions, activation formulas, or recurrence...

Downloads: 0 This Week

Last Update: 2025-10-09
See Project
22

luaposix

Lua bindings for POSIX APIs

This is a POSIX binding for LuaJIT, Lua 5.1, 5.2, 5.3 and 5.4; like most libraries, it simply binds to C APIs on the underlying system, so it won't work on non-POSIX systems. However, it does try to detect the level of POSIX conformance of the underlying system and bind only available APIs. For a while, luaposix contained support for curses functionality too, but now that has its own lcurses repository again, where it is being maintained separately.

Downloads: 0 This Week

Last Update: 2025-02-16
See Project
23

Hiera

A fast, powerful, and simple hierarchical vision transformer

Hiera is a hierarchical vision transformer designed to be fast, simple, and strong across image and video recognition tasks. The core idea is to use straightforward hierarchical attention with a minimal set of architectural “bells and whistles,” achieving competitive or superior accuracy while being markedly faster at inference and often faster to train. The repository provides installation options (from source or Torch Hub), a model zoo with pre-trained checkpoints, and code for evaluation and fine-tuning on standard benchmarks. ...

Downloads: 1 This Week

Last Update: 2025-10-08
See Project
24

Airtest

UI Automation Framework for Games and Apps

¿Airtest provides cross-platform APIs, including app installation, simulated input, assertion and so forth. Airtest uses image recognition technology to locate UI elements so that you can automate games and apps without injecting any code. Airtest cases can be easily run on large device farms, using the command line or python API. HTML reports with detailed info and screen recording allow you to quickly locate failure points. NetEase builds Airlab on top of the Airtest Project. ...

Downloads: 1 This Week

Last Update: 2025-12-04
See Project
25

Open Model Zoo

Pre-trained Deep Learning models and demos

Open Model Zoo is a large repository of high-quality pre-trained deep learning models and demonstration applications designed to work with the OpenVINO™ toolkit, offering a comprehensive starting point for a wide range of AI and computer vision workloads. It includes hundreds of models covering object detection, classification, segmentation, pose estimation, speech recognition, text-to-speech, and more, many of which are already converted into formats optimized for inference on CPUs, GPUs,...

Downloads: 0 This Week

Last Update: 2026-01-10
See Project