Showing 11 open source projects for "pdf ocr windows"

View related business solutions
  • Atera - an All-in-one platform for IT management Icon
    Atera - an All-in-one platform for IT management

    Ideal for IT departments and MSPs (managed service providers)

    Your IT essentials, integrated & elevated. Take your IT management from automated to autonomous, download Atera's agent to start your free trial!
    Try Atera now
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 1
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 3
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B),...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    PaddleOCR

    PaddleOCR

    Awesome multilingual OCR toolkits based on PaddlePaddle

    ...PaddleOCR is easy to install and easy to use on Windows, Linux, MacOS and other systems.
    Downloads: 63 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 5
    Qwen3-VL

    Qwen3-VL

    Qwen3-VL, the multimodal large language model series by Alibaba Cloud

    Qwen3-VL is the latest multimodal large language model series from Alibaba Cloud’s Qwen team, designed to integrate advanced vision and language understanding. It represents a major upgrade in the Qwen lineup, with stronger text generation, deeper visual reasoning, and expanded multimodal comprehension. The model supports dense and Mixture-of-Experts (MoE) architectures, making it scalable from edge devices to cloud deployments, and is available in both instruction-tuned and...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    MiniCPM-o

    MiniCPM-o

    A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

    MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DB-GPT

    DB-GPT

    Revolutionizing Database Interactions with Private LLM Technology

    DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    DeepSeek Prover V2

    DeepSeek Prover V2

    Advancing Formal Mathematical Reasoning via Reinforcement Learning

    DeepSeek-Prover-V2 is DeepSeek’s specialized model for formal theorem proving, particularly targeting proof in Lean 4. The repository describes how they use recursive proof decomposition by prompting DeepSeek-V3 to break complex theorems into subgoals, synthesize proof sketches, and then combine them to bootstrap training data. They then fine-tune via reinforcement learning with binary correct/incorrect feedback to integrate informal reasoning with formal proof behavior. The repo releases...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    Qwen3-Omni

    Qwen3-Omni

    Qwen3-omni is a natively end-to-end, omni-modal LLM

    Qwen3-Omni is a natively end-to-end multilingual omni-modal foundation model that processes text, images, audio, and video and delivers real-time streaming responses in text and natural speech. It uses a Thinker-Talker architecture with a Mixture-of-Experts (MoE) design, early text-first pretraining, and mixed multimodal training to support strong performance across all modalities without sacrificing text or image quality. The model supports 119 text languages, 19 speech input languages, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10
    NuMarkdown-8B-Thinking

    NuMarkdown-8B-Thinking

    Reasoning-powered OCR VLM for converting complex documents to Markdown

    NuMarkdown-8B-Thinking is the first reasoning OCR vision-language model (VLM) designed to convert documents into clean Markdown optimized for retrieval-augmented generation (RAG). Built on Qwen 2.5-VL-7B and fine-tuned with synthetic Doc → Reasoning → Markdown examples, it generates thinking tokens before producing the final Markdown to better handle complex layouts and tables. It uses a two-phase training process: supervised fine-tuning (SFT) followed by reinforcement learning (GRPO) with a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    translategemma-4b-it

    translategemma-4b-it

    Lightweight multimodal translation model for 55 languages

    translategemma-4b-it is a lightweight, state-of-the-art open translation model from Google, built on the Gemma 3 family and optimized for high-quality multilingual translation across 55 languages. It supports both text-to-text translation and image-to-text extraction with translation, enabling workflows such as OCR-style translation of signs, documents, and screenshots. With a compact ~5B parameter footprint and BF16 support, the model is designed to run efficiently on laptops, desktops, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo