Showing 13 open source projects for "extraction"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    DINOv2

    DINOv2

    PyTorch code and models for the DINOv2 self-supervised learning

    ...The core promise is that a single pretrained backbone can transfer well to many downstream tasks—from linear probing on classification to retrieval, detection, and segmentation—often requiring little or no fine-tuning. The repository includes code for training, evaluating, and feature extraction, with utilities to run k-NN or linear evaluation baselines to assess representation quality. Pretrained checkpoints cover multiple model sizes so practitioners can trade accuracy for speed and memory depending on their deployment constraints.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    HeartMuLa

    HeartMuLa

    A Family of Open Sourced Music Foundation Models

    ...The project also includes HeartCodec, a music codec optimized for high reconstruction fidelity, enabling efficient tokenization and reconstruction workflows that are critical for training and generation pipelines. For text extraction from audio, it provides HeartTranscriptor, a Whisper-based model tuned specifically for lyrics transcription, which helps bridge generated or recorded audio back into structured text. It also introduces HeartCLAP, which aligns audio and text into a shared embedding space.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 3
    DINOv3

    DINOv3

    Reference PyTorch implementation and models for DINOv3

    DINOv3 is the third-generation iteration of Meta’s self-supervised visual representation learning framework, building upon the ideas from DINO and DINOv2. It continues the paradigm of learning strong image representations without labels using teacher–student distillation, but introduces a simplified and more scalable training recipe that performs well across datasets and architectures. DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 4
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. The model’s multimodal capabilities allow it to reason across image and text content holistically, capturing structured and unstructured information from pages that include dense tables, seals, code snippets, and varied document graphics. ...
    Downloads: 20 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    FinGPT

    FinGPT

    Open-Source Financial Large Language Models

    ...The platform typically includes tools for fine-tuning, context engineering, and prompt templating, enabling users to build specialized assistants for tasks like sentiment analysis, earnings summary generation, risk profiling, trading signal interpretation, and document extraction from financial reports.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    MiniCPM-o

    MiniCPM-o

    A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

    MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    PokeeResearch-7B

    PokeeResearch-7B

    Pokee Deep Research Model Open Source Repo

    PokeeResearchOSS provides an open-source, agentic “deep research” model centered on a 7B backbone that can browse, read, and synthesize current information from the web. Instead of relying only on static training data, the agent performs searches, visits pages, and extracts evidence before forming answers to complex queries. It is built to operate end-to-end: planning a research strategy, gathering sources, reasoning over conflicting claims, and writing a grounded response. The repository...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    FixRes

    FixRes

    Reproduces results of "Fixing the train-test resolution discrepancy"

    FixRes is a lightweight yet powerful training methodology for convolutional neural networks (CNNs) that addresses the common train-test resolution discrepancy problem in image classification. Developed by Facebook Research, FixRes improves model generalization by adjusting training and evaluation procedures to better align input resolutions used during different phases. The approach is simple but highly effective, requiring no architectural modifications and working across diverse CNN...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    translategemma-4b-it

    translategemma-4b-it

    Lightweight multimodal translation model for 55 languages

    translategemma-4b-it is a lightweight, state-of-the-art open translation model from Google, built on the Gemma 3 family and optimized for high-quality multilingual translation across 55 languages. It supports both text-to-text translation and image-to-text extraction with translation, enabling workflows such as OCR-style translation of signs, documents, and screenshots. With a compact ~5B parameter footprint and BF16 support, the model is designed to run efficiently on laptops, desktops, and private cloud infrastructure, making advanced translation accessible without heavy hardware requirements. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    layoutlm-base-uncased

    layoutlm-base-uncased

    Multimodal Transformer for document image understanding and layout

    ...The model uses a standard BERT-like architecture but enriches input with 2D positional embeddings. It achieves state-of-the-art results in form understanding and information extraction benchmarks. This model is particularly useful for document AI applications like document classification, question answering, and named entity recognition.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Qwen2.5-VL-3B-Instruct

    Qwen2.5-VL-3B-Instruct

    Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video

    Qwen2.5-VL-3B-Instruct is a 3.75 billion parameter multimodal model by Qwen, designed to handle complex vision-language tasks in both image and video formats. As part of the Qwen2.5 series, it supports image-text-to-text generation with capabilities like chart reading, object localization, and structured data extraction. The model can serve as an intelligent visual agent capable of interacting with digital interfaces and understanding long-form videos by dynamically sampling resolution and frame rate. It uses a SwiGLU and RMSNorm-enhanced ViT architecture and introduces mRoPE updates for robust temporal and spatial understanding. The model supports flexible image input (file path, URL, base64) and outputs structured responses like bounding boxes or JSON, making it highly versatile in commercial and research settings. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Bio_ClinicalBERT

    Bio_ClinicalBERT

    ClinicalBERT model trained on MIMIC notes for clinical NLP tasks

    ...Bio_ClinicalBERT is available through Hugging Face's Transformers library for easy integration. It supports medical AI research and applications involving electronic health record understanding, clinical decision support, and biomedical information extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Ministral 3 3B Base 2512

    Ministral 3 3B Base 2512

    Small 3B-base multimodal model ideal for custom AI on edge hardware

    Ministral 3 3B Base 2512 is the smallest model in the Ministral 3 family, offering a compact yet capable multimodal architecture suited for lightweight AI applications. It combines a 3.4B-parameter language model with a 0.4B vision encoder, enabling both text and image understanding in a tiny footprint. As the base pretrained model, it is not fine-tuned for instructions or reasoning, making it the ideal foundation for custom post-training, domain adaptation, or specialized downstream tasks....
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB