19 projects for "documents" with 2 filters applied:

  • Retool your internal operations Icon
    Retool your internal operations

    Generate secure, production-grade apps that connect to your business data. Not just prototypes, but tools your team can actually deploy.

    Build internal software that meets enterprise security standards without waiting on engineering resources. Retool connects to your databases, APIs, and data sources while maintaining the permissions and controls you need. Create custom dashboards, admin tools, and workflows from natural language prompts—all deployed in your cloud with security baked in. Stop duct-taping operations together, start building in Retool.
    Build an app in Retool
  • Outgrown Windows Task Scheduler? Icon
    Outgrown Windows Task Scheduler?

    Free diagnostic identifies where your workflow is breaking down—with instant analysis of your scheduling environment.

    Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.
    Download Free Tool
  • 1
    ChatGPT Retrieval Plugin

    ChatGPT Retrieval Plugin

    The ChatGPT Retrieval Plugin lets you easily find personal documents

    The chatgpt-retrieval-plugin repository implements a semantic retrieval backend that lets ChatGPT (or GPT-powered tools) access private or organizational documents in natural language by combining vector search, embedding models, and plugin infrastructure. It can serve as a custom GPT plugin or function-calling backend so that a chat session can “look up” relevant documents based on user queries, inject those results into context, and respond more knowledgeably about a private knowledge base. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    ...It supports local deployment, enabling organizations concerned about privacy or latency to run the pipeline on-premises rather than send sensitive documents to third-party cloud services. The codebase is written in Python with a focus on modularity: you can swap preprocessing, recognition, and post-processing components as needed for custom workflows.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 3
    GLM-4.6V

    GLM-4.6V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    ...Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and can output or act via tools seamlessly, bridging perception and execution. Its architecture supports a very large context window (on the order of 128K tokens during training), which lets it handle complex multimodal inputs like long documents, multi-page reports, or video transcripts, while maintaining coherence across extended content. In benchmarks and internal evaluations, GLM-4.6V achieves state-of-the-art (SoTA) performance among models of comparable parameter scale on multimodal reasoning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Granite TSFM

    Granite TSFM

    Foundation Models for Time Series

    granite-tsfm collects public notebooks, utilities, and serving components for IBM’s Time Series Foundation Models (TSFM), giving practitioners a practical path from data prep to inference for forecasting and anomaly-detection use cases. The repository focuses on end-to-end workflows: loading data, building datasets, fine-tuning forecasters, running evaluations, and serving models. It documents the currently supported Python versions and points users to where the core TSFM models are hosted and how to wire up service components. Issues and examples in the tracker illustrate common tasks such as slicing inference windows or using pipeline helpers that return pandas DataFrames, grounding the library in day-to-day time-series operations. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Atera all-in-one platform IT management software with AI agents Icon
    Atera all-in-one platform IT management software with AI agents

    Ideal for internal IT departments or managed service providers (MSPs)

    Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
    Learn More
  • 5
    Qwen3-VL-Embedding

    Qwen3-VL-Embedding

    Multimodal embedding and reranking models built on Qwen3-VL

    ...The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a given query and candidate documents, enhancing retrieval accuracy in complex multimodal tasks. Together, they support advanced information retrieval workflows such as image-text search, visual question answering (VQA), and video-text matching, while providing out-of-the-box support for more than 30 languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    FastVLM

    FastVLM

    This repository contains the official implementation of FastVLM

    ...Reported results highlight dramatic speedups in time-to-first-token and competitive quality versus contemporary open VLMs, including comparisons across small and larger variants. The repository documents model variants, showcases head-to-head numbers against known baselines, and explains how the encoder integrates with common LLM backbones. Apple’s research brief frames FastVLM as targeting real-time or latency-sensitive scenarios, where lowering visual token pressure is critical to interactive UX. In short, it’s a practical recipe to make VLMs fast without exotic token-selection heuristics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    MiniMax-01

    MiniMax-01

    Large-language-model & vision-language-model based on Linear Attention

    MiniMax-01 is the official repository for two flagship models: MiniMax-Text-01, a long-context language model, and MiniMax-VL-01, a vision-language model built on top of it. MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    NuMarkdown-8B-Thinking

    NuMarkdown-8B-Thinking

    Reasoning-powered OCR VLM for converting complex documents to Markdown

    NuMarkdown-8B-Thinking is the first reasoning OCR vision-language model (VLM) designed to convert documents into clean Markdown optimized for retrieval-augmented generation (RAG). Built on Qwen 2.5-VL-7B and fine-tuned with synthetic Doc → Reasoning → Markdown examples, it generates thinking tokens before producing the final Markdown to better handle complex layouts and tables. It uses a two-phase training process: supervised fine-tuning (SFT) followed by reinforcement learning (GRPO) with a layout-centric reward for accuracy on challenging documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    layoutlm-base-uncased

    layoutlm-base-uncased

    Multimodal Transformer for document image understanding and layout

    layoutlm-base-uncased is a multimodal transformer model developed by Microsoft for document image understanding tasks. It incorporates both text and layout (position) features to effectively process structured documents like forms, invoices, and receipts. This base version has 113 million parameters and is pre-trained on 11 million documents from the IIT-CDIP dataset. LayoutLM enables better performance in tasks where the spatial arrangement of text plays a crucial role. The model uses a standard BERT-like architecture but enriches input with 2D positional embeddings. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Field Sales+ for MS Dynamics 365 and Salesforce Icon
    Field Sales+ for MS Dynamics 365 and Salesforce

    Maximize your sales performance on the go.

    Bring Dynamics 365 and Salesforce wherever you go with Resco’s solution. With powerful offline features and reliable data syncing, your team can access CRM data on mobile devices anytime, anywhere. This saves time, cuts errors, and speeds up customer visits.
    Learn More
  • 10
    Qwen2.5-VL-7B-Instruct

    Qwen2.5-VL-7B-Instruct

    Multimodal 7B model for image, video, and text understanding tasks

    Qwen2.5-VL-7B-Instruct is a multimodal vision-language model developed by the Qwen team, designed to handle text, images, and long videos with high precision. Fine-tuned from Qwen2.5-VL, this 7-billion-parameter model can interpret visual content such as charts, documents, and user interfaces, as well as recognize common objects. It supports complex tasks like visual question answering, localization with bounding boxes, and structured output generation from documents. The model is also capable of video understanding with dynamic frame sampling and temporal reasoning, enabling it to analyze and respond to long-form videos. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Ministral 3 8B Base 2512

    Ministral 3 8B Base 2512

    Versatile 8B-base multimodal LLM, flexible foundation for custom AI

    ...As a “base” model (i.e., not fine-tuned for instruction or reasoning), it offers a flexible starting point for custom downstream tasks or fine-tuning. The model supports a large 256k token context window, making it capable of handling long documents or extended dialogues. Because it comes from the edge-optimized Ministral 3 family, it remains deployable on reasonably powerful hardware while offering a good balance between capability and resource use. Its multilingual and multimodal pretraining enables broad applicability across languages and tasks — from generation to classification to vision-language tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    translategemma-4b-it

    translategemma-4b-it

    Lightweight multimodal translation model for 55 languages

    translategemma-4b-it is a lightweight, state-of-the-art open translation model from Google, built on the Gemma 3 family and optimized for high-quality multilingual translation across 55 languages. It supports both text-to-text translation and image-to-text extraction with translation, enabling workflows such as OCR-style translation of signs, documents, and screenshots. With a compact ~5B parameter footprint and BF16 support, the model is designed to run efficiently on laptops, desktops, and private cloud infrastructure, making advanced translation accessible without heavy hardware requirements. TranslateGemma uses a structured chat template that enforces explicit source and target language codes, ensuring consistent, deterministic behavior and reducing ambiguity in multilingual pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    bart-large-cnn

    bart-large-cnn

    Summarization model fine-tuned on CNN/DailyMail articles

    facebook/bart-large-cnn is a large-scale sequence-to-sequence transformer model developed by Meta AI and fine-tuned specifically for abstractive text summarization. It uses the BART architecture, which combines a bidirectional encoder (like BERT) with an autoregressive decoder (like GPT). Pre-trained on corrupted text reconstruction, the model was further trained on the CNN/DailyMail dataset—a collection of news articles paired with human-written summaries. It performs particularly well in...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Qwen3-Next

    Qwen3-Next

    Qwen3-Next: 80B instruct LLM with ultra-long context up to 1M tokens

    ...The model natively supports a context length of 262K tokens and can be extended up to 1 million tokens using RoPE scaling (YaRN), making it highly capable for processing large documents and extended conversations. Multi-Token Prediction (MTP) boosts both training and inference, while stability optimizations such as weight-decayed and zero-centered layernorm ensure robustness. Benchmarks show it performs comparably to larger models like Qwen3-235B on reasoning, coding, multilingual, and alignment tasks while requiring only a fraction of the training cost.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Hunyuan-A13B-Instruct

    Hunyuan-A13B-Instruct

    Efficient 13B MoE language model with long context and reasoning modes

    Hunyuan-A13B-Instruct is a powerful instruction-tuned large language model developed by Tencent using a fine-grained Mixture-of-Experts (MoE) architecture. While the total model includes 80 billion parameters, only 13 billion are active per forward pass, making it highly efficient while maintaining strong performance across benchmarks. It supports up to 256K context tokens, advanced reasoning (CoT) abilities, and agent-based workflows with tool parsing. The model offers both fast and slow...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Ministral 3 3B Base 2512

    Ministral 3 3B Base 2512

    Small 3B-base multimodal model ideal for custom AI on edge hardware

    ...It supports dozens of languages, making it practical for multilingual, global, or distributed environments. With a large 256k token context window, it can handle long documents, extended inputs, or multi-step processing workflows even at its small size.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Ministral 3 8B Reasoning 2512

    Ministral 3 8B Reasoning 2512

    Efficient 8B multimodal model tuned for advanced reasoning tasks.

    ...It supports dozens of languages, adheres reliably to system prompts, and provides native function calling and structured JSON output—key capabilities for agentic and automation workflows. The model also includes a 256k context window, allowing it to handle long documents and extended reasoning chains.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    VaultGemma

    VaultGemma

    VaultGemma: 1B DP-trained Gemma variant for private NLP tasks

    VaultGemma is a sub-1B parameter variant of Google’s Gemma family that is pre-trained from scratch with Differential Privacy (DP), providing mathematically backed guarantees that its outputs do not reveal information about any single training example. Using DP-SGD with a privacy budget across a large English-language corpus (web documents, code, mathematics), it prioritizes privacy over raw utility. The model follows a Gemma-2–style architecture, outputs text from up to 1,024 input tokens, and is intended to be instruction-tuned for downstream language understanding and generation tasks. Training ran on TPU v6e using JAX and Pathways with privacy-preserving algorithms (DP-SGD, truncated Poisson subsampling) and DP scaling laws to balance compute and privacy budgets. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Ministral 3 14B Base 2512

    Ministral 3 14B Base 2512

    Powerful 14B-base multimodal model — flexible base for fine-tuning

    ...It supports dozens of languages, making it suitable for multilingual applications around the world. With a large 256 k-token context window, Ministral 3 14B Base 2512 can handle very long inputs, complex documents, or large contexts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next