Showing 169 open source projects for "documents"

View related business solutions
  • Outgrown Windows Task Scheduler? Icon
    Outgrown Windows Task Scheduler?

    Free diagnostic identifies where your workflow is breaking down—with instant analysis of your scheduling environment.

    Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.
    Download Free Tool
  • Retool your internal operations Icon
    Retool your internal operations

    Generate secure, production-grade apps that connect to your business data. Not just prototypes, but tools your team can actually deploy.

    Build internal software that meets enterprise security standards without waiting on engineering resources. Retool connects to your databases, APIs, and data sources while maintaining the permissions and controls you need. Create custom dashboards, admin tools, and workflows from natural language prompts—all deployed in your cloud with security baked in. Stop duct-taping operations together, start building in Retool.
    Build an app in Retool
  • 1
    Listen to RSS news (from NetNewsWire), e-mail (from Mail), web pages (from Safari) and more on your iPod! This is an AppleScript Studio application that uses Mac OS X's built in text to speech technology to create audio files from these text documents an
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Primary goal of Imated is development of handwritten/machine printed - OCR system. And second goal is development text editor, that will be in a position to import scanned documents OCR them on-the-fly, edit them and print/save as a picture again.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    NeuroGrid could be thought of as a "Napster for Bookmarks." It allows the user to store data in a web-like fashion, allowing you to associate bookmarks (files/documents/whatever) with multiple keywords. See http://www.neurogrid.net
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Trellis is an interactive environment that allows users to add their observations, viewpoints, and conclusions as they analyze information by making semantic annotations to documents and other on-line resources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Atera all-in-one platform IT management software with AI agents Icon
    Atera all-in-one platform IT management software with AI agents

    Ideal for internal IT departments or managed service providers (MSPs)

    Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
    Learn More
  • 5
    KINg (KINg Is Not google!) is an effort to create a smart search engine, initially not to be used on the web, but to be used with documents in electronic format in our machine.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Durito: a free (as in speech) application that will manage, display and analyse various kinds of documents in a diversity of environments. Central to Durito's operation will be technologies such as XML and RDF, both cornerstones of the W3C's Semantic Web.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Anarchivist is the name of the rewrite of the AustLII software (www.austlii.edu.au). The project seeks to produce a full-text indexing search engine (for remote and local documents) and an XML/XSLT based document repository, among others.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    NuMarkdown-8B-Thinking

    NuMarkdown-8B-Thinking

    Reasoning-powered OCR VLM for converting complex documents to Markdown

    NuMarkdown-8B-Thinking is the first reasoning OCR vision-language model (VLM) designed to convert documents into clean Markdown optimized for retrieval-augmented generation (RAG). Built on Qwen 2.5-VL-7B and fine-tuned with synthetic Doc → Reasoning → Markdown examples, it generates thinking tokens before producing the final Markdown to better handle complex layouts and tables. It uses a two-phase training process: supervised fine-tuning (SFT) followed by reinforcement learning (GRPO) with a layout-centric reward for accuracy on challenging documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    layoutlm-base-uncased

    layoutlm-base-uncased

    Multimodal Transformer for document image understanding and layout

    layoutlm-base-uncased is a multimodal transformer model developed by Microsoft for document image understanding tasks. It incorporates both text and layout (position) features to effectively process structured documents like forms, invoices, and receipts. This base version has 113 million parameters and is pre-trained on 11 million documents from the IIT-CDIP dataset. LayoutLM enables better performance in tasks where the spatial arrangement of text plays a crucial role. The model uses a standard BERT-like architecture but enriches input with 2D positional embeddings. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Desktop and Mobile Device Management Software Icon
    Desktop and Mobile Device Management Software

    It's a modern take on desktop management that can be scaled as per organizational needs.

    Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.
    Learn More
  • 10
    Qwen2.5-VL-7B-Instruct

    Qwen2.5-VL-7B-Instruct

    Multimodal 7B model for image, video, and text understanding tasks

    Qwen2.5-VL-7B-Instruct is a multimodal vision-language model developed by the Qwen team, designed to handle text, images, and long videos with high precision. Fine-tuned from Qwen2.5-VL, this 7-billion-parameter model can interpret visual content such as charts, documents, and user interfaces, as well as recognize common objects. It supports complex tasks like visual question answering, localization with bounding boxes, and structured output generation from documents. The model is also capable of video understanding with dynamic frame sampling and temporal reasoning, enabling it to analyze and respond to long-form videos. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Ministral 3 8B Base 2512

    Ministral 3 8B Base 2512

    Versatile 8B-base multimodal LLM, flexible foundation for custom AI

    ...As a “base” model (i.e., not fine-tuned for instruction or reasoning), it offers a flexible starting point for custom downstream tasks or fine-tuning. The model supports a large 256k token context window, making it capable of handling long documents or extended dialogues. Because it comes from the edge-optimized Ministral 3 family, it remains deployable on reasonably powerful hardware while offering a good balance between capability and resource use. Its multilingual and multimodal pretraining enables broad applicability across languages and tasks — from generation to classification to vision-language tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    translategemma-4b-it

    translategemma-4b-it

    Lightweight multimodal translation model for 55 languages

    translategemma-4b-it is a lightweight, state-of-the-art open translation model from Google, built on the Gemma 3 family and optimized for high-quality multilingual translation across 55 languages. It supports both text-to-text translation and image-to-text extraction with translation, enabling workflows such as OCR-style translation of signs, documents, and screenshots. With a compact ~5B parameter footprint and BF16 support, the model is designed to run efficiently on laptops, desktops, and private cloud infrastructure, making advanced translation accessible without heavy hardware requirements. TranslateGemma uses a structured chat template that enforces explicit source and target language codes, ensuring consistent, deterministic behavior and reducing ambiguity in multilingual pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    bart-large-cnn

    bart-large-cnn

    Summarization model fine-tuned on CNN/DailyMail articles

    facebook/bart-large-cnn is a large-scale sequence-to-sequence transformer model developed by Meta AI and fine-tuned specifically for abstractive text summarization. It uses the BART architecture, which combines a bidirectional encoder (like BERT) with an autoregressive decoder (like GPT). Pre-trained on corrupted text reconstruction, the model was further trained on the CNN/DailyMail dataset—a collection of news articles paired with human-written summaries. It performs particularly well in...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Qwen3-Next

    Qwen3-Next

    Qwen3-Next: 80B instruct LLM with ultra-long context up to 1M tokens

    ...The model natively supports a context length of 262K tokens and can be extended up to 1 million tokens using RoPE scaling (YaRN), making it highly capable for processing large documents and extended conversations. Multi-Token Prediction (MTP) boosts both training and inference, while stability optimizations such as weight-decayed and zero-centered layernorm ensure robustness. Benchmarks show it performs comparably to larger models like Qwen3-235B on reasoning, coding, multilingual, and alignment tasks while requiring only a fraction of the training cost.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Hunyuan-A13B-Instruct

    Hunyuan-A13B-Instruct

    Efficient 13B MoE language model with long context and reasoning modes

    Hunyuan-A13B-Instruct is a powerful instruction-tuned large language model developed by Tencent using a fine-grained Mixture-of-Experts (MoE) architecture. While the total model includes 80 billion parameters, only 13 billion are active per forward pass, making it highly efficient while maintaining strong performance across benchmarks. It supports up to 256K context tokens, advanced reasoning (CoT) abilities, and agent-based workflows with tool parsing. The model offers both fast and slow...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Ministral 3 3B Base 2512

    Ministral 3 3B Base 2512

    Small 3B-base multimodal model ideal for custom AI on edge hardware

    ...It supports dozens of languages, making it practical for multilingual, global, or distributed environments. With a large 256k token context window, it can handle long documents, extended inputs, or multi-step processing workflows even at its small size.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Ministral 3 8B Reasoning 2512

    Ministral 3 8B Reasoning 2512

    Efficient 8B multimodal model tuned for advanced reasoning tasks.

    ...It supports dozens of languages, adheres reliably to system prompts, and provides native function calling and structured JSON output—key capabilities for agentic and automation workflows. The model also includes a 256k context window, allowing it to handle long documents and extended reasoning chains.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    VaultGemma

    VaultGemma

    VaultGemma: 1B DP-trained Gemma variant for private NLP tasks

    VaultGemma is a sub-1B parameter variant of Google’s Gemma family that is pre-trained from scratch with Differential Privacy (DP), providing mathematically backed guarantees that its outputs do not reveal information about any single training example. Using DP-SGD with a privacy budget across a large English-language corpus (web documents, code, mathematics), it prioritizes privacy over raw utility. The model follows a Gemma-2–style architecture, outputs text from up to 1,024 input tokens, and is intended to be instruction-tuned for downstream language understanding and generation tasks. Training ran on TPU v6e using JAX and Pathways with privacy-preserving algorithms (DP-SGD, truncated Poisson subsampling) and DP scaling laws to balance compute and privacy budgets. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Ministral 3 14B Base 2512

    Ministral 3 14B Base 2512

    Powerful 14B-base multimodal model — flexible base for fine-tuning

    ...It supports dozens of languages, making it suitable for multilingual applications around the world. With a large 256 k-token context window, Ministral 3 14B Base 2512 can handle very long inputs, complex documents, or large contexts.
    Downloads: 0 This Week
    Last Update:
    See Project