Page 2 | document free download

Showing 138 open source projects for "document"

View related business solutions

Artificial Intelligence Python Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
1

RAGFlow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.

Downloads: 4 This Week

Last Update: 2026-02-10
See Project
2

DeepCode

DeepCode: Open Agentic Coding

...The system description highlights an orchestration layer that plans, assigns subtasks, and adapts strategies as complexity changes, rather than relying on a single monolithic prompt. It also describes document parsing capabilities aimed at extracting algorithmic and mathematical details from technical materials, translating them into implementable specifications and code.

Downloads: 8 This Week

Last Update: 2026-02-11
See Project
3

llmware

Unified framework for building enterprise RAG pipelines

...The platform focuses on building secure and private AI workflows that can run locally on laptops, edge devices, or self-hosted servers without relying exclusively on cloud APIs. It provides a unified interface for constructing retrieval-augmented generation pipelines, agent workflows, and document intelligence applications. One of the framework’s defining characteristics is its collection of small specialized language models optimized for specific tasks such as summarization, classification, and document analysis. The system supports a wide range of inference backends including PyTorch, OpenVINO, ONNX Runtime, and other optimized runtimes, allowing developers to choose the most efficient execution environment for their hardware.

Downloads: 1 This Week

Last Update: 3 days ago
See Project
4

GLM-OCR

Accurate × Fast × Comprehensive

...The model’s multimodal capabilities allow it to reason across image and text content holistically, capturing structured and unstructured information from pages that include dense tables, seals, code snippets, and varied document graphics. GLM-OCR integrates a comprehensive SDK and inference toolchain that makes it easy for developers to install, invoke, and embed into production pipelines with simple commands or APIs.

Downloads: 12 This Week

Last Update: 2026-04-08
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
5

Claude Skills

Public repository for Agent Skills

...Rather than relying on handcrafted prompts every time, Skills teach an AI agent procedural knowledge and task-specific workflows so it can apply that expertise reliably, whether the task involves document creation, data analysis, design generation, or technical automation. Each Skill lives in its own directory with a SKILL.md file containing metadata and instructions, and can include supplemental scripts or assets that the agent uses to perform complex operations when relevant.

Downloads: 91 This Week

Last Update: 2026-04-09
See Project
6

DocTR

Library for OCR-related tasks powered by Deep Learning

...Seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents. Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters. User-friendly, 3 lines of code to load a document and extract text with a predictor. State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract. Easy integration (available templates for browser demo & API deployment). End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). ...

Downloads: 8 This Week

Last Update: 2026-02-04
See Project
7

PrivateGPT

Interact with your documents using the power of GPT

PrivateGPT is a production-ready, privacy-first AI system that allows querying of uploaded documents using LLMs, operating completely offline in your own environment. It provides contextual generative AI capabilities without sending data externally. Now maintained under Zylon.ai with enterprise deployment options (air gapped, cloud, or on-prem).

Downloads: 27 This Week

Last Update: 2025-07-29
See Project
8

BISHENG

BISHENG is an open LLM devops platform for next generation apps

BISHENG is an open LLM application DevOps platform, focusing on enterprise scenarios. It has been used by a large number of industry-leading organizations and Fortune 500 companies. "Bi Sheng" was the inventor of movable type printing, which played a vital role in promoting the transmission of human knowledge. We hope that BISHENG can also provide strong support for the widespread implementation of intelligent applications. Everyone is welcome to participate.

Downloads: 5 This Week

Last Update: 2026-01-28
See Project
9

LongBench

LongBench v2 and LongBench (ACL 25'&24')

...LongBench addresses this gap by providing datasets that require models to process and reason over long sequences of text across multiple tasks. The benchmark includes multiple categories such as single-document question answering, multi-document reasoning, summarization, long dialogue understanding, and code analysis. It supports bilingual evaluation in English and Chinese to assess multilingual capabilities across extended contexts. Newer versions of the benchmark introduce extremely long context windows ranging from thousands to millions of tokens, enabling researchers to test the limits of modern long-context models.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
10

GPT Academic

Research-oriented chatbot framework

GPT Academic is a research-oriented chatbot framework designed to integrate large language models (LLMs) into academic workflows. It provides tools for structured document processing, citation management, and enhanced interaction with research papers.

Downloads: 0 This Week

Last Update: 2025-03-04
See Project
11

ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs

ExtractThinker is a tool designed to facilitate the extraction and analysis of information from various data sources, aiding in data processing and knowledge discovery.

Downloads: 8 This Week

Last Update: 2025-06-09
See Project
12

PaperAI

Semantic search and workflows for medical/scientific papers

PaperAI is an open-source framework for searching and analyzing scientific papers, particularly useful for researchers looking to extract insights from large-scale document collections.

Downloads: 8 This Week

Last Update: 2025-07-01
See Project
13

BEIR

A Heterogeneous Benchmark for Information Retrieval

BEIR is a benchmark framework for evaluating information retrieval models across various datasets and tasks, including document ranking and question answering.

Downloads: 5 This Week

Last Update: 2025-06-04
See Project
14

huggingface_hub

The official Python client for the Huggingface Hub

The huggingface_hub library allows you to interact with the Hugging Face Hub, a platform democratizing open-source Machine Learning for creators and collaborators. Discover pre-trained models and datasets for your projects or play with the thousands of machine-learning apps hosted on the Hub. You can also create and share your own models, datasets, and demos with the community. The huggingface_hub library provides a simple way to do all these things with Python.

Downloads: 11 This Week

Last Update: 1 day ago
See Project
15

Chandra

OCR model for complex documents with layout-aware structured outputs

Chandra is an advanced OCR model designed to extract and structure information from complex documents such as tables, forms, handwritten notes, and mathematical content. It focuses on preserving full document layout, meaning that extracted text is accompanied by positional metadata like bounding boxes for each element. Chandra supports multiple output formats including Markdown, HTML, and JSON, making it suitable for downstream processing and integration into data pipelines. It is capable of handling over 40 languages and is optimized to read difficult inputs such as messy handwriting and multi-column layouts. ...

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
16

EasyOCR

Ready-to-use OCR with 80+ supported languages

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc. EasyOCR is a python module for extracting text from image. It is a general OCR that can read both natural scene text and dense text in document. We are currently supporting 80+ languages and expanding. Second-generation models: multiple times smaller size, multiple times faster inference, additional characters and comparable accuracy to the first generation models. EasyOCR will choose the latest model by default but you can also specify which model to use. Model weights for the chosen language will be automatically downloaded or you can download them manually from the model hub. ...

Downloads: 35 This Week

Last Update: 2024-09-24
See Project
17

All-in-RAG

Big Model Application Development Practice 1

...The repository provides a structured learning path that covers both theoretical foundations and practical implementation steps for RAG systems. It explains the full development pipeline required to create knowledge-aware AI assistants, including data preparation, document indexing, vector embedding generation, and retrieval strategies. The project also explores advanced topics such as hybrid retrieval methods, query optimization, and evaluation techniques for improving system accuracy. Alongside theoretical explanations, the repository includes hands-on exercises and example projects that demonstrate how to build production-ready RAG systems. ...

Downloads: 0 This Week

Last Update: 2026-03-17
See Project
18

Auto-Deep-Research

Your Fully-Automated Personal AI Assistant

...Users provide a research topic or multifaceted goal, and the system autonomously breaks the objective down into subtasks like literature collection, critical summarization, cross-comparison, citation extraction, metric evaluation, and structured writing. Auto-Deep-Research integrates retrieval from academic and web sources, processes document corpora for relevance and key insights, and organizes outputs into coherent chapters or sections according to research standards. It also embeds validation loops, where intermediate drafts are self-checked for consistency, coverage, and alignment with sound reasoning practices, reducing reliance on raw generation alone.

Downloads: 0 This Week

Last Update: 2026-02-03
See Project
19

fireworks-tech-graph

Claude Code skill for generating production-quality SVG+PNG technical

...The project emphasizes scalability and adaptability, allowing it to handle large datasets and evolving knowledge bases. By structuring information into graph form, it enables more meaningful navigation and discovery compared to traditional document-based systems.

Downloads: 23 This Week

Last Update: 4 days ago
See Project
20

GLM-4.6V

GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

...Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and can output or act via tools seamlessly, bridging perception and execution. Its architecture supports a very large context window (on the order of 128K tokens during training), which lets it handle complex multimodal inputs like long documents, multi-page reports, or video transcripts, while maintaining coherence across extended content. ...

Downloads: 0 This Week

Last Update: 2026-04-06
See Project
21

gensim

Topic Modelling for Humans

Gensim is a Python library for topic modeling, document indexing, and similarity retrieval with large corpora. The target audience is the natural language processing (NLP) and information retrieval (IR) community.

Downloads: 5 This Week

Last Update: 2025-10-16
See Project
22

kotaemon

An open-source RAG-based tool for chatting with your documents

An open-source clean & customizable RAG UI for chatting with your documents. Built with both end users and developers in mind. This project serves as a functional RAG UI for both end users who want to do QA on their documents and developers who want to build their own RAG pipeline.

Downloads: 4 This Week

Last Update: 2026-03-28
See Project
23

DB-GPT

Revolutionizing Database Interactions with Private LLM Technology

DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.

Downloads: 7 This Week

Last Update: 2026-03-27
See Project
24

LLM-Aided OCR Project

Enhances Tesseract OCR output using LLMs (local or API)

...The project is particularly useful for digitizing historical documents, research papers, and scanned materials where traditional OCR often struggles. It also includes tools for processing batches of images or documents, enabling automated document digitization workflows.

Downloads: 0 This Week

Last Update: 2026-03-22
See Project
25

Hallucination Leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations

...The project provides a standardized benchmark that evaluates different models using a dedicated hallucination detection system known as the Hallucination Evaluation Model. Each model is tested on document summarization tasks to measure how often generated responses introduce information that is not supported by the original source material. The results are published as a leaderboard that allows researchers and developers to compare model reliability and factual consistency. By focusing on hallucination rates rather than traditional metrics such as accuracy or fluency, the benchmark highlights an important aspect of AI system safety and trustworthiness. ...

Downloads: 0 This Week

Last Update: 2026-03-20
See Project