open document free download

809 projects for "open document" with 1 filter applied:

ChromeOS Clear Filters & Widen Search

Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
1

Open Semantic Search

Open source semantic search and text analytics for large document sets

Open Semantic Search is an open source research and analytics platform designed for searching, analyzing, and exploring large collections of documents using semantic search technologies. It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources.

Downloads: 3 This Week

Last Update: 5 days ago
See Project
2

Paperless-AI

AI-powered document analysis and tagging for Paperless-ngx

Paperless-AI is an AI-powered extension designed to enhance document management within Paperless-ngx by automating analysis, classification, and organization tasks. It continuously monitors incoming documents and processes them using various AI backends, enabling automatic assignment of titles, tags, document types, and correspondents. It integrates with multiple OpenAI-compatible services as well as local models, giving users flexibility in how document intelligence is handled. A key...

Downloads: 7 This Week

Last Update: 2026-03-17
See Project
3

WeKnora

LLM framework for document understanding and semantic retrieval

WeKnora is an open source framework developed for deep document understanding and semantic information retrieval using large language models. It focuses on analyzing complex and heterogeneous documents by combining multiple processing stages such as multimodal document parsing, vector indexing, and intelligent retrieval. It follows the Retrieval-Augmented Generation (RAG) paradigm, where relevant document segments are retrieved and used by language models to generate accurate, context-aware responses. ...

Downloads: 6 This Week

Last Update: 2026-05-01
See Project
4

docext

An on-premises, OCR-free unstructured data extraction

docext is a document intelligence toolkit that uses vision-language models to extract structured information from documents such as PDFs, forms, and scanned images. The system is designed to operate entirely on-premises, allowing organizations to process sensitive documents without relying on external cloud services. Unlike traditional document processing pipelines that rely heavily on optical character recognition, docext leverages multimodal AI models capable of understanding both visual...

Downloads: 3 This Week

Last Update: 2026-03-12
See Project
Build Securely on Azure with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
5

OpenSign

🔥 The free & Open Source DocuSign alternative

The premier open source document signing solution (DocuSign alternative). Welcome to OpenSign, the premier open source docusign alternative - document e-signing solution designed to provide a secure, reliable and free alternative to commercial esign platforms like DocuSign, PandaDoc, SignNow, Adobe Sign, Smartwaiver, SignRequest, HelloSign & Zoho sign.

Downloads: 6 This Week

Last Update: 2026-03-04
See Project
6

typst.ts

Run Typst in JavaScriptWorld

typst.ts is a project that brings the Typst typesetting system into the JavaScript ecosystem, enabling document compilation and rendering directly in browser and Node.js environments. It provides an implementation of Typst’s execution environment along with tools for compiling documents into various output formats, including vector graphics and web-friendly representations. The system is designed to support both client-side and server-side rendering workflows, allowing flexible deployment...

Downloads: 8 This Week

Last Update: 2026-04-08
See Project
7

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API

text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models. Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction capabilities into a unified API that standardizes the output. ...

Downloads: 1 This Week

Last Update: 2026-03-05
See Project
8

SemTools

Semantic search and document parsing tools for the command line

SemTools is an open-source command-line toolkit designed for document parsing, semantic indexing, and semantic search workflows. The project focuses on enabling developers and AI agents to process large document collections and extract meaningful semantic representations that can be searched efficiently. Built with Rust for performance and reliability, the toolchain provides fast processing of text and structured documents while maintaining low system overhead. ...

Downloads: 1 This Week

Last Update: 2026-03-13
See Project
9

A Document on Virtues

My Source of Inspiration in Creating Many Free & Open Source Projects

Downloads: 0 This Week

Last Update: 2026-01-18
See Project
Add Two Lines of Code. Get Full APM.
AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.

Start Free
10

RAG Anything

RAG-Anything: All-in-One RAG Framework

RAG-Anything is an open-source unified framework that extends the Retrieval-Augmented Generation (RAG) paradigm to fully multimodal document and knowledge retrieval, enabling systems to ingest, parse, represent, and query rich content that includes text, images, tables, formulas, and other structured or visual elements. Traditional RAG systems are typically limited to text and cannot effectively work across heterogeneous document layouts, but RAG-Anything addresses this by modeling multimodal content in ways that preserve cross-modal relationships and semantic context, often treating content elements as interconnected knowledge entities rather than separate data silos. ...

Downloads: 4 This Week

Last Update: 4 days ago
See Project
11

WordPerfect Document importer

Library for reading Corel WordPerfect(tm) documents.

2 Reviews

Downloads: 373 This Week

Last Update: 2024-08-16
See Project
12

dots.ocr

Multilingual Document Layout Parsing in a Single Vision-Language Model

dots.ocr is a cutting-edge multilingual document parsing system built on a unified vision-language model that combines layout detection, text recognition, and structural understanding into a single architecture. Unlike traditional OCR pipelines that rely on multiple specialized components, dots.ocr integrates these processes end-to-end, reducing error propagation and improving consistency across tasks. The model is designed to recognize virtually any human script, making it highly effective...

Downloads: 0 This Week

Last Update: 2026-03-24
See Project
13

MarkPDFDown

A high-quality PDF to Markdown tool based on large language model

MarkPDFdown is an open-source document processing tool designed to convert PDF files into structured Markdown output that can be easily used for documentation, content pipelines, and AI processing workflows. The project focuses on extracting text, formatting, and structural information from complex PDF documents and transforming that information into clean Markdown that preserves the original hierarchy of headings, paragraphs, tables, and lists.

Downloads: 4 This Week

Last Update: 2026-03-06
See Project
14

Frappe

Low code web framework for real world applications

Frappe is a full-stack, low-code web framework written in Python and JavaScript, used to build scalable and modular enterprise applications. It powers ERPNext and includes tools for REST APIs, user management, document modeling, workflows, and real-time updates. Frappe uses a "model-view-controller" approach with its own ORM and frontend system, enabling rapid development without sacrificing control or performance.

Downloads: 7 This Week

Last Update: 3 days ago
See Project
15

iText

iText for Java represents the next level of SDKs for developers

iText for Java represents the next level of SDKs for developers who want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit, and enhance PDF documents, iText can be a boon to nearly every workflow. iText Suite refers to the complete line of products comprising the open-source iText Core PDF library and its add-ons. The iText Suite is a fully-featured SDK for PDF development that allows you to seamlessly embed extensive PDF functionality into your software or workflows. ...

Downloads: 24 This Week

Last Update: 2026-03-30
See Project
16

DocETL

A system for agentic LLM-powered data processing and ETL

DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data.

Downloads: 5 This Week

Last Update: 2026-03-05
See Project
17

AI-Media2Doc

AI tool converting video/audio into structured documents instantly

AI-Media2Doc is a web-based application that uses large language models to convert video and audio content into structured, readable documents in a single workflow. It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not...

Downloads: 6 This Week

Last Update: 2026-03-18
See Project
18

RAG Web UI

RAG Web UI is an intelligent dialogue system based on RAG

RAG Web UI is an open-source intelligent dialogue system built on retrieval-augmented generation technology, designed to enable users to create AI-powered question answering systems grounded in their own knowledge bases. It combines document retrieval with large language models to provide accurate, context-aware responses based on indexed data rather than generic model knowledge.

Downloads: 2 This Week

Last Update: 2026-04-06
See Project
19

Warracker

Self-hostable warranty tracker to monitor expirations, store receipts

Warracker is an open-source web application built to help individuals and teams track and manage product warranties in one central, easy-to-use interface. Instead of scattering receipts, expiration dates, and warranty details across paper files or spreadsheets, Warracker lets users organize all of that information with detailed records for each product, including purchase dates, durations, and associated documentation like images or PDFs. It includes proactive notifications for upcoming...

Downloads: 2 This Week

Last Update: 2026-02-03
See Project
20

Papra

The minimalistic document archiving platform

Papra is a minimalist document management and archiving platform created to help individuals and teams store, organize, and retrieve digital documents with simplicity and accessibility at its core. Papra provides basic yet essential capabilities like uploading files, managing archives, creating organizations for shared access, and performing full-text searches, all within a responsive and user-friendly interface that works across devices. The project’s focus on long-term storage and...

Downloads: 2 This Week

Last Update: 2026-04-02
See Project
21

RAPTOR

The official implementation of RAPTOR

RAPTOR is a retrieval architecture designed to improve retrieval-augmented generation systems by organizing documents into hierarchical structures that enable more effective context retrieval. Traditional RAG systems typically retrieve small text chunks independently, which can limit a model’s ability to understand broader document context. RAPTOR addresses this limitation by recursively embedding, clustering, and summarizing documents to create a tree-structured hierarchy of information....

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
22

Semantra

Multi-tool for semantic search

Semantra is an open-source semantic search tool designed to help users explore large collections of documents by meaning rather than simple keyword matching. The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. ...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
23

ChatOllama

ChatOllama is an open-source AI chatbot

...The platform also includes higher-level capabilities such as AI agents, document-backed knowledge bases, real-time voice chat, and Model Context Protocol integration for external tools. Its RAG functionality allows document upload and knowledge-base-driven interaction, while vector database support adds more scalable retrieval options. Deployment is streamlined with Docker Compose, and the project also includes internationalization and modular feature toggles for controlling what parts of the system are enabled. ...

Downloads: 1 This Week

Last Update: 2026-04-20
See Project
24

chatd

Chat with your documents using local AI

chatd is an open-source desktop application that allows users to interact with their documents through a locally running large language model. The software focuses on privacy and security by ensuring that all document processing and inference occur entirely on the user’s computer without sending data to external cloud services. It includes a built-in integration with the Ollama runtime, which provides a cross-platform environment for running large language models locally. ...

Downloads: 2 This Week

Last Update: 2026-03-09
See Project
25

GLM-OCR

Accurate × Fast × Comprehensive

GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. ...

Downloads: 13 This Week

Last Update: 2026-04-08
See Project