documents free download

Showing 379 open source projects for "documents"

View related business solutions

Python Clear Filters & Widen Search

$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
Secure File Transfer for Windows with Cerberus by Redwood
Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.

Try for Free
1

MarkItDown

Python tool for converting files and office documents to Markdown

MarkItDown is a lightweight Python utility developed by Microsoft for converting various files and office documents to Markdown format. It is particularly useful for preparing documents for use with large language models and related text analysis pipelines.

1 Review

Downloads: 122 This Week

Last Update: 2026-05-26
See Project
2

Paperless-ngx

A community-supported supercharged version of paperless

Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.

Downloads: 22 This Week

Last Update: 2026-04-27
See Project
3

Papermerge

Open Source Document Management System for Digital Archives

Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. ...

Downloads: 13 This Week

Last Update: 2025-07-24
See Project
4

Markdown package LaTeX

Package for converting and rendering markdown documents in TeX

The Markdown package converts CommonMark markup to TeX commands. The functionality is provided both as a Lua module, and as plain TeX, LaTeX, and ConTeXt macro packages that can be used to directly typeset TeX documents containing markdown markup. Unlike other convertors, the Markdown package does not require any external programs and makes it easy to redefine how each and every markdown element is rendered. Creative abuse of the markdown syntax is encouraged.

Downloads: 11 This Week

Last Update: 3 days ago
See Project
Stop vibe-debugging.
Plug Claude into your app's actual errors.

AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.

Free 30 days.
5

PDF Arranger

Small python-gtk application, to merge or split PDFs

PDF Arranger is a small python-gtk application, which helps the user to merge or split PDF documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a front end for pikepdf. PDF Arranger is a fork of Konstantinos Poulios’s PDF Shuffler (see Savannah or Sourceforge). It’s a humble attempt to make the project a bit more active.

1 Review

Downloads: 418 This Week

Last Update: 2026-05-23
See Project
6

WeasyPrint

The awesome document factory

WeasyPrint is a smart solution helping people to create PDF documents. You can generate gorgeous statistical reports, invoices, tickets, and anything you want as long as you have some webdesign skills! Design your documents just as you design your websites! WeasyPrint follows the widely used HTML and CSS specifications from the W3C. You can use your usual web tools, languages and frameworks, but for print.

Downloads: 29 This Week

Last Update: 2026-06-02
See Project
7

kotaemon

An open-source RAG-based tool for chatting with your documents

An open-source clean & customizable RAG UI for chatting with your documents. Built with both end users and developers in mind. This project serves as a functional RAG UI for both end users who want to do QA on their documents and developers who want to build their own RAG pipeline.

Downloads: 11 This Week

Last Update: 2026-05-31
See Project
8

Umi-OCR

OCR software, free and offline

...It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. Users can interact with Umi-OCR through a graphical interface, command-line options, or HTTP interfaces, making it adaptable to both casual desktop usage and programmatic automation. Because the project is open source, developers can inspect, modify, and extend its capabilities, and plugins allow for different recognition engines or enhanced features.

Downloads: 3,120 This Week

Last Update: 2026-01-15
See Project
9

Semantra

Multi-tool for semantic search

...By relying on semantic embeddings and contextual analysis, the tool can identify passages that are relevant even when the query uses different wording than the source documents.

Downloads: 5 This Week

Last Update: 2026-03-11
See Project
Your monitoring isn't a stack. It's a pile. Fix that.
Errors, performance, logs, uptime. One install, one invoice, one UI.

Replace Datadog, New Relic, and Sentry without adding three more dashboards.

Free 30 days.
10

TexSoup

Fault-tolerant Python3 package for searching LaTeX documents

Navigate, Search, and Modify LaTeX Documents in Python. Easy and reliable: No C extensions, no installation dependencies, and 100% test coverage. TexSoup is a fault-tolerant, Python3 package for searching, navigating, and modifying LaTeX documents. You can skip installation and try TexSoup directly, using the pytwiddle demo.

Downloads: 0 This Week

Last Update: 2026-04-05
See Project
11

PrivateGPT

Interact with your documents using the power of GPT

PrivateGPT is a production-ready, privacy-first AI system that allows querying of uploaded documents using LLMs, operating completely offline in your own environment. It provides contextual generative AI capabilities without sending data externally. Now maintained under Zylon.ai with enterprise deployment options (air gapped, cloud, or on-prem).

Downloads: 18 This Week

Last Update: 2 days ago
See Project
12

Khoj

An AI personal assistant for your digital brain

Get more done with your open-source AI personal assistant. Khoj is a desktop application to search and chat with your notes, documents, and images. It is an offline-first, open-source AI personal assistant that is accessible from Emacs, Obsidian or your Web browser. Khoj is a thinking tool that is transparent, fun, and easy to engage with. You can build faster and better by using Khoj to search and reason across all your data sources. Khoj learns from your notes and documents to function as an extension of your brain. ...

Downloads: 4 This Week

Last Update: 2026-03-26
See Project
13

PyMuPDF

Python bindings for MuPDF's rendering library.

...It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on the screen. The viewer is small, fast, yet complete. It supports many document formats, such as PDF, XPS, OpenXPS, CBZ, EPUB, and FictionBook 2. You can annotate PDF documents and fill out forms with the mobile viewers (this feature is coming soon to the desktop viewer as well). The command line tools allow you to annotate, edit, and convert documents to other formats such as HTML, SVG, PDF, and CBZ. You can also write scripts to manipulate documents using Javascript. The library is written modularly in portable C, so features can be added and removed by integrators if they so desire.

Downloads: 11 This Week

Last Update: 2026-04-24
See Project
14

DocTR

Library for OCR-related tasks powered by Deep Learning

DocTR provides an easy and powerful way to extract valuable information from your documents. Seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents. Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters. User-friendly, 3 lines of code to load a document and extract text with a predictor.

Downloads: 11 This Week

Last Update: 2026-05-21
See Project
15

docext

An on-premises, OCR-free unstructured data extraction

docext is a document intelligence toolkit that uses vision-language models to extract structured information from documents such as PDFs, forms, and scanned images. The system is designed to operate entirely on-premises, allowing organizations to process sensitive documents without relying on external cloud services. Unlike traditional document processing pipelines that rely heavily on optical character recognition, docext leverages multimodal AI models capable of understanding both visual and textual information directly from document images. ...

Downloads: 6 This Week

Last Update: 2026-03-12
See Project
16

chatd

Chat with your documents using local AI

...The application typically runs models such as Mistral-7B and allows users to load and analyze documents while asking questions in natural language. Unlike many document-chat tools that require manual installation of model servers, chatd packages the model runner with the application so that users can start interacting with documents immediately after launching the program.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
17

LLM-Aided OCR Project

Enhances Tesseract OCR output using LLMs (local or API)

...The project is particularly useful for digitizing historical documents, research papers, and scanned materials where traditional OCR often struggles. It also includes tools for processing batches of images or documents, enabling automated document digitization workflows.

Downloads: 1 This Week

Last Update: 2026-03-22
See Project
18

WeKnora

LLM framework for document understanding and semantic retrieval

WeKnora is an open source framework developed for deep document understanding and semantic information retrieval using large language models. It focuses on analyzing complex and heterogeneous documents by combining multiple processing stages such as multimodal document parsing, vector indexing, and intelligent retrieval. It follows the Retrieval-Augmented Generation (RAG) paradigm, where relevant document segments are retrieved and used by language models to generate accurate, context-aware responses. This approach enables the system to provide more reliable answers by grounding model reasoning in the content of uploaded documents. ...

Downloads: 5 This Week

Last Update: 2026-06-10
See Project
19

abogen

Generate audiobooks from EPUBs, PDFs and text with captions

...The repository supports handling common ebook formats and generating outputs that combine audio plus caption metadata. By automating text-to-speech for arbitrary documents, abogen reduces the friction of producing audiobooks and could be integrated into larger workflows (e.g., batch converting a library of texts).

Downloads: 3 This Week

Last Update: 2026-02-06
See Project
20

Paperless-AI

AI-powered document analysis and tagging for Paperless-ngx

...Users can ask contextual questions about their files and receive precise answers based on full document understanding rather than simple keyword matching. Paperless-AI also includes a web interface for manual review and tagging, allowing greater control when handling sensitive or complex documents.

Downloads: 4 This Week

Last Update: 2026-03-17
See Project
21

MarkPDFDown

A high-quality PDF to Markdown tool based on large language model

...The software is particularly useful for developers working with technical documents, academic papers, or reports that need to be indexed, summarized, or processed by downstream AI systems.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
22

InternLM-XComposer-2.5

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System

...The model is built on top of the InternLM language model architecture and extends its capabilities to handle multimodal inputs and outputs. Instead of producing only textual responses, the system can generate visually enriched documents such as illustrated articles, presentations, and educational materials. It incorporates visual understanding modules that allow the model to analyze images and integrate them into coherent narrative outputs. The framework also supports tasks such as image captioning, multimodal reasoning, and layout generation for structured visual documents.

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
23

DeepSeek-OCR 2

Visual Causal Flow

DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. The repository provides model code and inference scripts that let researchers and developers run and benchmark the system on both images and PDFs, with support for batch evaluation and optimized pipelines leveraging vLLM and transformers.

Downloads: 8 This Week

Last Update: 2026-02-03
See Project
24

deepdoctection

A Repo For Document AI

DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for fine-tuning, evaluating and running models. ...

Downloads: 6 This Week

Last Update: 2026-06-12
See Project
25

Cognita

Open source RAG framework for building scalable modular AI apps

...Cognita provides reusable components such as parsers, data loaders, embedders, retrievers, and query controllers, allowing teams to customize each stage of the RAG pipeline independently. It includes both a backend service and a frontend interface, enabling users to upload documents, experiment with configurations, and perform question-answering tasks interactively. Cognita supports incremental indexing, meaning it processes only new or updated data to reduce computational overhead and improve efficiency.

Downloads: 1 This Week

Last Update: 2026-06-13
See Project