Showing 1605 open source projects for "open document"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1
    PDFIO.jl

    PDFIO.jl

    PDF Reader Library for Native Julia.

    PDFIO is a native Julia implementation for reading PDF files. It's a 100% Julia implementation of the PDF specification. Other than a few well-established algorithms like flate decode (zlib library) or cryptographic operations (OpenSSL library) almost all of the APIs are written in native Julia. PDF files are in existence for over three decades. Implementations of the PDF writers are not always to the specification or they may even vary significantly from vendor to vendor. Every time, you...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    TinyDB

    TinyDB

    Document oriented database optimized for you

    TinyDB is a lightweight document oriented database optimized for your happiness :) It's written in pure Python and has no external dependencies. The target are small apps that would be blown away by a SQL-DB or an external database server. The current source code has 1800 lines of code (with about 40% documentation) and 1600 lines tests. Like MongoDB, you can store any document (represented as dict) in TinyDB. TinyDB is designed to be simple and fun to use by providing a simple and clean...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    Cloud Firestore

    Cloud Firestore

    Node.js client for Google Cloud Firestore: a NoSQL document database

    The official Firestore client for Node.js, enabling seamless interaction with Google Cloud Firestore, a NoSQL document database optimized for real-time applications.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    dots.ocr

    dots.ocr

    Multilingual Document Layout Parsing in a Single Vision-Language Model

    dots.ocr is a cutting-edge multilingual document parsing system built on a unified vision-language model that combines layout detection, text recognition, and structural understanding into a single architecture. Unlike traditional OCR pipelines that rely on multiple specialized components, dots.ocr integrates these processes end-to-end, reducing error propagation and improving consistency across tasks. The model is designed to recognize virtually any human script, making it highly effective...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 5
    MarkPDFDown

    MarkPDFDown

    A high-quality PDF to Markdown tool based on large language model

    MarkPDFdown is an open-source document processing tool designed to convert PDF files into structured Markdown output that can be easily used for documentation, content pipelines, and AI processing workflows. The project focuses on extracting text, formatting, and structural information from complex PDF documents and transforming that information into clean Markdown that preserves the original hierarchy of headings, paragraphs, tables, and lists.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    llmware

    llmware

    Unified framework for building enterprise RAG pipelines

    llmware is an open source framework designed to simplify the creation of enterprise-grade applications powered by large language models. The platform focuses on building secure and private AI workflows that can run locally on laptops, edge devices, or self-hosted servers without relying exclusively on cloud APIs. It provides a unified interface for constructing retrieval-augmented generation pipelines, agent workflows, and document intelligence applications.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    monolith

    monolith

    CLI tool for saving complete web pages as a single HTML file

    A data hoarder’s dream come true, bundle any web page into a single HTML file. You can finally replace that gazillion of open tabs with a gazillion of .html files stored somewhere on your precious little drive. Unlike the conventional “Save page as”, monolith not only saves the target document, it embeds CSS, image, and JavaScript assets all at once, producing a single HTML5 document that is a joy to store and share. If compared to saving websites with wget -mpk, this tool embeds all assets as data URLs and therefore lets browsers render the saved page exactly the way it was on the Internet, even when no network connection is available.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 30 This Week
    Last Update:
    See Project
  • 9
    CouchDB.NET

    CouchDB.NET

    EF Core-like CouchDB experience for .NET

    CouchDB-Net is a .NET client library for Apache CouchDB, providing developers with a simple way to interact with the database using C#. It enables document storage, querying, and replication in .NET applications.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 10
    Frappe

    Frappe

    Low code web framework for real world applications

    Frappe is a full-stack, low-code web framework written in Python and JavaScript, used to build scalable and modular enterprise applications. It powers ERPNext and includes tools for REST APIs, user management, document modeling, workflows, and real-time updates. Frappe uses a "model-view-controller" approach with its own ORM and frontend system, enabling rapid development without sacrificing control or performance.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 11
    Nitrite Database

    Nitrite Database

    NoSQL embedded document store for Java

    Nitrite is an embedded NoSQL database for Java applications, offering lightweight document storage with indexing and query capabilities.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    WordPerfect Document importer
    Library for reading Corel WordPerfect(tm) documents.
    Leader badge
    Downloads: 404 This Week
    Last Update:
    See Project
  • 13
    Warracker

    Warracker

    Self-hostable warranty tracker to monitor expirations, store receipts

    Warracker is an open-source web application built to help individuals and teams track and manage product warranties in one central, easy-to-use interface. Instead of scattering receipts, expiration dates, and warranty details across paper files or spreadsheets, Warracker lets users organize all of that information with detailed records for each product, including purchase dates, durations, and associated documentation like images or PDFs. It includes proactive notifications for upcoming...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    AnythingLLM

    AnythingLLM

    The all-in-one Desktop & Docker AI application with full RAG and AI

    A full-stack application that enables you to turn any document, resource, or piece of content into a context that any LLM can use as references during chatting. This application allows you to pick and choose which LLM or Vector Database you want to use as well as supporting multi-user management and permissions. AnythingLLM is a full-stack application where you can use commercial off-the-shelf LLMs or popular open-source LLMs and vectorDB solutions to build a private ChatGPT with no compromises that you can run locally as well as host remotely and be able to chat intelligently with any documents you provide it. ...
    Downloads: 106 This Week
    Last Update:
    See Project
  • 15
    DocETL

    DocETL

    A system for agentic LLM-powered data processing and ETL

    DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    JSONView

    JSONView

    A web extension that helps you view JSON documents in the browser

    A web extension that helps you view JSON documents in the browser. Normally when encountering a JSON document (content type application/json), Firefox simply prompts you to download the view. With the JSONView extension, JSON documents are shown in the browser similar to how XML documents are shown. The document is formatted, highlighted, and arrays and objects can be collapsed. Even if the JSON document contains errors, JSONView will still show the raw text. JSONView is a Web extension...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 17
    AI-Media2Doc

    AI-Media2Doc

    AI tool converting video/audio into structured documents instantly

    AI-Media2Doc is a web-based application that uses large language models to convert video and audio content into structured, readable documents in a single workflow. It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 18
    PHP7

    PHP7

    PHP7 / Laravel Multi-format Streaming Parser

    When it comes to parsing XML/CSV/JSON/... documents, there are 2 approaches to consider. DOM loading loads all the documents, making it easy to navigate and parse, and as such provides maximum flexibility for developers. Streaming implies iterating through the document, acts like a cursor, and stops at each element in its way, thus avoiding memory overkill. Thus, when it comes to big files, callbacks will be executed meanwhile file is downloading and will be much more efficient as far as...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 19
    RAGFlow

    RAGFlow

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    iText

    iText

    iText for Java represents the next level of SDKs for developers

    iText for Java represents the next level of SDKs for developers who want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit, and enhance PDF documents, iText can be a boon to nearly every workflow. iText Suite refers to the complete line of products comprising the open-source iText Core PDF library and its add-ons. The iText Suite is a fully-featured SDK for PDF development that allows you to seamlessly embed extensive PDF functionality into your software or workflows. ...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 21
    ArangoDB JavaScript Driver

    ArangoDB JavaScript Driver

    The official ArangoDB JavaScript driver

    ArangoJS is the official JavaScript client for ArangoDB, a multi-model NoSQL database that supports document, key-value, and graph data models. This client provides a powerful yet simple API to interact with ArangoDB from Node.js or browser-based applications.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    OnlyOffice Web

    OnlyOffice Web

    Perform common file preview and editing via the web

    OnlyOffice Web is a browser-based document editing platform built on top of OnlyOffice that allows users to view and edit files entirely on the client side without requiring a backend server. It is designed with a privacy-first approach, ensuring that all document processing occurs locally in the browser, which prevents sensitive data from being uploaded or stored externally. The application supports a wide range of file formats, including DOCX, XLSX, PPTX, and CSV, making it versatile for...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    Papra

    Papra

    The minimalistic document archiving platform

    Papra is a minimalist document management and archiving platform created to help individuals and teams store, organize, and retrieve digital documents with simplicity and accessibility at its core. Papra provides basic yet essential capabilities like uploading files, managing archives, creating organizations for shared access, and performing full-text searches, all within a responsive and user-friendly interface that works across devices. The project’s focus on long-term storage and...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    OfficeCLI

    OfficeCLI

    OfficeCLI is the first and best command-line tool

    OfficeCLI is a command-line productivity tool designed to bring AI-powered automation into everyday office workflows, enabling users to perform tasks such as document generation, data processing, and communication management directly from the terminal. It focuses on simplifying repetitive business operations by translating natural language commands into structured actions. The system likely integrates with common office tools and formats, allowing seamless interaction with documents,...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 25
    Craft Agents

    Craft Agents

    Work effectively with agents

    Craft Agents project from lukilabs is an open-source desktop application and workflow environment built around agent interaction and document-centric tasks, designed to help users work with AI assistants more effectively across multiple information sources. This repository extends the idea of “agents” by providing a user-friendly interface that integrates APIs, multitasking workflows, and session sharing so that you can easily orchestrate multiple AI interactions and retrieve context from your sources. ...
    Downloads: 21 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB