Showing 2020 open source projects for "documents"

View related business solutions
  • Application Monitoring That Won't Slow Your App Down Icon
    Application Monitoring That Won't Slow Your App Down

    AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

    Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. The repository provides model code and inference scripts that let researchers and developers run and benchmark the system on both images and PDFs, with support for batch evaluation and optimized pipelines leveraging vLLM and transformers.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    whatsapp-web.js

    whatsapp-web.js

    WhatsApp library for NodeJS that connects through the browser app

    A WhatsApp client library for NodeJS that connects through the WhatsApp Web browser app. Programmatically control WhatsApp whether you're running user or business accounts. It uses Puppeteer to run a real instance of Whatsapp Web to avoid getting blocked. Programmatically control WhatsApp whether you're running user or business accounts. Whatsapp-web.js connects to an official version of WhatsApp Web under the hood, reducing ban risks. The object-oriented approach makes it easy to get...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 3
    fess

    fess

    Open source enterprise search server for websites, files, and data

    ...It enables organizations to quickly deploy a scalable search environment without requiring deep knowledge of underlying search technologies. Fess is built on top of OpenSearch and offers an integrated solution for crawling, indexing, and searching documents from websites, file systems, and various data stores. Fess includes a built-in crawler that can collect content from sources such as databases, CSV files, and shared storage, making it suitable for centralized knowledge discovery. It supports indexing and searching across many document formats including office documents, PDFs, and compressed archives. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    pagedown

    pagedown

    Paginate the HTML Output of R Markdown with CSS for Print

    ...You only need a modern web browser (e.g., Google Chrome or Microsoft Edge) to generate PDF. No need to install LaTeX to get beautiful PDFs. This R package stands on the shoulders of two giants to support typesetting with CSS for R Markdown documents: Paged.js and ReLaXed (we only borrowed some CSS from the ReLaXed repo and didn't really use the Node package).
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • 5
    MinDoc

    MinDoc

    Document management system developed for the IT team

    ...The starting point for development is that the company's IT department needs a simple and practical system for document management and sharing of project interfaces. Its function and interface originate from kancloud. It can be used to store daily interface documents, database dictionaries, manual descriptions and other documents. Built-in project management, user management, permission management and other functions can meet the document management needs of most small and medium teams.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Cognita

    Cognita

    Open source RAG framework for building scalable modular AI apps

    ...Cognita provides reusable components such as parsers, data loaders, embedders, retrievers, and query controllers, allowing teams to customize each stage of the RAG pipeline independently. It includes both a backend service and a frontend interface, enabling users to upload documents, experiment with configurations, and perform question-answering tasks interactively. Cognita supports incremental indexing, meaning it processes only new or updated data to reduce computational overhead and improve efficiency.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    gm

    gm

    R Package for Music Score and Audio Generation

    Create music easily, and show musical scores and audio files in R Markdown documents, R Jupyter Notebooks and RStudio.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    PapersGPT

    PapersGPT

    A powerful Zotero AI and MCP plugin with ChatGPT, Gemini 3.1, Claude

    PapersGPT is an AI-powered plugin that integrates directly into Zotero to transform how researchers interact with academic papers and literature collections. It enables users to chat with individual PDFs or entire collections, allowing them to extract insights, generate summaries, and explore connections between documents without leaving the Zotero environment. The plugin supports a wide range of state-of-the-art language models, including GPT, Claude, Gemini, and open-source alternatives, giving users flexibility in choosing performance, cost, and privacy trade-offs. One of its most powerful features is its ability to process large volumes of academic content quickly, enabling tasks such as literature reviews, theoretical analysis, and research synthesis to be completed significantly faster. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 9
    Unstructured.IO

    Unstructured.IO

    Open source libraries and APIs to build custom preprocessing pipelines

    The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of unstructured revolve around streamlining and optimizing the data processing workflow for LLMs. unstructured modular bricks and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and is efficient in transforming unstructured data into structured outputs.
    Downloads: 4 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Nyxt

    Nyxt

    The hacker's power-browser

    Out of the box Nyxt ships with tens of features that allow you to quickly analyze, navigate, and extract information from the Internet. Plus, Nyxt is fully hackable- all of its source code can be introspected, modified, and tweaked to your exact specification. Navigate large documents with ease. Utilize the power of running commands against multiple objects to avoid repeating yourself. You can select and close all buffers that match the string "ele". Fuzzy search-relevant commands to instantly run them. No more digging through menus. Use fuzzy search to instantly switch between buffers. No more hunting! Use link hinting to quickly jump around. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    swift-html

    swift-html

    Swift DSL for type-safe, extensible, and transformable HTML documents

    A Swift DSL for type-safe, extensible, and transformable HTML documents. The popular choice for rendering HTML in Swift these days is to use templating languages, but they expose your application to runtime errors and invalid HTML. Our library prevents these runtime issues at compile-time by embedding HTML directly into Swift’s powerful type system. Underneath the hood these tag functions html, body, h1, etc., are just creating and nesting instances of a Node type, which is a simple Swift enum. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Asciidoc Editor based on JavaFX 20

    Asciidoc Editor based on JavaFX 20

    Asciidoc Editor and Toolchain written with JavaFX 19

    Asciidoc FX is a WYSIWYG editor for the Asciidoc markup language. You can build PDF, Epub, and HTML books, documents, and slides. Supported Operating Systems and Builds shows the list of available builds with links for reference. If you are looking for the very latest version, visit the link in the note above to be guaranteed of downloading the latest and greatest version of AsciidocFX. AsciidocFX converts documents via the AsciidoctorJ library.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    GBrain

    GBrain

    Garry's Opinionated OpenClaw/Hermes Agent Brain

    ...GBrain introduces a hybrid retrieval model that combines embeddings with ranking strategies to improve relevance when querying large datasets. It also organizes knowledge into structured documents with summaries and timelines, helping agents maintain context and track changes in information.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 14
    Hermes Agent

    Hermes Agent

    The agent that grows with you

    Hermes Agent is a fully open-source autonomous AI agent designed to run persistently on your own machine or server, becoming more capable the longer it operates by learning from experience and building reusable procedural skills. Rather than functioning as a stateless chatbot, it maintains long-term memory across sessions and can generate searchable “Skill Documents” that capture how it solved complex tasks so it doesn’t start from scratch each time. The agent interfaces with messaging platforms like Telegram, Discord, Slack, and WhatsApp through a single gateway process, and also offers an interactive terminal user interface with history, autocomplete, and streamable tool output. It supports scheduled automation in natural language, allowing users to set up recurring tasks such as daily briefings or system audits that it runs unattended.
    Downloads: 119 This Week
    Last Update:
    See Project
  • 15
    iLovePDF Api

    iLovePDF Api

    iLovePDF Rest Api - PHP Library

    ...We offer a simple and concise API Reference and Guide as well as API Libraries with their own docs too. Our infrastructure uses the best PDF technology for processing PDF files. Merge and split documents with a variety of custom options. Remove, extract or organize PDF pages as you need. Reduce the size of your PDF while maintaining its original quality and formatting. Easily convert Images, MS Word, PowerPoint and Excel files into non-editable PDF documents. Convert PDF documents to JPG images or to PDF/A format.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    Paperless-AI

    Paperless-AI

    AI-powered document analysis and tagging for Paperless-ngx

    ...Users can ask contextual questions about their files and receive precise answers based on full document understanding rather than simple keyword matching. Paperless-AI also includes a web interface for manual review and tagging, allowing greater control when handling sensitive or complex documents.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    tinypdf

    tinypdf

    Minimal PDF creation library

    ...The library supports essential primitives like writing text, drawing basic shapes, and placing JPEG images, which covers common needs such as invoices, receipts, tickets, and simple reports. It also supports clickable links so generated documents can include interactive URLs, and it can create multi-page documents with custom page sizes. A notable convenience is built-in markdown-to-PDF conversion for common structures like headers and lists, letting you go from formatted text to a PDF layout quickly.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    iText Core/Community

    iText Core/Community

    iText for .NET is the .NET version of the iText library

    iText Core/Community (previously known as iTextSharp) is a high-performance, battle-tested library that allows you to create, adapt, inspect, and maintain PDF documents, allowing you to add PDF functionality to your software projects with ease. It is also available for Java. For more advanced examples, refer to our Knowledge Base or the main Examples repo. You can find C# equivalents to the Java Signing examples here, though the Java code is very similar since they have the same API. Some of the output PDF files will be incorrectly displayed by the GitHub previewer, so be sure to download them to see the correct results. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 106 This Week
    Last Update:
    See Project
  • 20
    OpenSign

    OpenSign

    🔥 The free & Open Source DocuSign alternative

    The premier open source document signing solution (DocuSign alternative). Welcome to OpenSign, the premier open source docusign alternative - document e-signing solution designed to provide a secure, reliable and free alternative to commercial esign platforms like DocuSign, PandaDoc, SignNow, Adobe Sign, Smartwaiver, SignRequest, HelloSign & Zoho sign. Our mission is to democratize the document signing process, making it accessible and straightforward for everyone.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    latex-action

    latex-action

    GitHub Action to compile LaTeX documents

    GitHub Action to compile LaTeX documents. It runs in a Docker container with a full TeXLive environment installed.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    EzXML.jl

    EzXML.jl

    XML/HTML handling tools for primates

    EzXML.jl is a package to handle XML/HTML documents for primates. This package depends on libxml2, which will be automatically installed as an artifact via XML2_jll.jl if you use Julia 1.3 or later. Currently, Windows, Linux, macOS, and FreeBSD are now supported.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    PDFMathTranslate

    PDFMathTranslate

    PDF scientific paper translation with preserved formats

    PDFMathTranslate is a Python-based tool that uses AI translation to convert academic PDFs into bilingual (e.g. Chinese-English) documents while preserving formatting, including math notation. It supports OCR-enhanced content and offers CLI, GUI, Docker, and Zotero integration under AGPL v3.
    Downloads: 43 This Week
    Last Update:
    See Project
  • 24
    Homebox

    Homebox

    Inventory and organization system built for the Home User

    ...Written in Go with a web-based UI, Homebox emphasizes low resource usage and portable deployment, making it ideal for self-hosting with a single Docker container or a compiled binary. Users can organize inventory into categories, locations, and tags, attach images and documents, and track purchase dates, prices, warranties, and maintenance schedules to keep all home information in one place. The embedded web UI is responsive across devices from desktops to smartphones, providing powerful search and filters that let users find items quickly. Homebox supports custom fields for extended metadata and uses SQLite for easy setup and backups, facilitating straightforward deployment without complex infrastructure.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    Open Notebook

    Open Notebook

    An Open Source implementation of Notebook LM with more flexibility

    ...The platform supports 16+ AI providers—including OpenAI, Anthropic, Ollama, Google, and LM Studio—allowing flexible model choice and cost optimization. Open Notebook enables users to organize and analyze multi-modal content such as PDFs, videos, audio files, web pages, and Office documents. It combines full-text and vector search with context-aware AI chat to deliver insights grounded in your own research materials. With advanced features like multi-speaker podcast generation, customizable content transformations, and a comprehensive REST API, Open Notebook provides a powerful and extensible research environment.
    Downloads: 22 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB