Showing 2020 open source projects for "documents"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 1
    Readest

    Readest

    Readest is a modern, feature-rich ebook reader

    ...The goal appears to be to let users feed in arbitrary reading material and then interact with it (highlighting, translation, lookup, maybe TTS or summarization) more comfortably. Because of that, it's oriented towards learners, researchers, or people dealing with multilingual documents — especially when they need to rapidly digest or reference large amounts of text. The design seems to prioritize flexible input formats, possibly OCR or uploaded documents, and interactive tools to navigate or annotate them.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 2
    TeXtidote

    TeXtidote

    Spelling, grammar and style checking on LaTeX documents

    If so, you probably know that the process is far from simple. Since LaTeX documents contain special commands and keywords (the so-called "markup") that are not part of the "real" text, you cannot run a grammar checker directly on these files: it cannot tell the difference between markup and text. The other option is to remove all this markup, leaving only the "clear" text; however, when a grammar tool points to a problem at a specific line in this clear text, it becomes hard to retrace that location in the original LaTeX file. ...
    Downloads: 77 This Week
    Last Update:
    See Project
  • 3
    PyMuPDF

    PyMuPDF

    Python bindings for MuPDF's rendering library.

    ...It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on the screen. The viewer is small, fast, yet complete. It supports many document formats, such as PDF, XPS, OpenXPS, CBZ, EPUB, and FictionBook 2. You can annotate PDF documents and fill out forms with the mobile viewers (this feature is coming soon to the desktop viewer as well). The command line tools allow you to annotate, edit, and convert documents to other formats such as HTML, SVG, PDF, and CBZ. You can also write scripts to manipulate documents using Javascript. The library is written modularly in portable C, so features can be added and removed by integrators if they so desire.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 4
    QuestPDF

    QuestPDF

    A library that can help you with generating PDF documents

    Quickly design and generate PDF documents with an open-source, modern, and battle-tested C# library. Forget about limitations, feel confident, enjoy your task and efficiently deliver professional products. QuestPDF is a progressive library that can help you with generating PDF documents in your .NET application by offering a friendly, discoverable and predictable C# fluent API.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    ...The repository supports handling common ebook formats and generating outputs that combine audio plus caption metadata. By automating text-to-speech for arbitrary documents, abogen reduces the friction of producing audiobooks and could be integrated into larger workflows (e.g., batch converting a library of texts).
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    Genji

    Genji

    Document-oriented, embedded SQL database

    Genji is an embedded database written in Go that aims to simplify dealing with data in the modern world. It combines the power of SQL with the versatility of documents to provide a maximum of flexibility with no compromise. Run powerful queries on rich documents with an expressive SQL syntax. Create tables with strict schemas, partial schemas, or no schemas at all to control your data the way you want it. Fully serializable transactions, in-memory mode, memory usage control, and more. Genji was designed for simplicity in mind. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    kotaemon

    kotaemon

    An open-source RAG-based tool for chatting with your documents

    An open-source clean & customizable RAG UI for chatting with your documents. Built with both end users and developers in mind. This project serves as a functional RAG UI for both end users who want to do QA on their documents and developers who want to build their own RAG pipeline.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    CryptPad
    CryptPad is an open-source, end-to-end encrypted collaborative office suite developed by XWiki SAS. It offers a privacy-focused alternative to mainstream cloud-based productivity tools, enabling users to create, edit, and share documents securely without compromising data privacy. All content is encrypted client-side, ensuring that only authorized users can access the information. CryptPad supports various applications, including rich text documents, spreadsheets, presentations, code editing, kanban boards, polls, and whiteboards, facilitating real-time collaboration among teams. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 9
    chatd

    chatd

    Chat with your documents using local AI

    ...The application typically runs models such as Mistral-7B and allows users to load and analyze documents while asking questions in natural language. Unlike many document-chat tools that require manual installation of model servers, chatd packages the model runner with the application so that users can start interacting with documents immediately after launching the program.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Docspell

    Docspell

    Assist in organizing your piles of documents

    ...It is targeted for home use, i.e. families, households, and also for smaller groups/companies. You can associate tags, set correspondent,s and lots of other predefined and custom metadata. If your documents are associated with such metadata, you can quickly find them later using the search feature. However adding this manually is a tedious task. Docspell can help by suggesting correspondents, guessing tags or finding dates using machine learning. It can learn metadata from existing documents and find things using NLP. This makes adding metadata to your documents a lot easier. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    LLM-Aided OCR Project

    LLM-Aided OCR Project

    Enhances Tesseract OCR output using LLMs (local or API)

    ...The project is particularly useful for digitizing historical documents, research papers, and scanned materials where traditional OCR often struggles. It also includes tools for processing batches of images or documents, enabling automated document digitization workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    LibreSign

    LibreSign

    Nextcloud app to sign PDF documents

    ...Beyond offering agility and security in digital signatures and document management, LibreSign features functionalities that adapt to the specific needs of your organization. Keep your documents secure with end-to-end encryption and multi-factor authentication, ensuring protection throughout the electronic document signing process. Hybrid signatures streamline negotiation processes, offering flexibility in choosing between personal or system-generated digital certificates for signing documents digitally with LibreSign. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    PdfPig

    PdfPig

    Read and extract text and other content from PDFs in C#

    This project allows users to read and extract text and other content from PDF files. In addition the library can be used to create simple PDF documents containing text and geometrical shapes.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 14
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    Open Semantic Search is an open source research and analytics platform designed for searching, analyzing, and exploring large collections of documents using semantic search technologies. It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources. Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 15
    MiroFish

    MiroFish

    A Simple and Universal Swarm Intelligence Engine

    MiroFish is a next-generation artificial intelligence prediction engine that leverages multi-agent technology and swarm-intelligence simulation to model, simulate, and forecast complex real-world scenarios. The system extracts “seed” information from sources such as breaking news, policy documents, and market signals to construct a high-fidelity digital parallel world populated by thousands of virtual agents with independent memory and behavior rules. Users can inject variables or conditions into this simulated environment from a “god’s eye view,” enabling iterative prediction of future trends under different assumptions, which can be useful for decision support, scenario planning, or creative exploration. ...
    Downloads: 397 This Week
    Last Update:
    See Project
  • 16
    MarkPDFDown

    MarkPDFDown

    A high-quality PDF to Markdown tool based on large language model

    ...The software is particularly useful for developers working with technical documents, academic papers, or reports that need to be indexed, summarized, or processed by downstream AI systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Symfony DomCrawler

    Symfony DomCrawler

    Eases DOM navigation for HTML and XML documents

    Symfony DomCrawler is a PHP component that provides powerful tools for navigating and extracting data from HTML and XML documents. It allows developers to parse, filter, and manipulate web pages using CSS selectors and XPath expressions. DomCrawler is widely used for web scraping, testing, and processing structured content, and integrates well with other Symfony components like BrowserKit.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    WebLink Component

    WebLink Component

    Manages links between resources

    The WebLink component manages links between resources. It is particularly useful to advise clients to preload and prefetch documents through HTTP and HTTP/2 pushes. This component implements the HTML5's Links, Preload and Resource Hints W3C's specifications. It can also be used with extensions defined in the HTML5 link type extensions wiki.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Taplo

    Taplo

    A TOML toolkit written in Rust

    A versatile, feature-rich TOML toolkit. This is the repository for Taplo, a TOML v1.0.0 toolkit, more details are on the website. Validate TOML documents syntactically or against JSON schemas. Formatter with fine-grained options. Embeddable language server with features based on JSON schemas. Available wherever Rust compiles. Taplo CLI aims to be an one stop shop tool for working with TOML files via the command line. The features include validation, formatting, and querying TOML documents with a jq-like fashion.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Paper2Slides

    Paper2Slides

    From Paper to Presentation in One Click

    Paper2Slides is an automation tool that converts research papers, reports, and other documents into polished slide decks and posters with minimal manual effort. It is designed to replace the repetitive work of turning dense technical documents into presentation-friendly structure by extracting key points, figures, and data into a coherent visual narrative. The system supports multiple input formats, so you can process PDFs and common office documents rather than being locked to a single file type. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    Nokogiri

    Nokogiri

    Tool to work with XML and HTML from Ruby

    Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (C) and xerces (Java). Be secure-by-default by treating all documents as untrusted by default. Be a thin-as-reasonable layer on top of the underlying parsers, and don't attempt to fix behavioral differences between the parsers. "Native gems" contain pre-compiled libraries for a specific machine architecture. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Semantra

    Semantra

    Multi-tool for semantic search

    ...By relying on semantic embeddings and contextual analysis, the tool can identify passages that are relevant even when the query uses different wording than the source documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    InternLM-XComposer-2.5

    InternLM-XComposer-2.5

    InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System

    ...The model is built on top of the InternLM language model architecture and extends its capabilities to handle multimodal inputs and outputs. Instead of producing only textual responses, the system can generate visually enriched documents such as illustrated articles, presentations, and educational materials. It incorporates visual understanding modules that allow the model to analyze images and integrate them into coherent narrative outputs. The framework also supports tasks such as image captioning, multimodal reasoning, and layout generation for structured visual documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PHP7

    PHP7

    PHP7 / Laravel Multi-format Streaming Parser

    When it comes to parsing XML/CSV/JSON/... documents, there are 2 approaches to consider. DOM loading loads all the documents, making it easy to navigate and parse, and as such provides maximum flexibility for developers. Streaming implies iterating through the document, acts like a cursor, and stops at each element in its way, thus avoiding memory overkill. Thus, when it comes to big files, callbacks will be executed meanwhile file is downloading and will be much more efficient as far as memory is concerned.
    Downloads: 7 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB