Showing 265 open source projects for "open pdf"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    PDFMathTranslate

    PDFMathTranslate

    PDF scientific paper translation with preserved formats

    PDFMathTranslate is a Python-based tool that uses AI translation to convert academic PDFs into bilingual (e.g. Chinese-English) documents while preserving formatting, including math notation. It supports OCR-enhanced content and offers CLI, GUI, Docker, and Zotero integration under AGPL v3.
    Downloads: 53 This Week
    Last Update:
    See Project
  • 2
    Blackbird

    Blackbird

    OSINT tool for finding accounts across 600+ sites by username or email

    ...Blackbird also includes an optional AI-powered profiling feature that analyzes discovered sites to generate behavioral and technical insights about a user’s online presence. Results from searches can be exported in formats such as PDF, CSV, or JSON for documentation or reporting purposes.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 3
    Merge PDF files instantly with this simple, free Windows tool by Franklin Ogot.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 4
    Docling

    Docling

    Get your documents ready for gen AI

    Docling is an open-source document processing toolkit built to prepare diverse content types for modern generative AI and data workflows. The project focuses on converting and parsing many document formats into a unified structured representation that downstream systems can easily consume. It supports advanced PDF understanding, including layout detection, table extraction, and reading order analysis, enabling high-fidelity document intelligence pipelines.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    Memvid

    Memvid

    Video-based AI memory library. Store millions of text chunks in MP4

    Memvid encodes text chunks as QR codes within MP4 frames to build a portable “video memory” for AI systems. This innovative approach uses standard video containers and offers millisecond-level semantic search across large corpora with dramatically less storage than vector DBs. It's self-contained—no DB needed—and supports features like PDF indexing, chat integration, and cloud dashboards.
    Downloads: 50 This Week
    Last Update:
    See Project
  • 6
    PDF-utility

    PDF-utility

    PDF Utility is a tool designed to efficiently manipulate PDF files

    Digna PDF Utility is a tool designed to efficiently manipulate PDF documents. It offers a range of functionalities including adding page numbers, deleting unwanted pages, merging multiple PDFs into a single file, converting PDF to DOCX and vice versa, protect a PDF file with password and displaying PDF content.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 8
    WingetUI

    WingetUI

    A better UI for your package managers

    WingetUI is an intuitive, responsive and nice-looking user interface for the most common (command-line) package managers for Windows 10 and Windows 11. With this program, you'll be able to easily download, install, update and uninstall any software that's published on those package managers. The available packages range from well-known software such as browsers, pdf readers and other common programs to other less-known utilities such as vim for windows, which I bet you didn't know it...
    Downloads: 96 This Week
    Last Update:
    See Project
  • 9
    Pix2Text

    Pix2Text

    Open-Source Python3 tool for recognizing layouts, tables, and math

    An Open-Source Python3 tool for recognizing layouts, tables, math formulas, and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality.
    Downloads: 13 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    Zerox OCR

    Zerox OCR

    PDF to Markdown with vision models

    A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense. ZeroX is an open-source machine learning framework designed for fast experimentation and production deployment, optimized for speed and ease of use.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    Remarkable for Linux

    Remarkable for Linux

    The Markdown Editor for Linux

    With Live Preview you can see your changes as you make them. There is no need to export first to check your syntax. This is accompanied by synchronized scrolling. Remarkable has Github Flavoured Markdown. This has a simple, easy-to-learn syntax with features like checklists, highlighting, links, images and more. Remarkable allows you to export your files to PDF and HTML from within the app. The HTML code is even prettified and PDFs have a TOC. You can style your markdown documents however...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    PageIndex

    PageIndex

    Document Index for Vectorless, Reasoning-based RAG

    ...This reasoning-driven retrieval aligns more naturally with how humans explore complex texts, improving relevance and traceability, especially in professional domains like financial reports, legal contracts, and technical manuals. The project includes example notebooks, scripts for tree generation and search, and support for multiple document formats including PDF and markdown, with tools designed to preserve context and semantic boundaries.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    ArchiveBox

    ArchiveBox

    Open source self-hosted web archiving

    ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline. Without active preservation effort, everything on the internet eventually disappears or degrades. Archive.org does a great job as a centralized service, but saved URLs have to be public, and they can't save every type of content. ArchiveBox is an open source tool that lets organizations & individuals archive both public & private web content while retaining control over their data....
    Downloads: 13 This Week
    Last Update:
    See Project
  • 14
    Pdf_tools
    ✅ Image to PDF Convert multiple image files into a single PDF. Supports formats: JPG, JPEG, PNG, BMP, TIFF. ✅ PDF Merger Merge multiple PDF files into one. Reorder PDF files before merging. ✅ PDF Splitter Split PDF files by range or into individual pages. ✅ Page Remover Remove specific pages from a PDF. ✅ Fill & Sign Add text and signature to a PDF. Preview the PDF and adjust signature position.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    DB-GPT

    DB-GPT

    Revolutionizing Database Interactions with Private LLM Technology

    DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 17
    Semantra

    Semantra

    Multi-tool for semantic search

    Semantra is an open-source semantic search tool designed to help users explore large collections of documents by meaning rather than simple keyword matching. The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    Mercury

    Mercury

    Convert Python notebook to web app and share with non-technical users

    Turn Python notebooks to web applications with open-source Mercury framework. Hide code and add interactive widgets. Non-technical users can tweak widgets and execute notebook with new parameters. The core of Mercury is Open Source under AGPLv3. We provide Mercury Pro with additional features, dedicated support and friendly commercial license. Mercury is a perfect tool to convert Python notebook to interactive web application and share with non-programmers.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 20
    word_cloud

    word_cloud

    A little word cloud generator in Python

    A little word cloud generator in Python. The code is tested against Python 2.7, 3.4, 3.5, 3.6 and 3.7. If you are using conda, you can install from the conda-forge channel. wordcloud depends on numpy and pillow. To save the wordcloud into a file, matplotlib can also be installed. If there are no wheels available for your version of python, installing the package requires having a C compiler set up. Before installing a compiler, report an issue describing the version of python and operating...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 21
    Pysheeet

    Pysheeet

    Python Cheat Sheet

    Pysheeet is a community-driven collection of Python code snippets covering common patterns and tasks like sockets, file I/O, data structures, and more. Each snippet is concise and battle-tested, designed to save coding time and reduce boilerplate. With documentation hosted on Read the Docs and an active GitHub repo, it’s a go-to resource for Python developers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    ChatGPT Academic

    ChatGPT Academic

    ChatGPT extension for scientific research work

    ChatGPT extension for scientific research work, specially optimized academic paper polishing experience, supports custom shortcut buttons, supports custom function plug-ins, supports markdown table display, double display of Tex formulas, complete code display function, new local Python/C++/Go project tree Analysis function/Project source code self-translation ability, newly added PDF and Word document batch summary function/PDF paper full-text translation function. All buttons are...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Khoj

    Khoj

    An AI personal assistant for your digital brain

    Get more done with your open-source AI personal assistant. Khoj is a desktop application to search and chat with your notes, documents, and images. It is an offline-first, open-source AI personal assistant that is accessible from Emacs, Obsidian or your Web browser. Khoj is a thinking tool that is transparent, fun, and easy to engage with. You can build faster and better by using Khoj to search and reason across all your data sources. Khoj learns from your notes and documents to function as...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 25
    deepdoctection

    deepdoctection

    A Repo For Document AI

    DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...
    Downloads: 3 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB