Showing 16 open source projects for "pdf to text"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 2
    zpdf

    zpdf

    Zero-copy PDF text extraction library written in Zig

    ...It implements multiple PDF decompression filters and handles common font encoding pathways, which are essential for turning raw PDF content streams into readable text. It also understands both classic cross-reference tables and newer cross-reference streams, including PDF 1.5+ features, and it offers configurable strict vs permissive error handling depending on whether you prioritize correctness or robustness.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    borb

    borb

    borb is a library for reading, creating and manipulating PDF files

    borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    xhtml2pdf

    xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    ...Each user can be assigned different permissions to perform only a specific kind of action e.g. view only documents from a specific folder. OCR technology is vital part of Papermerge. It extracts text information from scanned documents, PDF, JPEG, TIFF files.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 6
    Google Open Source Project Style Guide

    Google Open Source Project Style Guide

    Chinese version of Google open source project style guide

    ...If the project you are modifying originates from Google, you may be directed to the English version of the project page to understand the style used by the project. The Chinese version of the project uses reStructuredText plain text markup syntax, and uses Sphinx to generate document formats such as HTML / CHM / PDF.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Sphinx

    Sphinx

    Main repository for the Sphinx documentation builder

    ...It was originally created for the Python documentation, and it has excellent facilities for the documentation of software projects in a range of languages. Of course, this site is also created from reStructuredText sources using Sphinx! HTML (including Windows HTML Help), LaTeX (for printable PDF versions), ePub, Texinfo, manual pages, plain text. Semantic markup and automatic links for functions, classes, citations, glossary terms and similar pieces of information. Easy definition of a document tree, with automatic links to siblings, parents and children. General index as well as a language-specific module index. Automatic highlighting using the Pygments highlighter. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 8
    PageIndex

    PageIndex

    Document Index for Vectorless, Reasoning-based RAG

    ...The project includes example notebooks, scripts for tree generation and search, and support for multiple document formats including PDF and markdown, with tools designed to preserve context and semantic boundaries.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    ...Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP, GraphQL protocols with TLS. Intuitive design pattern for high-performance microservices. Seamless Docker container integration: sharing, exploring, sandboxing, versioning and dependency control via Jina Hub. Fast deployment to Kubernetes, Docker Compose and Jina Cloud. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    bridgex

    bridgex

    Convert files like docx, xlsx, pptx, html, and more to MarkDown

    ... - Support for multiple input formats. - Lightweight editing prior to saving. Supported Formats 📂 Bridgex supports conversion of the following file formats: - PDF (.pdf) - Word (.docx) - PowerPoint (.pptx) - Excel (.xlsx, .xls, .csv) - Outlook Messages (.msg) - Text (.txt, .text) - Markdown (.md, .markdown) - JSON (.json, .jsonl) - XML (.xml) - RSS/Atom (.rss, .atom) - HTML/MHTML (.html, .htm, .mhtml) - ePub (.epub) - Compressed files (.zip) - Jupyter Notebooks (.ipynb) - Other formats supported by Markitdown Bridgex is not an IDE, text editor, Markdown editor, or document viewer
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Text-ly

    Text-ly

    Text.ly - An alternative for Notepad.

    LOOKING FOR Text Editor? You've Come At The Right Place! Editing Your text for your simplicity A Text editor for Editing Text....! Just download and install and use as an alternative for typical Notepad. This application is compiled from the Pyinstaller library so don't mind there is a vulnerability or something the antivirus program might show it as malware or trojan this happens with most of the apps compiled from the Pyinstaller library. So No worries There is not any malware or virus...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Paperless-ng

    Paperless-ng

    A supercharged version of paperless, scan, index and archive docs

    Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. I wrote this to make “going paperless” easier. I do not have to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    Dvipdfm tool for SCons

    SCons tool to cooperate with dvipdfm program

    SCons is a make replacement providing a range of enhanced features such as automated dependency generation and built in compilation cache support. SCons rule sets are Python scripts so as well as the features it provides itself SCons allows you to use the full power of Python to control compilation. This is a SCons extension (tool) which enables usage of the dvipdfm program to convert dvi files to pdf.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    openPLM - open source PLM
    open source PLM system - Product Structure management (BOM management) system and Electronic documents management or Entreprise Content Management (ECM) system
    Downloads: 14 This Week
    Last Update:
    See Project
  • 15
    uListen is a TTS(Text To Speech) application. It can TALK you the web pages, chm files, pdf files and word files and plain text files.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    POST (Python Obviously Simple Text) provides support for simple, flexible dynamic document generation in multiple output formats. Supports inputs in text or XML, outputs in HTML, PDF, RTF, LaTeX source, nroff source, postscript, and plain text.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB