Showing 20 open source projects for "pdf to text"

View related business solutions
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    borb

    borb

    borb is a library for reading, creating and manipulating PDF files

    borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    ...Each user can be assigned different permissions to perform only a specific kind of action e.g. view only documents from a specific folder. OCR technology is vital part of Papermerge. It extracts text information from scanned documents, PDF, JPEG, TIFF files.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3
    kb

    kb

    A minimalist command line knowledge base manager

    kb is a minimalist command-line knowledge base manager that gives users a fast, organized way to collect, store, search, and retrieve notes, documents, cheatsheets, procedures, and other artifacts directly from the terminal. It was created to solve the common problem of having scattered text files or reference materials on disk that are hard to search or categorize, and it surfaces a simple CLI interface with intuitive commands for adding, viewing, editing, and deleting knowledge items. Each...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Tarjamento de Dados Pessoais e Sigilosos

    Tarjamento de Dados Pessoais e Sigilosos

    Ferramenta de Tarjamento de Dados Pessoais e Sigilosos

    Ferramenta desktop open-source para tarjamento de dados sensíveis em PDFs. Detecta automaticamente CPFs, e-mails e telefones, além de permitir marcação manual. Salva como PDF não pesquisável, impedindo recuperação dos dados tarjados. v1.6.3: "Desfazer" multinível (30 passos), painel de miniaturas, documento centralizado, indicador de zoom, novos padrões de telefone e motor de marcação reescrito. ⚠️ Alguns antivírus podem gerar falsos positivos devido ao empacotamento com PyInstaller. O...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    Pdf_tools
    ✅ Image to PDF Convert multiple image files into a single PDF. Supports formats: JPG, JPEG, PNG, BMP, TIFF. ✅ PDF Merger Merge multiple PDF files into one. Reorder PDF files before merging. ✅ PDF Splitter Split PDF files by range or into individual pages. ✅ Page Remover Remove specific pages from a PDF. ✅ Fill & Sign Add text and signature to a PDF.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    Scribus

    Scribus

    Powerful desktop publishing software

    Scribus is an Open Source program that brings professional page layout to Linux, BSD UNIX, Solaris, OpenIndiana, GNU/Hurd, Mac OS X, OS/2 Warp 4, eComStation, and Windows desktops with a combination of press-ready output and new approaches to page design. Underneath a modern and user-friendly interface, Scribus supports professional publishing features, such as color separations, CMYK and spot colors, ICC color management, and versatile PDF creation.
    Leader badge
    Downloads: 13,629 This Week
    Last Update:
    See Project
  • 7
    bridgex

    bridgex

    Convert files like docx, xlsx, pptx, html, and more to MarkDown

    ... - Support for multiple input formats. - Lightweight editing prior to saving. Supported Formats 📂 Bridgex supports conversion of the following file formats: - PDF (.pdf) - Word (.docx) - PowerPoint (.pptx) - Excel (.xlsx, .xls, .csv) - Outlook Messages (.msg) - Text (.txt, .text) - Markdown (.md, .markdown) - JSON (.json, .jsonl) - XML (.xml) - RSS/Atom (.rss, .atom) - HTML/MHTML (.html, .htm, .mhtml) - ePub (.epub) - Compressed files (.zip) - Jupyter Notebooks (.ipynb) - Other formats supported by Markitdown Bridgex is not an IDE, text editor, Markdown editor, or document viewer
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    pdf combiner merger converter splitter

    pdf combiner merger converter splitter

    PDF Combiner is a user-friendly, GUI-based tool built in

    PDF Combiner is a user-friendly open source free to use, GUI-based tool for combining, pdf to excel, pdf to word, image to pdf, zip, unzip annotate and splitting PDF files. It is easy to use, supports multiple file insert and delete and process, and allows you to adjust the order of files before combining.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    pdf-editor

    pdf-editor

    Edit your PDFs without needing a subscription or creating accounts

    ...Add a parser for the command line to do multiple commands at once e.g. merge (cut pdf1) pdf2. Tested working with Python 3.8.5. Install venv (py -3.8 -m pip install virtualenv). PDF and Word documents are binary files, which makes them much more complex than plaintext files. In addition to text, they store lots of font, color, and layout information. If you want your programs to read or write to PDFs or Word documents, you’ll need to do more than simply pass their filenames to open().
    Downloads: 5 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    PDF-Shuffler
    PDF-Shuffler is a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a frontend for python-pyPdf.
    Leader badge
    Downloads: 47 This Week
    Last Update:
    See Project
  • 11
    TexLexAn is an open source text analyser for Linux, able to estimate the readability and reading time, to classify and summarize texts. It has some learning abilities and accepts html, doc, pdf, ppt, odt and txt documents. Written in C and Python.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    openPLM - open source PLM
    open source PLM system - Product Structure management (BOM management) system and Electronic documents management or Entreprise Content Management (ECM) system
    Downloads: 9 This Week
    Last Update:
    See Project
  • 13
    Whyteboard is a painting whiteboard application for Linux and Windows, that allows the annotation of PDF and PostScript documents, and image files with common drawing tools.
    Leader badge
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Create standard PDF resume's by adding, editing, deleting applicant information details. You can create your own templates by just creating a folder and a text file. It runs on Windows and Linux. For more info, read the user manual included.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    pyPdf-GUI
    pyPdf-GUI is a Python-based graphical user interface for the pure-Python PDF library pyPdf, allowing the user to easily manipulate PDF files. It can extract pages, merge several files into a single one, rotate pages in a file, extract text, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    PyTioga is for creating figures and plots with high quality text and graphics in PDF format. Text is processed directly by TeX (not an emulation), and the graphics covers a broad range of PDF features including images, curves, clipping, and transparency.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    The aim of this project is to develop a Portable Document Format (PDF) importer for OpenOffice.org Writer based on XPDF. This project was inspired by the PDF importer within KWord.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Wyneken is a content-oriented text processor that makes your life as a student easier by allowing you to create and manage digital notebooks. Wyneken also allows you to create PDF presentations, letters, articles, and reports. In 2015, Wyneken may or may not work with the latest Linux distributions, but you can use it for building pdfs by pulling our docker repo indicated in the homepage field of this site.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Process OpenOffice.org Writer Files and transform them to PDF without installing OpenOffice.org What is PyOpenOffice? * It is a class library, written in the Python Language. * It is a platform-independent command-line utility (many abilitie
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Search and index text in an archive of pdf files.
    Downloads: 9 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB