Showing 202 open source projects for "apache pdf"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    NeMo Retriever Library

    NeMo Retriever Library

    Document content and metadata extraction microservice

    NeMo Retriever Library is a scalable microservice framework designed for extracting, structuring, and enriching content from documents to support downstream generative AI applications. It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Extractous

    Extractous

    Fast and efficient unstructured data extraction

    Extractous is a Rust-based unstructured data extraction library focused on fast local parsing of documents and other content-heavy files. Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. For broader format support, the system combines its Rust core with ahead-of-time compiled Apache Tika shared libraries, which allows it to extend parsing coverage while still avoiding traditional server-based overhead. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Outline

    Outline

    Fastest wiki and knowledge base for growing teams

    A modern team knowledge base for your internal documentation, product specs, support answers, meeting notes, onboarding, & more. An intuitive editor with markdown support, slash commands, rich embeds, and more. Beautiful documents, without even trying. Search and share documents without ever leaving your team chat. Nest documents in a hierachy, automatically build a network of backlinks and search across everything. Onboard new team members easily through internal guides, resources, and...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    QPDF

    QPDF

    PDF transformation/manipulation program + library

    QPDF is a C++ library and set of programs that inspect and manipulate the structure of PDF files. It can encrypt and linearize files, expose the internals of a PDF file, and do many other operations useful to end users and PDF developers.
    Leader badge
    Downloads: 1,022 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    PRDownloader

    PRDownloader

    A file downloader library for Android with pause and resume support

    A file downloader library for Android with pause and resume support. PRDownloader can be used to download any type of files like image, video, pdf, apk and etc. This file downloader library supports pause and resume while downloading a file. Supports large file download. This downloader library has a simple interface to make download request. We can check if the status of downloading with the given download Id. PRDownloader gives callbacks for everything like onProgress, onCancel, onStart,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    picocli

    picocli

    Framework for building GraalVM-enabled command line apps

    Picocli is a one-file framework for creating Java command-line applications with almost zero code. It supports a variety of command-line syntax styles including POSIX, GNU, MS-DOS and more. It generates highly customizable usage help messages that use ANSI colors and styles to contrast important elements and reduce the cognitive load on the user. Picocli-based applications can have command line TAB completion showing available options, option parameters, and subcommands, for any level of...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    word_cloud

    word_cloud

    A little word cloud generator in Python

    ...Before installing a compiler, report an issue describing the version of python and operating system being used. The wordcloud_cli tool can be used to generate word clouds directly from the command-line. If you're dealing with PDF files, then pdftotext, included by default with many Linux distribution, comes in handy. Use wordcloud_cli --help so see all available options. The wordcloud library is MIT licenced, but contains DroidSansMono.ttf, a true type font by Google, that is apache licensed.
    Downloads: 2 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    Controllable-RAG-Agent

    Controllable-RAG-Agent

    This repository provides an advanced RAG

    Controllable-RAG-Agent is an advanced Retrieval-Augmented Generation (RAG) system designed specifically for complex, multi-step question answering over your own documents. Instead of relying solely on simple semantic search, it builds a deterministic control graph that acts as the “brain” of the agent, orchestrating planning, retrieval, reasoning, and verification across many steps. The pipeline ingests PDFs, splits them into chapters, cleans and preprocesses text, then constructs vector...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Hypernomicon

    Hypernomicon

    Hypertext-infused philosophy personal database software

    Hypernomicon is a personal productivity/database application for researchers that combines structured note-taking, mind-mapping, management of files (e.g., PDFs) and folders, and reference management into an integrated environment that organizes all of the above into semantic networks or hierarchies in terms of debates, positions, arguments, labels, terminology/concepts, and user-defined keywords by means of database relations and automatically generated hyperlinks (hence ‘Hyper’ in the...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 13
    Rolemaster Office
    PC and NPC character generator for Rolemaster RMFRP roleplaying system (from Iron Crown Enterprises). The program calculates all bonus and generates a nice PDF character sheet that contains additionally pages. The programm does not provide during-game support.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 14

    FOray

    Modular XSL-FO Implementation for Java.

    FOray is an open-source XSL-FO publishing system that is suitable for converting XML content into PDF and other document formats. Although not yet fully conformant with the XSL-FO standard, it is very useful for many applications.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    DataExtract

    DataExtract

    Extracts Data Types Like Email Addresses From All Kinds Of Files

    DataExtract is a program that scans files of many different types - text, PDF, Word, Excel etc, extracting all kinds of structured patterns, like email addresses and phone numbers, from them.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Provides optical character recognition (OCR) solutions for Vietnamese language.
    Leader badge
    Downloads: 188 This Week
    Last Update:
    See Project
  • 17
    neoHort  iText&OpenPdf&JExcel&ApachePOI

    neoHort iText&OpenPdf&JExcel&ApachePOI

    neoHort:Java PDF&XLSX runtime builder-based iText&JExcelAPI&Apache POI

    neoHort: Java PDF&XLSX runtime builder. Based: iText 2.1.7, OpenPdf, JExcelAPI, POI libraries. Xml-based input source with integrated WebJava environment objects. Includes dynamical tag's structures. Demo https://neohort.herokuapp.com/ https://neohort4ape.appspot.com GitHub https://github.com/surban1974/neohort neoHort5 migrated to https://sourceforge.net/p/neohort5 Maven https://github.com/surban1974/neohort/blob/master/README.md
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    ant4docbook

    ant4docbook

    ANT4DOCBOOK is an ANT task for DOCBOOK

    ANT4DOCBOOK is an ANT task for DOCBOOK, a semantic markup language for technical documentation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    Multiuser HylaFAX PHP/MySQL Web interface for viewing faxes online, downloading & emailing in PDF format, and categorizing & archiving all sent and received faxes.
    Downloads: 38 This Week
    Last Update:
    See Project
  • 20
    InventarVerwaltung

    InventarVerwaltung

    Eine einfache, schnelle und lokale Inventarverwaltung für Desktop

    ... * Nutzung kostenlos (privat & kommerziell) * Weitergabe erlaubt (unverändert) * Modifikation und Verkauf nicht erlaubt 📄 Details siehe: `LICENSE_DE.txt, LICENSE_EN.txt` ## ⚙️ Systemanforderungen * Windows / Linux ## 📚 Drittanbieter-Komponenten Diese Software verwendet folgende Bibliotheken: * FlatLaf (Apache 2.0) * Apache POI (Apache 2.0) * SQLite JDBC (Apache 2.0) * und weitere 📄 Details siehe: `THIRD_PARTY_LICENSES.txt`
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    WKFsuite

    WKFsuite

    WKFsuite - Quick and easy leave and vacation management

    # ITALIANO ## Funzionalità Principali - Gestione permessi e ferie aziendali - Database locale - dati restano in azienda - Accesso da PC e smartphone via WiFi aziendale - Multi-utente: Admin, Manager, Dipendente ## Versione FREE - Sistema completo gestione permessi - Dashboard e calendario permessi - Richieste con approvazione/rifiuto - Export PDF ## Versione PRO (€20 una tantum) - Email automatiche approvazioni/rifiuti - Grafici analytics avanzati - Supporto prioritario :...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    C# ECG Toolkit

    C# ECG Toolkit

    ECG Toolkit support for: SCP-ECG, DICOM, HL7 aECG, ISHNE & MUSE-XML

    C# ECG Toolkit is an open source software toolkit to convert, view and print electrocardiograms. The toolkit is developed using C# .NET Framework 2.0 and later (code also supports netstandard2.0). Support for ECG formats: SCP-ECG, DICOM, HL7 aECG, ISHNE, MUSE-XML and OmronECG.
    Leader badge
    Downloads: 19 This Week
    Last Update:
    See Project
  • 23
    crystal-facet-uml

    crystal-facet-uml

    Create consistent Uml diagrams

    As software architect, you create a set of diagrams describing use-cases, requirements, structural views, behavioral and deployment views. crystal_facet_uml keeps element names and element hierarchies consistent. It exports diagrams in svg, pdf, ps and png formats to be used in text processing systems like docbook, html, latex. This tool runs on your local PC and is based on glib, gdk, gtk, cairo, pango, sqlite.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 24
    MyBox

    MyBox

    Easy Tools of PDF, Image, File, Network, Data, and Medias

    javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    BimmerLink Log Analysis

    BimmerLink Log Analysis

    BMW / MINI Bimmerlink Log Visualizer

    What is Bimmerlink? https://www.bimmerlink.app/ Bimmerlink is an application that reads real-time sensor data and vehicle telemetry from BMW and MINI cars through the OBD port. It allows users to capture time-series logs directly from the vehicle's electronic control units. In this project, I aimed to visualize the log files generated by Bimmerlink (exported as .csv) and convert them into a structured PDF report with charts and insights. Disclaimer This is a small utility...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB