Page 3 | pdf-jp free download

Showing 305 open source projects for "pdf-jp"

View related business solutions

Python Clear Filters & Widen Search

$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
1

deepdoctection

A Repo For Document AI

DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...

Downloads: 3 This Week

Last Update: 4 days ago
See Project
2

abogen

Generate audiobooks from EPUBs, PDFs and text with captions

abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on...

Downloads: 5 This Week

Last Update: 2026-02-06
See Project
3

DB-GPT

Revolutionizing Database Interactions with Private LLM Technology

DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.

Downloads: 2 This Week

Last Update: 2026-03-27
See Project
4

LlamaParse

Parse files for optimal RAG

LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.

Downloads: 1 This Week

Last Update: 2026-02-13
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

ArchiveBox

Open source self-hosted web archiving

ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline. Without active preservation effort, everything on the internet eventually disappears or degrades. Archive.org does a great job as a centralized service, but saved URLs have to be public, and they can't save every type of content. ArchiveBox is an open source tool that lets organizations & individuals archive both public & private web content while retaining control over their data....

Downloads: 3 This Week

Last Update: 2024-12-15
See Project
6

Tarjamento de Dados Pessoais e Sigilosos

Ferramenta de Tarjamento de Dados Pessoais e Sigilosos

...Com a Ferramenta de Tarjamento de Dados Pessoais e Sigilosos, o usuário consegue selecionar dados que devem ser tarjados, facilitando seu trabalho. Ao final do procedimento, a ferramenta salva o arquivo como um PDF não pesquisável, mantendo a proteção do dado que foi tarjado. Desenvolvido por: Fábio Silva de Moraes (Linkedin @fsmoraesrj) e Alexandre Jammel (Linkedin @alexandre-jammel)

Downloads: 0 This Week

Last Update: 2024-06-12
See Project
7

Perf Book

The book "Performance Analysis and Tuning on Modern CPU"

This project is a practical guide to performance analysis and tuning on modern CPUs, bridging microarchitecture details with hands-on profiling. It explains how caches, TLBs, prefetchers, branch predictors, and out-of-order execution influence real program speed, then connects those concepts to concrete optimization strategies. Readers learn how to design trustworthy benchmarks, avoid measurement traps (warmup, turbo, frequency scaling), and interpret hardware performance counters. The book...

Downloads: 2 This Week

Last Update: 2025-09-23
See Project
8

myGPTReader

AI Slack bot for reading, summarizing, and chatting with content

myGPTReader is an AI-powered Slack bot designed to help users read, summarize, and interact with various types of digital content through conversational interfaces. It enables users to quickly understand web pages, documents, and even video content by transforming them into interactive discussions rather than static reading experiences. myGPTReader supports a wide range of file formats, including eBooks, PDFs, and text-based documents, making it flexible for both casual and professional use...

Downloads: 1 This Week

Last Update: 3 days ago
See Project
9

NeMo Retriever Library

Document content and metadata extraction microservice

NeMo Retriever Library is a scalable microservice framework designed for extracting, structuring, and enriching content from documents to support downstream generative AI applications. It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
10

Mercury

Convert Python notebook to web app and share with non-technical users

...Mercury is a perfect tool to convert Python notebook to interactive web application and share with non-programmers. You define interactive widgets for your notebook with the YAML header. Your users can change the widgets values, execute the notebook and save result (as PDF or html file). You can hide your code to not scare your (non-coding) collaborators. Easily deploy to any server. Mercury is dual-licensed. Looking for dedicated support, a commercial-friendly license, and more features? The Mercury Pro is for you.

Downloads: 0 This Week

Last Update: 2026-01-15
See Project
11

PaperQA2

High accuracy RAG for answering questions from scientific documents

PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
12

word_cloud

A little word cloud generator in Python

...Before installing a compiler, report an issue describing the version of python and operating system being used. The wordcloud_cli tool can be used to generate word clouds directly from the command-line. If you're dealing with PDF files, then pdftotext, included by default with many Linux distribution, comes in handy. Use wordcloud_cli --help so see all available options. The wordcloud library is MIT licenced, but contains DroidSansMono.ttf, a true type font by Google, that is apache licensed.

Downloads: 0 This Week

Last Update: 2026-01-22
See Project
13

DocsGPT

Private AI platform for agents, enterprise search and RAG pipelines

DocsGPT is an open-source AI platform for deploying private RAG pipelines, AI agents, and enterprise search on your own infrastructure. Connect any data source (PDFs, DOCX, CSV, Excel, HTML, audio, GitHub, databases, URLs) and get accurate, hallucination-free answers with source citations. Choose your LLM: OpenAI, Anthropic, Google Gemini, or local models. Works with Qdrant, MongoDB, and Elasticsearch and more. Deploy via Docker or Kubernetes with full data sovereignty. Build...

Downloads: 3 This Week

Last Update: 1 day ago
See Project
14

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API

text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models. Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
15

kb

A minimalist command line knowledge base manager

kb is a minimalist command-line knowledge base manager that gives users a fast, organized way to collect, store, search, and retrieve notes, documents, cheatsheets, procedures, and other artifacts directly from the terminal. It was created to solve the common problem of having scattered text files or reference materials on disk that are hard to search or categorize, and it surfaces a simple CLI interface with intuitive commands for adding, viewing, editing, and deleting knowledge items. Each...

Downloads: 0 This Week

Last Update: 2026-02-16
See Project
16

ArXiv MCP Server

A Model Context Protocol server for searching and analyzing arXiv

arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...

Downloads: 0 This Week

Last Update: 1 day ago
See Project
17

PdfBooklet

PdfBooklet is a Python Gtk application which allows to make books or booklets from existing pdf files. It can also adjust margins, rotate, scale, merge files or extract pages.

15 Reviews

Downloads: 203 This Week

Last Update: 2025-12-07
See Project
18

Controllable-RAG-Agent

This repository provides an advanced RAG

Controllable-RAG-Agent is an advanced Retrieval-Augmented Generation (RAG) system designed specifically for complex, multi-step question answering over your own documents. Instead of relying solely on simple semantic search, it builds a deterministic control graph that acts as the “brain” of the agent, orchestrating planning, retrieval, reasoning, and verification across many steps. The pipeline ingests PDFs, splits them into chapters, cleans and preprocesses text, then constructs vector...

Downloads: 0 This Week

Last Update: 2025-11-13
See Project
19

LLMStack

No-code multi-agent framework to build LLM Agents, workflows

LLMStack is a no-code platform for building generative AI agents, workflows and chatbots, connecting them to your data and business processes. Build tailor-made generative AI agents, applications and chatbots that cater to your unique needs by chaining multiple LLMs. Seamlessly integrate your own data, internal tools and GPT-powered models without any coding experience using LLMStack's no-code builder. Trigger your AI chains from Slack or Discord. Deploy to the cloud or on-premise.

Downloads: 0 This Week

Last Update: 2024-11-05
See Project
20

Jina

Build cross-modal and multimodal applications on the cloud

...Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP, GraphQL protocols with TLS. Intuitive design pattern for high-performance microservices. Seamless Docker container integration: sharing, exploring, sandboxing, versioning and dependency control via Jina Hub. Fast deployment to Kubernetes, Docker Compose and Jina Cloud. ...

Downloads: 0 This Week

Last Update: 2024-11-12
See Project
21

pdf2wordx

Convertir "pdf" a documentos ".docx"

`pip install pdf2wordx` Este proyecto usa "Tkinter" V8.6 y usa "pdf2docx" V0.5.8 para realizar las conversiones de PDF a DOCX. El programa es fácil de usar, solo se dene seleccionar el archivo PDF, Bucar la carpeta donde se guardará el documento DOCX, finalmente de click en el botón "Convertir", el documento se convertirá y guardará en la ruta especificada.

Downloads: 8 This Week

Last Update: 2025-04-28
See Project
22

LangChain Extract

Did you say you like data?

LangChain Extract is an open-source reference application designed to demonstrate how large language models can be used to extract structured data from unstructured text and document files. The project implements a lightweight web service that allows developers to define extraction schemas and apply them to various sources such as plain text, HTML, or PDF documents. Built using FastAPI and the LangChain framework, the application exposes a REST API that can process documents and return structured outputs that match user-defined JSON schemas. Developers can create reusable “extractors” that define what type of information should be pulled from a document, along with example prompts that improve extraction quality through in-context learning.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
23

pdf combiner merger converter splitter

PDF Combiner is a user-friendly, GUI-based tool built in

PDF Combiner is a user-friendly open source free to use, GUI-based tool for combining, pdf to excel, pdf to word, image to pdf, zip, unzip annotate and splitting PDF files. It is easy to use, supports multiple file insert and delete and process, and allows you to adjust the order of files before combining.

1 Review

Downloads: 3 This Week

Last Update: 2024-05-03
See Project
24

patchworkPDF

An Open Source PDF editor for Windows

An Open Source PDF editor for Windows. Written in Python. Includes source and can be built for MAC or Linux. Author: Gareth Finch

Downloads: 5 This Week

Last Update: 20 hours ago
See Project
25

Create Index from PDF

PDF Indexing Script: Searches PDF for words, records page numbers

This Python script helps automate the process of creating an index for a PDF document. It reads a list of words from a text file, searches through each page of the PDF, and records the page numbers where each word appears. The script accounts for the first 24 pages of the PDF that use Roman numerals (i-xxiv) and adjusts the page numbers accordingly. It is designed to be case-insensitive, ensuring that variations in capitalization do not affect the search results. ...

Downloads: 0 This Week

Last Update: 2025-03-03
See Project