pdf to free download - SourceForge

Showing 129 open source projects for "pdf to"

View related business solutions

Artificial Intelligence Linux Clear Filters & Widen Search

Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files

OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.

Downloads: 109 This Week

Last Update: 2 days ago
See Project
2

MinerU

A high-quality tool for convert PDF to Markdown and JSON

MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.

Downloads: 28 This Week

Last Update: 6 hours ago
See Project
3

AI PDF Chatbot LangChain

AI PDF chatbot agent built with LangChain & LangGraph

AI PDF Chatbot LangChain is a full-stack template for building conversational agents that can ingest and answer questions about PDF documents. The project demonstrates how to combine LangChain and LangGraph with a vector database to enable retrieval-augmented question answering over user-provided files. It includes both frontend and backend components, making it suitable as a production starting point rather than just a minimal demo.

Downloads: 0 This Week

Last Update: 2026-03-27
See Project
4

GROBID

A machine learning software for extracting information

...The extraction here covers the usual bibliographical information (e.g. title, abstract, authors, affiliations, keywords, etc.). References extraction and parsing from articles in PDF format, around .87 F1-score against on an independent PubMed Central set of 1943 PDF containing 90,125 references, and around .89 on a similar bioRxiv set of 2000 PDF (using the Deep Learning citation model). All the usual publication metadata are covered (including DOI, PMID, etc.).

Downloads: 6 This Week

Last Update: 2026-04-07
See Project
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
5

Tesseract OCR

Open Source OCR Engine

...Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. It also has unicode (UTF-8) support.

5 Reviews

Downloads: 10,111 This Week

Last Update: 2025-12-26
See Project
6

Umi-OCR

OCR software, free and offline

...It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. Users can interact with Umi-OCR through a graphical interface, command-line options, or HTTP interfaces, making it adaptable to both casual desktop usage and programmatic automation. Because the project is open source, developers can inspect, modify, and extend its capabilities, and plugins allow for different recognition engines or enhanced features.

Downloads: 74 This Week

Last Update: 2026-01-15
See Project
7

Local-NotebookLM

Googles NotebookLM but local

Local-NotebookLM is a local AI tool for turning PDF documents into generated audio content. It works like a self-hosted alternative to NotebookLM-style document-to-audio workflows. The system extracts and processes PDF text, sends the content through an LLM, and converts the result into speech with configurable voices. Users can generate podcasts, summaries, interviews, lectures, debates, tutorials, news reports, executive briefs, and other formats.

Downloads: 7 This Week

Last Update: 4 days ago
See Project
8

MarkPDFDown

A high-quality PDF to Markdown tool based on large language model

MarkPDFdown is an open-source document processing tool designed to convert PDF files into structured Markdown output that can be easily used for documentation, content pipelines, and AI processing workflows. The project focuses on extracting text, formatting, and structural information from complex PDF documents and transforming that information into clean Markdown that preserves the original hierarchy of headings, paragraphs, tables, and lists.

Downloads: 2 This Week

Last Update: 2026-03-06
See Project
9

Unlimited OCR Works

Welcome the Era of One-shot Long-horizon Parsing

...It is designed to push OCR beyond short, isolated image recognition and into longer document understanding workflows. The project supports single-image parsing as well as multi-page and PDF-style parsing by converting pages into images. It provides inference paths for Hugging Face Transformers, vLLM, and SGLang, which gives users several deployment options. The repository also includes example code for batch inference over image folders or PDF inputs. Overall, it is useful for researchers and developers who need advanced OCR, long-document parsing, and model-based extraction from complex visual documents.

Downloads: 4 This Week

Last Update: 4 days ago
See Project
$300 Free Credits to Build on Google Cloud
New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.

Claim $300 Free
10

Ollama RAG Chatbot

Chat with multiple PDFs locally

...The main value of the project is its ability to process multiple PDF inputs and turn them into a question-answering workflow centered on document retrieval. With Docker support, script-based setup, optional ngrok exposure, and a clear local run path, it serves as a compact starter project for people who want a hands-on, self-hosted PDF chat system.

Downloads: 0 This Week

Last Update: 2026-04-20
See Project
11

Scribe.js

JavaScript OCR and text extraction for images and PDFs

Scribe.js is a JavaScript library that provides Optical Character Recognition (OCR) and text extraction capabilities for both images and PDF documents, aimed at developers who want to build OCR features directly into their applications. The library can take image files (such as PNG or JPEG) and recognize the text they contain, and it can also extract text from PDF files that either already contain text or are image-based scans, using modern web standards and WebAssembly under the hood. ...

Downloads: 0 This Week

Last Update: 2026-05-27
See Project
12

Paperless-ngx

A community-supported supercharged version of paperless

Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.

Downloads: 14 This Week

Last Update: 2026-04-27
See Project
13

Docling

Get your documents ready for gen AI

...The project focuses on converting and parsing many document formats into a unified structured representation that downstream systems can easily consume. It supports advanced PDF understanding, including layout detection, table extraction, and reading order analysis, enabling high-fidelity document intelligence pipelines. Docling is designed to run efficiently on commodity hardware and can be used both as a Python API and a command-line tool. Its modular architecture allows developers to extend functionality and integrate specialized models for tasks such as OCR and audio transcription. ...

Downloads: 4 This Week

Last Update: 2 days ago
See Project
14

canvas-editor

Canvas-based WYSIWYG rich text editor with advanced layout tools

...Its architecture is modular, allowing developers to extend functionality through plugins, custom commands, and event hooks. It includes support for page-based layouts with headers, footers, pagination, and print-ready output, including PDF generation. It also provides interactive components such as form controls and context menus, making it suitable for building complex document editing systems.

Downloads: 8 This Week

Last Update: 10 hours ago
See Project
15

MetaScreener

AI-powered tool for efficient abstract and PDF screening

...Instead of manually reviewing hundreds or thousands of documents, researchers can use MetaScreener to apply machine learning techniques that assist with classification and prioritization of candidate papers. The platform can analyze both abstracts and full PDF documents, enabling automated filtering based on research criteria defined by the user. By incorporating natural language processing techniques, the system can identify potentially relevant studies and reduce the workload associated with manual screening.

Downloads: 0 This Week

Last Update: 2026-05-08
See Project
16

Open CoDesign

Open-source Claude Design alternative

Open CoDesign is an open-source, desktop AI design tool that transforms natural language prompts into fully structured design artifacts such as prototypes, slide decks, and marketing assets. It is designed as a local-first alternative to cloud-based design tools, allowing users to run everything on their own machine while bringing their own AI model and API keys. The system supports multiple model providers and integrates directly with existing developer tools, enabling seamless workflows...

Downloads: 127 This Week

Last Update: 2026-05-23
See Project
17

Desktop Commander MCP

AI-powered MCP server for desktop file and terminal automation

...It allows users to run terminal commands with streaming output, manage long-running processes, and even execute code in memory without saving files. It also supports working with structured and document formats such as Excel, PDF, and DOCX, enabling AI to read, modify, and generate these files directly.

Downloads: 6 This Week

Last Update: 7 days ago
See Project
18

Semantra

Multi-tool for semantic search

Semantra is an open-source semantic search tool designed to help users explore large collections of documents by meaning rather than simple keyword matching. The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. The system runs from the command line and automatically launches a local web interface where users can perform interactive searches and examine document passages related to a query. ...

Downloads: 3 This Week

Last Update: 2026-06-21
See Project
19

Easy DataSet

A powerful tool for creating datasets for LLM fine-tuning

Easy DataSet is a comprehensive open-source tool designed to make creating high-quality datasets for large language model fine-tuning, retrieval-augmented generation (RAG), and evaluation as easy and automated as possible by providing intuitive interfaces and powerful parsing, segmentation, and labeling tools. It supports ingesting domain-specific documents in a wide range of formats — including PDF, Markdown, DOCX, EPUB, and plain text — and can intelligently segment, clean, and structure content into rich datasets tailored for downstream LLM training needs. The system includes automated question-generation capabilities, hierarchical label trees, and answer generation pipelines that use LLM APIs to produce coherent paired data with customizable templates. ...

Downloads: 3 This Week

Last Update: 2026-04-10
See Project
20

ChatGPT Academic

ChatGPT extension for scientific research work

ChatGPT extension for scientific research work, specially optimized academic paper polishing experience, supports custom shortcut buttons, supports custom function plug-ins, supports markdown table display, double display of Tex formulas, complete code display function, new local Python/C++/Go project tree Analysis function/Project source code self-translation ability, newly added PDF and Word document batch summary function/PDF paper full-text translation function. All buttons are dynamically generated by reading functional.py, you can add custom functions at will, and liberate the pasteboard. Support for markdown tables output by GPT. If the output contains a formula, it will be displayed in tex form and rendered form at the same time, which is convenient for copying and reading.

Downloads: 4 This Week

Last Update: 2024-12-19
See Project
21

Hiring Agent

AI agent to evaluate and score resumes

Hiring Agent is an AI-powered resume evaluation pipeline for screening technical candidates. It reads a resume PDF and converts the content into Markdown-like text. It then uses a local or hosted language model to extract structured candidate information into sectioned JSON. The system can enrich that resume data with GitHub profile and repository signals when a profile is available. After the data is collected, it produces an explainable evaluation with category scores, supporting evidence, bonus points, and deductions. ...

Downloads: 0 This Week

Last Update: 2026-06-24
See Project
22

Magic Resume

free online AI resume editor

...It supports customizable themes and layouts, enabling users to tailor the design to different industries or personal branding preferences. Magic Resume also includes export functionality for generating polished PDF documents directly in the browser, making it practical for job applications.

Downloads: 0 This Week

Last Update: 2026-06-01
See Project
23

Papermerge

Open Source Document Management System for Digital Archives

...Each user can be assigned different permissions to perform only a specific kind of action e.g. view only documents from a specific folder. OCR technology is vital part of Papermerge. It extracts text information from scanned documents, PDF, JPEG, TIFF files.

Downloads: 5 This Week

Last Update: 2025-07-24
See Project
24

AnythingLLM

The all-in-one Desktop & Docker AI application with full RAG and AI

A full-stack application that enables you to turn any document, resource, or piece of content into a context that any LLM can use as references during chatting. This application allows you to pick and choose which LLM or Vector Database you want to use as well as supporting multi-user management and permissions. AnythingLLM is a full-stack application where you can use commercial off-the-shelf LLMs or popular open-source LLMs and vectorDB solutions to build a private ChatGPT with no...

Downloads: 68 This Week

Last Update: 2026-06-25
See Project
25

NAPS2 - Not Another PDF Scanner

Scan documents to PDF and other file types, as simply as possible.

...NAPS2 is a document scanning application with a focus on simplicity and ease of use. Scan your documents from WIA- and TWAIN-compatible scanners, organize the pages as you like, and save them as PDF, TIFF, JPEG, PNG, and other file formats. Available on Windows, Mac, and Linux. NAPS2 is currently available in over 40 different languages. Want to see NAPS2 in your preferred language? Help translate! See the wiki for more details.

149 Reviews

Downloads: 712 This Week

Last Update: 2026-01-10
See Project