pdf metadata free download

Showing 20 open source projects for "pdf metadata"

View related business solutions

Python Clear Filters & Widen Search

AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
1

MinerU

A high-quality tool for convert PDF to Markdown and JSON

MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.

Downloads: 12 This Week

Last Update: 2026-06-18
See Project
2

PyPDF

A pure-python PDF library capable of splitting, merging, cropping

pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.

Downloads: 15 This Week

Last Update: 2 days ago
See Project
3

PaperQA2

High accuracy RAG for answering questions from scientific documents

PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
4

Papermerge

Open Source Document Management System for Digital Archives

...Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and open-source software which means that transparency is the core value of our software development. Source code can be reviewed and improved by anyone from anywhere. Papermerge supports multiple users. ...

Downloads: 10 This Week

Last Update: 2025-07-24
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
5

shuyuan

Reading book source

...It likely supports different input formats (text, HTML, PDF), and may integrate optional translation or text normalization tools.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
6

kb

A minimalist command line knowledge base manager

kb is a minimalist command-line knowledge base manager that gives users a fast, organized way to collect, store, search, and retrieve notes, documents, cheatsheets, procedures, and other artifacts directly from the terminal. It was created to solve the common problem of having scattered text files or reference materials on disk that are hard to search or categorize, and it surfaces a simple CLI interface with intuitive commands for adding, viewing, editing, and deleting knowledge items. Each...

Downloads: 0 This Week

Last Update: 2026-02-16
See Project
7

ArXiv MCP Server

A Model Context Protocol server for searching and analyzing arXiv

arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...

Downloads: 0 This Week

Last Update: 2026-04-26
See Project
8

deepdoctection

A Repo For Document AI

DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...

Downloads: 1 This Week

Last Update: 2026-06-12
See Project
9

abogen

Generate audiobooks from EPUBs, PDFs and text with captions

abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on...

Downloads: 4 This Week

Last Update: 2026-02-06
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
10

NeMo Retriever Library

Document content and metadata extraction microservice

NeMo Retriever Library is a scalable microservice framework designed for extracting, structuring, and enriching content from documents to support downstream generative AI applications. It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient...

Downloads: 1 This Week

Last Update: 2026-05-29
See Project
11

Perf Book

The book "Performance Analysis and Tuning on Modern CPU"

This project is a practical guide to performance analysis and tuning on modern CPUs, bridging microarchitecture details with hands-on profiling. It explains how caches, TLBs, prefetchers, branch predictors, and out-of-order execution influence real program speed, then connects those concepts to concrete optimization strategies. Readers learn how to design trustworthy benchmarks, avoid measurement traps (warmup, turbo, frequency scaling), and interpret hardware performance counters. The book...

Downloads: 0 This Week

Last Update: 2025-09-23
See Project
12

CiteFlow

Desktop research workspace for PDFs, notes, citations, bibliographies.

CiteFlow is a focused desktop research workspace for students, researchers, and academic writers who want to manage PDFs, notes, citations, and bibliographies in one place. Create project-based workspaces for essays, articles, reports, literature reviews, and long-form research. Import PDFs, read them inside the app, search within documents, compare files side by side, highlight key passages, and add page-based notes. CiteFlow can assist with DOI metadata detection, keeps citation history...

Downloads: 0 This Week

Last Update: 2026-05-28
See Project
13

Nostalgic Photo DataBase (platform)

Active repository of jpeg & pdf files with customizable tags.

NPDB offers a comprehensive platform for creating and maintaining a database of both old, digitized photos and new snapshots captured by smartphones. This versatile system allows users to organize and search through their collection using customizable tags, catering to images of any vintage. Additionally exists PDF files support. NPDB's flexible tagging system allows users to categorize their files using an arbitrary set of tags tailored to their preferences. This intuitive approach...

Downloads: 1 This Week

Last Update: 6 hours ago
See Project
14

QuickPlot

Simple user interface for gnuplot aimed for reflectometry data

Graphical user interface for gnuplot to create publication quality figure very quickly. It supports templates for fast formatting of graphics, different plot styles, insets, axis and label options. One important feature is storing metadata in png and pdf files that can be used to reload any graph saved with QuickPlot.

Downloads: 1 This Week

Last Update: 2022-07-26
See Project
15

Reminiscence

Self-Hosted Bookmark And Archive Manager

Bookmark links and edit its metadata (like title, tags, summary) via web interface. Archive links to content in HTML, PDF or full-page PNG format. Automatic archival of links to non-html content like pdf, jpg, txt etc. i.e. Bookmarking links to pdf, jpg etc.. via the web interface will automatically save those files on the server. Supports archival of media elements of a web page using third-party download managers.

Downloads: 0 This Week

Last Update: 2022-08-31
See Project
16

TensorFlow-ZH

Chinese version of the official document of TensorFlow

...The repo mirrors the structure of the original English docs: chapters, sections, code examples, API references, and supplementary content like configuration and build guides. It includes additional files like a PDF version (compiled LaTeX/TeX sources), table of contents mappings, and translation metadata to track contributions. Over time, the repo has evolved to stay in sync with upstream changes, providing versioned snapshots of the translated content.

Downloads: 0 This Week

Last Update: 2025-10-02
See Project
17

i-Map - Plot Geolocation from Images

Automatically plots latitude, longitude from images on Google maps.

...To generate a report, you can export this data into PDF or Excel file according to your requirements.

Downloads: 2 This Week

Last Update: 2017-11-26
See Project
18

openPLM - open source PLM

open source PLM system - Product Structure management (BOM management) system and Electronic documents management or Entreprise Content Management (ECM) system

1 Review

Downloads: 8 This Week

Last Update: 2015-05-01
See Project
19

sort-photorec-datarecovery

Sort PhotoRec files and pictures from a data recovery by date

Phython script that sorts pictures and files from a data recovery made with PhotoRec. Recovered files are moved according to date create / date taken and date last modified into a folder structure extension/year/month. Useful for data recovery from hdd, RAID or memory cards where you get folders with mixed filetypes like from PhotoRec. Supports pictures (JPG, RAW formats) and office-documents (DOCX, DOC, XSLX, PDF, PPTX and more).

Downloads: 0 This Week

Last Update: 2022-03-18
See Project
20

PyShelf

FOSS Ebook Server, With no windowing requirements

PyShelf is an Open Source python based, ebook server, that does not and never will require a windowing system.

Downloads: 0 This Week

Last Update: 2020-07-29
See Project