Showing 20 open source projects for "pdf metadata"

View related business solutions
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 1
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 2
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 3
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    ...Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and open-source software which means that transparency is the core value of our software development. Source code can be reviewed and improved by anyone from anywhere. Papermerge supports multiple users. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    shuyuan

    shuyuan

    Reading book source

    ...It likely supports different input formats (text, HTML, PDF), and may integrate optional translation or text normalization tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    kb

    kb

    A minimalist command line knowledge base manager

    kb is a minimalist command-line knowledge base manager that gives users a fast, organized way to collect, store, search, and retrieve notes, documents, cheatsheets, procedures, and other artifacts directly from the terminal. It was created to solve the common problem of having scattered text files or reference materials on disk that are hard to search or categorize, and it surfaces a simple CLI interface with intuitive commands for adding, viewing, editing, and deleting knowledge items. Each...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    deepdoctection

    deepdoctection

    A Repo For Document AI

    DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    NeMo Retriever Library

    NeMo Retriever Library

    Document content and metadata extraction microservice

    NeMo Retriever Library is a scalable microservice framework designed for extracting, structuring, and enriching content from documents to support downstream generative AI applications. It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Perf Book

    Perf Book

    The book "Performance Analysis and Tuning on Modern CPU"

    This project is a practical guide to performance analysis and tuning on modern CPUs, bridging microarchitecture details with hands-on profiling. It explains how caches, TLBs, prefetchers, branch predictors, and out-of-order execution influence real program speed, then connects those concepts to concrete optimization strategies. Readers learn how to design trustworthy benchmarks, avoid measurement traps (warmup, turbo, frequency scaling), and interpret hardware performance counters. The book...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    CiteFlow

    CiteFlow

    Desktop research workspace for PDFs, notes, citations, bibliographies.

    CiteFlow is a focused desktop research workspace for students, researchers, and academic writers who want to manage PDFs, notes, citations, and bibliographies in one place. Create project-based workspaces for essays, articles, reports, literature reviews, and long-form research. Import PDFs, read them inside the app, search within documents, compare files side by side, highlight key passages, and add page-based notes. CiteFlow can assist with DOI metadata detection, keeps citation history...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Nostalgic Photo DataBase (platform)

    Nostalgic Photo DataBase (platform)

    Active repository of jpeg & pdf files with customizable tags.

    NPDB offers a comprehensive platform for creating and maintaining a database of both old, digitized photos and new snapshots captured by smartphones. This versatile system allows users to organize and search through their collection using customizable tags, catering to images of any vintage. Additionally exists PDF files support. NPDB's flexible tagging system allows users to categorize their files using an arbitrary set of tags tailored to their preferences. This intuitive approach...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    QuickPlot

    QuickPlot

    Simple user interface for gnuplot aimed for reflectometry data

    Graphical user interface for gnuplot to create publication quality figure very quickly. It supports templates for fast formatting of graphics, different plot styles, insets, axis and label options. One important feature is storing metadata in png and pdf files that can be used to reload any graph saved with QuickPlot.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Reminiscence

    Reminiscence

    Self-Hosted Bookmark And Archive Manager

    Bookmark links and edit its metadata (like title, tags, summary) via web interface. Archive links to content in HTML, PDF or full-page PNG format. Automatic archival of links to non-html content like pdf, jpg, txt etc. i.e. Bookmarking links to pdf, jpg etc.. via the web interface will automatically save those files on the server. Supports archival of media elements of a web page using third-party download managers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    TensorFlow-ZH

    TensorFlow-ZH

    Chinese version of the official document of TensorFlow

    ...The repo mirrors the structure of the original English docs: chapters, sections, code examples, API references, and supplementary content like configuration and build guides. It includes additional files like a PDF version (compiled LaTeX/TeX sources), table of contents mappings, and translation metadata to track contributions. Over time, the repo has evolved to stay in sync with upstream changes, providing versioned snapshots of the translated content.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    i-Map - Plot Geolocation from Images

    i-Map - Plot Geolocation from Images

    Automatically plots latitude, longitude from images on Google maps.

    ...To generate a report, you can export this data into PDF or Excel file according to your requirements.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    openPLM - open source PLM
    open source PLM system - Product Structure management (BOM management) system and Electronic documents management or Entreprise Content Management (ECM) system
    Downloads: 8 This Week
    Last Update:
    See Project
  • 19

    sort-photorec-datarecovery

    Sort PhotoRec files and pictures from a data recovery by date

    Phython script that sorts pictures and files from a data recovery made with PhotoRec. Recovered files are moved according to date create / date taken and date last modified into a folder structure extension/year/month. Useful for data recovery from hdd, RAID or memory cards where you get folders with mixed filetypes like from PhotoRec. Supports pictures (JPG, RAW formats) and office-documents (DOCX, DOC, XSLX, PDF, PPTX and more).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    PyShelf

    PyShelf

    FOSS Ebook Server, With no windowing requirements

    PyShelf is an Open Source python based, ebook server, that does not and never will require a windowing system.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo