Showing 14 open source projects for "pdf metadata"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • 1
    OpenDataLoader PDF

    OpenDataLoader PDF

    PDF Parser for AI-ready data. Automate PDF accessibility

    OpenDataLoader PDF is an open-source document processing system designed to convert complex PDF files into structured, AI-ready formats such as Markdown, JSON, and HTML while preserving layout, hierarchy, and semantic meaning. It focuses on enabling downstream use cases like retrieval-augmented generation (RAG), knowledge extraction, and document intelligence pipelines by maintaining accurate reading order and spatial metadata through bounding boxes.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    GROBID

    GROBID

    A machine learning software for extracting information

    ...The extraction here covers the usual bibliographical information (e.g. title, abstract, authors, affiliations, keywords, etc.). References extraction and parsing from articles in PDF format, around .87 F1-score against on an independent PubMed Central set of 1943 PDF containing 90,125 references, and around .89 on a similar bioRxiv set of 2000 PDF (using the Deep Learning citation model). All the usual publication metadata are covered (including DOI, PMID, etc.).
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    Grimmory

    Grimmory

    Grimmory is the successor of booklore

    Grimmory is a self-hosted digital library management platform designed to help users organize, read, and manage their entire book collection in a centralized and fully controlled environment. As the successor to Booklore, it expands on the idea of personal knowledge ownership by allowing users to store and interact with books without relying on third-party cloud services. The platform supports a wide range of formats, including eBooks, PDFs, comics, and audiobooks, making it versatile for...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    CERCA

    CERCA

    CERCA – Citation Extraction & Reference Checking Assistant

    CERCA is an open-source research tool that supports the verification of bibliographic references in scientific manuscripts. It extracts references from PDF files and checks their existence and consistency against authoritative metadata sources, producing explainable diagnostics, audit logs, and reproducible reports. It is intended for: - Researchers performing final manuscript checks - Reviewers assessing reference consistency - Editors supporting editorial quality control - Meta-research and reproducibility workflows CERCA is an experimental tool. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Xena - Digital Preservation Software

    Xena - Digital Preservation Software

    Xena transforms files into open data formats

    Xena transforms files into open data formats for long-term digital preservation, encodes content in Base64 and wraps in XML metadata. Formats supported include MBOX, PST, MSG, DOC, XLS, PPT, RTF, PNG, XML, PDF, JPG, TIFF, PCX, WAV, MP3 and more. NO LONGER MAINTAINED, NO LONGER SUPPORTED
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    CMIS Input plugin for Pentaho

    CMIS Input plugin for Pentaho

    Allows querying Content Management Systems that use the CMIS.

    ...With this goal (the extraction and analysis of data) has been designed and developed the CMIS Input plugin for Pentaho Data Integration (Kettle) that allows querying Content Management Systems that use the CMIS interoperability standard. The data, once extracted, can be stored and analyzed and perhaps presented in customized reports be published in various formats for the end user (PDF, Excel, etc..).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Topiary Explorer
    TopiaryExplorer has moved to GitHub. Find the new project info page here: https://github.com/qiime/Topiary-Explorer. If you need help or would like to add a bug/feature request, please do so there.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Atarrabi

    Atarrabi

    A web-based workflow application for publishing environmental data

    Atarrabi is a web-based workflow application used for preparing meteorological research data for persistent identifier registration. This software will not run out-of-the-box. Please visit our web site and contact us to learn more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    Armchair File Manager

    Armchair File Manager

    Remote control your home theater PC from across the room

    Use the Armchair File Manager to control your Windows home theater PC using its remote control. Perform light-duty computing tasks from across the room without a keyboard or mouse. Armchair works best with a PC connected to a widescreen television.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    DocInfoRetriever is a Web_based document full-text search engine based on lucene. It allows you to search the contents and metadata of documents . Supported document formats, likes doc, xls, pdf, odt, jpg...etc.,and torrent files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Adds XMP metadata to PDFs representing scholarly articles. The metadata is sourced from an OpenURL query to CrossRef.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    A Semantic Web Personal Digital Library Shared, Classification and storage of digital literature and Social network system. The software allows to classify digital literature (PDF, MS Office) in a library managed by Semantic Web
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Yadoda is a personal digital library: user can create his own ontology and a db of digital documents (pdf,ps,mp3,images) that can be enriched with metadata (author,date,title). User can create semantic relations between documents and navigate them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo