Showing 59 open source projects for "pdf to text"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    PDF Editor

    PDF Editor

    Offline PDF editor. Add images, signatures, text to PDF in the browser

    Offline PDF editor. Add images, signatures, text to PDF in your browser.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 2
    Scribe.js

    Scribe.js

    JavaScript OCR and text extraction for images and PDFs

    Scribe.js is a JavaScript library that provides Optical Character Recognition (OCR) and text extraction capabilities for both images and PDF documents, aimed at developers who want to build OCR features directly into their applications. The library can take image files (such as PNG or JPEG) and recognize the text they contain, and it can also extract text from PDF files that either already contain text or are image-based scans, using modern web standards and WebAssembly under the hood. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    PDFCraft

    PDFCraft

    PDFCraft is a free, privacy-focused PDF toolkit

    PDFCraft is an extensible toolkit for creating, editing, and transforming PDF documents with both a graphical interface and a scripting API, making it useful for users ranging from casual editors to automated document processors. At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    npm-pdfreader

    npm-pdfreader

    Parse text and tables from PDF files.

    npm-pdfreader is a Node.js library for reading text and parsing tables from PDF files. It supports tabular data with automatic column detection and rule-based parsing, making it useful for extracting structured data from PDFs. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    pdfmake

    pdfmake

    Client/server side PDF printing in pure JavaScript

    Print PDFs directly in the browser or delegate it to your NodeJS backend. Use the same document definition in both cases. Forget about manual x, y calculations. Declare document structure and let pdfmake do the rest. Use paragraphs, columns, lists, tables, canvas, etc. Declare your own styles, use custom fonts, build a DSL and extend the framework. Provides a set of options to disable font layout cache and to control when pages are flushed to the output file. Pdfmake is runnable in browser...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    canvas-editor

    canvas-editor

    Canvas-based WYSIWYG rich text editor with advanced layout tools

    canvas-editor is a browser-based rich text editor that renders content using HTML5 Canvas and SVG instead of traditional DOM-based approaches. It is designed to provide a WYSIWYG editing experience similar to word processors, enabling precise control over layout, rendering, and document structure. canvas-editor supports a wide range of formatting and document features, including text styling, tables, images, and embedded elements, all managed through a structured data model. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    Marpit

    Marpit

    The skinny framework for creating slide deck from Markdown

    Marpit /mɑːrpɪt/ is the skinny framework for creating slide deck from Markdown. It can transform Markdown and CSS theme(s) to slide deck composed of static HTML and CSS and create a web page convertible into slide PDF by printing. Marpit is designed to output minimum assets for the slide deck. You can use the bare assets as a logicless slide deck, but mainly we expect to integrate output with other tools and applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    JupyterLab

    JupyterLab

    JupyterLab computational environment

    ...JupyterLab understands many file formats (images, CSV, JSON, Markdown, PDF, Vega, Vega-Lite, etc.) and can also display rich kernel output in these formats. See File and Output Formats for more information. To navigate the user interface, JupyterLab offers customizable keyboard shortcuts and the ability to use key maps from vim, emacs, and Sublime Text in the text editor.
    Downloads: 29 This Week
    Last Update:
    See Project
  • 9
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    Open Semantic Search is an open source research and analytics platform designed for searching, analyzing, and exploring large collections of documents using semantic search technologies. It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources. Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Application Monitoring That Won't Slow Your App Down Icon
    Application Monitoring That Won't Slow Your App Down

    AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

    Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.
    Start Free
  • 10
    Collabora Online

    Collabora Online

    Collabora Online is a collaborative online office suite

    Collabora Online is a powerful online office suite that you can integrate into your own infrastructure or access via one of our trusted hosting Partners. Your digital sovereignty is our priority. We provide you with all the tools to keep your data secure, without compromising on features. Collabora Online’s text document editor provides a true WYSIWYG editing experience, making visualizing your document layout incredibly easy. Open any document, add comments and track changes from anywhere,...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 11
    bookdown

    bookdown

    Authoring Books and Technical Documents with R Markdown

    A open-source (GPL-3) R package to facilitate writing books and long-form articles/reports with R Markdown. Generate printer-ready books and ebooks from R Markdown documents. A markup language easier to learn than LaTeX, and to write elements such as section headers, lists, quotes, figures, tables, and citations. Multiple choices of output formats: PDF, LaTeX, HTML, EPUB, and Word. Possibility of including dynamic graphics and interactive applications (HTML widgets and Shiny apps) Support...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Easy DataSet

    Easy DataSet

    A powerful tool for creating datasets for LLM fine-tuning

    Easy DataSet is a comprehensive open-source tool designed to make creating high-quality datasets for large language model fine-tuning, retrieval-augmented generation (RAG), and evaluation as easy and automated as possible by providing intuitive interfaces and powerful parsing, segmentation, and labeling tools. It supports ingesting domain-specific documents in a wide range of formats — including PDF, Markdown, DOCX, EPUB, and plain text — and can intelligently segment, clean, and structure content into rich datasets tailored for downstream LLM training needs. The system includes automated question-generation capabilities, hierarchical label trees, and answer generation pipelines that use LLM APIs to produce coherent paired data with customizable templates. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Browserless

    Browserless

    The headless Chrome/Chromium driver on top of Puppeteer

    Browserless is an open-source headless browser automation library and service built on top of Puppeteer that simplifies the process of running and scaling Chromium-based browser tasks in production environments. It provides a high-level API for interacting with headless Chrome, allowing developers to perform operations such as generating PDFs, capturing screenshots, extracting text or HTML, and automating web navigation. The project is designed to act as a production-ready abstraction layer...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    LandPPT

    LandPPT

    An LLM-based presentation generation platform

    LandPPT is an open-source AI platform that automatically generates professional presentation slides using large language models. The system allows users to create complete PowerPoint presentations simply by entering a topic or uploading source documents such as PDFs, Word files, or Markdown notes. Using natural language processing and structured content generation, the platform produces presentation outlines and converts them into fully formatted slide decks. The application integrates...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 15
    Fidus Writer

    Fidus Writer

    Fidus Writer is an online collaborative editor for academics

    Fidus Writer is an online collaborative editor especially made for academics who need to use citations and/or formulas. The editor focuses on the content rather than the layout, so that with the same text, you can later on publish it in multiple ways: On a website, as a printed book, or as an ebook. In each case, you can choose from a number of layouts that are adequate for the medium of choice.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    PandaWiki

    PandaWiki

    AI-powered open source platform for building intelligent wiki bases

    PandaWiki is an open source knowledge base system designed to help users build intelligent documentation platforms powered by large language models. It combines traditional wiki functionality with modern AI capabilities, allowing teams and individuals to create and manage product documentation, technical manuals, FAQs, and blog-style knowledge resources. PandaWiki provides tools for managing knowledge bases through an administrative interface while also generating public-facing wiki sites...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    myGPTReader

    myGPTReader

    AI Slack bot for reading, summarizing, and chatting with content

    myGPTReader is an AI-powered Slack bot designed to help users read, summarize, and interact with various types of digital content through conversational interfaces. It enables users to quickly understand web pages, documents, and even video content by transforming them into interactive discussions rather than static reading experiences. myGPTReader supports a wide range of file formats, including eBooks, PDFs, and text-based documents, making it flexible for both casual and professional use...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    DOCX Document Converter

    DOCX Document Converter

    Convert .docx to .md/.txt and .html. Free, unlimited, fast.

    A simple, free, unlimited, secure web-based tool that converts Microsoft Word documents (.docx) into Markdown (.md/.txt) and HTML files. Perfect for developers, writers, and anyone who needs to transform .docx MS Office Word documents into web-friendly or AI context friendly formats. Unlike those other jerks on the web that charge many dollars per month for this, I made it free, unlimited and open source. This is a better version of 'convert docx to txt' since .md files can be opened...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 19
    OmegaT - multiplatform CAT tool

    OmegaT - multiplatform CAT tool

    The free computer aided translation (CAT) tool for professionals

    OmegaT is a free and open source multiplatform Computer Assisted Translation tool with fuzzy matching, translation memory, keyword search, glossaries, and translation leveraging into updated projects.
    Leader badge
    Downloads: 1,557 This Week
    Last Update:
    See Project
  • 20
    ONLYOFFICE Desktop Editors

    ONLYOFFICE Desktop Editors

    Free office suite for working with text, spreadsheets and presentation

    ONLYOFFICE Desktop Editors is an open source and 100% free office suite, combining text, spreadsheet and presentation editors for working on documents offline. The application features all types of formatting options and allows users to edit complex documents. Collaboration features such as reviewing and real-time co-editing are available as well. The editors offer 100% compatibility with MS Office and support other popular document formats including OpenDocument. The application also...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 21
    MyBox

    MyBox

    Easy Tools of PDF, Image, File, Network, Data, and Medias

    javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Kiwix

    Kiwix

    Wikipedia offline & more

    Kiwix is an offline reader for Web content. It's especially intended to make Wikipedia available offline. With Kiwix, you can enjoy Wikipedia on a boat, in the middle of nowhere... or in Jail. Kiwix manages to do that by reading ZIM files, a highly compressed open format with additional meta-data.
    Leader badge
    Downloads: 371 This Week
    Last Update:
    See Project
  • 23
    Markdown PDF

    Markdown PDF

    Markdown converter for Visual Studio Code

    This extension converts Markdown files to PDF, HTML, PNG or JPEG files.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    QwikTape

    QwikTape

    Do calculations, annotate it like you would on a paper "qwikly".

    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Quiz/Survey/Test - QST

    Quiz/Survey/Test - QST

    A Free, complete, enterprise grade, open source exam management system

    QST, the worlds unparalleled open source online/lan assessment software. From a quick quiz on your phone to very large scale, high stakes, proctored desktop testing, we make it easy/secure/economical. Our intuitive design contains features (Immediate detailed results, Create/Export/Import/Convert Questions, WYSIWYG/Math-Chemistry/Basic Editors, Question/Item Bank, Multiple Question Types, Multiple Delivery Styles, Multiple Delivery/Results Options, Adaptive/Branching Questions, Randomly...
    Leader badge
    Downloads: 41 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB