Showing 58 open source projects for "text pdf extrator"

View related business solutions
  • Our Free Plans just got better! | Auth0 by Okta Icon
    Our Free Plans just got better! | Auth0 by Okta

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.
    Try free now
  • Bright Data - All in One Platform for Proxies and Web Scraping Icon
    Bright Data - All in One Platform for Proxies and Web Scraping

    Say goodbye to blocks, restrictions, and CAPTCHAs

    Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.
    Get Started
  • 1
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 34 This Week
    Last Update:
    See Project
  • 2
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and open...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 3
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a full-text...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 4
    Pix2Text

    Pix2Text

    Open-Source Python3 tool for recognizing layouts, tables, and math

    An Open-Source Python3 tool for recognizing layouts, tables, math formulas, and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality. Pix2Text (P2T) can recognize layouts, tables, images, text, and mathematical...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Enterprise and Small Business CRM Solution | Clear C2 C2CRM Icon
    Enterprise and Small Business CRM Solution | Clear C2 C2CRM

    Voted Best CRM System with Top Ranked Customer Support. CRM Management includes Sales, Marketing, Relationship Management, and Help Desk.

    C2CRM consists of four modules that integrate to provide a comprehensive CRM solution: Relationship Management, Sales Automation, Marketing Automation, and Customer Service. Only buy what each user needs.
    Learn More
  • 5
    PyMuPDF

    PyMuPDF

    Python bindings for MuPDF's rendering library.

    MuPDF is a lightweight PDF, XPS, and E-book viewer. MuPDF consists of a software library, command line tools, and viewers for various platforms. The renderer in MuPDF is tailored for high-quality anti-aliased graphics. It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on the screen. The viewer is small, fast, yet complete. It supports many document formats, such as PDF, XPS, OpenXPS, CBZ, EPUB...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. Using the browser...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    Sphinx

    Sphinx

    Main repository for the Sphinx documentation builder

    Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, written by Georg Brandl and licensed under the BSD license. It was originally created for the Python documentation, and it has excellent facilities for the documentation of software projects in a range of languages. Of course, this site is also created from reStructuredText sources using Sphinx! HTML (including Windows HTML Help), LaTeX (for printable PDF versions), ePub, Texinfo, manual pages, plain text...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    borb

    borb

    borb is a library for reading, creating and manipulating PDF files

    borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    ChatGPT Academic

    ChatGPT Academic

    ChatGPT extension for scientific research work

    ChatGPT extension for scientific research work, specially optimized academic paper polishing experience, supports custom shortcut buttons, supports custom function plug-ins, supports markdown table display, double display of Tex formulas, complete code display function, new local Python/C++/Go project tree Analysis function/Project source code self-translation ability, newly added PDF and Word document batch summary function/PDF paper full-text translation function. All buttons are dynamically...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Automated quote and proposal software for IT solution providers. | ConnectWise CPQ Icon
    Automated quote and proposal software for IT solution providers. | ConnectWise CPQ

    Create IT quote templates, automate workflows, add integrations & price catalogs to save time & reduce errors on manual data entry & updates.

    ConnectWise CPQ, formerly ConnectWise Sell, is a professional quote and proposal automation software for IT solution providers. ConnectWise CPQ offers a wide range of tools that enables IT solution providers to save time, quote more, and win big. Top features include professional quote or proposal templates, product catalog and sourcing, workflow automation, sales reporting, and integrations with best-in-breed solutions like Cisco, Dell, HP, and Salesforce.
    Learn More
  • 10
    xhtml2pdf

    xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Remarkable for Linux

    Remarkable for Linux

    The Markdown Editor for Linux

    With Live Preview you can see your changes as you make them. There is no need to export first to check your syntax. This is accompanied by synchronized scrolling. Remarkable has Github Flavoured Markdown. This has a simple, easy-to-learn syntax with features like checklists, highlighting, links, images and more. Remarkable allows you to export your files to PDF and HTML from within the app. The HTML code is even prettified and PDFs have a TOC. You can style your markdown documents however you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Google Open Source Project Style Guide

    Google Open Source Project Style Guide

    Chinese version of Google open source project style guide

    .... If the project you are modifying originates from Google, you may be directed to the English version of the project page to understand the style used by the project. The Chinese version of the project uses reStructuredText plain text markup syntax, and uses Sphinx to generate document formats such as HTML / CHM / PDF.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    pdf-editor

    pdf-editor

    Edit your PDFs without needing a subscription or creating accounts

    Edit your PDFs without needing a subscription or creating accounts. Add a GUI/Turn it into a web application. Add a parser for the command line to do multiple commands at once e.g. merge (cut pdf1) pdf2. Tested working with Python 3.8.5. Install venv (py -3.8 -m pip install virtualenv). PDF and Word documents are binary files, which makes them much more complex than plaintext files. In addition to text, they store lots of font, color, and layout information. If you want your programs to read...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    pdf combiner merger converter splitter

    pdf combiner merger converter splitter

    PDF Combiner is a user-friendly, GUI-based tool built in

    PDF Combiner is a user-friendly open source free to use, GUI-based tool for combining, pdf to excel, pdf to word, image to pdf, zip, unzip annotate and splitting PDF files. It is easy to use, supports multiple file insert and delete and process, and allows you to adjust the order of files before combining.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 18
    Scribus

    Scribus

    Powerful desktop publishing software

    Scribus is an Open Source program that brings professional page layout to Linux, BSD UNIX, Solaris, OpenIndiana, GNU/Hurd, Mac OS X, OS/2 Warp 4, eComStation, and Windows desktops with a combination of press-ready output and new approaches to page design. Underneath a modern and user-friendly interface, Scribus supports professional publishing features, such as color separations, CMYK and spot colors, ICC color management, and versatile PDF creation.
    Leader badge
    Downloads: 12,218 This Week
    Last Update:
    See Project
  • 19
    MediaWiki To LaTeX converts MediaWiki markup to LaTeX and generates a PDF. So it provides an export from MediaWiki to LaTeX. It works with any project running MediaWiki, especially Wikipedia and Wikibooks.
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    DeepSearch5Plus

    DeepSearch5Plus

    Search recursively all files, text inside files, and bookmarks

    Files, text, and bookmarks html/json search tool. It is able to merge duplicated bookmark links from multiple html/json files and export to html. The user interface is designed to be used by kyeboard in any aspect of its features scrolling list, select input field, menu bar, check and drop-down menu and so on. It's able to printout search list to pdf or system printer or save as text file. It allows to rename single file or change upper/lower case, join spaces in bulk mode with multiple...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21

    SimpleTextFormatter

    STF automatically generates documentation

    STF is a system of automatically generating documentation under control of a program or a script. It is frequently used to automatically generate test reports. STF is also used to clean up the output of a process and turn it into a nice looking report.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Text-ly

    Text-ly

    Text.ly - An alternative for Notepad.

    LOOKING FOR Text Editor? You've Come At The Right Place! Editing Your text for your simplicity A Text editor for Editing Text....! Just download and install and use as an alternative for typical Notepad. This application is compiled from the Pyinstaller library so don't mind there is a vulnerability or something the antivirus program might show it as malware or trojan this happens with most of the apps compiled from the Pyinstaller library. So No worries There is not any malware or virus...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Paperless-ng

    Paperless-ng

    A supercharged version of paperless, scan, index and archive docs

    Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. I wrote this to make “going paperless” easier. I do not have to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    HistogramsApp

    HistogramsApp

    Application that generates KDE-PDP plots from geochronological data

    HistogramsApp is a Python 3.6 application that generates (KDE and PDP) from geochronological data .HistogramsApp allows to interactively setup plot parameters such as the bandwidth and the peak detection sensibility. To cite the application please refer to: 1) https://www.tandfonline.com/doi/abs/10.1080/00206814.2021.1954556?journalCode=tigr20 Rodriguez-Corcho, A. F., Rojas-Agramonte, Y., Barrera-Gonzalez, J. A., Marroquin-Gomez, M. P., Bonilla-Correa, S., Izquierdo-Camacho, D.,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Carlo Acutis Novena

    Carlo Acutis Novena

    Copyleft novena text and illustrations to meditate

    French, Italian and Spanish copyleft novena text and illustrations to meditate with Carlo Acutis, a defunct Italian catholic teenager who documented eucharistic miracles and liked dolphins.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next