Search Results for "python pdf to text files"

Showing 654 open source projects for "python pdf to text files"

View related business solutions
  • All-in-One Payroll and HR Platform Icon
    All-in-One Payroll and HR Platform

    For small and mid-sized businesses that need a comprehensive payroll and HR solution with personalized support

    We design our technology to make workforce management easier. APS offers core HR, payroll, benefits administration, attendance, recruiting, employee onboarding, and more.
  • Manage Properties Better For Free Icon
    Manage Properties Better For Free

    For small to mid-sized landlords and property managers

    Innago is a free and easy-to-use property management solution. Whether you have 1 unit or 1000, student housing, or commercial properties, Innago is built for you. Our software is designed to save you time and money, so you can spend more time doing the things that matter most.
  • 1
    pdf-extractor

    pdf-extractor

    Node.js module for rendering pdf pages to images, svgs and HTML files

    Pdf-extractor is a wrapper around pdf.js to generate images, svgs, html files, text files and json files from a pdf on node.js. A DOM Canvas is used to render and export the graphical layer of the pdf. Canvas exports *.png as a default but can be extended to export to other file types like .jpg. Pdf objects are converted to svg using the SVGGraphics parser of pdf.js. Pdf text is converted to HTML. This can be used as a (transparent) layer over the image to enable text selection. Pdf text...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    Super-PDF-Editor

    Super-PDF-Editor

    World's most comprehensive, powerful, process-based PDF editor

    World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. PDF editing with 60+ features rich tools and function like OCR pdf and images and produce output like searchable PDF, Text, Hocr, Box, Unlv. Also, improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry. Easy pdf imposition, booklet, n ups pages, and more. OCR...
    Downloads: 51 This Week
    Last Update:
    See Project
  • 3
    Super-PDF-Editor-Lite

    Super-PDF-Editor-Lite

    World's most comprehensive, powerful, process-based PDF editor

    World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. Includes features like Create PDF from Images, HTML, Text files. Create a processing log file. Extract Page, Split Page, Rotate Page, Merge Page, Duplicate page, Move Page, Printing, and Compress Page. Improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 4
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 19 This Week
    Last Update:
    See Project
  • Free CRM Software With Something for Everyone Icon
    Free CRM Software With Something for Everyone

    216,000+ customers in over 135 countries grow their businesses with HubSpot

    Think CRM software is just about contact management? Think again. HubSpot CRM has free tools for everyone on your team, and it’s 100% free. Here’s how our free CRM solution makes your job easier.
  • 5
    PDFsam

    PDFsam

    PDFsam, a desktop application to split, merge, mix, rotate PDF files

    PDFsam Basic is our free and open-source desktop application to split, merge, extract pages, rotate and mix PDF files. PDFsam Visual is a powerful tool to visually compose PDF files, reorder pages, delete pages, split, merge, rotate, encrypt, decrypt, extract text, convert to grayscale, crop PDF files. PDFsam Basic is written using JavaFX. Since version 4 it is released as a self-contained application and bundles a jlinked JDK while version 3 requires a Java Runtime Environment 8 with JavaFx...
    Downloads: 44 This Week
    Last Update:
    See Project
  • 6
    Video-subtitle-extractor

    Video-subtitle-extractor

    A GUI tool for extracting hard-coded subtitle (hardsub) from videos

    Video hard subtitle extraction, generate srt file. There is no need to apply for a third-party API, and text recognition can be implemented locally. A deep learning-based video subtitle extraction framework, including subtitle region detection and subtitle content extraction. A GUI tool for extracting hard-coded subtitles (hardsub) from videos and generating srt files. Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu...
    Downloads: 65 This Week
    Last Update:
    See Project
  • 7
    JupyterLab

    JupyterLab

    JupyterLab computational environment

    ... new workflows for interactive computing. JupyterLab also offers a unified model for viewing and handling data formats. JupyterLab understands many file formats (images, CSV, JSON, Markdown, PDF, Vega, Vega-Lite, etc.) and can also display rich kernel output in these formats. See File and Output Formats for more information. To navigate the user interface, JupyterLab offers customizable keyboard shortcuts and the ability to use key maps from vim, emacs, and Sublime Text in the text editor.
    Downloads: 61 This Week
    Last Update:
    See Project
  • 8
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a full-text...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 9
    KOReader

    KOReader

    An ebook reader application supporting PDF, DjVu, EPUB, FB2, etc.

    KOReader is a document viewer for E Ink devices. Supported fileformats include EPUB, PDF, DjVu, XPS, CBT, CBZ, FB2, PDB, TXT, HTML, RTF, CHM, DOC, MOBI and ZIP files. It’s available for Kindle, Kobo, PocketBook, Android and desktop Linux. Runs on embedded devices (Cervantes, Kindle, Kobo, PocketBook, reMarkable), Android and Linux computers. Developers can run a KOReader emulator in Linux and MacOS. Multi-lingual user interface with a highly customizable reader view and many typesetting options...
    Downloads: 47 This Week
    Last Update:
    See Project
  • The Secure Workspace for Remote Work Icon
    The Secure Workspace for Remote Work

    Venn isolates and protects work from any personal use on the same computer, whether BYO or company issued.

    Venn is a secure workspace for remote work that isolates and protects work from any personal use on the same computer. Work lives in a secure local enclave that is company controlled, where all data is encrypted and access is managed. Within the enclave – visually indicated by the Blue Border around these applications – business activity is walled off from anything that happens on the personal side. As a result, work and personal uses can now safely coexist on the same computer.
  • 10
    Visual Studio Code

    Visual Studio Code

    Visual Studio Code

    ... is updated monthly with new features and bug fixes. You can download it for Windows, macOS, and Linux on Visual Studio Code's website. To get the latest releases every day, install the Insiders build. Debug code right from the editor. Launch or attach to your running apps and debug with break points, call stacks, and an interactive console. Working with Git and other SCM providers has never been easier. Review diffs, stage files, and make commits right from the editor.
    Downloads: 57 This Week
    Last Update:
    See Project
  • 11
    Termux application

    Termux application

    Terminal emulator application for Android OS extendible

    Termux is an Android terminal application and Linux environment. At first start a small base system is downloaded, desired packages can then be installed using the apt package manager known from the Debian and Ubuntu Linux distributions. Access the built-in help by long-pressing anywhere on the terminal and selecting the Help menu option to learn more. Allows the app to view information about network connections such as which networks exist and are connected. Allows the app to create network...
    Downloads: 46 This Week
    Last Update:
    See Project
  • 12
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    ...-source software which means that transparency is the core value of our software development. Source code can be reviewed and improved by anyone from anywhere. Papermerge supports multiple users. Each user can be assigned different permissions to perform only a specific kind of action e.g. view only documents from a specific folder. OCR technology is vital part of Papermerge. It extracts text information from scanned documents, PDF, JPEG, TIFF files.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 13
    SOPS

    SOPS

    Simple and flexible tool for managing secrets

    sops is an editor of encrypted files that supports YAML, JSON, ENV, INI and BINARY formats and encrypts with AWS KMS, GCP KMS, Azure Key Vault, age, and PGP. For the adventurous, unstable features are available in the develop branch, which you can install from source. To use sops as a library, take a look at the decrypt package. We rewrote Sops in Go to solve a number of deployment issues, but the Python branch still exists under python-sops. We will keep maintaining it for a while, and you can...
    Downloads: 21 This Week
    Last Update:
    See Project
  • 14
    Frescobaldi

    Frescobaldi

    LilyPond sheet music text editor

    Frescobaldi is a free and open source LilyPond sheet music text editor. Designed to be powerful yet lightweight and easy-to-use, Frescobaldi offers great functionality and a host of useful features such as music view with advanced two-way Point & Click, Midi capturing to enter music, a Snippet Manager and many more. Frescobaldi is named after Girolamo Frescobaldi (1583-1643), an Italian composer of keyboard music in the late Renaissance and early Baroque period.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 15
    ripgrep

    ripgrep

    Regex pattern directory search tool that respects your .gitignore

    ... could be PDF text extraction, less supported decompression, decrypting, automatic encoding detection and so on. In other words, use ripgrep if you like speed, filtering by default, fewer bugs and Unicode support.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 16
    borb

    borb

    borb is a library for reading, creating and manipulating PDF files

    borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Sphinx

    Sphinx

    Main repository for the Sphinx documentation builder

    Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, written by Georg Brandl and licensed under the BSD license. It was originally created for the Python documentation, and it has excellent facilities for the documentation of software projects in a range of languages. Of course, this site is also created from reStructuredText sources using Sphinx! HTML (including Windows HTML Help), LaTeX (for printable PDF versions), ePub, Texinfo, manual pages, plain text...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 18
    WeasyPrint

    WeasyPrint

    The awesome document factory

    ..., tables of contents, links, annotations, optimized images, attachments, WeasyPrint provides many features out of the box, and even gives you the possibility to add your own ways to customize your PDF files. Digital fonts are finely tuned pieces of artwork. To give to your documents the subtle touch they deserve, carefully choose the options you want, kerning, ligatures, old-style numbers, tabular figures, ordinals, etc.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 19
    Back In Time

    Back In Time

    An easy-to-use backup tool for GNU Linux using rsync in the back

    Back In Time is an easy-to-use tool to backup files and folders. It runs on GNU Linux (not on Windows or OS X/macOS) and provides a command line tool backintime and a GUI backintime-qt both written in Python3. It uses rsync to take manual or scheduled snapshots and stores them locally or remotely through SSH. Each snapshot is in its own folder with copies of the original files, but unchanged files are hard-linked between snapshots to save storage space. It was inspired by FlyBack.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 20
    StoryTeller

    StoryTeller

    Multimodal AI Story Teller, built with Stable Diffusion, GPT, etc.

    ... commit. The final video will be saved as /out/out.mp4, alongside other intermediate images, audio files, and subtitles. For more advanced use cases, you can also directly interface with Story Teller in Python code.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 21
    PdfPig

    PdfPig

    Read and extract text and other content from PDFs in C#

    This project allows users to read and extract text and other content from PDF files. In addition the library can be used to create simple PDF documents containing text and geometrical shapes.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Code App

    Code App

    Building a full-fledged code editor for iPad

    ...) Local Python Runtime. Local Clang compiler. Git Version Control. Package manager support (Pip and NPM) and Remote connection support (Files and terminal). While we want to make the editing experience as close as a desktop offers, Code App is still bounded by iOS's limitations. For example, you cannot download arbitrary commands or modules with native components. Spawning subprocesses is also not possible.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 23
    RecoverPy

    RecoverPy

    Interactively find and recover deleted or overwritten files

    RecoverPy is a powerful tool that leverages your system capabilities to recover lost files. Unlike others, you can not only recover deleted files but also overwritten data. Every block of your partition will be scanned. You can even find a string in binary files.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    Logseq

    Logseq

    A privacy-first, open-source platform for knowledge management

    Logseq is a privacy-first, open-source knowledge base that works on top of local plain-text Markdown and Org-mode files. Use it to write, organize and share your thoughts, keep your to-do list, and build your own digital garden. Logseq is a platform for knowledge management and collaboration. It focuses on privacy, longevity, and user control. The server will never store or analyze your private notes. Your data are plain text files and we currently support both Markdown and Emacs Org-mode (more...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    xhtml2pdf

    xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next