108 projects for "python pdf scaper" with 1 filter applied:

  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Deploy Apps in Seconds with Cloud Run Icon
    Deploy Apps in Seconds with Cloud Run

    Host and run your applications without the need to manage infrastructure. Scales up from and down to zero automatically.

    Cloud Run is the fastest way to deploy containerized apps. Push your code in Go, Python, Node.js, Java, or any language and Cloud Run builds and deploys it automatically. Get fast autoscaling, pay only when your code runs, and skip the infrastructure headaches. Two million requests free per month. And new customers get $300 in free credit.
    Try Cloud Run Free
  • 1
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    zpdf

    zpdf

    Zero-copy PDF text extraction library written in Zig

    zpdf is a high-performance PDF text extraction library written in Zig that focuses on speed, low overhead, and modern parsing techniques. It leans heavily on memory-mapped file reading and zero-copy patterns where possible, so it can scan large PDFs without repeatedly copying data around in memory. The library supports streaming extraction using efficient arena allocation, making it well suited for workloads that need to process big documents quickly or in batches. It implements multiple PDF...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...
    Downloads: 41 This Week
    Last Update:
    See Project
  • 4
    Pysheeet

    Pysheeet

    Python Cheat Sheet

    Pysheeet is a community-driven collection of Python code snippets covering common patterns and tasks like sockets, file I/O, data structures, and more. Each snippet is concise and battle-tested, designed to save coding time and reduce boilerplate. With documentation hosted on Read the Docs and an active GitHub repo, it’s a go-to resource for Python developers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Cut Data Warehouse Costs up to 54% with BigQuery Icon
    Cut Data Warehouse Costs up to 54% with BigQuery

    Migrate from Snowflake, Databricks, or Redshift with free migration tools. Exabyte scale without the Exabyte price.

    BigQuery delivers up to 54% lower TCO than cloud alternatives. Migrate from legacy or competing warehouses using free BigQuery Migration Service with automated SQL translation. Get serverless scale with no infrastructure to manage, compressed storage, and flexible pricing—pay per query or commit for deeper discounts. New customers get $300 in free credit.
    Try BigQuery Free
  • 5
    PageIndex

    PageIndex

    Document Index for Vectorless, Reasoning-based RAG

    PageIndex is an innovative open-source framework that reimagines retrieval-augmented generation (RAG) by eliminating conventional vector similarity search and instead building hierarchical semantic indexes that mirror a document’s natural structure. Rather than chunking text and embedding it into a vector database, PageIndex constructs a tree-structured index — similar to a detailed, AI-enhanced table of contents — that a large language model can traverse to locate the most relevant sections...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 7
    Jupyter Notebook Tools for Sphinx

    Jupyter Notebook Tools for Sphinx

    Sphinx source parser for Jupyter notebooks

    nbsphinx is a Sphinx extension that provides a source parser for *.ipynb files. Custom Sphinx directives are used to show Jupyter Notebook code cells (and of course their results) in both HTML and LaTeX output. Un-evaluated notebooks – i.e. notebooks without stored output cells – will be automatically executed during the Sphinx build process.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Libros de Programación en Español

    Libros de Programación en Español

    List of programming books in Spanish for free

    Libros de Programación en Español is a curated list of free programming books in Spanish, organized by topic and technology so learners can find high-quality materials without cost. The README is structured as an index with general programming books, followed by sections for specific languages such as JavaScript, TypeScript, Python, Ruby, Rust, PHP, Haskell, Go, Kotlin, Java, and R.Each entry includes the book title, author, and a link to the official or legal free version (PDF, HTML, eBook, etc.), focusing on resources that are legitimately available. Beyond languages, the list also covers frameworks and libraries (like React and Qwik), tools (such as Git), and databases (SQL), grouping them in separate sections for easier browsing. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • $300 in Free Credit for Your Google Cloud Projects Icon
    $300 in Free Credit for Your Google Cloud Projects

    Build, test, and explore on Google Cloud with $300 in free credit. No hidden charges. No surprise bills.

    Launch your next project with $300 in free Google Cloud credit—no hidden charges. Test, build, and deploy without risk. Use your credit across the Google Cloud platform to find what works best for your needs. After your credits are used, continue building with free monthly usage products. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 10
    Controllable-RAG-Agent

    Controllable-RAG-Agent

    This repository provides an advanced RAG

    Controllable-RAG-Agent is an advanced Retrieval-Augmented Generation (RAG) system designed specifically for complex, multi-step question answering over your own documents. Instead of relying solely on simple semantic search, it builds a deterministic control graph that acts as the “brain” of the agent, orchestrating planning, retrieval, reasoning, and verification across many steps. The pipeline ingests PDFs, splits them into chapters, cleans and preprocesses text, then constructs vector...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    MComix

    MComix

    GTK+ comic book viewer.

    MComix is a user-friendly, customizable image viewer. It is specifically designed to handle comic books (both Western comics and manga) and supports a variety of container formats (including CBR, CBZ, CB7, CBT, LHA and PDF). MComix is a fork of Comix.
    Leader badge
    Downloads: 678 This Week
    Last Update:
    See Project
  • 12
    Scribus

    Scribus

    Powerful desktop publishing software

    Scribus is an Open Source program that brings professional page layout to Linux, BSD UNIX, Solaris, OpenIndiana, GNU/Hurd, Mac OS X, OS/2 Warp 4, eComStation, and Windows desktops with a combination of press-ready output and new approaches to page design. Underneath a modern and user-friendly interface, Scribus supports professional publishing features, such as color separations, CMYK and spot colors, ICC color management, and versatile PDF creation.
    Leader badge
    Downloads: 14,739 This Week
    Last Update:
    See Project
  • 13
    EasyABC

    EasyABC

    EasyABC is an open source ABC editor

    EasyABC allows the user to create, edit, view, play, convert music written in the ABC music notation language. The program was originally written in Python 2.7 and WxPython by Nils Liberg and runs on Windows, OSX, and Linux. Jan Wybren de Jong has converted to run on Python 3.8 or higher. Frédéric Aupépin has been supporting EasyABC on OSX. EasyABC depends upon other external programs like abc2midi, abcm2ps, fluidsynth. If you install the Windows or Mac executables most of these programs...
    Leader badge
    Downloads: 276 This Week
    Last Update:
    See Project
  • 14
    Apache OpenOffice

    Apache OpenOffice

    The free and Open Source productivity suite

    Free alternative for Office productivity tools: Apache OpenOffice - formerly known as OpenOffice.org - is an open-source office productivity software suite containing word processor, spreadsheet, presentation, graphics, formula editor, and database management applications. OpenOffice is available in many languages, works on all common computers, stores data in ODF - the international open standard format - and is able to read and write files in other formats, included the format used by the...
    Leader badge
    Downloads: 298,066 This Week
    Last Update:
    See Project
  • 15
    Small Python library with various things such as Configuration file parsing (in Python syntax), HTML and PDF parsing. Used in others of my projects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    MediaWiki To LaTeX converts MediaWiki markup to LaTeX and generates a PDF. So it provides an export from MediaWiki to LaTeX. It works with any project running MediaWiki, especially Wikipedia and Wikibooks.
    Leader badge
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17

    littleutils

    Various small and useful command-line utilities

    The littleutils include duplicate file finders (repeats, repeats.pl, repeats.py), image optimizers (opt-jpg, opt-png, opt-gif, recomp-jpg), file rename tools (lowercase, uppercase, pren), archive recompressors (to-gzip, to-bzip2, to-bzip3, to-7zip, to-lzma, to-lzip, to-xz), a tempfile utility (tempname), file property tools (filedate, filemode, filenode, fileown, filesize, and lrealpath), and others. See the README file for more details.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 18
    Impressive
    Impressive is a program that displays PDF presentation slides with style. Smooth alpha-blended slide transitions are provided for the sake of eye candy, but in addition to this, Impressive offers some unique tools that are very useful for presentations.
    Downloads: 40 This Week
    Last Update:
    See Project
  • 19

    SimpleTextFormatter

    STF automatically generates documentation

    STF is a system of automatically generating documentation under control of a program or a script. It is frequently used to automatically generate test reports. STF is also used to clean up the output of a process and turn it into a nice looking report.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    QuickPlot

    QuickPlot

    Simple user interface for gnuplot aimed for reflectometry data

    Graphical user interface for gnuplot to create publication quality figure very quickly. It supports templates for fast formatting of graphics, different plot styles, insets, axis and label options. One important feature is storing metadata in png and pdf files that can be used to reload any graph saved with QuickPlot.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Celestial Precomputation

    Celestial Precomputation

    Sight Reduction for Air Navigation with Python

    Warning: I found an error if the DR position is on the southern hemisphere. The PUB249 pdf files did not show this error and are correct (as far as I checked). All Revisions 8+ and the Windows release have been fixed. This Python TK application calculates a good set of stars for a 3 star fix, makes all calculations similar to the FAAs Celestial Computation Sheet (see FAA doc FAA-H-8083-18). Also, you can create a PDF file that generates Pub 249 Vol 1 for any epoch you specify. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Fryktelig fin faktura

    Fryktelig fin faktura

    Create, review and administer norwegian invoices.

    Create, review and administer invoices for use in Norway, and print on the F60 faktura form or send by email as PDF. «Fryktelig Fin Faktura» er et fakturaprogram for norske forhold. Fakturaene lages på PDF eller skjema F60. Dersom du kan hjelpe til, er vi åpne for flere utviklere. Releases are signed with this GnuPG/PGP key fingerprint: «7F09 D1F8 1C3E 1758 4AC9 C61A 4F5A D64D FA68 7324»
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    DocBook to LaTeX Publishing transforms your SGML/XML DocBook documents to DVI, PostScript or PDF by translating them in pure LaTeX as a first process. MathML 2.0 markups are supported too. It started as a clone of DB2LaTeX.
    Leader badge
    Downloads: 98 This Week
    Last Update:
    See Project
  • 24
    Linux-Intelligent-Ocr-Solution

    Linux-Intelligent-Ocr-Solution

    Easy-OCR solution and Tesseract trainer for GNU/Linux

    Linux-intelligent-ocr-solution Lios is a free and open source software for converting print in to text using either scanner or a camera, It can also produce text out of scanned images from other sources such as Pdf, Image, Folder containing Images or screenshot. Program is given total accessibility for visually impaired. A Tesseract Trainer GUI is also shipped with this package. Forum : https://groups.google.com/forum/#!forum/lios Video Tutorial :...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    QtiPlot
    QtiPlot is a user-friendly, platform independent data analysis and visualization application similar to the non-free Windows program Origin.
    Downloads: 53 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.