Showing 21 open source projects for "python pdf scaper"

View related business solutions
  • Build on Google Cloud with $300 in Free Credit Icon
    Build on Google Cloud with $300 in Free Credit

    New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.
    Start Free Trial
  • 99.99% Uptime for MySQL and PostgreSQL on Google Cloud Icon
    99.99% Uptime for MySQL and PostgreSQL on Google Cloud

    Enterprise Plus edition delivers sub-second maintenance downtime and 2x read/write performance. Built for critical apps.

    Cloud SQL Enterprise Plus gives you a 99.99% availability SLA with near-zero downtime maintenance—typically under 10 seconds. Get 2x better read/write performance, intelligent data caching, and 35 days of point-in-time recovery. Supports MySQL, PostgreSQL, and SQL Server with built-in vector search for gen AI apps. New customers get $300 in free credit.
    Try Cloud SQL Free
  • 1
    Pix2Text

    Pix2Text

    Open-Source Python3 tool for recognizing layouts, tables, and math

    ...A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality. Pix2Text (P2T) can recognize layouts, tables, images, text, and mathematical formulas, and integrate all of these contents into Markdown format. P2T can also convert an entire PDF file (which can contain scanned images or any other format) into Markdown format.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 2
    Zerox OCR

    Zerox OCR

    PDF to Markdown with vision models

    A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense. ZeroX is an open-source machine learning framework designed for fast experimentation and production deployment, optimized for speed and ease of use.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    Remarkable for Linux

    Remarkable for Linux

    The Markdown Editor for Linux

    With Live Preview you can see your changes as you make them. There is no need to export first to check your syntax. This is accompanied by synchronized scrolling. Remarkable has Github Flavoured Markdown. This has a simple, easy-to-learn syntax with features like checklists, highlighting, links, images and more. Remarkable allows you to export your files to PDF and HTML from within the app. The HTML code is even prettified and PDFs have a TOC. You can style your markdown documents however...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    bookdown

    bookdown

    Authoring Books and Technical Documents with R Markdown

    ...A markup language easier to learn than LaTeX, and to write elements such as section headers, lists, quotes, figures, tables, and citations. Multiple choices of output formats: PDF, LaTeX, HTML, EPUB, and Word. Possibility of including dynamic graphics and interactive applications (HTML widgets and Shiny apps) Support for languages other than R, including C/C++, Python, and SQL, etc. LaTeX equations, theorems, and proofs work for all output formats. Can be published to GitHub, bookdown.org, and any web servers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Deploy Apps in Seconds with Cloud Run Icon
    Deploy Apps in Seconds with Cloud Run

    Host and run your applications without the need to manage infrastructure. Scales up from and down to zero automatically.

    Cloud Run is the fastest way to deploy containerized apps. Push your code in Go, Python, Node.js, Java, or any language and Cloud Run builds and deploys it automatically. Get fast autoscaling, pay only when your code runs, and skip the infrastructure headaches. Two million requests free per month. And new customers get $300 in free credit.
    Try Cloud Run Free
  • 5
    Scribus

    Scribus

    Powerful desktop publishing software

    Scribus is an Open Source program that brings professional page layout to Linux, BSD UNIX, Solaris, OpenIndiana, GNU/Hurd, Mac OS X, OS/2 Warp 4, eComStation, and Windows desktops with a combination of press-ready output and new approaches to page design. Underneath a modern and user-friendly interface, Scribus supports professional publishing features, such as color separations, CMYK and spot colors, ICC color management, and versatile PDF creation.
    Leader badge
    Downloads: 14,492 This Week
    Last Update:
    See Project
  • 6
    Apache OpenOffice

    Apache OpenOffice

    The free and Open Source productivity suite

    Free alternative for Office productivity tools: Apache OpenOffice - formerly known as OpenOffice.org - is an open-source office productivity software suite containing word processor, spreadsheet, presentation, graphics, formula editor, and database management applications. OpenOffice is available in many languages, works on all common computers, stores data in ODF - the international open standard format - and is able to read and write files in other formats, included the format used by the...
    Leader badge
    Downloads: 288,280 This Week
    Last Update:
    See Project
  • 7
    MediaWiki To LaTeX converts MediaWiki markup to LaTeX and generates a PDF. So it provides an export from MediaWiki to LaTeX. It works with any project running MediaWiki, especially Wikipedia and Wikibooks.
    Leader badge
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8

    SimpleTextFormatter

    STF automatically generates documentation

    STF is a system of automatically generating documentation under control of a program or a script. It is frequently used to automatically generate test reports. STF is also used to clean up the output of a process and turn it into a nice looking report.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DocBook to LaTeX Publishing transforms your SGML/XML DocBook documents to DVI, PostScript or PDF by translating them in pure LaTeX as a first process. MathML 2.0 markups are supported too. It started as a clone of DB2LaTeX.
    Leader badge
    Downloads: 102 This Week
    Last Update:
    See Project
  • Run Any Workload on Compute Engine VMs Icon
    Run Any Workload on Compute Engine VMs

    From dev environments to AI training, choose preset or custom VMs with 1–96 vCPUs and industry-leading 99.95% uptime SLA.

    Compute Engine delivers high-performance virtual machines for web apps, databases, containers, and AI workloads. Choose from general-purpose, compute-optimized, or GPU/TPU-accelerated machine types—or build custom VMs to match your exact specs. With live migration and automatic failover, your workloads stay online. New customers get $300 in free credits.
    Try Compute Engine
  • 10
    Canorus

    Canorus

    Music score editor

    Canorus is a free cross-platform music score editor. It supports an unlimited number and length of staffs, polyphony, a MIDI playback of notes, chord markings, lyrics, import/export filters to formats like MIDI, MusicXML, ABC Music, MusiXTeX and LilyPond
    Leader badge
    Downloads: 18 This Week
    Last Update:
    See Project
  • 11
    PDF-Shuffler
    PDF-Shuffler is a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a frontend for python-pyPdf.
    Downloads: 57 This Week
    Last Update:
    See Project
  • 12

    Dvipdfm tool for SCons

    SCons tool to cooperate with dvipdfm program

    SCons is a make replacement providing a range of enhanced features such as automated dependency generation and built in compilation cache support. SCons rule sets are Python scripts so as well as the features it provides itself SCons allows you to use the full power of Python to control compilation. This is a SCons extension (tool) which enables usage of the dvipdfm program to convert dvi files to pdf.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Cute Klava
    Cute Klava is now part of LibreEngineering suite: http://sourceforge.net/projects/libreeng/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    xml2txt is a text formatter for XMl in the same way the FO is a PDF formatter. It uses python to convert an XML document to well-formatted text, wtih borders, indents, and tables.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    The aim of this project is to develop a Portable Document Format (PDF) importer for OpenOffice.org Writer based on XPDF. This project was inspired by the PDF importer within KWord.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Wyneken is a content-oriented text processor that makes your life as a student easier by allowing you to create and manage digital notebooks. Wyneken also allows you to create PDF presentations, letters, articles, and reports. In 2015, Wyneken may or may not work with the latest Linux distributions, but you can use it for building pdfs by pulling our docker repo indicated in the homepage field of this site. The docker site has the information on how to invoke the container, which is...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    WikiPDF is a mediawiki extension based on Wiki2PDF that adds PDF/LaTeX features to mediawiki. Wiki2PDF is a python script to convert multiple articles of a mediawiki based wiki (pre-configured to use with www.wikipedia.org) to a single LaTeX or PDF file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Typewriter converts plain text files to PDF. It allows users to create underlined text and superscript. The PDF created looks exactly as the text, with the same line and page breaks, allowing users the simplicity and control of a typewriter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    pdfplayground is a collection of tools written in python to read/write PDF files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Process OpenOffice.org Writer Files and transform them to PDF without installing OpenOffice.org What is PyOpenOffice? * It is a class library, written in the Python Language. * It is a platform-independent command-line utility (many abilitie
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    xtopdf: Tools to convert other formats (x) to PDF; x as in math. - solve for x :-) Currently x == {.txt, .DBF}. Others to follow. Benefits: all those of PDF (better cross-platform viewing/printing, read-only, etc.)
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.
Try Free →