python text free download

Showing 56 open source projects for "python text"

View related business solutions

Formats and Protocols Python Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
1

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files

OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.

Downloads: 120 This Week

Last Update: 2026-07-01
See Project
2

Pix2Text

Open-Source Python3 tool for recognizing layouts, tables, and math

An Open-Source Python3 tool for recognizing layouts, tables, math formulas, and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality. Pix2Text (P2T) can recognize layouts, tables, images, text, and mathematical formulas, and integrate all of these contents into Markdown format. ...

Downloads: 6 This Week

Last Update: 2026-02-07
See Project
3

isort

A Python utility / library to sort imports

isort is a Python utility/library to sort imports alphabetically, and automatically separated into sections and by type. It provides a command-line utility, Python library and plugins for various editors to quickly sort all your imports. It requires Python 3.6+ to run but supports formatting Python 2 code too. Several plugins have been written that enable to use isort from within a variety of text-editors.

Downloads: 5 This Week

Last Update: 2026-02-28
See Project
4

Memvid

Video-based AI memory library. Store millions of text chunks in MP4

Memvid encodes text chunks as QR codes within MP4 frames to build a portable “video memory” for AI systems. This innovative approach uses standard video containers and offers millisecond-level semantic search across large corpora with dramatically less storage than vector DBs. It's self-contained—no DB needed—and supports features like PDF indexing, chat integration, and cloud dashboards.

Downloads: 7 This Week

Last Update: 2026-05-27
See Project
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
5

Nano PDF Editor

Edit PDF files with Nano Banana

Nano PDF Editor is a minimalist, portable PDF viewer and toolkit that focuses on simplicity, speed, and ease of integration for applications that need basic PDF rendering without heavy dependencies. It provides core functionality such as page navigation, zooming, text selection, and rendering directly to native graphics surfaces, making it suitable for lightweight PDF viewing scenarios on desktop or embedded platforms. Designed to be easily embedded into larger software projects, Nano-PDF...

Downloads: 22 This Week

Last Update: 2026-02-05
See Project
6

PackageDev

Tools to ease the creation of snippets, syntax definitions, etc.

PackageDev provides syntax highlighting and other helpful utility for Sublime Text resource files. Resource files are ways of configuring the Sublime Text text editor to various extends, including but not limited to: custom syntax definitions, context menus (and the main menu), and key bindings. Thus, this package is ideal for package developers, but even normal users of Sublime Text who want to configure it to their liking should find it very useful.

Downloads: 6 This Week

Last Update: 13 hours ago
See Project
7

Unredact

A simple tool for reading in poorly redacted documents

Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...

Downloads: 11 This Week

Last Update: 2026-02-03
See Project
8

pdfly

CLI tool to extract (meta)data from PDF and manipulate PDF files

A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.

Downloads: 10 This Week

Last Update: 2025-10-13
See Project
9
$1258baefa1ba45684ba74f4fc70f7d7dbd4af9b2aa997254aa88696b73b49100?&w=120$

Texify

Math OCR model that outputs LaTeX and markdown

Texify is an OCR model that converts images or pdfs containing math into markdown and LaTeX that can be rendered by MathJax ($$ and $ are delimiters). It can run on CPU, GPU, or MPS.

Downloads: 3 This Week

Last Update: 2024-10-31
See Project
Save Up to 91% on Cloud Compute With Spot VMs
Automatic sustained-use discounts. One free VM per month. No negotiation needed.

Run batch jobs at 60-91% off with Spot VMs. Long-running workloads get automatic discounts with sustained use.

Try Free
10

py-pdf-parser

A Python tool to help extracting information from structured PDFs

py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents.

Downloads: 4 This Week

Last Update: 2025-04-28
See Project
11

TexText

Re-editable LaTeX/ typst graphics for Inkscape

Re-editable LaTeX and typst graphics for Inkscape. TexText is a Python extension for the vector graphics editor Inkscape providing the possibility to add and re-edit LaTeX and typst generated SVG elements to your drawing.

Downloads: 11 This Week

Last Update: 2026-01-06
See Project
12

Extract TOTP/HOTP secrets

Extract one time password (OTP) secrets from QR codes

The Python script extract_otp_secrets.py extracts one-time password (OTP) secrets from QR codes exported by two-factor authentication (2FA) apps such as "Google Authenticator".

1 Review

Downloads: 0 This Week

Last Update: 2026-03-08
See Project
13

DocArray

The data structure for multimodal data

DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc. Data...

Downloads: 0 This Week

Last Update: 2025-03-21
See Project
14

realwatermark

A Python application to add watermarks (text or image) to PDF files

A Python application to add watermarks (text or image) to PDF files, converts them into image and back to PDF with options for OCR and compression.

Downloads: 0 This Week

Last Update: 2025-01-27
See Project
15

Asymptote

2D & 3D TeX-Aware Vector Graphics Language

Asymptote is a powerful descriptive vector graphics language for technical drawing, inspired by MetaPost but with an improved C++-like syntax. Asymptote provides for figures the same high-quality typesetting that LaTeX does for scientific text.

42 Reviews

Downloads: 548 This Week

Last Update: 2 days ago
See Project
16

Create Index from PDF

PDF Indexing Script: Searches PDF for words, records page numbers

This Python script helps automate the process of creating an index for a PDF document. It reads a list of words from a text file, searches through each page of the PDF, and records the page numbers where each word appears. The script accounts for the first 24 pages of the PDF that use Roman numerals (i-xxiv) and adjusts the page numbers accordingly.

Downloads: 1 This Week

Last Update: 2025-03-03
See Project
17

PII-Blackout

100% offline, AI-powered PDF redaction

PII Blackout — 100% Offline, Automated PDF & Image Redaction Protecting sensitive data shouldn't mean risking your privacy by uploading files to online tools. PII Blackout is a powerful, local-first desktop application that automates the tedious process of finding and masking Personally Identifiable Information (PII) with military-grade security. Key Features: Smart Auto-Detection & Redaction Manually searching for emails, phone numbers, tax IDs, and names is a thing of the past. PII...

Downloads: 0 This Week

Last Update: 2026-06-24
See Project
18

LovelyPlots

Matplotlib style sheets to nicely format figures for scientific papers

LovelyPlots is a repository containing matplotlib style sheets to nicely format figures for scientific papers, theses, and presentations while keeping them fully editable in Adobe Illustrator. Additionally, .svg export options allow figures to automatically adapt their font to your document's font. For example, .svg figures imported in a .tex file will automatically be generated with the text font used in your .tex file.

Downloads: 4 This Week

Last Update: 2024-05-09
See Project
19

EpiDoc: Epigraphic Documents in TEI XML

XML text markup for ancient documents

The EpiDoc Collaborative is developing specifications and tools for standards-based, digital publication and interchange of scholarly and educational editions of documentary and literary texts like inscriptions and papyri. The link below will take you to the EpiDoc home page on this site.

2 Reviews

Downloads: 9 This Week

Last Update: 2023-06-15
See Project
20

SPyQL

Query data on the command line with SQL-like SELECTs powered by Python

SQL with Python in the middle. SPyQL is a query language that combines the simplicity and structure of SQL with the power and readability of Python. SPyQL offers a command-line interface that allows running SPyQL queries on top of text data (e.g. CSV, JSON). Data can come from files but also from data streams, such as as Kafka, or from databases such as PostgreSQL.

Downloads: 0 This Week

Last Update: 2023-10-19
See Project
21

pdf-editor

Edit your PDFs without needing a subscription or creating accounts

Edit your PDFs without needing a subscription or creating accounts. Add a GUI/Turn it into a web application. Add a parser for the command line to do multiple commands at once e.g. merge (cut pdf1) pdf2. Tested working with Python 3.8.5. Install venv (py -3.8 -m pip install virtualenv). PDF and Word documents are binary files, which makes them much more complex than plaintext files. In addition to text, they store lots of font, color, and layout information. If you want your programs to read or write to PDFs or Word documents, you’ll need to do more than simply pass their filenames to open().

Downloads: 0 This Week

Last Update: 2023-02-02
See Project
22

ADFILT

Web filter lists for countless different topics

This is the place where I, Imre Kristoffer Eilertsen, host my web filter lists for countless different topics, for use in adblock tools and the like. GitHub was in mid-2017 by far the easiest way for laymen like me to store pure text files, which is a necessity to create subscribable lists. This is a hobby project of mine, in which I work just as much on these lists and this repo as I feel like. But don't be fooled by the appearance, as these are nevertheless some lists that I've placed lots...

Downloads: 0 This Week

Last Update: 2023-10-13
See Project
23

PersonGen

A minor Project in Python which uses the RandomUser API .

A Small Program in Python That Makes Use of RandomUser API To Generate Random Person Data.

Downloads: 1 This Week

Last Update: 2021-12-23
See Project
24

pylatexenc

Simple LaTeX parser providing latex-to-unicode and unicode-to-latex

Simple LaTeX parser providing latex-to-unicode and unicode-to-latex conversion. Python 3.4 or 2.7. The library is designed to be as backward-compatible as reasonably possible and is able to run on old Python versions should it be necessary. (Use the setup.py script directly if you have Python 3.7, poetry doesn't seem to work with old Python versions.) The pylatexenc.latexencode module provides a function unicode_to_latex() which converts a Unicode string into LaTeX text and escape sequences. ...

Downloads: 0 This Week

Last Update: 2024-05-14
See Project
25

jsonfield

A reusable Django model field for storing ad-hoc JSON data

jsonfield is a reusable model field that allows you to store validated JSON, automatically handling serialization to and from the database. To use, add jsonfield.JSONField to one of your models. Note: django.contrib.postgres now supports PostgreSQL's jsonb type, which includes extended querying capabilities. If you're an end user of PostgreSQL and want full-featured JSON support, then it is recommended that you use the built-in JSONField. However, jsonfield is still useful when your app...

Downloads: 2 This Week

Last Update: 2022-09-02
See Project