Showing 1495 open source projects for "python text parser"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • 1
    tika-python

    tika-python

    Python binding to the Apache Tika™ REST services

    A Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and easy to install. To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background. To get this working in a disconnected environment, download a tika server file (both tika-server.jar and tika-server.jar.md5, which can be found here) and set...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    ElevenLabs Python

    ElevenLabs Python

    The official Python SDK for the ElevenLabs API

    elevenlabs-python is the official Python SDK for the ElevenLabs API, giving developers a convenient way to access ElevenLabs’ high-quality, lifelike voices. The library wraps the HTTP API into a typed Python client, so you can perform text-to-speech, streaming, voice cloning, voice management, and agents-related operations with simple method calls.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    python-bibtexparser v2

    python-bibtexparser v2

    Bibtex parser for Python 3

    Welcome to python-bibtexparser, a parser for .bib files with a long history and wide adaption. Bibtexparser is available in two versions: V1 and V2. For new projects, we recommend using v2 which, in the long run, will provide an overall more robust and faster experience. For now, however, note that v2 is an early beta, and does not contain all features of v1.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Python Progressbar

    Python Progressbar

    Progressbar 2 - A progress bar for Python 2 and Python 3

    A text progress bar is typically used to display the progress of a long-running operation, providing a visual cue that processing is underway. The progressbar is based on the old Python progressbar package that was published on the now-defunct Google Code. Since that project was completely abandoned by its developer and the developer did not respond to my email, I decided to fork the package.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    MCP Text Editor

    MCP Text Editor

    Provides line-oriented text file editing capabilities

    The MCP Text Editor Server provides line-oriented text file editing capabilities through a standardized API, optimized for integration with Large Language Models (LLMs). It enables efficient partial file access, minimizing token usage while ensuring safe concurrent editing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Text Generation Inference

    Text Generation Inference

    Large Language Model Text Generation Inference

    Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Text Generation Web UI

    Text Generation Web UI

    A gradio web UI for running Large Language Models like LLaMA

    A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA. Dropdown menu for switching between models. Notebook mode that resembles OpenAI's playground. Chat mode for conversation and role playing. Instruct mode compatible with Alpaca and Open Assistant formats. Nice HTML output for GPT-4chan. Markdown output for GALACTICA, including LaTeX rendering. Custom chat characters. Advanced chat features (send images, get audio responses with TTS)....
    Downloads: 10 This Week
    Last Update:
    See Project
  • 9
    Python Client For NLP Cloud

    Python Client For NLP Cloud

    NLP Cloud serves high performance pre-trained or custom models for NER

    NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, dialogue summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, blog post generation, source code generation, question answering, automatic speech recognition, machine translation, language detection, semantic search,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Lightspeed golf course management software Icon
    Lightspeed golf course management software

    Lightspeed Golf is all-in-one golf course management software to help courses simplify operations, drive revenue and deliver amazing golf experiences.

    From tee sheet management, point of sale and payment processing to marketing, automation, reporting and more—Lightspeed is built for the pro shop, restaurant, back office, beverage cart and beyond.
    Learn More
  • 10
    mistletoe

    mistletoe

    A fast, extensible and spec-compliant Markdown parser in pure Python

    mistletoe is a Markdown parser in pure Python, designed to be fast, spec-compliant and fully customizable. Apart from being the fastest CommonMark-compliant Markdown parser implementation in pure Python, mistletoe also supports easy definitions of custom tokens. Parsing Markdown into an abstract syntax tree also allows us to swap out renderers for different output formats, without touching any of the core components.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Selectolax

    Selectolax

    Python binding to Modest and Lexbor engines

    A fast HTML5 parser with CSS selectors using Modest and Lexbor engines. Selectolax supports two backends: Modest and Lexbor. By default, all examples use the Modest backend. Most of the features between backends are almost identical, but there are still some differences. Currently, the Lexbor backend is in beta and missing some of the features. To use lexbor, just import the parser and use it in the similar way to the HTMLParser.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    Think Python 2

    Think Python 2

    LaTeX source and supporting code for Think Python, 2nd edition

    ThinkPython2 is the repository for the second edition of Allen Downey’s Think Python textbook, which teaches programming fundamentals in Python to beginners. The code includes all of the example programs, exercises, and supplementary files referenced in the book, allowing learners to run the examples, experiment, and extend them. The repository contains clean, well-commented Python scripts that are easy to follow and map directly to chapters of the text, covering topics like variables, control flow, functions, recursion, data structures (lists, dictionaries), classes and objects, file I/O, and algorithmic thinking. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Bot Framework SDK for Python

    Bot Framework SDK for Python

    Build and connect intelligent bots that interact naturally

    This repository contains code for the Python version of the Microsoft Bot Framework SDK, which is part of the Microsoft Bot Framework - a comprehensive framework for building enterprise-grade conversational AI experiences. This SDK enables developers to model conversation and build sophisticated bot applications using Python. SDKs for JavaScript and .NET are also available.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    MeloTTS

    MeloTTS

    High-quality multi-lingual text-to-speech library by MyShell.ai

    MeloTTS is an open-source text-to-speech (TTS) system that generates natural-sounding speech from text input. It utilizes advanced machine-learning models to produce high-quality audio outputs.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 16
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 17
    GDScript Toolkit

    GDScript Toolkit

    Independent set of GDScript tools - parser, linter and formatter

    Independent set of GDScript tools, parser, linter and formatter. This project provides a set of tools for daily work with GDScript. At the moment it provides a parser that produces a parse tree for debugging and educational purposes. A linter that performs a static analysis according to some predefined configuration. A formatter that formats the code according to some predefined rules. A code metrics calculator which calculates the cyclomatic complexity of functions and classes. To install...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Pix2Text

    Pix2Text

    Open-Source Python3 tool for recognizing layouts, tables, and math

    An Open-Source Python3 tool for recognizing layouts, tables, math formulas, and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality. Pix2Text (P2T) can recognize layouts, tables, images, text, and mathematical formulas, and integrate all of these contents into Markdown format. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 19
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 121 This Week
    Last Update:
    See Project
  • 20
    JC

    JC

    CLI tool and python library

    ...The JC parsers can also be used as python modules. In this case, the output will be a python dictionary, or a list of dictionaries, instead of JSON. Two representations of the data are available. The default representation uses a strict schema per parser and converts known numbers to int/float JSON values. Certain known values of None are converted to JSON null, known boolean values are converted, and, in some cases, additional semantic context fields are added.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    novelWriter

    novelWriter

    Open source plain text editor designed for writing novels

    A markdown-like text editor designed for writing novels and larger projects of many smaller plain text documents. It is designed to be a simple text editor that allows for easy organization of text files and notes, with a metadata syntax for comments, synopsis, and cross-referencing between files, and built on plain text files for robustness. The project storage is suitable for version control software, and also well suited for file synchronisation tools. All text is saved as plain text...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    pywinauto

    pywinauto

    Windows GUI Automation with Python (based on text properties)

    pywinauto is a set of Python modules to automate the Microsoft Windows GUI. At its simplest it allows you to send mouse and keyboard actions to Windows dialogs and controls, but it has support for more complex actions like getting text data.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    ART ASCII Library

    ART ASCII Library

    ASCII art library for Python

    ASCII art is also known as "computer text art". It involves the smart placement of typed special characters or letters to make a visual shape that is spread over multiple lines of text. ART is a Python lib for text converting to ASCII art fancy.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    MegaParse

    MegaParse

    File Parser optimised for LLM Ingestion with no loss

    MegaParse is a file parser optimized for Large Language Model (LLM) ingestion, ensuring no loss of information. It efficiently parses various document formats, such as PDFs, DOCX, and PPTX, converting them into formats ideal for processing by LLMs. This tool is essential for applications that require accurate and comprehensive data extraction from diverse document types.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    FastAPI

    FastAPI

    FastAPI framework, high performance, easy to learn, fast to code

    FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. Great editor support. Completion everywhere. Less time debugging. Designed to be easy to use and learn. Less time reading docs. Minimize code duplication. Multiple features from each parameter declaration. Fewer bugs. Get production-ready code. With automatic interactive documentation. Based on (and fully compatible with) the open standards for APIs: OpenAPI...
    Downloads: 43 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next