Showing 16 open source projects for "python text parser"

View related business solutions
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    nanoGPT

    nanoGPT

    The simplest, fastest repository for training/finetuning models

    NanoGPT is a minimalistic yet powerful reimplementation of GPT-style transformers created by Andrej Karpathy for educational and research use. It distills the GPT architecture into a few hundred lines of Python code, making it far easier to understand than large, production-scale implementations. The repo is organized with a training pipeline (dataset preprocessing, model definition, optimizer, training loop) and inference script so you can train a small GPT on text datasets like Shakespeare or custom corpora. It emphasizes readability and clarity: the training loop is cleanly written, and the code avoids heavy abstractions, letting students follow the architecture step by step. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    Concordia

    Concordia

    Crowdsourcing platform for full text transcription and tagging

    Concordia is a platform for crowdsourcing transcription and tagging of text in digitized images. It was developed by the Library of Congress so that volunteers of all backgrounds could transcribe and tag digitized images of manuscripts and typed materials from the Library’s collections that could not otherwise be done by optical character recognition.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Stanza

    Stanza

    Stanford NLP Python library for many human languages

    Stanza is a collection of accurate and efficient tools for the linguistic analysis of many human languages. Starting from raw text to syntactic analysis and entity recognition, Stanza brings state-of-the-art NLP models to languages of your choosing. Stanza is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of speech and morphological features, to give a syntactic structure dependency parse, and to recognize named entities. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Libro

    Libro

    An interactive program for statistical analysis of texts

    A cross-platform text analysis program written in Python and Free Pascal/Lazarus which scans a whole text file (in plain text, HTML, EPUB, or ODT formats) and ranks all used words according to frequency, performing a quantitative analysis of the text using Shannon-Weaver information statistic and Zipf power law function. It counts words, sentences, chars, spaces, and syllables.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Level Up Your Cyber Defense with External Threat Management Icon
    Level Up Your Cyber Defense with External Threat Management

    See every risk before it hits. From exposed data to dark web chatter. All in one unified view.

    Move beyond alerts. Gain full visibility, context, and control over your external attack surface to stay ahead of every threat.
    Try for Free
  • 5
    Zettel

    Zettel

    Zettel allows taking notes from several references and organizing them

    Zettel is a program for taking notes from bibliographic references. Instead of marking the text on paper and then going crazy looking for where the copy ended up, the notes are saved in a database, linked to the reference from where they were copied. Notes can be tagged and retrieved in several ways. Zettel é um programa para fichamento de referências bibliográficas. Ao invés de marcar o texto em papel e depois enlouquecer procurando onde a cópia foi parar, as notas são guardadas numa...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    GoodByeCatpcha

    GoodByeCatpcha

    Solver ReCaptcha v2 Free

    An async Python library to automate solving ReCAPTCHA v2 by images/audio using Mozilla's DeepSpeech, PocketSphinx, Microsoft Azure’s, Google Speech and Amazon's Transcribe Speech-to-Text API. Also image recognition to detect the object suggested in the captcha. Built with Pyppeteer for Chrome automation framework and similarities to Puppeteer, PyDub for easily converting MP3 files into WAV, aiohttp for async minimalistic web-server, and Python’s built-in AsyncIO for convenience.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    AngelReader

    AngelReader

    An E-book, Audio-book, & Library Loader in One Application

    AngelReader: A minimalist but powerful GUI application that has the capacity to load [1] E-books in plain text format with the least use of both software and hardware resources. It can also load [2] Audio-books with the basic functions of play, stop, pause, and resume with the same minimalist economy that doesn't hog computer resources. When used in integration with the AngelReader Library Selector, it can function as a mini library management system for books in electronic formats. It's in...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Manifest Maker

    Manifest Maker

    NO LONGER MAINTAINED

    NO LONGER MAINTAINED, NO LONGER SUPPORTED Manifest Maker is a graphical Python application which takes a file or group of files and creates a plain text manifest list of each item. The manifest includes the file name (including directory structure) as well as a checksum of the file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    isbntools

    A command line tool to extract, transform and get metadata for ISBNs

    As of 2015-06-02, this project is no longer under active development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Cloud-based help desk software with ServoDesk Icon
    Cloud-based help desk software with ServoDesk

    Full access to Enterprise features. No credit card required.

    What if You Could Automate 90% of Your Repetitive Tasks in Under 30 Days? At ServoDesk, we help businesses like yours automate operations with AI, allowing you to cut service times in half and increase productivity by 25% - without hiring more staff.
    Try ServoDesk for free
  • 10
    Newspaper3k

    Newspaper3k

    News, full-text, and article metadata extraction in Python 3

    Inspired by requests for its simplicity and powered by lxml for its speed. Newspaper is an amazing python library for extracting & curating articles. Newspaper delivers Instapaper style article extraction. Newspaper is a Python3 library! If you are certain that an entire news source is in one language, go ahead and use the same api. Works in 10+ languages, English, Chinese, German, Arabic, and more! On python3 you must install newspaper3k, not newspaper. newspaper is our python2 library....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    Language Constructor

    Complete tool for constructing/manipulating languages in digital form

    With this tool you can easily design a new language, digitize an existing one or incrementally reconstruct an ancient language. It allows for free experimentation of all aspects of the language, so it does not have to be made consistent on paper first. You can edit script, syntax, grammar, morphology, lexicon and phonology, as well as write documents in the language, as it might be too complex to be handled by current font technology. The information is stored in xml format for easy...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    MarcXimiL is a flexible multi-platform bibliographic similarity analysis framework. Features: deduplication, information monitoring, visual analysis, plagiarism detection. Supported: MARCXML, OAI-PMH2 harvesting, and importation of text MARC.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    note taking simplified
    *nts* provides a simple format for using text files to store notes, a command line interface for viewing notes in a variety of convenient ways and a cross-platform, wx(python)-based GUI for creating and modifying notes as well as viewing them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Python module for reading and writing MARC records in both transport (z39.2) and plain-text mnemonic formats. Also includes simple command-line tools for translation between these formats.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    GutenPy is a comfortable text reader and catalog browser for Project Gutenberg. It features handy bookmarking, word definition lookups, and powerful catalog browser that uses regular expression filtering.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    open-tamil

    Tamil Tools, Tamil Library for Python 2, 3

    Open-Tamil is a full featured Tamil text processing library in Python. It works fully in Python 2, 3. Published via pip - python package index. See: https://pypi.python.org/pypi/Open-Tamil/0.67
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next