Showing 18 open source projects for "python text"

View related business solutions
  • Our Free Plans just got better! | Auth0 by Okta Icon
    Our Free Plans just got better! | Auth0 by Okta

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.
    Try free now
  • Never Get Blocked Again | Enterprise Web Scraping Icon
    Never Get Blocked Again | Enterprise Web Scraping

    Enterprise-Grade Proxies • Built-in IP Rotation • 195 Countries • 20K+ Companies Trust Us

    Get unrestricted access to public web data with our ethically-sourced proxy network. Automated session management and advanced unblocking handle the hard parts. Scale from 1 to 1M requests with zero blocks. Built for developers with ready-to-use APIs, serverless functions, and complete documentation. Used by 20,000+ companies including Fortune 500s. SOC2 and GDPR compliant.
    Get Started
  • 1

    Tokenized Text Aligner

    Aligns tokens in two versions of a text with differing tokenization.

    This tool performs token-by-token alignment of two versions of a text with differing tokenization by interpreting the results of a file diff (https://docs.python.org/3/library/difflib.html). It is intended for use in the preparation of annotated linguistic corpora, where differences in tokenization may arise (i) following corrections or modifications to the source text or (ii) through the creation of different layers of annotation (part-of-speech, treebank) requiring different tokenization...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    WordCount

    WordCount

    Count frequency of single, 2-word and 3-word clusters in a text

    The program can read a text file and count the occurrences of single words and clusters of 2 and 3 words. The resulting list will be sorted in descending order (highest frequency on top).
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3

    MITRE Annotation Toolkit

    A toolkit for managing and manipulating text annotations

    The MITRE Annotation Toolkit (MAT) is a suite of tools which can be used for automated and human tagging of annotations. Annotation is a process, used mostly by researchers in natural language processing, of enhancing documents with information about the various phrase types the documents contain. MAT supports both UI interaction and command-line interaction, and provides various levels of control over the overall annotation process. It can be customized for specific tasks (e.g.,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4

    Safe Harbor Deidentification

    Safe Harbor Deidentification for medical documents

    Phalanx - Deidentify Safe Harbor Deidentification Mode of Phalanx is an abridged pipeline of NLP annotators culminating in NER annotators which write output of text offsets. It uses the Safe Harbor deidentification method.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Payroll Services for Small Businesses | QuickBooks Icon
    Payroll Services for Small Businesses | QuickBooks

    Save up to 50% on QuickBooks Online! Keep the Accounting and Book Keeping for your Small Business up to date!

    Easily pay your team and access powerful tools, employee benefits, and supportive experts with the #1 online payroll service provider. Manage payroll and access HR and employee services in one place. Pay your team automatically once your payroll setup is complete. We'll calculate, file, and pay your payroll taxes automatically.
    Learn More
  • 5

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on...
    Leader badge
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6

    Presage

    the intelligent predictive text entry platform

    Presage (formerly Soothsayer) is an intelligent predictive text entry system. Presage generates predictions by modelling natural language as a combination of redundant information sources. Presage computes probabilities for words which are most likely to be entered next by merging predictions generated by the different predictive algorithms. Presage's modular and extensible architecture allows its language model to be extended and customized to utilize statistical, syntactic, and semantic...
    Leader badge
    Downloads: 36 This Week
    Last Update:
    See Project
  • 7

    dadosSemiotica

    Collecter and manager of semiotica annalisis data

    This program is a web application to collect and organize data of text analysis. It works with sets of texts and the analysis are done on portions of the length of a sentence. One of the preprocessing modules is based on CoGroo (A LibreOffice & OpenOffice.org Portuguese Grammar Checker).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    TEES

    Turku Event Extraction System

    Turku Event Extraction System (TEES) is a free and open source natural language processing system developed for the extraction of events and relations from biomedical text. It is written mostly in Python, and should work in generic Unix/Linux environments. Currently, the TEES source code repository still remains on GitHub at http://jbjorne.github.com/TEES/ where there is also a wiki with more information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Manage printing in a cost-efficient and eco-friendly way with Gelato. Icon
    Manage printing in a cost-efficient and eco-friendly way with Gelato.

    Gelato offers an extensive catalog of custom products, a zero-inventory business model, and free designing tools—all in one place.

    The world's largest print on demand network with 140+ production partners across 32 countries. Gelato offers end-to-end design, production and logistics for individuals looking to start their own business today!
    Sign up for Free
  • 10
    Part-of-speech tagging is the task of assigning symbols from a particular set to words in a natural language text. ACOPOST implements and extends well-known machine learning techniques and provides a uniform environment for testing.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11

    mwetoolkit

    THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/

    ... to virtually any text collection, language, and MWE type. It is a command-line tool written mostly in Python. Its development started in 2010 as a PhD thesis but the project keeps active (see the SVN logs). Up-to-date documentation and details about the tool can be found on the mwetoolkit website: http://mwetoolkit.sourceforge.net/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Mishkal: Arabic Text Vocalization

    Mishkal: Arabic Text Vocalization

    Arabic Text Vocalization system

    Automatic system of vocalization of arabic text.
    Leader badge
    Downloads: 108 This Week
    Last Update:
    See Project
  • 13

    Language Constructor

    Complete tool for constructing/manipulating languages in digital form

    With this tool you can easily design a new language, digitize an existing one or incrementally reconstruct an ancient language. It allows for free experimentation of all aspects of the language, so it does not have to be made consistent on paper first. You can edit script, syntax, grammar, morphology, lexicon and phonology, as well as write documents in the language, as it might be too complex to be handled by current font technology. The information is stored in xml format for easy...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    t2t-pipe

    automatic alignment pipeline for parallel treebanks

    The *Tree-to-Tree (t2t) Alignment Pipe* is a collection of python scripts, co-ordinating the process of automatic alignment of parallel treebanks from plain text files with a single call from a unix command line. Supported Languages: DE, FR, EN
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Redundancy due to cut-paste operations in text creates bias in machine learning for NLP. This module takes a directory and produces a subset of the files in that directory (in a list) with an upper bound on similarity between two files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Little Cohesion Helper (LCH), alias TraglWeck, semi-automates the annotation of lexical-cohesion in a given text. Input is a raw text file and this software generates a bunch of XML files which can be used with MMAX2.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    This tool translates the text of the selected language to the language of your choice in the status bar and add the translated text in a note. Esta ferramenta traduz o texto do idioma selecionado para o idioma de sua escolha na barra de status..
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Pylero
    Pylero is an open-source Python-based text generator.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next