Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Scientific/Engineering
Linguistics Software
Search Results

Search Results for "python text parser"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 18
Windows 14
Mac 9
More...
BSD 7
Desktop Operating Systems 4
ChromeOS 2
Mobile Operating Systems 1

Category

Scientific/Engineering 22
Artificial Intelligence 11
Text Editors 4
Education 3
Software Development 2
Desktop Environment 1
Internet 1
Multimedia 1
Religion and Philosophy 1
Social sciences 1
System 1

License

OSI-Approved Open Source 20
Public Domain 1

Translations

English 3
Arabic 2
Brazilian Portuguese 1
Dutch 1
More...
French 1
Portuguese 1
Spanish 1

Programming Language

Python 22
C++ 4
C 3
JavaScript 3
Perl 3
More...
C# 1
Java 1
PHP 1
R 1
Ruby 1
Unix Shell 1

Status

Beta 8
Alpha 5
Pre-Alpha 3
Production/Stable 3
More...
Planning 2

Showing 22 open source projects for "python text parser"

View related business solutions

Linguistics Python Clear Filters & Widen Search

Keep company data safe with Chrome Enterprise
Protect your business with AI policies and data loss prevention in the browser

Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.

Download Chrome
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

Tokenized Text Aligner

Aligns tokens in two versions of a text with differing tokenization.

This tool performs token-by-token alignment of two versions of a text with differing tokenization by interpreting the results of a file diff (https://docs.python.org/3/library/difflib.html). It is intended for use in the preparation of annotated linguistic corpora, where differences in tokenization may arise (i) following corrections or modifications to the source text or (ii) through the creation of different layers of annotation (part-of-speech, treebank) requiring different tokenization....

Downloads: 0 This Week

Last Update: 2024-07-31
See Project
2

WordCount

Count frequency of single, 2-word and 3-word clusters in a text

The program can read a text file and count the occurrences of single words and clusters of 2 and 3 words. The resulting list will be sorted in descending order (highest frequency on top).

Downloads: 1 This Week

Last Update: 2025-02-01
See Project
3

Color to Word

Turn colors into words

The program will turn a color into a list of 10 words, obtained according to a custom designed algorithm based on letter shape and position in the alphabet. - Click inside the frame on the left to pick a color through the color chooser window - The program will match the color with the colors corresponding to a list of all the English words contained in the file wordcolor.txt - The first 10 matches will appear in the frame on the right - Right-click - Copy to copy the word...

Downloads: 1 This Week

Last Update: 2024-09-27
See Project
4

MITRE Annotation Toolkit

A toolkit for managing and manipulating text annotations

The MITRE Annotation Toolkit (MAT) is a suite of tools which can be used for automated and human tagging of annotations. Annotation is a process, used mostly by researchers in natural language processing, of enhancing documents with information about the various phrase types the documents contain. MAT supports both UI interaction and command-line interaction, and provides various levels of control over the overall annotation process. It can be customized for specific tasks (e.g.,...

Downloads: 0 This Week

Last Update: 2023-04-19
See Project
Payments you can rely on to run smarter.
Never miss a sale. Square payment processing serves customers better with tools and integrations that make work more efficient.

Accept payments at your counter or on the go. It’s easy to get started. Try the Square POS app on your phone or pick from a range of hardworking hardware.

Learn More
5

yabasta

Yet Another BAsic Scraper and Text Analysis

YA BASTA! is a Python/R application for Lyrics Web Scraper and Text Analysis. Web scraping is developed in Python, text analysis in R as Python subprocesses. YA BASTA! is only tested on windows OS. To run YA BASTA! just type on window command prompt: python.exe yabasta.py

Downloads: 0 This Week

Last Update: 2020-11-27
See Project
6

Safe Harbor Deidentification

Safe Harbor Deidentification for medical documents

Phalanx - Deidentify Safe Harbor Deidentification Mode of Phalanx is an abridged pipeline of NLP annotators culminating in NER annotators which write output of text offsets. It uses the Safe Harbor deidentification method.

Downloads: 1 This Week

Last Update: 2019-09-10
See Project
7

Arabic Corpus

Text categorization, arabic language processing, language modeling

The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on...

Downloads: 9 This Week

Last Update: 2019-03-05
See Project
8

Presage

the intelligent predictive text entry platform

Presage (formerly Soothsayer) is an intelligent predictive text entry system. Presage generates predictions by modelling natural language as a combination of redundant information sources. Presage computes probabilities for words which are most likely to be entered next by merging predictions generated by the different predictive algorithms. Presage's modular and extensible architecture allows its language model to be extended and customized to utilize statistical, syntactic, and semantic...

3 Reviews

Downloads: 301 This Week

Last Update: 2018-10-11
See Project
9

dadosSemiotica

Collecter and manager of semiotica annalisis data

This program is a web application to collect and organize data of text analysis. It works with sets of texts and the analysis are done on portions of the length of a sentence. One of the preprocessing modules is based on CoGroo (A LibreOffice & OpenOffice.org Portuguese Grammar Checker).

Downloads: 0 This Week

Last Update: 2018-11-01
See Project
Total Network Visibility for Network Engineers and IT Managers
Network monitoring and troubleshooting is hard. TotalView makes it easy.

This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.

Learn More
10

TEES

Turku Event Extraction System

Turku Event Extraction System (TEES) is a free and open source natural language processing system developed for the extraction of events and relations from biomedical text. It is written mostly in Python, and should work in generic Unix/Linux environments. Currently, the TEES source code repository still remains on GitHub at http://jbjorne.github.com/TEES/ where there is also a wiki with more information.

Downloads: 0 This Week

Last Update: 2017-05-23
See Project
11

BioC

We describe a simple XML format to share text documents and annotation

A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are...

Downloads: 4 This Week

Last Update: 2016-08-08
See Project
12

ACOPOST - a collection of POS taggers

Part-of-speech tagging is the task of assigning symbols from a particular set to words in a natural language text. ACOPOST implements and extends well-known machine learning techniques and provides a uniform environment for testing.

1 Review

Downloads: 0 This Week

Last Update: 2016-02-26
See Project
13

mwetoolkit

THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/

THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/ The Multiword Expressions toolkit aids in the automatic identification and extraction of multiword units in running text. These include idioms (kick the bucket), noun compounds (cable car), phrasal verbs (take off, give up), etc. Even though it focuses on multiword expresisons, the framework is quite complete and can also be useful in any corpus-based study in computational linguistics. The mwetoolkit can be applied to virtually any text collection, language, and MWE type. ...

1 Review

Downloads: 1 This Week

Last Update: 2019-05-01
See Project
14

AsiEs

AsiEs stands for Asistente de Escritura (writing assistant). It provides word prediction and autocomplete for fast writing. Thought for people with difficulties writing on keyboard, improves the writing speed preventing the user from pressing at most 50% of keys to write and avoids ortographic errors. Made by Fundación Teletón Uruguay (http://www.teleton.org.uy/home/)

Downloads: 0 This Week

Last Update: 2015-06-17
See Project
15

Mishkal: Arabic Text Vocalization

Arabic Text Vocalization system

Automatic system of vocalization of arabic text.

5 Reviews

Downloads: 136 This Week

Last Update: 2017-10-29
See Project
16

RusMorph

Russian morphology tagger. Parses text(s) and output xml representation of text(s) with grammatical annotation.

Downloads: 0 This Week

Last Update: 2016-10-04
See Project
17

t2t-pipe

automatic alignment pipeline for parallel treebanks

The *Tree-to-Tree (t2t) Alignment Pipe* is a collection of python scripts, co-ordinating the process of automatic alignment of parallel treebanks from plain text files with a single call from a unix command line. Supported Languages: DE, FR, EN

Downloads: 0 This Week

Last Update: 2014-01-07
See Project
18

Language Constructor

Complete tool for constructing/manipulating languages in digital form

With this tool you can easily design a new language, digitize an existing one or incrementally reconstruct an ancient language. It allows for free experimentation of all aspects of the language, so it does not have to be made consistent on paper first. You can edit script, syntax, grammar, morphology, lexicon and phonology, as well as write documents in the language, as it might be too complex to be handled by current font technology. The information is stored in xml format for easy...

Downloads: 0 This Week

Last Update: 2013-12-19
See Project
19

Corpus redundancy manager

Redundancy due to cut-paste operations in text creates bias in machine learning for NLP. This module takes a directory and produces a subset of the files in that directory (in a list) with an upper bound on similarity between two files.

Downloads: 0 This Week

Last Update: 2014-06-30
See Project
20

Little Cohesion Helper

Little Cohesion Helper (LCH), alias TraglWeck, semi-automates the annotation of lexical-cohesion in a given text. Input is a raw text file and this software generates a bunch of XML files which can be used with MMAX2.

Downloads: 0 This Week

Last Update: 2013-04-11
See Project
21

TradutorOOoNote

This tool translates the text of the selected language to the language of your choice in the status bar and add the translated text in a note. Esta ferramenta traduz o texto do idioma selecionado para o idioma de sua escolha na barra de status..

Downloads: 0 This Week

Last Update: 2013-04-09
See Project
22

Pylero

Pylero is an open-source Python-based text generator.

Downloads: 0 This Week

Last Update: 2014-12-26
See Project

Previous
You're on page 1
Next

Related Searches

annotation

mishkal-desktop

word count

medical diagnosis system

arabic corpus

predictive text

php grammar checker

pos

corpus

word prediction

Related Categories

Scientific/Engineering

Artificial Intelligence

Text Editors

Education

Software Development

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2025 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

×

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: