batch text processing free download

pyVideoTrans

Translate the video from one language to another and embed dubbing

...The tool supports both command-line and GUI modes, making it accessible to developers and creatives needing batch or automated processing.

Downloads: 10 This Week

Last Update: 2026-02-17

See Project

...Furthermore, dynamic interactive documents can be useful for presenting complicated interdependencies to the reader more clearly, far beyond conventional paper publication. The mulitNotes text architecture and processing pipeline is based on d2d and standard technologies (XSLT, ECMAScript. LilyPond, PostScript, etc.) and addresses these issues. An overview about the software architecture and its operation is given in: Journal of the Text Encoding Initiative, Open Issue 18/2024: "Using d2d for Writing XML --- The multiNotes Text Architecture for Musical Analysis" https://doi.org/10.4000/132ex

Downloads: 0 This Week

Last Update: 2025-02-04

See Project

TXM

Unicode XML TEI text analysis platform

TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...

Downloads: 13 This Week

Last Update: 2024-12-09

See Project

modnlp-plugins

External plugins for modnlp/teccli

This is a general project for modnlp/teccli plugins, with focus on text visualizaton.

Downloads: 0 This Week

Last Update: 2023-05-06

See Project

MITRE Annotation Toolkit

A toolkit for managing and manipulating text annotations

The MITRE Annotation Toolkit (MAT) is a suite of tools which can be used for automated and human tagging of annotations. Annotation is a process, used mostly by researchers in natural language processing, of enhancing documents with information about the various phrase types the documents contain. MAT supports both UI interaction and command-line interaction, and provides various levels of control over the overall annotation process. It can be customized for specific tasks (e.g.,...

Downloads: 0 This Week

Last Update: 2023-04-19

See Project

Live Transcribe Speech Engine

Live Transcribe is an Android application

Live Transcribe Speech Engine provides on-device speech recognition components that power real-time transcription for accessibility and everyday voice-first experiences. Its design prioritizes latency and robustness in noisy, far-field environments, enabling continuous transcription with low delay on mobile hardware. The engine manages audio front-end processing—such as noise suppression and voice activity detection—before feeding audio into compact, accurate acoustic and language models....

Downloads: 0 This Week

Last Update: 2025-10-10

See Project

Leseratte

Leseratte is a Java parser for German written language. Currently, it contains a German lexicon (based on the Wiktionary), inflexion rules, a grammar and a parser. (Semantics component planned.) Usable as a Java library, also provides a graphical UI.

Downloads: 0 This Week

Last Update: 2020-10-03

See Project

KSUCCA Corpus

A 50 million tokens corpus of Classical Arabic.

King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words. However, it can be used for other research purposes, such...

Downloads: 13 This Week

Last Update: 2020-02-19

See Project

TIES

A smart search engine for medical documents

TIES (Text Information Extraction System) is a clinical text search engine that uses Natural Language Processing techniques to extract medical concepts from free text clinical reports. It provides secure de-identified access to this information and has in built collaboration tools and honest broker functionality. It is licensed for academic use under the BSD license.

1 Review

Downloads: 0 This Week

Last Update: 2019-09-09

See Project

Safe Harbor Deidentification

Safe Harbor Deidentification for medical documents

Phalanx - Deidentify Safe Harbor Deidentification Mode of Phalanx is an abridged pipeline of NLP annotators culminating in NER annotators which write output of text offsets. It uses the Safe Harbor deidentification method.

Downloads: 0 This Week

Last Update: 2019-09-10

See Project

Arabic Corpus

Text categorization, arabic language processing, language modeling

The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on...

Downloads: 7 This Week

Last Update: 2019-03-05

See Project

Ghawwas_V4

An open source system for Arabic corpora processing

Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character...

1 Review

Downloads: 11 This Week

Last Update: 2018-12-09

See Project

IceNLP

IceNLP is an open source Natural Language Processing (NLP) toolkit for analyzing and processing Icelandic text. The toolkit is implemented in Java.

1 Review

Downloads: 0 This Week

Last Update: 2018-04-13

See Project

TEES

Turku Event Extraction System

Turku Event Extraction System (TEES) is a free and open source natural language processing system developed for the extraction of events and relations from biomedical text. It is written mostly in Python, and should work in generic Unix/Linux environments. Currently, the TEES source code repository still remains on GitHub at http://jbjorne.github.com/TEES/ where there is also a wiki with more information.

Downloads: 0 This Week

Last Update: 2017-05-23

See Project

Welsh Natural Language Toolkit

The project supports the Welsh Language Technology domain with a set of NLP tools that drive innovation and advance the development of sophisticated textual analysis solutions. The WNLT project delivers four core NLP modules; a) Word Segmentation for separating text into words b) Sentence Boundary Disambiguation for finding sentence boundaries c) Part of Speech Tagger for determining the part of speech of each word d) Morphological Analyser for identifying the root form (lemma) of words....

Downloads: 0 This Week

Last Update: 2017-05-26

See Project

Discriminative Language Editor

Discriminative language editor based on ontologies

Text editor in Java that is able to detect discriminative expressions while the user is typing. When the internal ontology-based analyzer detects a potential discriminative expression the user is advised by underscoring the related words in the text. A descriptive message about the issue is also shown to the user when the cursor is placed over the potential discriminative expression.

Downloads: 0 This Week

Last Update: 2016-10-30

See Project

BioC

We describe a simple XML format to share text documents and annotation

A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. ...

Downloads: 1 This Week

Last Update: 2016-08-08

See Project

Welsh Natural Language Toolkit

WNLT is a suite of open source natural language modules for the Welsh

The project supports the Welsh Language Technology domain with a set of NLP tools that drive innovation and advance the development of sophisticated textual analysis solutions. The WNLT project delivers four core NLP modules; a) Word Segmentation for separating text into words b) Sentence Boundary Disambiguation for finding sentence boundaries c) Part of Speech Tagger for determining the part of speech of each word d) Morphological Analyser for identifying the root form (lemma) of words....

Downloads: 0 This Week

Last Update: 2016-11-29

See Project

bnf2xml

simple BNF parser makes xml markup of matches

bnf2xml a simple BNF parser that takes text as input, searches according to a BNF query file, and outputs text marked up by the xml labels that show context. bnf2xml is as simple to use as any text binary ie, awk(1) grep(1). bnf2xml does not require C API because it outputs simple xml labeling. README is visible on file dl page. EXAMPLE: $ echo "hi" | bnf2xml patternfile <word><alph>h</alph><alph>i</alph></word> or <gas>hydrogen iodide</gas> patternfile says how to find...

Downloads: 0 This Week

Last Update: 2016-04-08

See Project

Morfologik

ATTENTION! Morfologik is now at GitHub: https://github.com/morfologik/

1 Review

Downloads: 0 This Week

Last Update: 2015-09-10

See Project

Virastyar

Virastyar is an spell checker for low-resource languages

Virastyar is a free and open-source (FOSS) spell checker. It stands upon the shoulders of many free/libre/open-source (FLOSS) libraries developed for processing low-resource languages, especially Persian and RTL languages Publications: Kashefi, O., Nasri, M., & Kanani, K. (2010). Towards Automatic Persian Spell Checking. SCICT. Kashefi, O., Sharifi, M., & Minaie, B. (2013). A novel string distance metric for ranking Persian respelling suggestions. Natural Language Engineering,...

14 Reviews

Downloads: 402 This Week

Last Update: 2020-03-05

See Project

JInsect

The JINSECT toolkit is a Java-based toolkit and library that supports and demonstrates the use of n-gram graphs within Natural Language Processing applications, ranging from summarization and summary evaluation to text classiﬁcation and indexing.

3 Reviews

Downloads: 0 This Week

Last Update: 2015-08-25

See Project

ArabicDiacritizer

An automatic restoration of Arabic diacritic marks

This is a software of Arabic diacritical marks restoration. It is based mainly on deep architectures using deep neural network. The algorithm generates diacritized text with determined end case. The algorithm is described in detail in: Ilyes Rebai, and Yassine BenAyed 'Text-to-speech synthesis system with Arabic diacritic recognition system', Computer Speech & Language, 2015. We appreciate it very much if you can cite our related work. ************** Installation...

Downloads: 0 This Week

Last Update: 2014-12-16

See Project

Lingala NLP

This project is devoted to the development of natural language processing tools and resources for the Lingala language, which is spoken by tens of millions of people in central Africa.

Downloads: 0 This Week

Last Update: 2014-11-13

See Project

Khawas

An Arabic Corpora Processing Tool

The new version is available at https://sourceforge.net/projects/ghawwasv4/

Downloads: 3 This Week

Last Update: 2014-08-02

See Project

Search Results for "batch text processing"

Showing 41 open source projects for "batch text processing"

pyVideoTrans

multinotes

TXM

modnlp-plugins

MITRE Annotation Toolkit

Live Transcribe Speech Engine

Leseratte

KSUCCA Corpus

TIES

Safe Harbor Deidentification

Arabic Corpus

Ghawwas_V4

IceNLP

TEES

Welsh Natural Language Toolkit

Discriminative Language Editor

BioC

Welsh Natural Language Toolkit

bnf2xml

Morfologik

Virastyar

JInsect

ArabicDiacritizer

Lingala NLP

Khawas

Search Results for "batch text processing"

Showing 41 open source projects for "batch text processing"

pyVideoTrans

multinotes

TXM

modnlp-plugins

MITRE Annotation Toolkit

Live Transcribe Speech Engine

Leseratte

KSUCCA Corpus

TIES

Safe Harbor Deidentification

Arabic Corpus

Ghawwas_V4

IceNLP

TEES

Welsh Natural Language Toolkit

Discriminative Language Editor

BioC

Welsh Natural Language Toolkit

bnf2xml

Morfologik

Virastyar

JInsect

ArabicDiacritizer

Lingala NLP

Khawas

Related Searches

Related Categories