Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Scientific/Engineering Software
Search Results

Search Results for "text processing" - Page 2

x

Sort By:

Relevance

Clear All Filters

OS

Linux 144
Windows 121
Mac 105
More...
BSD 94
ChromeOS 62
Desktop Operating Systems 14
Mobile Operating Systems 4
Server Operating Systems 3
Game Consoles 1

Category

Scientific/Engineering 144
Text Editors 76
Artificial Intelligence 50
Software Development 23
Business 19
Education 16
System 10
Multimedia 8
Internet 7
Database 6
Formats and Protocols 6
Printing 2
Religion and Philosophy 2
Security 2
Social sciences 2
Communications 1
Games 1

License

OSI-Approved Open Source 131
Public Domain 6
Creative Commons Attribution License 3
Other License 3

Translations

Programming Language

Status

Production/Stable 53
Beta 44
Alpha 24
Planning 7
More...
Pre-Alpha 5
Mature 3
Inactive 2

Showing 144 open source projects for "text processing"

View related business solutions

Scientific/Engineering Linux Clear Filters & Widen Search

Auth0 B2B Essentials: SSO, MFA, and RBAC Built In
Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.

Sign Up Free
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
1

Leseratte

Leseratte is a Java parser for German written language. Currently, it contains a German lexicon (based on the Wiktionary), inflexion rules, a grammar and a parser. (Semantics component planned.) Usable as a Java library, also provides a graphical UI.

Downloads: 0 This Week

Last Update: 2020-10-03
See Project
2

KSUCCA Corpus

A 50 million tokens corpus of Classical Arabic.

King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words. However, it can be used for other research purposes, such...

Downloads: 3 This Week

Last Update: 2020-02-19
See Project
3

AhoTTS Multilingual, a Multilingual TTS

Text-to-Speech TTS for Basque, Spanish, Catalan, Galician and English

Text-to-Speech conversor for Basque, Spanish, Catalan, Galician and English. It includes linguistic processing and built voices for all the languages aforementioned. Its acoustic engine is based on hts_engine and it uses a high quality vocoder called AhoCoder. Developed by Aholab Signal Processing Laboratory: https://aholab.ehu.es/aholab/ http://aholab.ehu.es/ahocoder/

1 Review

Downloads: 0 This Week

Last Update: 2019-11-29
See Project
4

TIES

A smart search engine for medical documents

TIES (Text Information Extraction System) is a clinical text search engine that uses Natural Language Processing techniques to extract medical concepts from free text clinical reports. It provides secure de-identified access to this information and has in built collaboration tools and honest broker functionality. It is licensed for academic use under the BSD license.

1 Review

Downloads: 0 This Week

Last Update: 2019-09-09
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
5

Safe Harbor Deidentification

Safe Harbor Deidentification for medical documents

Phalanx - Deidentify Safe Harbor Deidentification Mode of Phalanx is an abridged pipeline of NLP annotators culminating in NER annotators which write output of text offsets. It uses the Safe Harbor deidentification method.

Downloads: 0 This Week

Last Update: 2019-09-10
See Project
6

SmGen

Verilog Finite State Machine (FSM) Code Generator

SmGen is a finite state machine (FSM) generator for Verilog. On the other hand, it is not an FSM entry tool. The input is behavioral Verilog with clock boundaries specifically set by the designer. SmGen unrolls this behavioral code and generates an FSM from it in synthesizable Verilog. Clock boundaries are explicitly provided by the designer so there is good control on the expected timing

Downloads: 0 This Week

Last Update: 2019-04-25
See Project
7

Arabic Corpus

Text categorization, arabic language processing, language modeling

The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on...

Downloads: 9 This Week

Last Update: 2019-03-05
See Project
8

Ghawwas_V4

An open source system for Arabic corpora processing

Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character...

1 Review

Downloads: 4 This Week

Last Update: 2018-12-09
See Project
9

Mavscript

Calculations in a text document

Mavscript allows the user to do calculations in a text document. Plain text, LaTeX and OpenOffice Writer files (.odt) are supported. The calculation is done by the algebra system Yacas (default), Jasymca or by the Java interpreter BeanShell.

Downloads: 0 This Week

Last Update: 2018-11-05
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

lottie vectors

Create, display and process 2D vectors in a 3D window.

Lottie Vectors is an application for Matlab that alows you to do some pretty neat things -with vectors. More exactly -displaying them in ways that hopefully will allow you to explore and better understand your vector data. The basic idea is simple, take a vector defined in one of a few different types of data formats and map it on the screen. Add another vector and you start to form a 'route'. Each route or position vector can be accompanied with a 'force' vector. This can be used to...

Downloads: 2 This Week

Last Update: 2018-11-12
See Project
11

IceNLP

IceNLP is an open source Natural Language Processing (NLP) toolkit for analyzing and processing Icelandic text. The toolkit is implemented in Java.

1 Review

Downloads: 0 This Week

Last Update: 2018-04-13
See Project
12

High-Throughput Tabular Data Processor

...Citation: Madanecki P, Bałut M, Buckley PG, Ochocka JR, Bartoszewski R, Crossman DK, et al. (2018) High-Throughput Tabular Data Processor – Platform independent graphical tool for processing large data sets. PLoS ONE 13(2): e0192858. https://doi.org/10.1371/journal.pone.0192858

Downloads: 0 This Week

Last Update: 2018-04-24
See Project
13

CapsimTMK

Capsim(r) C Text Mode Kernel(TMK),DSP and communication blocks, topologies, libraries and tools for the development of high performance block diagram digital signal processing and communications systems,built in interpreter for scripting.SystemC support.

Downloads: 1 This Week

Last Update: 2018-02-16
See Project
14

PSIworx

Data processing for the PSI fluorometer

PSIworxR (R) and PSIworx (MATLAB) are a collection of functions and scripts to analyze data from the PSI SuperHead Fast Fluorometer series (www.psi.cz). These fluorometers are used in limnology and oceanography research communities. The program retrieves parameters from single turnover induction and relaxation. Results are returned as text files and PDF figures.

Downloads: 0 This Week

Last Update: 2017-08-23
See Project
15

TEES

Turku Event Extraction System

Turku Event Extraction System (TEES) is a free and open source natural language processing system developed for the extraction of events and relations from biomedical text. It is written mostly in Python, and should work in generic Unix/Linux environments. Currently, the TEES source code repository still remains on GitHub at http://jbjorne.github.com/TEES/ where there is also a wiki with more information.

Downloads: 1 This Week

Last Update: 2017-05-23
See Project
16

Welsh Natural Language Toolkit

The project supports the Welsh Language Technology domain with a set of NLP tools that drive innovation and advance the development of sophisticated textual analysis solutions. The WNLT project delivers four core NLP modules; a) Word Segmentation for separating text into words b) Sentence Boundary Disambiguation for finding sentence boundaries c) Part of Speech Tagger for determining the part of speech of each word d) Morphological Analyser for identifying the root form (lemma) of words....

Downloads: 0 This Week

Last Update: 2017-05-26
See Project
17

Discriminative Language Editor

Discriminative language editor based on ontologies

Text editor in Java that is able to detect discriminative expressions while the user is typing. When the internal ontology-based analyzer detects a potential discriminative expression the user is advised by underscoring the related words in the text. A descriptive message about the issue is also shown to the user when the cursor is placed over the potential discriminative expression.

Downloads: 0 This Week

Last Update: 2016-10-30
See Project
18

BioC

We describe a simple XML format to share text documents and annotation

A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. ...

Downloads: 16 This Week

Last Update: 2016-08-08
See Project
19

Visualization of Protein-Ligand Graphs

Compute protein graphs. Moved to https://github.com/MolBIFFM/PTGLtools

NOTE: Project moved to https://github.com/MolBIFFM/PTGLtools. The Visualization of Protein-Ligand Graphs (VPLG) software package computes and visualizes protein graphs. It works on the super-secondary structure level and uses the atom coordinates from PDB files and the SSE assignments of the DSSP algorithm. VPLG is command line software. If you do not like typing commands, try our PTGL web server: http://ptgl.uni-frankfurt.de/

Downloads: 0 This Week

Last Update: 2024-03-20
See Project
20

Welsh Natural Language Toolkit

WNLT is a suite of open source natural language modules for the Welsh

The project supports the Welsh Language Technology domain with a set of NLP tools that drive innovation and advance the development of sophisticated textual analysis solutions. The WNLT project delivers four core NLP modules; a) Word Segmentation for separating text into words b) Sentence Boundary Disambiguation for finding sentence boundaries c) Part of Speech Tagger for determining the part of speech of each word d) Morphological Analyser for identifying the root form (lemma) of words....

Downloads: 0 This Week

Last Update: 2016-11-29
See Project
21

PLP

Powerfull pre-processor

Powerful Verilog Preprocessor. PLP stands for Perl Pre-processor. Perl is used as "control language" that is embedded in the Verilog code (or any other code) to generate code on the fly. It is used commonly as a Verilog pre-processor but can be used with any target/output language (C, C++, Java, VHDL, plain text etc)

Downloads: 0 This Week

Last Update: 2016-03-10
See Project
22

bnf2xml

simple BNF parser makes xml markup of matches

bnf2xml a simple BNF parser that takes text as input, searches according to a BNF query file, and outputs text marked up by the xml labels that show context. bnf2xml is as simple to use as any text binary ie, awk(1) grep(1). bnf2xml does not require C API because it outputs simple xml labeling. README is visible on file dl page. EXAMPLE: $ echo "hi" | bnf2xml patternfile <word><alph>h</alph><alph>i</alph></word> or <gas>hydrogen iodide</gas> patternfile says how to find...

Downloads: 0 This Week

Last Update: 2016-04-08
See Project
23

Morfologik

ATTENTION! Morfologik is now at GitHub: https://github.com/morfologik/

1 Review

Downloads: 0 This Week

Last Update: 2015-09-10
See Project
24

Virastyar

Virastyar is an spell checker for low-resource languages

Virastyar is a free and open-source (FOSS) spell checker. It stands upon the shoulders of many free/libre/open-source (FLOSS) libraries developed for processing low-resource languages, especially Persian and RTL languages Publications: Kashefi, O., Nasri, M., & Kanani, K. (2010). Towards Automatic Persian Spell Checking. SCICT. Kashefi, O., Sharifi, M., & Minaie, B. (2013). A novel string distance metric for ranking Persian respelling suggestions. Natural Language Engineering,...

14 Reviews

Downloads: 56 This Week

Last Update: 2020-03-05
See Project
25

Modular Audio Recognition Framework

MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition along with sample applications (identification, NLP, etc.) of its use, implemented in Java.

3 Reviews

Downloads: 0 This Week

Last Update: 2015-10-06
See Project

Previous
1
You're on page 2
3
4
5
6
Next

Related Searches

ghawwas

annotation

arabic corpus

sapi 5 voices

search text

finite state machine simulation

latex

vectors

htdp

block diagram editor library c++

Related Categories

Scientific/Engineering

Text Editors

Artificial Intelligence

Software Development

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise