Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Scientific/Engineering
Linguistics Software
Search Results

Search Results for "text based"

x

Sort By:

Relevance

Clear All Filters

OS

Windows 26
Linux 24
Mac 23
More...
BSD 13
ChromeOS 13
Desktop Operating Systems 1
Mobile Operating Systems 1

Category

Scientific/Engineering 27
Artificial Intelligence 11
Education 5
Text Editors 4
Business 2
Multimedia 2
Software Development 2
Database 1
Religion and Philosophy 1
Social sciences 1
System 1

License

OSI-Approved Open Source 19
Creative Commons Attribution License 3
Other License 2
Public Domain 1

Translations

English 9
German 3
French 2
Chinese (Simplified) 1
More...
Indonesian 1
Japanese 1
Portuguese 1
Russian 1
Spanish 1

Programming Language

Java 27
C 1
Flex 1
Groovy 1
JavaScript 1
More...
Perl 1
S/R 1
XSL (XSLT/XPath/XSL-FO) 1

Status

Alpha 10
Production/Stable 8
Beta 5
Pre-Alpha 2

Showing 27 open source projects for "text based"

View related business solutions

Linguistics Java Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

multinotes

Text architecture for music theory.

...Furthermore, dynamic interactive documents can be useful for presenting complicated interdependencies to the reader more clearly, far beyond conventional paper publication. The mulitNotes text architecture and processing pipeline is based on d2d and standard technologies (XSLT, ECMAScript. LilyPond, PostScript, etc.) and addresses these issues. An overview about the software architecture and its operation is given in: Journal of the Text Encoding Initiative, Open Issue 18/2024: "Using d2d for Writing XML --- The multiNotes Text Architecture for Musical Analysis" https://doi.org/10.4000/132ex

Downloads: 0 This Week

Last Update: 2025-02-04
See Project
2

LaBB-CAT

A linguistic annotation store

LABB-CAT is a browser-based linguistics research tool that stores recordings and regular-expression searchable text transcripts of interviews. The search results, entire transcripts, and media, can be viewed or exported in a variety of format

Downloads: 1 This Week

Last Update: 1 day ago
See Project
3

TXM

Unicode XML TEI text analysis platform

TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). ...

Downloads: 16 This Week

Last Update: 2024-12-09
See Project
4

korpus

Corpus Linguistics Software

Some software for Corpus Linguistics, which includes Corpus Text Editor, Web-based search, etc. This project created for Belarusian Corpus, but can be used for other languages with some adaption.

Downloads: 0 This Week

Last Update: 2021-02-02
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
5

Leseratte

Leseratte is a Java parser for German written language. Currently, it contains a German lexicon (based on the Wiktionary), inflexion rules, a grammar and a parser. (Semantics component planned.) Usable as a Java library, also provides a graphical UI.

Downloads: 0 This Week

Last Update: 2020-10-03
See Project
6

SimpleLemmatizer

This program is for text lemmatization

It lemmatizes texts based on supplied model. The base model is for slovak texts and is created from Slovak National Corpus, copyright by Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences

Downloads: 0 This Week

Last Update: 2020-03-22
See Project
7

TIES

A smart search engine for medical documents

TIES (Text Information Extraction System) is a clinical text search engine that uses Natural Language Processing techniques to extract medical concepts from free text clinical reports. It provides secure de-identified access to this information and has in built collaboration tools and honest broker functionality. It is licensed for academic use under the BSD license. For commercial use please contact Nexi at http://nexihub.com *** NOTICE: this software and forum are no longer...

1 Review

Downloads: 0 This Week

Last Update: 2019-09-09
See Project
8

Ghawwas_V4

An open source system for Arabic corpora processing

Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character...

1 Review

Downloads: 4 This Week

Last Update: 2018-12-09
See Project
9

Corpus Toolkit

A text management tool for linguistic purposes...

Downloads: 0 This Week

Last Update: 2017-11-23
See Project
Compliant and Reliable File Transfers Backed by Top Security Certifications
Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.

Start Free Trial
10

HermeneutiX

Your graphical tool for Syntactic/Semantic Structure Analysis of texts

HermeneutiX is a tool for diagramming syntactic and semantic structures of complex (not necessarily foreign-language) texts (e.g. bible or other historical excerpts). HermeneutiX is now part of SciToS (the scientific tool set). Starting with version 2.0.0, HermeneutiX can be found on GitHub. Please check out the release summary: https://github.com/scientific-tool-set/scitos/releases For an introduction, check out this video: https://youtu.be/uQjewyG0Ad8 PS: To run a Java...

Downloads: 0 This Week

Last Update: 2017-09-28
See Project
11

Musaheb

An Arabic collocation extraction tool

“Musaheb”, an Arabic collocation extraction tool that has been designed and implemented to overcome the limitations of existing collocation extraction tools. “Musaheb” is able to extract n-gram collocations up to 5-gram, in addition to extracting the collocates of the nodes (the word-types we are looking for its collocates) within a window size of zero to 15 words. Moreover, it provides eight collocation statistics to calculate the strength of the collocation, and permits the input of...

Downloads: 0 This Week

Last Update: 2017-08-22
See Project
12

Discriminative Language Editor

Discriminative language editor based on ontologies

Text editor in Java that is able to detect discriminative expressions while the user is typing. When the internal ontology-based analyzer detects a potential discriminative expression the user is advised by underscoring the related words in the text. A descriptive message about the issue is also shown to the user when the cursor is placed over the potential discriminative expression.

Downloads: 0 This Week

Last Update: 2016-10-30
See Project
13

sgmweka

Weka wrapper for the SGM toolkit for text classification and modeling.

Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. ...

Downloads: 17 This Week

Last Update: 2016-06-23
See Project
14

JInsect

The JINSECT toolkit is a Java-based toolkit and library that supports and demonstrates the use of n-gram graphs within Natural Language Processing applications, ranging from summarization and summary evaluation to text classiﬁcation and indexing.

3 Reviews

Downloads: 0 This Week

Last Update: 2015-08-25
See Project
15

Stemmer Gujarati

Offline stemmer for Gujarati , which is one of 22 Indian languages.

...There has been lot of significant work in the development and evaluation of stemmer for non-Indian languages, but very less or no significant work has been done on Indian front especially for Gujarati language.The code of this stemmer is based on algorithm designed under guidance of Prof. Nikita Desai, India. It takes input file of type .txt containing Gujarati text encoded as UTF-8 and then removes stop words which are unessential. After processing rest of the words, it outputs corresponding file containing all stem words plus other details.

Downloads: 0 This Week

Last Update: 2015-04-05
See Project
16

FALCON - Text Search Java Project

JSON based text search Java Project

----------------- - What is it? - ----------------- The "Falcon Search" is a JAVA API and tool to search inside the documents. It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language...

Downloads: 0 This Week

Last Update: 2014-04-18
See Project
17

Texalyzer

Text analyzer

Analyzes text document using TF-IDF and optionally stopword list, and extracts important keywords.

Downloads: 0 This Week

Last Update: 2017-04-04
See Project
18

Transformation-Based Learning in Java

Java application for training and deploying text processing applications such as part-of-speech taggers, based on a re-implementation of Brill's algorithm in Java.

Downloads: 0 This Week

Last Update: 2014-04-23
See Project
19

gannu

Java API and tools for performing NLP and other AI tasks

Java API and tools for performing a wide range of AI tasks such as: word sense disambiguation (released), optimization (5 Evolutionary Algorithms Implemented ETA February 2014), opinion mining (ETA November 2014) and text wikification (ETA July 2014). Gannu includes some graphical interfaces for scientific purposes. When using Gannu please cite: *Jiménez, F. V., Gelbukh, A. F. & Sidorov, G. (2013). Simple Window Selection Strategies for the Simplified Lesk Algorithm for Word Sense...

Downloads: 0 This Week

Last Update: 2013-12-16
See Project
20

TML - Text Mining Library for LSA & CMM

TML is a Java Library for LSA and extracting Concept Maps from text

TML has moved to http://www.villalon.cl/tml.html and the code to https://github.com/villalon/tml

3 Reviews

Downloads: 0 This Week

Last Update: 2013-08-05
See Project
21

EyeMap - Eye Movement Data Analyzer

EyeMap is a visualization and analysis tool for text reading eye movement data. It can process Unicode, proportion/non-proportion and spaced/unspaced reading materials, which supports various languages and experiment methods.

1 Review

Downloads: 0 This Week

Last Update: 2013-08-10
See Project
22

FullFiller

Data Base Benchmarking tool

A very simple tool to automate benchmarking tests on MySQL DBs. It fills MySQL tables columns; perform customized tests; and outputs the results on CSV format. It uses Xeger, a java package for generating random text from regular expressions (http://code.google.com/p/xeger/). Xeger uses dk.brics.automaton java package developed by Anders Møller (http://cs.au.dk/~amoeller/automaton/index.html).

Downloads: 0 This Week

Last Update: 2012-06-17
See Project
23

BioEvent

This is a Java-based project for complex event extraction from text and co-reference resolution. Currently the code can read BioNLP shared task format (http://2011.bionlp-st.org/) and i2b2 Natural Language Processing for Clinical Data shared task format (https://www.i2b2.org/NLP/DataSets/Main.php). Event extraction includes finding events and the parameters for an event in a text.

Downloads: 0 This Week

Last Update: 2013-04-25
See Project
24

oopinyinguide

OO Pinyin Guide is a Java extension for OpenOffice 3 or higher. It enables the user to add pinyin transliteration over Chinese characters inside a text document. This tool can be useful for people learning or teaching Chinese.

3 Reviews

Downloads: 9 This Week

Last Update: 2013-04-29
See Project
25

TextSegFault

Linear text segmentation algorithm based on a sliding window technique.

Downloads: 0 This Week

Last Update: 2013-04-11
See Project

Previous
You're on page 1
2
Next

Related Searches

ghawwas

image gallery

tmx

corpus linguistics

search text

niv bible file

arabic collocations

editor ontology

sgm

text summarization

Related Categories

Scientific/Engineering

Artificial Intelligence

Education

Text Editors

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise