Best Open Source Java Linguistics Software

Java Linguistics Software

Linguistics Java Clear Filters

Browse free open source Java Linguistics Software and projects below. Use the toggles on the left to filter open source Java Linguistics Software by OS, license, language, programming language, and project status.

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
8 Monitoring Tools in One APM. Install in 5 Minutes.
Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.

Start Free
1

WordNetSQL

WordNet Database in various SQL format

2 Reviews

Downloads: 19 This Week

Last Update: 2014-02-16
See Project
2

Wordcorr

Data management for comparative linguistics

Wordcorr automates the tedious and risky process of tabulating and managing the sound correspondences used in working out the historical development of natural languages. Initial support was from NSF.

4 Reviews

Downloads: 10 This Week

Last Update: 2013-01-05
See Project
3

oopinyinguide

OO Pinyin Guide is a Java extension for OpenOffice 3 or higher. It enables the user to add pinyin transliteration over Chinese characters inside a text document. This tool can be useful for people learning or teaching Chinese.

3 Reviews

Downloads: 9 This Week

Last Update: 2013-04-29
See Project
4

Ghawwas_V4

An open source system for Arabic corpora processing

Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character encoding g. Accept TXT, DOC, DOCX, RTF and HTML formats h. Export the processing results in CSV file format

1 Review

Downloads: 8 This Week

Last Update: 2018-12-09
See Project
Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
5

LaBB-CAT

A linguistic annotation store

LABB-CAT is a browser-based linguistics research tool that stores recordings and regular-expression searchable text transcripts of interviews. The search results, entire transcripts, and media, can be viewed or exported in a variety of format

Downloads: 12 This Week

Last Update: 2026-04-28
See Project
6

LINNAEUS

Entity recognition and normalization software for biomedical text

Downloads: 10 This Week

Last Update: 2016-05-05
See Project
7

BioC

We describe a simple XML format to share text documents and annotation

A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.

Downloads: 8 This Week

Last Update: 2016-08-08
See Project
8

sgmweka

Weka wrapper for the SGM toolkit for text classification and modeling.

Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. Other functions are usable through either Java command-line commands or class inclusion into Java projects.

Downloads: 7 This Week

Last Update: 2016-06-23
See Project
9

srt-translator

Subtitle translator from one natural language to other.

Translating subtitles in format SubRip from one natural language to other. It is based on Google Translate without API and therefore without payment. Translator have automatic and manual spell checkers.

Downloads: 7 This Week

Last Update: 2016-07-19
See Project
AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
10

GramLab

Le projet Gramlab vise à mettre à disposition des entreprises des outils logiciels OpenSource et gratuits, qui peuvent être mis en oeuvre par des développeurs qui ne sont pas spécialistes du traitement des langues. Note : L'outil GLabCorpus Manager nécessite l'installation d'un serveur SolR. Pour le télécharger et plus d'information, veuillez vous rendre dans la section Files.

Downloads: 5 This Week

Last Update: 2016-03-10
See Project
11

Helsinki Finite-State Technology

The Helsinki Finite-State Transducer toolkit is intended for processing natural language morphologies. The toolkit is demonstrated by wide-coverage implementations of a number of languages of varying morphological complexity.

Downloads: 4 This Week

Last Update: 2017-09-14
See Project
12

Korean Analyzer Rhino

Parsing Korean words by morpheme and part-of-speech

RHINO parses Korean words by morpheme and part-of-speech. Its dictionaries are based on Korean Modern Tagged Corpus(12 million phrases scale) which was made by Korean government. So it analyses many cases of stems and endings. And the newly developed Dynamic Dictionary Technology can make words to react with their context. That is, a programmed database. For more information see the files in the help folder.

Downloads: 4 This Week

Last Update: 2020-10-11
See Project
13

IceNLP

IceNLP is an open source Natural Language Processing (NLP) toolkit for analyzing and processing Icelandic text. The toolkit is implemented in Java.

1 Review

Downloads: 2 This Week

Last Update: 2018-04-13
See Project
14

HermeneutiX

Your graphical tool for Syntactic/Semantic Structure Analysis of texts

HermeneutiX is a tool for diagramming syntactic and semantic structures of complex (not necessarily foreign-language) texts (e.g. bible or other historical excerpts). HermeneutiX is now part of SciToS (the scientific tool set). Starting with version 2.0.0, HermeneutiX can be found on GitHub. Please check out the release summary: https://github.com/scientific-tool-set/scitos/releases For an introduction, check out this video: https://youtu.be/uQjewyG0Ad8 PS: To run a Java application such as HermeneutiX (i.e. SciToS) you need a Java Runtime Environment (JRE). HermeneutiX is currently built to be compatible down to JRE version 6. You may download the current JRE here: http://www.java.com/en/download

Downloads: 3 This Week

Last Update: 2017-09-28
See Project
15

ISO GrAF

Experimental Java library for reading and writing GrAF/XML files.

The Graph Annotation Framework (GrAF) models linguistic annotations using a data model based on Graph theory and algorithms. The GrAF standard is a work product of ISO TC37SC4 Working Group 1. This Java library is NOT part of the GrAF standard and standoff annotation files produced by the library may not be GrAF compliant.

Downloads: 2 This Week

Last Update: 2015-03-07
See Project
16

TXM

Unicode XML TEI text analysis platform

TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.

Downloads: 2 This Week

Last Update: 2024-12-09
See Project
17

EyeMap - Eye Movement Data Analyzer

EyeMap is a visualization and analysis tool for text reading eye movement data. It can process Unicode, proportion/non-proportion and spaced/unspaced reading materials, which supports various languages and experiment methods.

1 Review

Downloads: 1 This Week

Last Update: 2013-08-10
See Project
18

Bermuda Text-to-Speech

This project includes basic NLP and DSP techniques for Text-to-Speech

See TTS demo at: http://rslp.racai.ro/index.php?page=tts This is an entirely written in JAVA project which includes a set of tools and methods designed to enable Multilingual Text-to-Speech (TTS) synthesis. We currently support English and Romanian but we will soon train more models and make them available for download. If you want to read more about our other NLP and TTS tools check out http://nlptools.racai.ro.

Downloads: 1 This Week

Last Update: 2014-03-24
See Project
19

Communication Supporting System

Downloads: 1 This Week

Last Update: 2015-03-26
See Project
20

Khawas

An Arabic Corpora Processing Tool

The new version is available at https://sourceforge.net/projects/ghawwasv4/

Downloads: 1 This Week

Last Update: 2014-08-02
See Project
21

NetBeans Dictionaries

Additional dictionary files for the NetBeans spellchecker.

Additional dictionary files for the NetBeans spellchecker.

Downloads: 1 This Week

Last Update: 2013-03-16
See Project
22

Pacx

Platform for Annotated Corpora in XML Integrated tool for corpus linguists built on Eclipse, Vex, Subversive, etc. for creating and editing transcriptions and annotations, querying, managing version controlled data, and building a shippable corpus.

Downloads: 1 This Week

Last Update: 2014-03-15
See Project
23

jWords

jWords is a port of WORDS (by William Whitaker, a free latin-to-english dictionary program written in Ada), to Java. Besides the dictionary will be translated to the German language.

Downloads: 1 This Week

Last Update: 2012-09-14
See Project
24

ARARSS

Downloads: 0 This Week

Last Update: 2019-01-01
See Project
25

Annoschemer

Annoschemer is a little tool for easy editing of MMAX2 annotationschemes.

Downloads: 0 This Week

Last Update: 2014-07-15
See Project