Java Linguistics Software

View 200 business solutions

Browse free open source Java Linguistics Software and projects below. Use the toggles on the left to filter open source Java Linguistics Software by OS, license, language, programming language, and project status.

  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 1
    WordNet Database in various SQL format
    Leader badge
    Downloads: 35 This Week
    Last Update:
    See Project
  • 2

    Wordcorr

    Data management for comparative linguistics

    Wordcorr automates the tedious and risky process of tabulating and managing the sound correspondences used in working out the historical development of natural languages. Initial support was from NSF.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 3

    sgmweka

    Weka wrapper for the SGM toolkit for text classification and modeling.

    Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. Other functions are usable through either Java command-line commands or class inclusion into Java projects.
    Leader badge
    Downloads: 16 This Week
    Last Update:
    See Project
  • 4
    srt-translator

    srt-translator

    Subtitle translator from one natural language to other.

    Translating subtitles in format SubRip from one natural language to other. It is based on Google Translate without API and therefore without payment. Translator have automatic and manual spell checkers.
    Downloads: 14 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    oopinyinguide
    OO Pinyin Guide is a Java extension for OpenOffice 3 or higher. It enables the user to add pinyin transliteration over Chinese characters inside a text document. This tool can be useful for people learning or teaching Chinese.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 7

    Ghawwas_V4

    An open source system for Arabic corpora processing

    Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character encoding g. Accept TXT, DOC, DOCX, RTF and HTML formats h. Export the processing results in CSV file format
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    ATTENTION! Morfologik is now at GitHub: https://github.com/morfologik/
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.
    Leader badge
    Downloads: 3 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10

    BioContext

    Software for extraction of biomedical information from literature

    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Entity recognition and normalization software for biomedical text
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    LaBB-CAT

    LaBB-CAT

    A linguistic annotation store

    LABB-CAT is a browser-based linguistics research tool that stores recordings and regular-expression searchable text transcripts of interviews. The search results, entire transcripts, and media, can be viewed or exported in a variety of format
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    Le projet Gramlab vise à mettre à disposition des entreprises des outils logiciels OpenSource et gratuits, qui peuvent être mis en oeuvre par des développeurs qui ne sont pas spécialistes du traitement des langues. Note : L'outil GLabCorpus Manager nécessite l'installation d'un serveur SolR. Pour le télécharger et plus d'information, veuillez vous rendre dans la section Files.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Korean Analyzer Rhino

    Korean Analyzer Rhino

    Parsing Korean words by morpheme and part-of-speech

    RHINO parses Korean words by morpheme and part-of-speech. Its dictionaries are based on Korean Modern Tagged Corpus(12 million phrases scale) which was made by Korean government. So it analyses many cases of stems and endings. And the newly developed Dynamic Dictionary Technology can make words to react with their context. That is, a programmed database. For more information see the files in the help folder.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    iLastic

    iLastic

    Query, integrate and manipulate data using natural languages.

    iLastic is an open-source framework to query, integrate and manipulate any type of data in English. Extract, transform and merge information from the web, databases, files or any other data repository using a language you already know... English
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    NOTE: For latest version, please visit https://github.com/ispasic/FlexiTerm. FlexiTerm is an open-source software tool for automatic term recognition. FlexiTerm uses a range of methods to neutralise the main sources of term variation. FlexiTerm is robust enough for less formally structured texts, such as those found in patient blogs or medical notes.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17

    NetBeans Dictionaries

    Additional dictionary files for the NetBeans spellchecker.

    Additional dictionary files for the NetBeans spellchecker.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Annoschemer is a little tool for easy editing of MMAX2 annotationschemes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    AraRooter

    Find Arabic Root Word

    Using Machine Learning, AraRooter finds the three-lettered root of any Arabic lemma with around 84% accuracy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Arabic Morphology& Sentacs coding
    This project aimed at creating framework and binary data format for etymological Arabic system. and will not continue hosted at sourceforge because the term of use determine me as enemy, so I am prohibited from using sourceforge services.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Autshumato MTWS

    Autshumato MTWS

    Autshumato Machine Translation Web Service

    Web service providing access to the Autshumato Machine Translation (MT) and other Moses Statistical MT systems. Functionality includes: - Automatic sentence, document, web page translation. - Improvements for translations. - Reviewer requests and interface to review improvements - Connection to the latest version of the Autshumato ITE, Post Edits done on inserted automatic translations are automatically submitted to the MTWS. - Administration interface to add users, reviewers and MT systems. - Exposed API for all of the services. - Ability to log into the system using your Google or Facebook ID. - All requests are logged by IP. Licensed under the GNU GPL v3 (or later): http://www.gnu.org/licenses/gpl-3.0.txt
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    It's a utility application for updating and integrating translation memories, created by the Autshumato ITE, over a network. Licensed under the TMate Open Source License and free to download and be used by anyone.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    BANAL - Banal And Not A Language. A prototyping notation compatible with Java and C# (via the largest possible common footprint between the two).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    BANNER is a named entity recognition system intended primarily for biomedical text. It uses conditional random fields as the primary recognition engine and includes a wide survey of the best techniques described in recent literature.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB