Java Linguistics Software

View 2709 business solutions

Browse free open source Java Linguistics Software and projects below. Use the toggles on the left to filter open source Java Linguistics Software by OS, license, language, programming language, and project status.

  • Get the most trusted enterprise browser Icon
    Get the most trusted enterprise browser

    Advanced built-in security helps IT prevent breaches before they happen

    Defend against security incidents with Chrome Enterprise. Create customizable controls, manage extensions and set proactive alerts to keep your data and employees protected without slowing down productivity.
    Download Chrome
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • 1
    WordNet Database in various SQL format
    Leader badge
    Downloads: 39 This Week
    Last Update:
    See Project
  • 2

    Wordcorr

    Data management for comparative linguistics

    Wordcorr automates the tedious and risky process of tabulating and managing the sound correspondences used in working out the historical development of natural languages. Initial support was from NSF.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 3
    Live Transcribe Speech Engine

    Live Transcribe Speech Engine

    Live Transcribe is an Android application

    Live Transcribe Speech Engine provides on-device speech recognition components that power real-time transcription for accessibility and everyday voice-first experiences. Its design prioritizes latency and robustness in noisy, far-field environments, enabling continuous transcription with low delay on mobile hardware. The engine manages audio front-end processing—such as noise suppression and voice activity detection—before feeding audio into compact, accurate acoustic and language models. Partial hypotheses stream as words are recognized, then stabilize with minimal jitter as confidence increases, which is crucial for usability. The code emphasizes efficient use of CPU and neural accelerators to balance battery life with responsiveness. Deployed in accessibility contexts, it aims for dependable behavior across accents, environments, and intermittent connectivity, with graceful degradation when resources are constrained.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4

    sgmweka

    Weka wrapper for the SGM toolkit for text classification and modeling.

    Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. Other functions are usable through either Java command-line commands or class inclusion into Java projects.
    Leader badge
    Downloads: 28 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    srt-translator

    srt-translator

    Subtitle translator from one natural language to other.

    Translating subtitles in format SubRip from one natural language to other. It is based on Google Translate without API and therefore without payment. Translator have automatic and manual spell checkers.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 6

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.
    Leader badge
    Downloads: 22 This Week
    Last Update:
    See Project
  • 7
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 8
    LaBB-CAT

    LaBB-CAT

    A linguistic annotation store

    LABB-CAT is a browser-based linguistics research tool that stores recordings and regular-expression searchable text transcripts of interviews. The search results, entire transcripts, and media, can be viewed or exported in a variety of format
    Downloads: 15 This Week
    Last Update:
    See Project
  • 9
    Helsinki Finite-State Technology
    The Helsinki Finite-State Transducer toolkit is intended for processing natural language morphologies. The toolkit is demonstrated by wide-coverage implementations of a number of languages of varying morphological complexity.
    Downloads: 14 This Week
    Last Update:
    See Project
  • Photo and Video Editing APIs and SDKs Icon
    Photo and Video Editing APIs and SDKs

    Trusted by 150 million+ creators and businesses globally

    Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
    Learn More
  • 10
    CoSyne Integrated Prototype
    Multilingual Content Synchronization with Wikis: CoSyne is a Research and Technological Development project co-funded by the European Union. Details: http://cosyne.eu
    Downloads: 9 This Week
    Last Update:
    See Project
  • 11
    oopinyinguide
    OO Pinyin Guide is a Java extension for OpenOffice 3 or higher. It enables the user to add pinyin transliteration over Chinese characters inside a text document. This tool can be useful for people learning or teaching Chinese.
    Leader badge
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12

    ISO GrAF

    Experimental Java library for reading and writing GrAF/XML files.

    The Graph Annotation Framework (GrAF) models linguistic annotations using a data model based on Graph theory and algorithms. The GrAF standard is a work product of ISO TC37SC4 Working Group 1. This Java library is NOT part of the GrAF standard and standoff annotation files produced by the library may not be GrAF compliant.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    ELIA(Eyegaze Language Integration Analysis) supports the analysis of eye-tracking data for studies in language processing. ELIA eases early analysis of data to enable iterative development of experiments in response to spoken language.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Entity recognition and normalization software for biomedical text
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15

    JAVA Arabic Stemmer

    A JAVA class with a small functionality that is stemming Arabic words

    A JAVA Arabic stemmer that is based on Shereen Khoja algorithm. This java class offers a function called stemWrod which takes an arabic word and return the stem of it.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16

    BioLemmatizer

    Lemmatization tool for morphological analysis of biomedical literature

    The BioLemmatizer is a domain-specific lemmatization tool for the morphological analysis of biomedical literature. It is tailored to the biological domain through integration of several published lexical resources related to molecular biology. It focuses on the inflectional morphology of English, including the plural form of nouns, the conjugations of verbs, and the comparative and superlative form of adjectives and adverbs. README: https://sourceforge.net/projects/biolemmatizer/files/ The BioLemmatizer 1.2 release adds an optional functionality to normalize British English spellings into American English spellings and then retrieve corresponding lemmas. If you use the BioLemmatizer to support academic research, please cite the following paper: Haibin Liu, Tom Christiansen, William A Baumgartner Jr, and Karin Verspoor BioLemmatizer: a lemmatization tool for morphological processing of biomedical text Journal of Biomedical Semantics 2012, 3:3.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    FREJ
    FREJ stands for "Fuzzy Regular Expressions for Java" - it is a command-line tool and library which allow you easily compare strings with patterns disregarding nasty typos and considering several variants (like "Barack Obama", "B.H.Obama" etc.) Project sources are moved to github: https://github.com/RodionGork/FREJ
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Le projet Gramlab vise à mettre à disposition des entreprises des outils logiciels OpenSource et gratuits, qui peuvent être mis en oeuvre par des développeurs qui ne sont pas spécialistes du traitement des langues. Note : L'outil GLabCorpus Manager nécessite l'installation d'un serveur SolR. Pour le télécharger et plus d'information, veuillez vous rendre dans la section Files.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    HanNanum - Korean POS Tagger
    HanNanum is a Korean Morphological Analyzer and POS Tagger. A plug-in component-based architecture is adapted to the new Java version for flexible use. You can find the work flow for morphological analysis, POS tagging, noun extraction, etc. Contact: kschoi@kaist.ac.kr hjjeong@world.kaist.ac.kr
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20

    RDRPOSTagger

    A Rule-based Part-of-Speech and Morphological Tagging Toolkit

    RDRPOSTagger is a robust, easy-to-use and language-independent rule-based toolkit for Part-of-Speech (POS) and morphological tagging. RDRPOSTagger obtains fast performance in both learning and tagging process. RDRPOSTagger also achieves a very competitive accuracy in comparison to the state-of-the-art results. RDRPOSTagger now supports pre-trained POS and morphological tagging models for Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese. Additionally, RDRPOSTagger supports the pre-trained Universal POS tagging models for 40 languages. See the full usage of RDRPOSTagger at: http://rdrpostagger.sourceforge.net/
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Corpus Toolkit

    Corpus Toolkit

    A text management tool for linguistic purposes...

    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    InGen is a java-based tool that automatically extracts keywords from a given LaTeX-Document and creates an index for those keywords.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23

    OPTIMA cidoc-crm Semantic Annotation

    Semantic annotation of archaeology reports with respect to CIDOC-CRM

    The semantic annotation system OPTIMA is the result of Andreas Vlachidis PhD work, (supervised by Prof. Douglas Tudhope, University of Glamorgan, UK). OPTIMA performs the NLP tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense Disambiguation using hand-crafted rules and SKOS terminological resources (English Heritage Thesauri and Glossaries). The resulted semantic annotations are associated with classes of the (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) and its archaeological extension, CRM-EH. OPTIMA is also targeted at the detection and recognition of contextual relations between CRM entities. Such relations are modeled with respect to the CRM-EH archaeology extension. The pipeline targets the CIDOC-CRM entities; E19.Physical_Object, E53.Place, E49.Time_Appellation and E57.Material and the CRM-EH entities; EHE1001.Context_Event, EHE1002.Production_Event, EHE1004.Deposition_Event and P45.consists_of material property
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    The program creates OWL ontology files that describe relationships between entities. Basis are definitions found by searching Wikipedia articles for specific lexico-syntactic patterns.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    EyeMap - Eye Movement Data Analyzer
    EyeMap is a visualization and analysis tool for text reading eye movement data. It can process Unicode, proportion/non-proportion and spaced/unspaced reading materials, which supports various languages and experiment methods.
    Downloads: 1 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.