Open Source Java Natural Language Processing (NLP) Tools

Java Natural Language Processing (NLP) Tools

View 189 business solutions

Browse free open source Java Natural Language Processing (NLP) Tools and projects below. Use the toggles on the left to filter open source Java Natural Language Processing (NLP) Tools by OS, license, language, programming language, and project status.

  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • Get the most trusted enterprise browser Icon
    Get the most trusted enterprise browser

    Advanced built-in security helps IT prevent breaches before they happen

    Defend against security incidents with Chrome Enterprise. Create customizable controls, manage extensions and set proactive alerts to keep your data and employees protected without slowing down productivity.
    Download Chrome
  • 1
    MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition along with sample applications (identification, NLP, etc.) of its use, implemented in Java.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 2
    VnCoreNLP

    VnCoreNLP

    A Vietnamese natural language processing toolkit

    VnCoreNLP is a Java-based natural language processing toolkit tailored for Vietnamese. It offers a fast and accurate pipeline for essential NLP tasks, facilitating research and application development in Vietnamese language processing. ​
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    OpenNLP provides the organizational structure for coordinating several different projects which approach some aspect of Natural Language Processing. OpenNLP also defines a set of Java interfaces and implements some basic infrastructure for NLP compon
    Leader badge
    Downloads: 20 This Week
    Last Update:
    See Project
  • 4
    Ansj Chinese word segmentation

    Ansj Chinese word segmentation

    Ansj word segmentation

    The real java implementation of ict. The word segmentation effect is faster than the open source version of ict. Chinese word segmentation, name recognition, part-of-speech tagging, user-defined dictionary. This is a java implementation of Chinese word segmentation based on n-Gram+CRF+HMM. The word segmentation speed reaches about 2 million words per second (tested under mac air), and the accuracy rate can reach more than 96%. At present, it has realized the functions of Chinese word segmentation, Chinese name recognition, user-defined dictionary, keyword extraction, automatic summarization, and keyword tagging. It can be applied to natural language processing and other aspects, and is suitable for various projects that require high word segmentation effects.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Apache OpenNLP

    Apache OpenNLP

    Apache OpenNLP

    Apache OpenNLP is a machine learning-based NLP library that provides tools for text-processing tasks such as tokenization, sentence segmentation, and named entity recognition.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Common Resource Grep - crgrep

    Common Resource Grep - crgrep

    Common Resource Grep

    CRGREP searches for matching text in databases, various document formats, archives and other difficult to access resources. A command line tool for name and content text matching in database tables, plain files, MS Office documents, PDF, archives, MP3 audio, image meta-data, scanned documents, maven dependencies and web resources. CRGREP will search resources within resources of any arbitrary combination or depth, so text within a document within a zip archive, and so on. Here you will find binary downloads and discussion (https://sourceforge.net/p/crgrep/discussion/) . The actual development and issue tracking can be found here: https://bitbucket.org/cryanfuse/crgrep
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.
    Leader badge
    Downloads: 18 This Week
    Last Update:
    See Project
  • 8
    masmt

    masmt

    A frame work for Multi agent system development

    MaSMT is a java based multi-agent system development framework, especially designed for development of English to Sinhala machine translation system. MaSMT also capable to develop any multi-agent based system through its architecture. Reference: B. Hettige, A. S. Karunananda, G. Rzevski, Multi-agent solution for managing complexity in English to Sinhala Machine Translation, International Journal of Design & Nature and Ecodynamics, Volume 11, Issue 2, 2016, 88 – 96. B. Hettige, A. S. Karunananda, G. Rzevski, ” MaSMT: A Multi-agent System Development Framework for English-Sinhala Machine Translation”, International Journal of Computational Linguistics and Natural Language Processing (IJCLNLP), Volume 2 Issue 7 July 2013.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 9
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.
    Downloads: 15 This Week
    Last Update:
    See Project
  • Photo and Video Editing APIs and SDKs Icon
    Photo and Video Editing APIs and SDKs

    Trusted by 150 million+ creators and businesses globally

    Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
    Learn More
  • 10
    This ohnlp project has released "pipelines" that were contributed by members of the OHNLP Consortium. The pipelines are based on the Apache UIMA framework. medKAT/P, MedCoref, MedTagger, MedXN, and cTAKES are licensed under Apache License V2.0. MedTime is licensed under GNU General Public License version 3.0 (GPLv3). cTAKES development has moved to apache.org. See http://ctakes.apache.org/
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    Open Pandora's Box

    Open Pandora's Box

    Pandora is an artificial intelligent web based bot

    Pandora is an artificial intelligent web based bot written in Java. Pandora is a component based AI architecture including, database memory, XML, voice, voice rec, chat, IRC, HTTP, Wiktionary, Freebase, consciousness, language, GUI, applet, web, jsp, Android
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    Next Generation Programming

    Next Generation Programming

    Compose Software Without Writing Any Programing Code

    "Next Generation Programming - Programming Without Coding Software" is a drag-drop wizard for creating simple or complex applications without writing any programming language code The Software is coded/designed with "Java Programming Language" for novice/expert programmers; Programmers can write softwares with visual tools : drag-drop components;visual editors... Programmers can use the software to compose of simple/complex applications : Database programs, circuit design, generate code and upload to chip for designed circuits (ESP8266, ESP32 chips) The Software in question is much simpler to use than PWCT (https://sourceforge.net/projects/doublesvsoop/) software. The Software has more features than PWCT software such as SCADA. Please start by looking at examples from the website first. In this way, you can learn the features of the software and how to use the software in a very short time. More Information (Documents, Videos, Examples ...) : negep.epizy.com
    Downloads: 13 This Week
    Last Update:
    See Project
  • 13

    Service Grid - Language Grid Base System

    SOA infrastracture initially developed by NICT Language Grid Project

    Service Grid is an infrastructure for accumulating and sharing Web services. Resources with complicated intellectual property issues are wrapped as Web services and shared on the Service Grid. If you release your software by using the software of this project, please include the following description in the documents or on the website. * This software uses the [SOFTWARE] by the Language Grid project (http://langrid.org/). [SOFTWARE] is one of: * Service Grid Server Software (http://langrid.org/oss-project/en/service_grid.html) * Language Service Development Libraries (http://langrid.org/oss-project/en/language_service.html) * Language Grid Toolbox (http://langrid.org/oss-project/en/toolbox.html) If you publish a paper by using the software of this project, please cite the following book. * Toru Ishida Ed. The Language Grid: Service-Oriented Collective Intelligence for Laguage Resource Interoperability. Springer, 2011. ISBN 978-3-642-21177-5.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 14
    The BioNLP UIMA Component Repository provides UIMA wrappers for novel and well-known 3rd-party NLP tools used in biomedical text prosessing, such as tokenizers, parsers, named entity taggers, and tools for evaluation.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    AminePlatform

    AminePlatform

    Amine is a Multi-Layer Platform for the dev. of Intelligent Systems

    Amine is an Artificial Intelligence Multi-Layer Java Open Source Platform dedicated to the development of various kinds of Intelligent Systems and Agents (Knowledge-Based, Ontology-Based, Conceptual Graph -CG- Based, NLP, Reasoning and Learning, Natural Language Processing, etc.). Ontology, KB can be created and manipulated with various processes. CG theory is used as the main knowledge representation language. Amine provides two languages: PROLOG+CG which extends PROLOG with CG and Amine modules, and SYNERGY which is a visual activation/propagation based language. CGs are considered by SYNERGY as activable/executable graphs. See for more detail: //amine-platform.sourceforge.net/
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    JInsect
    The JINSECT toolkit is a Java-based toolkit and library that supports and demonstrates the use of n-gram graphs within Natural Language Processing applications, ranging from summarization and summary evaluation to text classification and indexing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Maximum entropy is a powerful method for constructing statistical models of classification tasks, such as part of speech tagging in Natural Language Processing. Several example applications using maxent can be found in the OpenNLP Tools Library.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    This project aims to build a suite of Natural Language Processing tools. Modules will include corpus indexing and access tools, a part-of-speech tagger, tokenisers, text classification software, etc.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    The nlpFarm is a Natural Language Processing (NLP) resource where early research prototypes (Java) can evolve into robust and useful open source. Our farmstead collaborates under the OpenNLP initiative, in order to make NLP software publically available.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    JWNL is a Java API for accessing the WordNet relational dictionary. WordNet is widely used for developing NLP applications, and a Java API such as JWNL will allow developers to more easily use Java for building NLP applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    The Wikipedia Miner toolkit provides simplified access to Wikipedia. This open encyclopedia represents a vast, constantly evolving multilingual database of concepts and semantic relations; a promising resource for nlp and related research.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    The Aikernel is an intelligence server and cell runtime environment that uses natural language processing and other pattern matching with Activators, Contexts, Concepts to allow multi tasking between installed cells.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    This is a toolkit for medical natural language processing (NLP). The core engine is general enough to be used in a variety of text processing domains, though the toolkit includes specific support for medical reports and patient de-identification.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    NLP4J library is a toolset written in Java for Natural Language Processing. This version is oriented to Document Classification and uses Naive Bayes, TF-IDF, etc. There are also pre-processing tools.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25

    OPTIMA cidoc-crm Semantic Annotation

    Semantic annotation of archaeology reports with respect to CIDOC-CRM

    The semantic annotation system OPTIMA is the result of Andreas Vlachidis PhD work, (supervised by Prof. Douglas Tudhope, University of Glamorgan, UK). OPTIMA performs the NLP tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense Disambiguation using hand-crafted rules and SKOS terminological resources (English Heritage Thesauri and Glossaries). The resulted semantic annotations are associated with classes of the (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) and its archaeological extension, CRM-EH. OPTIMA is also targeted at the detection and recognition of contextual relations between CRM entities. Such relations are modeled with respect to the CRM-EH archaeology extension. The pipeline targets the CIDOC-CRM entities; E19.Physical_Object, E53.Place, E49.Time_Appellation and E57.Material and the CRM-EH entities; EHE1001.Context_Event, EHE1002.Production_Event, EHE1004.Deposition_Event and P45.consists_of material property
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.