Open Source BSD Natural Language Processing (NLP) Tools

Natural Language Processing (NLP) Tools for BSD

Browse free open source Natural Language Processing (NLP) tools and projects for BSD below. Use the toggles on the left to filter open source Natural Language Processing (NLP) tools by OS, license, language, programming language, and project status.

  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    MeCab is a fast and customizable Japanese morphological analyzer. MeCab is designed for generic purpose and applied to variety of NLP tasks, such as Kana-Kanji conversion. MeCab provides parameter estimation functionalities based on CRFs and HMM
    Leader badge
    Downloads: 2,174 This Week
    Last Update:
    See Project
  • 2
    Virastyar

    Virastyar

    Virastyar is an spell checker for low-resource languages

    Virastyar is a free and open-source (FOSS) spell checker. It stands upon the shoulders of many free/libre/open-source (FLOSS) libraries developed for processing low-resource languages, especially Persian and RTL languages Publications: Kashefi, O., Nasri, M., & Kanani, K. (2010). Towards Automatic Persian Spell Checking. SCICT. Kashefi, O., Sharifi, M., & Minaie, B. (2013). A novel string distance metric for ranking Persian respelling suggestions. Natural Language Engineering, 19(2), 259-284. Rasooli, M. S., Kahefi, O., & Minaei-Bidgoli, B. (2011). Effect of adaptive spell checking in Persian. In NLP-KE Contributors: Omid Kashefi Azadeh Zamanifar Masoumeh Mashaiekhi Meisam Pourafzal Reza Refaei Mohammad Hedayati Kamiar Kanani Mehrdad Senobari Sina Iravanin Mohammad Sadegh Rasooli Mohsen Hoseinalizadeh Mitra Nasri Alireza Dehlaghi Fatemeh Ahmadi Neda PourMorteza
    Downloads: 48 This Week
    Last Update:
    See Project
  • 3
    Chinese-XLNet

    Chinese-XLNet

    Chinese XLNet pre-trained model

    Chinese-XLNet is a Chinese language pre-trained model based on the XLNet architecture, providing an advanced foundation for natural language processing tasks in Mandarin and other Chinese dialects. Unlike traditional masked language modeling, XLNet uses a permutation language modeling objective that captures bidirectional context more effectively by training over all possible token orderings, yielding richer contextual representations. This model is trained on large-scale Chinese text datasets to learn linguistic patterns, long-range dependencies, and semantic nuance typical of Chinese writing, making it useful for tasks like text classification, question answering, named entity recognition, and language generation. Chinese-XLNet offers an alternative to models like BERT by emphasizing autoregressive and permutation-based learning, which can lead to performance improvements on certain benchmarks and tasks.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    OpenNN - Open Neural Networks Library

    OpenNN - Open Neural Networks Library

    Machine learning algorithms for advanced analytics

    OpenNN is a software library written in C++ for advanced analytics. It implements neural networks, the most successful machine learning method. Some typical applications of OpenNN are business intelligence (customer segmentation, churn prevention…), health care (early diagnosis, microarray analysis…) and engineering (performance optimization, predictive maitenance…). OpenNN does not deal with computer vision or natural language processing. The main advantage of OpenNN is its high performance. This library outstands in terms of execution speed and memory allocation. It is constantly optimized and parallelized in order to maximize its efficiency. The documentation is composed by tutorials and examples to offer a complete overview about the library. OpenNN is developed by Artelnics, a company specialized in artificial intelligence.
    Downloads: 10 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    Common Resource Grep - crgrep

    Common Resource Grep - crgrep

    Common Resource Grep

    CRGREP searches for matching text in databases, various document formats, archives and other difficult to access resources. A command line tool for name and content text matching in database tables, plain files, MS Office documents, PDF, archives, MP3 audio, image meta-data, scanned documents, maven dependencies and web resources. CRGREP will search resources within resources of any arbitrary combination or depth, so text within a document within a zip archive, and so on. Here you will find binary downloads and discussion (https://sourceforge.net/p/crgrep/discussion/) . The actual development and issue tracking can be found here: https://bitbucket.org/cryanfuse/crgrep
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    AminePlatform

    AminePlatform

    Amine is a Multi-Layer Platform for the dev. of Intelligent Systems

    Amine is an Artificial Intelligence Multi-Layer Java Open Source Platform dedicated to the development of various kinds of Intelligent Systems and Agents (Knowledge-Based, Ontology-Based, Conceptual Graph -CG- Based, NLP, Reasoning and Learning, Natural Language Processing, etc.). Ontology, KB can be created and manipulated with various processes. CG theory is used as the main knowledge representation language. Amine provides two languages: PROLOG+CG which extends PROLOG with CG and Amine modules, and SYNERGY which is a visual activation/propagation based language. CGs are considered by SYNERGY as activable/executable graphs. See for more detail: //amine-platform.sourceforge.net/
    Leader badge
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    OpenNLP provides the organizational structure for coordinating several different projects which approach some aspect of Natural Language Processing. OpenNLP also defines a set of Java interfaces and implements some basic infrastructure for NLP compon
    Leader badge
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Open Pandora's Box

    Open Pandora's Box

    Pandora is an artificial intelligent web based bot

    Pandora is an artificial intelligent web based bot written in Java. Pandora is a component based AI architecture including, database memory, XML, voice, voice rec, chat, IRC, HTTP, Wiktionary, Freebase, consciousness, language, GUI, applet, web, jsp, Android
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    masmt

    masmt

    A frame work for Multi agent system development

    MaSMT is a java based multi-agent system development framework, especially designed for development of English to Sinhala machine translation system. MaSMT also capable to develop any multi-agent based system through its architecture. Reference: B. Hettige, A. S. Karunananda, G. Rzevski, Multi-agent solution for managing complexity in English to Sinhala Machine Translation, International Journal of Design & Nature and Ecodynamics, Volume 11, Issue 2, 2016, 88 – 96. B. Hettige, A. S. Karunananda, G. Rzevski, ” MaSMT: A Multi-agent System Development Framework for English-Sinhala Machine Translation”, International Journal of Computational Linguistics and Natural Language Processing (IJCLNLP), Volume 2 Issue 7 July 2013.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 10

    Darkbot

    The IRC's Talking Robot

    [ Please read https://sourceforge.net/p/darkbot/news/2014/01/darkbots-revitalization/ ] Darkbot is a portable IRC chat robot written in the C language that can be taught responses to user inquiries, and even have conversations with them. Darkbot was originally created by Jason Hamilton as an aid for help channels on Intenet Relay Chat.
    Leader badge
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    JVnSegmenter is a Java-based and open-source Vietnamese word segmentation tool. The segmentation model was trained on about 8,000 sentences using Conditional Random Fields (FlexCRFs). This tool would be useful for Vietnamese NLP community.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12

    Hermes Natural Language Processing

    A repository of software, documentation and data for NLP

    Hermes is a repository of software, documentation and data for NLP. I am currently adding corpora extracted from Wikipedia (mostrly in Romance languages).
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    JWNL is a Java API for accessing the WordNet relational dictionary. WordNet is widely used for developing NLP applications, and a Java API such as JWNL will allow developers to more easily use Java for building NLP applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    The Wikipedia Miner toolkit provides simplified access to Wikipedia. This open encyclopedia represents a vast, constantly evolving multilingual database of concepts and semantic relations; a promising resource for nlp and related research.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    OpenCCG, the OpenNLP CCG Library, is a collection of natural language processing components and tools which provide support for parsing and realization with Combinatory Categorial Grammar (CCG).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    OpenPR
    OpenPR stands for Open Pattern Recognition project and is intended to be an open source library for algorithms of image processing, computer vision, natural language processing, pattern recognition, machine learning and the related fields.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17

    Service Grid - Language Grid Base System

    SOA infrastracture initially developed by NICT Language Grid Project

    Service Grid is an infrastructure for accumulating and sharing Web services. Resources with complicated intellectual property issues are wrapped as Web services and shared on the Service Grid. If you release your software by using the software of this project, please include the following description in the documents or on the website. * This software uses the [SOFTWARE] by the Language Grid project (http://langrid.org/). [SOFTWARE] is one of: * Service Grid Server Software (http://langrid.org/oss-project/en/service_grid.html) * Language Service Development Libraries (http://langrid.org/oss-project/en/language_service.html) * Language Grid Toolbox (http://langrid.org/oss-project/en/toolbox.html) If you publish a paper by using the software of this project, please cite the following book. * Toru Ishida Ed. The Language Grid: Service-Oriented Collective Intelligence for Laguage Resource Interoperability. Springer, 2011. ISBN 978-3-642-21177-5.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    Grok is a library of natural language processing components, including support for parsing with categorial grammars and various preprocessing tasks such as part-of-speech tagging, sentence detection, and tokenization.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    AutoSummary uses Natural Language Processing to generate a contextually-relevant synopsis of plain text. It uses statistical and rule-based methods for part-of-speech tagging, word sense disambiguation, sentence deconstruction and semantic analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    Bermuda Text-to-Speech

    This project includes basic NLP and DSP techniques for Text-to-Speech

    See TTS demo at: http://rslp.racai.ro/index.php?page=tts This is an entirely written in JAVA project which includes a set of tools and methods designed to enable Multilingual Text-to-Speech (TTS) synthesis. We currently support English and Romanian but we will soon train more models and make them available for download. If you want to read more about our other NLP and TTS tools check out http://nlptools.racai.ro.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    The BioNLP UIMA Component Repository provides UIMA wrappers for novel and well-known 3rd-party NLP tools used in biomedical text prosessing, such as tokenizers, parsers, named entity taggers, and tools for evaluation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    CRFSharp

    CRFSharp

    CRFSharp is a .NET(C#) implementation of Conditional Random Field

    CRFSharp(aka CRF#) is a .NET(C#) implementation of Conditional Random Fields, an machine learning algorithm for learning from labeled sequences of examples. It is widely used in Natural Language Process (NLP) tasks, for example: word breaker, postagging, named entity recognized, query chunking and so on. CRF#'s mainly algorithm is the same as CRF++ written by Taku Kudo. It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. For example, in machine with 64GB, CRF# encodes model with more than 4.5 hundred million features quickly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Chinese-LLaMA-Alpaca 2

    Chinese-LLaMA-Alpaca 2

    Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project

    This project is developed based on the commercially available large model Llama-2 released by Meta. It is the second phase of the Chinese LLaMA&Alpaca large model project. The Chinese LLaMA-2 base model and the Alpaca-2 instruction fine-tuning large model are open-sourced. These models expand and optimize the Chinese vocabulary on the basis of the original Llama-2, use large-scale Chinese data for incremental pre-training, and further improve the basic semantics and command understanding of Chinese. Performance improvements. The related model supports FlashAttention-2 training, supports 4K context and can be extended up to 18K+ through the NTK method.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Chonkie

    Chonkie

    The no-nonsense RAG chunking library

    Chonkie is an AI-powered framework designed for building conversational agents and chatbots with natural language understanding and multi-turn conversation support.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    CoPT, Corpus Processing Tools, is a set of java classes intended to assist field linguists, NLP researchers and developers, students and software developers in all corpus-related processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB