Open Source Linux Natural Language Processing (NLP) Tools

Natural Language Processing (NLP) Tools for Linux

View 26 business solutions

Browse free open source Natural Language Processing (NLP) tools and projects for Linux below. Use the toggles on the left to filter open source Natural Language Processing (NLP) tools by OS, license, language, programming language, and project status.

  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1
    MeCab is a fast and customizable Japanese morphological analyzer. MeCab is designed for generic purpose and applied to variety of NLP tasks, such as Kana-Kanji conversion. MeCab provides parameter estimation functionalities based on CRFs and HMM
    Leader badge
    Downloads: 1,812 This Week
    Last Update:
    See Project
  • 2
    Virastyar

    Virastyar

    Virastyar is an spell checker for low-resource languages

    Virastyar is a free and open-source (FOSS) spell checker. It stands upon the shoulders of many free/libre/open-source (FLOSS) libraries developed for processing low-resource languages, especially Persian and RTL languages Publications: Kashefi, O., Nasri, M., & Kanani, K. (2010). Towards Automatic Persian Spell Checking. SCICT. Kashefi, O., Sharifi, M., & Minaie, B. (2013). A novel string distance metric for ranking Persian respelling suggestions. Natural Language Engineering, 19(2), 259-284. Rasooli, M. S., Kahefi, O., & Minaei-Bidgoli, B. (2011). Effect of adaptive spell checking in Persian. In NLP-KE Contributors: Omid Kashefi Azadeh Zamanifar Masoumeh Mashaiekhi Meisam Pourafzal Reza Refaei Mohammad Hedayati Kamiar Kanani Mehrdad Senobari Sina Iravanin Mohammad Sadegh Rasooli Mohsen Hoseinalizadeh Mitra Nasri Alireza Dehlaghi Fatemeh Ahmadi Neda PourMorteza
    Leader badge
    Downloads: 546 This Week
    Last Update:
    See Project
  • 3
    OpenVINO

    OpenVINO

    OpenVINO™ Toolkit repository

    OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. Boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common tasks. Use models trained with popular frameworks like TensorFlow, PyTorch and more. Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud. This open-source version includes several components: namely Model Optimizer, OpenVINO™ Runtime, Post-Training Optimization Tool, as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inferencing on Intel® CPUs and Intel® Processor Graphics. It supports pre-trained models from the Open Model Zoo, along with 100+ open source and public models in popular formats such as TensorFlow, ONNX, PaddlePaddle, MXNet, Caffe, Kaldi.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 4
    ChatGLM.cpp

    ChatGLM.cpp

    C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

    ChatGLM.cpp is a C++ implementation of the ChatGLM-6B model, enabling efficient local inference without requiring a Python environment. It is optimized for running on consumer hardware.
    Downloads: 12 This Week
    Last Update:
    See Project
  • Photo and Video Editing APIs and SDKs Icon
    Photo and Video Editing APIs and SDKs

    Trusted by 150 million+ creators and businesses globally

    Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
    Learn More
  • 5
    Docspell

    Docspell

    Assist in organizing your piles of documents

    Docspell is a personal document organizer. Or sometimes called a "Document Management System" (DMS). You'll need a scanner to convert your papers into files. Docspell can then assist in organizing the resulting mess. It can unify your files from scanners, emails, and other sources. It is targeted for home use, i.e. families, households, and also for smaller groups/companies. You can associate tags, set correspondent,s and lots of other predefined and custom metadata. If your documents are associated with such metadata, you can quickly find them later using the search feature. However adding this manually is a tedious task. Docspell can help by suggesting correspondents, guessing tags or finding dates using machine learning. It can learn metadata from existing documents and find things using NLP. This makes adding metadata to your documents a lot easier. For machine learning, it relies on the free (GPL) Stanford Core NLP library.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 6
    DeepSparse

    DeepSparse

    Sparsity-aware deep learning inference runtime for CPUs

    A sparsity-aware enterprise inferencing system for AI models on CPUs. Maximize your CPU infrastructure with DeepSparse to run performant computer vision (CV), natural language processing (NLP), and large language models (LLMs).
    Downloads: 11 This Week
    Last Update:
    See Project
  • 7
    BotSharp

    BotSharp

    AI Multi-Agent Framework in .NET

    Conversation as a platform (CaaP) is the future, so it's perfect that we're already offering the whole toolkits to our .NET developers using the BotSharp AI BOT Platform Builder to build a CaaP. It opens up as much learning power as possible for your own robots and precisely control every step of the AI processing pipeline. BotSharp is an open source machine learning framework for AI Bot platform builder. This project involves natural language understanding, computer vision and audio processing technologies, and aims to promote the development and application of intelligent robot assistants in information systems. Out-of-the-box machine learning algorithms allow ordinary programmers to develop artificial intelligence applications faster and easier. It's written in C# running on .Net Core that is full cross-platform framework. C# is a enterprise-grade programming language which is widely used to code business logic in information management-related system.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8
    HanLP

    HanLP

    Han Language Processing

    HanLP is a multilingual Natural Language Processing (NLP) library composed of a series of models and algorithms. Built on TensorFlow 2.0, it was designed to advance state-of-the-art deep learning techniques and popularize the application of natural language processing in both academia and industry. HanLP is capable of lexical analysis (Chinese word segmentation, part-of-speech tagging, named entity recognition), syntax analysis, text classification, and sentiment analysis. It comes with pretrained models for numerous languages including Chinese and English. It offers efficient performance, clear structure and customizable features, with plenty more amazing features to look forward to on the roadmap.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 9
    NVIDIA NeMo

    NVIDIA NeMo

    Toolkit for conversational AI

    NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI architectures are typically large and require a lot of data and compute for training. NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node mixed-precision training. Supported models: Jasper, QuartzNet, CitriNet, Conformer-CTC, Conformer-Transducer, Squeezeformer-CTC, Squeezeformer-Transducer, ContextNet, LSTM-Transducer (RNNT), LSTM-CTC. NGC collection of pre-trained speech processing models.
    Downloads: 8 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    spaCy

    spaCy

    Industrial-strength Natural Language Processing (NLP)

    spaCy is a library built on the very latest research for advanced Natural Language Processing (NLP) in Python and Cython. Since its inception it was designed to be used for real world applications-- for building real products and gathering real insights. It comes with pretrained statistical models and word vectors, convolutional neural network models, easy deep learning integration and so much more. spaCy is the fastest syntactic parser in the world according to independent benchmarks, with an accuracy within 1% of the best available. It's blazing fast, easy to install and comes with a simple and productive API.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    Ciphey

    Ciphey

    Decrypt encryptions without knowing the key or cipher

    Fully automated decryption/decoding/cracking tool using natural language processing & artificial intelligence, along with some common sense. You don't know, you just know it's possibly encrypted. Ciphey will figure it out for you. Ciphey can solve most things in 3 seconds or less. Ciphey aims to be a tool to automate a lot of decryptions & decodings such as multiple base encodings, classical ciphers, hashes or more advanced cryptography. If you don't know much about cryptography, or you want to quickly check the ciphertext before working on it yourself, Ciphey is for you. The technical part. Ciphey uses a custom-built artificial intelligence module (AuSearch) with a Cipher Detection Interface to approximate what something is encrypted with. And then a custom-built, customizable natural language processing Language Checker Interface, which can detect when the given text becomes plaintext.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    franc

    franc

    Natural language detection

    Franc is a lightweight language detection library for JavaScript that supports multiple languages and scripts. It is designed for detecting text language efficiently in various applications.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    AWS Toolkit for Visual Studio Code

    AWS Toolkit for Visual Studio Code

    Local Lambda debug, CodeWhisperer, SAM/CFN syntax, etc.

    The AWS Toolkit extension for Visual Studio Code enables you to interact with Amazon Web Services (AWS). Try the AWS Code Sample Catalog to start coding with the AWS SDK. The AWS Explorer provides access to the AWS services that you can work with when using the Toolkit. To see the AWS Explorer, choose the AWS icon in the Activity bar. The Developer Tools panel is a section for developer-focused tooling curated for working in an IDE. The Developer Tools panel can be found underneath the AWS Explorer when the AWS icon is selected in the Activity bar. The AWS CDK Explorer enables you to work with AWS Cloud Development Kit (CDK) applications. It shows a top-level view of your CDK applications that have been synthesized in your workspace. Amazon CodeWhisperer provides inline code suggestions using machine learning and natural language processing on the contents of your current file. Supported languages include Java, Python and Javascript.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    Obsei

    Obsei

    Obsei is a low code AI powered automation tool

    Obsei is an automated no-code/low-code AI-powered text observation and analysis framework, designed for extracting insights from unstructured text data such as social media, reviews, and logs.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    gse

    gse

    Go efficient multilingual NLP and text segmentation

    Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others. Gse is implements jieba by golang, and try add NLP support and more feature. Support common, search engine, full mode, precise mode and HMM mode multiple word segmentation modes. Support user and embed dictionary, Part-of-speech/POS tagging, analyze segment info, stop and trim words. Support multilingual: English, Chinese, Japanese and others. Support Traditional Chinese. Support HMM cut text use Viterbi algorithm. Support NLP by TensorFlow (in work). Named Entity Recognition (in work). Supports with elastic search and bleve. run JSON RPC service.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    Machine Learning PyTorch Scikit-Learn

    Machine Learning PyTorch Scikit-Learn

    Code Repository for Machine Learning with PyTorch and Scikit-Learn

    Initially, this project started as the 4th edition of Python Machine Learning. However, after putting so much passion and hard work into the changes and new topics, we thought it deserved a new title. So, what’s new? There are many contents and additions, including the switch from TensorFlow to PyTorch, new chapters on graph neural networks and transformers, a new section on gradient boosting, and many more that I will detail in a separate blog post. For those who are interested in knowing what this book covers in general, I’d describe it as a comprehensive resource on the fundamental concepts of machine learning and deep learning. The first half of the book introduces readers to machine learning using scikit-learn, the defacto approach for working with tabular datasets. Then, the second half of this book focuses on deep learning, including applications to natural language processing and computer vision.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    SparseML

    SparseML

    Libraries for applying sparsification recipes to neural networks

    SparseML is an optimization toolkit for training and deploying deep learning models using sparsification techniques like pruning and quantization to improve efficiency.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Stanza

    Stanza

    Stanford NLP Python library for many human languages

    Stanza is a collection of accurate and efficient tools for the linguistic analysis of many human languages. Starting from raw text to syntactic analysis and entity recognition, Stanza brings state-of-the-art NLP models to languages of your choosing. Stanza is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of speech and morphological features, to give a syntactic structure dependency parse, and to recognize named entities. The toolkit is designed to be parallel among more than 70 languages, using the Universal Dependencies formalism. Stanza is built with highly accurate neural network components that also enable efficient training and evaluation with your own annotated data.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    Transformers4Rec

    Transformers4Rec

    Transformers4Rec is a flexible and efficient library

    Transformers4Rec is an advanced recommendation system library that leverages Transformer models for sequential and session-based recommendations. The library works as a bridge between natural language processing (NLP) and recommender systems (RecSys) by integrating with one of the most popular NLP frameworks, Hugging Face Transformers (HF). Transformers4Rec makes state-of-the-art transformer architectures available for RecSys researchers and industry practitioners. Traditional recommendation algorithms usually ignore the temporal dynamics and the sequence of interactions when trying to model user behavior. Generally, the next user interaction is related to the sequence of the user's previous choices. In some cases, it might be a repeated purchase or song play. User interests can also suffer from interest drift because preferences can change over time. Those challenges are addressed by the sequential recommendation task.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Subliminal Blaster 4

    Subliminal Blaster 4

    Subliminal Blaster Powered 4 - Mude seus Hábitos! Change your habits

    Subliminal Blaster is a NLP software that shows text subliminal messages in your computer screen while you use it normaly for your activities. It re-programs your mind in a subconscious level while you exercite your conscious with your activities like browsing, working, watching video and others. Subliminal Blaster é um software de PNL que exibe mensagens subliminares na tela do PC enquanto você utiliza normalmente para suas atividades. Ele reprograma sua mente a nível subconsciente enquanto você exercita seu consciente em suas atividades. WE ARE NOW ON VERSION 4! Please support the project by donating bitcoins 1GRYGnSmpuU1ZuXodn2H9UVEpVRBx5CTL2 Or dogecoins! DBfkGrdLvmpbYQzcRCm9KLUuPk9Zigjjod Would you like to contribute? Go to our Facebook page! https://www.facebook.com/SubliminalBlasterIntl/
    Leader badge
    Downloads: 27 This Week
    Last Update:
    See Project
  • 21
    OpenNLP provides the organizational structure for coordinating several different projects which approach some aspect of Natural Language Processing. OpenNLP also defines a set of Java interfaces and implements some basic infrastructure for NLP compon
    Leader badge
    Downloads: 24 This Week
    Last Update:
    See Project
  • 22
    MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition along with sample applications (identification, NLP, etc.) of its use, implemented in Java.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 23
    Apache OpenNLP

    Apache OpenNLP

    Apache OpenNLP

    Apache OpenNLP is a machine learning-based NLP library that provides tools for text-processing tasks such as tokenization, sentence segmentation, and named entity recognition.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    Bolt NLP

    Bolt NLP

    Bolt is a deep learning library with high performance

    Bolt is a high-performance deep learning inference framework developed by Huawei Noah's Ark Lab. It is designed to optimize and accelerate the deployment of deep learning models across various hardware platforms. Bolt is a light-weight library for deep learning. Bolt, as a universal deployment tool for all kinds of neural networks, aims to automate the deployment pipeline and achieve extreme acceleration. Bolt has been widely deployed and used in many departments of HUAWEI company, such as 2012 Laboratory, CBG and HUAWEI Product Lines. If you have questions or suggestions, you can submit issue.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Datasets

    Datasets

    Hub of ready-to-use datasets for ML models

    Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. We also feature a deep integration with the Hugging Face Hub, allowing you to easily load and share a dataset with the wider NLP community. There are currently over 2658 datasets, and more than 34 metrics available. Datasets naturally frees the user from RAM memory limitation, all datasets are memory-mapped using an efficient zero-serialization cost backend (Apache Arrow). Smart caching: never wait for your data to process several times.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.