Page 5 | Best Open Source Python Natural Language Processing (NLP) Tools

TextRank

TextRank implementation for Python 3

TextRank is an implementation of the TextRank algorithm for extractive text summarization and keyword extraction, inspired by Google’s PageRank.

Downloads: 0 This Week

Last Update: 2025-01-24

See Project

TorchDistill

A coding-free framework built on PyTorch

torchdistill (formerly kdkit) offers various state-of-the-art knowledge distillation methods and enables you to design (new) experiments simply by editing a declarative yaml config file instead of Python code. Even when you need to extract intermediate representations in teacher/student models, you will NOT need to reimplement the models, which often change the interface of the forward, but instead specify the module path(s) in the yaml file. In addition to knowledge distillation, this framework helps you design and perform general deep learning experiments (WITHOUT coding) for reproducible deep learning studies. i.e., it enables you to train models without teachers simply by excluding teacher entries from a declarative yaml config file.

Downloads: 0 This Week

Last Update: 2025-05-11

See Project

Transformers-Interpret

Model explainability that works seamlessly with Hugging Face

Transformers-Interpret is an interpretability tool for Transformer-based NLP models, providing insights into attention mechanisms and feature importance.

Downloads: 0 This Week

Last Update: 2025-01-24

See Project

Transformers4Rec

Transformers4Rec is a flexible and efficient library

Transformers4Rec is an advanced recommendation system library that leverages Transformer models for sequential and session-based recommendations. The library works as a bridge between natural language processing (NLP) and recommender systems (RecSys) by integrating with one of the most popular NLP frameworks, Hugging Face Transformers (HF). Transformers4Rec makes state-of-the-art transformer architectures available for RecSys researchers and industry practitioners. Traditional recommendation algorithms usually ignore the temporal dynamics and the sequence of interactions when trying to model user behavior. Generally, the next user interaction is related to the sequence of the user's previous choices. In some cases, it might be a repeated purchase or song play. User interests can also suffer from interest drift because preferences can change over time. Those challenges are addressed by the sequential recommendation task.

Downloads: 0 This Week

Last Update: 2025-01-24

See Project

Underthesea

Underthesea - Vietnamese NLP Toolkit

Underthesea is a Vietnamese NLP toolkit providing various text processing capabilities, including word segmentation, part-of-speech tagging, and named entity recognition.

Downloads: 0 This Week

Last Update: 2025-10-02

See Project

UniEM

Unified embedding model

UniEM is a unified embedding model designed to create high-quality text embeddings for various natural language processing tasks.

Downloads: 0 This Week

Last Update: 2025-01-30

See Project

WebArena

Code repo for "WebArena to build Autonomous Agents

WebArena is a realistic web environment designed for building and testing autonomous agents, providing a platform for developing web-based AI agents.

Downloads: 0 This Week

Last Update: 2025-01-30

See Project

WikiChat

WikiChat is an improved RAG

WikiChat is a chatbot framework designed to interactively retrieve and summarize Wikipedia information, allowing users to ask questions and get context-aware responses?

Downloads: 0 This Week

Last Update: 2025-04-29

See Project

Wikipedia2Vec

A tool for learning vector representations of words and entities

Wikipedia2Vec is an embedding learning tool that creates word and entity vector representations from Wikipedia, enabling NLP models to leverage structured and contextual knowledge.

Downloads: 0 This Week

Last Update: 2025-01-24

See Project

XLM (Cross-lingual Language Model)

PyTorch original implementation of Cross-lingual Language Model

XLM (Cross-lingual Language Model) is a family of multilingual pretraining methods that align representations across languages to enable strong zero-shot transfer. It popularized objectives like Masked Language Modeling (MLM) across many languages and Translation Language Modeling (TLM) that jointly trains on parallel sentence pairs to tighten cross-lingual alignment. Using a shared subword vocabulary, XLM learns language-agnostic features that work well for classification and sequence labeling tasks such as XNLI, NER, and POS without target-language supervision. The repository provides preprocessing pipelines, training code, and fine-tuning scripts so you can reproduce benchmark results or adapt models to your own multilingual corpora. Pretrained checkpoints cover dozens of languages and multiple model sizes, balancing quality and compute needs.

Downloads: 0 This Week

Last Update: 2025-10-07

See Project

aseryla

Aseryla code repositories

This project describes a model of how the semantic human memory represents the information relevant to the objects of the world in text format. It provides a system and a GUI application capable of extracting and managing concepts and relations from English texts. https://aseryla2.sourceforge.io/

Downloads: 0 This Week

Last Update: 2021-10-29

See Project

cocoNLP

A Chinese information extraction tool

cocoNLP is a lightweight natural-language processing toolkit geared toward practical information extraction from raw text, especially for Chinese and mixed Chinese–English content. Instead of requiring a heavy pipeline, it focuses on quick wins such as extracting names, places, organizations, emails, phone numbers, and dates directly from unstructured sentences. The project blends pattern-based methods with NLP heuristics, giving developers dependable results for real-world texts like chats, comments, and user-generated content. Its API is intentionally simple, so you can drop it into scripts, ETL jobs, or dashboards without deep ML expertise. Because it aims at utility over complexity, it’s useful for prototyping data products or building lightweight text analytics where large models would be overkill. The repository also includes examples and test snippets to help you understand expected inputs and typical outputs, which shortens the learning curve for newcomers.

Downloads: 0 This Week

Last Update: 2025-11-05

See Project

fastNLP

fastNLP: A Modularized and Extensible NLP Framework

fastNLP is a lightweight framework for natural language processing (NLP), the goal is to quickly implement NLP tasks and build complex models. A unified Tabular data container simplifies the data preprocessing process. Built-in Loader and Pipe for multiple datasets, eliminating the need for preprocessing code. Various convenient NLP tools, such as Embedding loading (including ELMo and BERT), intermediate data cache, etc.. Provide a variety of neural network components and recurrence models (covering tasks such as Chinese word segmentation, named entity recognition, syntactic analysis, text classification, text matching, metaphor resolution, summarization, etc.). Trainer provides a variety of built-in Callback functions to facilitate experiment recording, exception capture, etc. Automatic download of some datasets and pre-trained models.

Downloads: 0 This Week

Last Update: 2022-08-05

See Project

find-similar

User-friendly library to find similar objects

The mission of the FindSimilar project is to provide a powerful and versatile open source library that empowers developers to efficiently find similar objects and perform comparisons across a variety of data types. Whether dealing with texts, images, audio, or more, our project aims to simplify the process of identifying similarities and enhancing decision-making. https://github.com/findsimilar/find-similar - GitHub repo http://demo.findsimilar.org/ - Demo project and tutorial https://docs.findsimilar.org/ - Documentation

1 Review

Downloads: 0 This Week

Last Update: 2023-11-12

See Project

funNLP

Resources, corpora, and tools for Chinese natural language processing

FunNLP is a large, curated collection of resources, corpora, and tools for Chinese natural language processing (NLP). It aggregates datasets, lexicons, wordlists, sentiment dictionaries, knowledge graphs, and pretrained model references, serving as a one-stop resource hub for Chinese NLP practitioners. The repository is organized into categories such as sentiment analysis, text classification, named entity recognition, knowledge graphs, and various lexicons (e.g. sensitive words, emotion dictionaries, stopwords). It also includes links to academic papers, open-source model implementations, and practical utilities like word segmentation or text cleaning scripts. The project is highly community-oriented, frequently updated with contributions and new resources, and it’s widely used in both academic and applied NLP research. Its value lies in providing not just tools but also curated, domain-specific data, which can be hard to find elsewhere.

Downloads: 0 This Week

Last Update: 2025-10-01

See Project

jiant

jiant is an nlp toolkit

Jiant is a multitask NLP framework for fine-tuning transformer-based models on multiple natural language understanding (NLU) tasks.

Downloads: 0 This Week

Last Update: 2025-01-22

See Project

lazynlp

Library to scrape and clean web pages to create massive datasets

LazyNLP is a lightweight tool for collecting and curating large-scale text datasets for machine learning and NLP applications with minimal manual effort.

Downloads: 0 This Week

Last Update: 2025-01-22

See Project

textacy

NLP, before and after spaCy

textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library. With the fundamentals, tokenization, part-of-speech tagging, dependency parsing, etc., delegated to another library, textacy focuses primarily on the tasks that come before and follow after.

Downloads: 0 This Week

Last Update: 2025-01-22

See Project

tnets

Theano-based (Deep) Neural Networks for Speech Research

Alpha tool build on the top of Theano and delibaretly tuned for ASR/NLP tasks.

Downloads: 0 This Week

Last Update: 2015-11-19

See Project

torchtext

Data loaders and abstractions for text and NLP

We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. LTS versions are distributed through a different channel than the other versioned releases. Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses. To build torchtext from source, you need git, CMake and C++11 compiler such as g++. When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, check out the environment it was built with conda (here) and pip (here). Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB, etc.

Downloads: 0 This Week

Last Update: 2024-04-16

See Project

txtai

Build AI-powered semantic search applications

txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid pace, models can understand concepts in documents, audio, images and more. Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction. Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes). Applications range from similarity search to complex NLP-driven data extractions to generate structured databases. The following applications are powered by txtai.

Downloads: 0 This Week

Last Update: 2025-12-15

See Project

webXcreta

webXcreta users natural language processing to create grammatical averages of textual communication and then generate original content based on these statistics.

Downloads: 0 This Week

Last Update: 2014-05-06

See Project

Open Source Python Natural Language Processing (NLP) Tools - Page 5

Python Natural Language Processing (NLP) Tools

TextRank

TorchDistill

Transformers-Interpret

Transformers4Rec

Underthesea

UniEM

WebArena

WikiChat

Wikipedia2Vec

XLM (Cross-lingual Language Model)

aseryla

cocoNLP

fastNLP

find-similar

funNLP

jiant

lazynlp

textacy

tnets

torchtext

txtai

webXcreta

Related Searches