Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Natural Language Processing (NLP) Tools
Search Results

Search Results for "training"

x

Sort By:

Relevance

Clear All Filters

OS

Windows 34
Linux 33
Mac 33
More...
BSD 3
ChromeOS 3

Category

Artificial Intelligence 35
Software Development 5
Education 2
Business 1
Multimedia 1
Scientific/Engineering 1

License

OSI-Approved Open Source 32

Programming Language

Python 35
JavaScript 1

Status

Production/Stable 1

Showing 35 open source projects for "training"

View related business solutions

Natural Language Processing (NLP) Python Clear Filters & Widen Search

Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

Diffgram

Training data (data labeling, annotation, workflow) for all data types

From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality. Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. ...

Downloads: 5 This Week

Last Update: 2024-10-14
See Project
2

Pyreft

ReFT: Representation Finetuning for Language Models

PyreFT is a tool by Stanford NLP for fine-tuning transformer models with an emphasis on efficient, resource-conserving training and customizability for NLP tasks.

Downloads: 0 This Week

Last Update: 2025-02-04
See Project
3

LightAutoML

Fast and customizable framework for automatic ML model creation

LightAutoML is an automated machine learning (AutoML) framework optimized for efficient model training and hyperparameter tuning, focusing on both tabular and text data.

Downloads: 0 This Week

Last Update: 2025-12-04
See Project
4

DOLMA

Data and tools for generating and inspecting OLMo pre-training data

DOLMA (Data Optimization and Learning for Model Alignment) is a framework designed to manage large-scale datasets for training and fine-tuning language models efficiently.

Downloads: 0 This Week

Last Update: 2025-06-25
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
5

PaddleNLP

Easy-to-use and powerful NLP library with Awesome model zoo

PaddleNLP It is a natural language processing development library for flying paddles, with Easy-to-use text area API, Examples of applications for multiple scenarios, and High-performance distributed training Three major features, aimed at improving the modeling efficiency of the flying oar developer's text field, aiming to improve the developer's development efficiency in the text field, and provide rich examples of NLP applications. Provide rich industry-level pre-task capabilities Taskflow And process-wide text area API: Support for the loading of rich Chinese data sets Dataset API, can flexibly and efficiently complete data pretreatment Data API, Preset 60 + pre-training word vector Embedding API, Providing 100 + pre-training model Transformer API Wait, the efficiency of NLP task modeling can be greatly improved.

Downloads: 0 This Week

Last Update: 2025-05-21
See Project
6

ModelScope

Bring the notion of Model-as-a-Service to life

...It seeks to bring together most advanced machine learning models from the AI community, and streamlines the process of leveraging AI models in real-world applications. The core ModelScope library open-sourced in this repository provides the interfaces and implementations that allow developers to perform model inference, training and evaluation. In particular, with rich layers of API abstraction, the ModelScope library offers unified experience to explore state-of-the-art models spanning across domains such as CV, NLP, Speech, Multi-Modality, and Scientific-computation. Model contributors of different areas can integrate models into the ModelScope ecosystem through the layered APIs, allowing easy and unified access to their models. ...

Downloads: 2 This Week

Last Update: 2026-03-27
See Project
7

NVIDIA NeMo

Toolkit for conversational AI

...Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI architectures are typically large and require a lot of data and compute for training. NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node mixed-precision training. Supported models: Jasper, QuartzNet, CitriNet, Conformer-CTC, Conformer-Transducer, Squeezeformer-CTC, Squeezeformer-Transducer, ContextNet, LSTM-Transducer (RNNT), LSTM-CTC. NGC collection of pre-trained speech processing models.

Downloads: 2 This Week

Last Update: 2026-03-23
See Project
8

Adapters

A Unified Library for Parameter-Efficient Learning

Adapters is an add-on library to HuggingFace's Transformers, integrating 10+ adapter methods into 20+ state-of-the-art Transformer models with minimal coding overhead for training and inference. Adapters provide a unified interface for efficient fine-tuning and modular transfer learning, supporting a myriad of features like full-precision or quantized training (e.g. Q-LoRA, Q-Bottleneck Adapters, or Q-PrefixTuning), adapter merging via task arithmetics or the composition of multiple adapters via composition blocks, allowing advanced research in parameter-efficient transfer learning for NLP tasks.

Downloads: 0 This Week

Last Update: 2025-05-20
See Project
9

Colossal-AI

Making large AI models cheaper, faster and more accessible

...It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment. However, distributed training, especially model parallelism, often requires domain expertise in computer systems and architecture. It remains a challenge for AI researchers to implement complex distributed training solutions for their models. Colossal-AI provides a collection of parallel components for you. We aim to support you to write your distributed deep learning models just like how you write your model on your laptop.

Downloads: 1 This Week

Last Update: 2025-05-28
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
10

NNCF

Neural Network Compression Framework for enhanced OpenVINO

NNCF (Neural Network Compression Framework) is an optimization toolkit for deep learning models, designed to apply quantization, pruning, and other techniques to improve inference efficiency.

Downloads: 1 This Week

Last Update: 2026-02-24
See Project
11

SparseML

Libraries for applying sparsification recipes to neural networks

SparseML is an optimization toolkit for training and deploying deep learning models using sparsification techniques like pruning and quantization to improve efficiency.

Downloads: 0 This Week

Last Update: 2025-06-02
See Project
12

SetFit

Efficient few-shot learning with Sentence Transformers

...It achieves high accuracy with little labeled data - for instance, with only 8 labeled examples per class on the Customer Reviews sentiment dataset, SetFit is competitive with fine-tuning RoBERTa Large on the full training set of 3k examples.

Downloads: 0 This Week

Last Update: 2025-08-05
See Project
13

Stanza

Stanford NLP Python library for many human languages

...The toolkit is designed to be parallel among more than 70 languages, using the Universal Dependencies formalism. Stanza is built with highly accurate neural network components that also enable efficient training and evaluation with your own annotated data.

Downloads: 1 This Week

Last Update: 2026-02-26
See Project
14

Chinese-XLNet

Chinese XLNet pre-trained model

Chinese-XLNet is a Chinese language pre-trained model based on the XLNet architecture, providing an advanced foundation for natural language processing tasks in Mandarin and other Chinese dialects. Unlike traditional masked language modeling, XLNet uses a permutation language modeling objective that captures bidirectional context more effectively by training over all possible token orderings, yielding richer contextual representations. This model is trained on large-scale Chinese text datasets to learn linguistic patterns, long-range dependencies, and semantic nuance typical of Chinese writing, making it useful for tasks like text classification, question answering, named entity recognition, and language generation. ...

Downloads: 0 This Week

Last Update: 2026-01-15
See Project
15

torchtext

Data loaders and abstractions for text and NLP

We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. LTS versions are distributed through a different channel than the other versioned releases. Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses. To build torchtext from source, you need git, CMake and C++11 compiler such as g++. When building from source, make sure that you have the same C++...

Downloads: 0 This Week

Last Update: 2024-04-16
See Project
16

Datasets

Hub of ready-to-use datasets for ML models

Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. We also feature a deep integration with the Hugging Face Hub, allowing you to easily load and share a dataset with the wider NLP community. There are currently over 2658 datasets, and more than 34 metrics available. ...

Downloads: 0 This Week

Last Update: 2026-03-23
See Project
17

Chinese-LLaMA-Alpaca 2

Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project

...The Chinese LLaMA-2 base model and the Alpaca-2 instruction fine-tuning large model are open-sourced. These models expand and optimize the Chinese vocabulary on the basis of the original Llama-2, use large-scale Chinese data for incremental pre-training, and further improve the basic semantics and command understanding of Chinese. Performance improvements. The related model supports FlashAttention-2 training, supports 4K context and can be extended up to 18K+ through the NTK method.

Downloads: 0 This Week

Last Update: 2024-01-23
See Project
18

WebArena

Code repo for "WebArena to build Autonomous Agents

WebArena is a realistic web environment designed for building and testing autonomous agents, providing a platform for developing web-based AI agents.

Downloads: 0 This Week

Last Update: 2025-01-30
See Project
19

AllenNLP

An open-source NLP research library, built on PyTorch

AllenNLP makes it easy to design and evaluate new deep learning models for nearly any NLP problem, along with the infrastructure to easily run them in the cloud or on your laptop. AllenNLP includes reference implementations of high quality models for both core NLP problems (e.g. semantic role labeling) and NLP applications (e.g. textual entailment). AllenNLP supports loading "plugins" dynamically. A plugin is just a Python package that provides custom registered classes or additional...

Downloads: 1 This Week

Last Update: 2022-10-18
See Project
20

MITRE Annotation Toolkit

A toolkit for managing and manipulating text annotations

...It can be customized for specific tasks (e.g., named entity identification, de-identification of medical records). The goal of MAT is not to help you configure your training engine (in the default case, the Carafe CRF system) to achieve the best possible performance on your data. MAT is for "everything else": all the tools you end up wishing you had.

Downloads: 0 This Week

Last Update: 2023-04-19
See Project
21

SimCSE

SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE (Simple Contrastive Learning of Sentence Embeddings) is a machine learning framework for training sentence embeddings using contrastive learning. It improves representation learning for NLP tasks.

Downloads: 0 This Week

Last Update: 2025-01-21
See Project
22

SRU

Training RNNs as Fast as CNNs

...In this work, we propose the Simple Recurrent Unit (SRU), a light recurrent unit that balances model capacity and scalability. SRU is designed to provide expressive recurrence, enable highly parallelized implementation, and comes with careful initialization to facilitate the training of deep models. We demonstrate the effectiveness of SRU on multiple NLP tasks. SRU achieves 5--9x speed-up over cuDNN-optimized LSTM on classification and question answering datasets, and delivers stronger results than LSTM and convolutional models. We also obtain an average of 0.7 BLEU improvement over the Transformer model on the translation by incorporating SRU into the architecture. ...

Downloads: 0 This Week

Last Update: 2022-08-09
See Project
23

XLM (Cross-lingual Language Model)

PyTorch original implementation of Cross-lingual Language Model

...Using a shared subword vocabulary, XLM learns language-agnostic features that work well for classification and sequence labeling tasks such as XNLI, NER, and POS without target-language supervision. The repository provides preprocessing pipelines, training code, and fine-tuning scripts so you can reproduce benchmark results or adapt models to your own multilingual corpora. Pretrained checkpoints cover dozens of languages and multiple model sizes, balancing quality and compute needs.

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
24

TextBrewer

A PyTorch-based knowledge distillation toolkit

TextBrewer is a PyTorch-based model distillation toolkit for natural language processing. It includes various distillation techniques from both NLP and CV field and provides an easy-to-use distillation framework, which allows users to quickly experiment with the state-of-the-art distillation methods to compress the model with a relatively small sacrifice in the performance, increasing the inference speed and reducing the memory usage.

Downloads: 0 This Week

Last Update: 2025-01-22
See Project
25

NLP Architect

A model library for exploring state-of-the-art deep learning

...The library includes our past and ongoing NLP research and development efforts as part of Intel AI Lab. NLP Architect is designed to be flexible for adding new models, neural network components, data handling methods, and for easy training and running models. NLP Architect is a model-oriented library designed to showcase novel and different neural network optimizations. The library contains NLP/NLU-related models per task, different neural network topologies (which are used in models), procedures for simplifying workflows in the library, pre-defined data processors and dataset loaders and misc utilities. ...

Downloads: 0 This Week

Last Update: 2022-08-05
See Project

Previous
You're on page 1
2
Next

Related Searches

roof

nvidia

ai

python ai

morphological analysis

chinese

hotel management system with pos

dataset

medical diagnosis system

text-to-speech

Related Categories

Artificial Intelligence

Software Development

Education

Business

Multimedia

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise