Audience
Developers interested in a large language model
About GPT-NeoX
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training.
Other Popular Alternatives & Related Software
T5
With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.
Learn more
DeepSpeed
DeepSpeed is an open source deep learning optimization library for PyTorch. It's designed to reduce computing power and memory use, and to train large distributed models with better parallelism on existing computer hardware. DeepSpeed is optimized for low latency, high throughput training.
DeepSpeed can train DL models with over a hundred billion parameters on the current generation of GPU clusters. It can also train up to 13 billion parameters in a single GPU.
DeepSpeed is developed by Microsoft and aims to offer distributed training for large-scale models. It's built on top of PyTorch, which specializes in data parallelism.
Learn more
OPT
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.
Learn more
Pythia
Pythia combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.
Learn more
Pricing
Starting Price:
Free
Pricing Details:
Open source
Free Version:
Free Version available.
Integrations
Company Information
EleutherAI
Founded: 2020
github.com/EleutherAI/gpt-neox
Other Useful Business Software
Easy-to-Use Website Accessibility Widget
All in One Accessibility is an AI based accessibility tool that helps organizations to enhance the accessibility and usability of websites quickly.
Product Details
Platforms Supported
Cloud
On-Premises
Training
Documentation