Implementation of model parallel autoregressive transformers on GPUs
Framework and no-code GUI for fine-tuning LLMs
Ongoing research training transformer models at scale
State-of-the-art Parameter-Efficient Fine-Tuning
Training and serving large-scale neural networks
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
BISHENG is an open LLM devops platform for next generation apps
LLM training code for MosaicML foundation models
An implementation of model parallel GPT-2 and GPT-3-style models