Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Run 100B+ language models at home, BitTorrent-style
Inference code for Llama models
Inference code and configs for the ReplitLM model family
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
Explore large language models in 512MB of RAM
Implementation of "Tree of Thoughts
Implementation of model parallel autoregressive transformers on GPUs
Code for the paper Fine-Tuning Language Models from Human Preferences
An implementation of model parallel GPT-2 and GPT-3-style models