MemoryOS is designed to provide a memory operating system
Parallax is a distributed model serving framework
LLM training in simple, raw C/CUDA
AirLLM 70B inference with single 4GB GPU
DepGraph: Towards Any Structural Pruning
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
Code and models for ICML 2024 paper, NExT-GPT
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
The official implementation of RAPTOR
On the Structural Pruning of Large Language Models
Driving with Graph Visual Question Answering
A tension reasoning engine over 131 S-class problems
Open-weight, large-scale hybrid-attention reasoning model
Serving multiple LoRA finetuned LLM as one
An open-source framework for training large multimodal models