MemoryOS is designed to provide a memory operating system
Project aimed at extracting, exporting, and analyzing chat records
Parallax is a distributed model serving framework
LLM training in simple, raw C/CUDA
AirLLM 70B inference with single 4GB GPU
Learn How LLM Transformer Models Work with Interactive Visualization
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Apple Intelligence from the command line
On the Structural Pruning of Large Language Models
DepGraph: Towards Any Structural Pruning
A powerful tool for creating datasets for LLM fine-tuning
The official implementation of RAPTOR
Based on the LangChain/LangGraph framework
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
Mooncake is the serving platform for Kimi
UCCL is an efficient communication library for GPUs
Driving with Graph Visual Question Answering
A high-performance inference engine for AI models
A tension reasoning engine over 131 S-class problems
Code and models for ICML 2024 paper, NExT-GPT
Open-weight, large-scale hybrid-attention reasoning model
Web-based tool converts GitHub repository contents
An LLM-based presentation generation platform
Serving multiple LoRA finetuned LLM as one