Accessible large language models via k-bit quantization for PyTorch
Agentic, Reasoning, and Coding (ARC) foundation models
Universal LLM Deployment Engine with ML Compilation
Utilities intended for use with Llama models
A high-performance ML model serving framework, offers dynamic batching
Phi-3.5 for Mac: Locally-run Vision and Language Models
The official Meta Llama 3 GitHub site
Synthetic data curation for post-training and data extraction
NeurIPS2025 Spotlight] Quantized Attention
The official repo of Qwen chat & pretrained large language model
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs
Research code artifacts for Code World Model (CWM)
Open-source large language model family from Tencent Hunyuan
LLM training code for MosaicML foundation models
Tensor search for humans
Unified framework for building enterprise RAG pipelines
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Large-language-model & vision-language-model based on Linear Attention
Bringing BERT into modernity via both architecture changes and scaling
Set of tools to assess and improve LLM security
A large-scale model of medical consultation in Chinese
Cache-Augmented Generation: A Simple, Efficient Alternative to RAG
Implementation for MatMul-free LM
DepGraph: Towards Any Structural Pruning
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI