Gemma open-weight LLM library, from Google DeepMind
Open-weight, large-scale hybrid-attention reasoning model
Alibaba's high-performance LLM inference engine for diverse apps
On the Structural Pruning of Large Language Models
UCCL is an efficient communication library for GPUs
Implementation for MatMul-free LM
DeepSeek LLM: Let there be answers
Flagship MoE model for advanced reasoning, coding, and agents
Efficient MoE reasoning model for coding and math workloads
Open multimodal model for coding, agents, and long-context tasks
4-bit Command A+ model for enterprise agents and multilingual tasks
FP8 Qwen model for efficient multimodal coding and agent tasks