User toolkit for analyzing and interfacing with Large Language Models
Implement CPU from scratch and play with large model deployments
Learn to build your Second Brain AI assistant with LLMs
Ongoing research training transformer models at scale
Real-time multi-AI collaboration: Claude, Codex & Gemini
Scalable RL solution for advanced reasoning of language models
Overcoming Group Chat Scenarios with LLM-based Technical Assistance
The Security Toolkit for LLM Interactions
slime is an LLM post-training framework for RL Scaling
Inference code for CodeLlama models
Qwen3-omni is a natively end-to-end, omni-modal LLM
Automatic question answering for local knowledge bases based on LLM
Tensor search for humans
Generative AI reference workflows
From Paper to Presentation in One Click
Unify Efficient Fine-tuning of RAG Retrieval, including Embedding
Large-language-model & vision-language-model based on Linear Attention
A Survey of Large Language Models
An efficient forwarding service designed for LLMs
Test-Time Reinforcement Learning
Minimal reproduction of OneRec
MoBA: Mixture of Block Attention for Long-Context LLMs
NeurIPS2025 Spotlight] Quantized Attention
General technology for enabling AI capabilities w/ LLMs and MLLMs
Llama Chinese community, real-time aggregation