950 line, minimal, extensible LLM inference engine built from scratch
Adding guardrails to large language models
Seamlessly integrate LLMs into scikit-learn
State-of-the-art Parameter-Efficient Fine-Tuning
TokenSpeed is a speed-of-light LLM inference engine
Multi-source content processor for NotebookLM
Test-Time Reinforcement Learning
Bridging LLM and Recommender System
Semi-Structured Agentic Framework. Workflows build themselves
A Gym environment for web task automation
Parallax is a distributed model serving framework
Minimal reproduction of OneRec
Redundancy-aware KV Cache Compression for Reasoning Models
AI-powered tool for efficient abstract and PDF screening
The official implementation of RAPTOR
AI-driven multi-agent research assistant automating hypothesis
Synthetic data curation for post-training and data extraction
A high-quality PDF to Markdown tool based on large language model
Search all of YouTube from the command line
Specify a github or local repo, github pull request
From nobody to big model (LLM) hero
Deploy your agentic worfklows to production
Mastering Applied AI, One Concept at a Time
Modular AI runtime for robots
NeurIPS2025 Spotlight] Quantized Attention