Persistent context and multi-instance coordination
Block Diffusion for Ultra-Fast Speculative Decoding
Multimodal embedding and reranking models built on Qwen3-VL
SimpleMem: Efficient Lifelong Memory for LLM Agents
A New Axis of Sparsity for Large Language Models
General plug-and-play inference library for Recursive Language Models
Z80-μLM is a 2-bit quantized language model
Anthropic's original performance take-home, now open for you to try
PersonaPlex code
Socket.IO integration for Flask applications
The async Python driver for MongoDB and Tornado or asyncio
Simplifies the local serving of AI models from any source
Collection of Gemma 3 variants that are trained for performance
Language Model Reinforcement Learning Environments frameworks
Collection of reference environments, offline reinforcement learning
Simple and easily configurable grid world environments
Curl cryptocurrencies exchange rates
Enables the best performance on NVIDIA RTX Graphics Cards
Spanish-language course repository that teaches fundamentals of SQL
A minimal, modern Python project template
LLM training in simple, raw C/CUDA
High-quality implementations of standard and SOTA methods
bsuite is a collection of carefully-designed experiments
A static type analyzer for Python code
Fast and accurate AI powered file content types detection