From nobody to big model (LLM) hero
MoBA: Mixture of Block Attention for Long-Context LLMs
Mastering Applied AI, One Concept at a Time
the terminal client for Ollama
How to optimize some algorithm in cuda
NeurIPS2025 Spotlight] Quantized Attention
A simple, easy-to-hack GraphRAG implementation
Open-source evaluation toolkit of large multi-modality models (LMMs)
General technology for enabling AI capabilities w/ LLMs and MLLMs
Llama Chinese community, real-time aggregation
Large Language Model Principles and Practice Tutorial from Scratch
Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
Full-stack AI Red Teaming platform
Open-source industrial-grade ASR models
A list of free LLM inference resources accessible via API
AI memory OS for LLM and Agent systems
Qwen3-ASR is an open-source series of ASR models
AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories
Spark-TTS Inference Code
A frontier, first-principles handbook
Analyzing Hacker News discussions from a decade ago in hindsight
Making RAG Simpler with Small and Open-Sourced Language Models
Marrying Grounding DINO with Segment Anything & Stable Diffusion
Fast-stable-diffusion + DreamBooth
A Pragmatic VLA Foundation Model