A list of free LLM inference resources accessible via API
A high-throughput and memory-efficient inference and serving engine
From Paper to Presentation in One Click
Unified KV Cache Compression Methods for Auto-Regressive Models
Redundancy-aware KV Cache Compression for Reasoning Models
Seamlessly integrate LLMs into scikit-learn
LLM abstractions that aren't obstructions
UCCL is an efficient communication library for GPUs
Cache-Augmented Generation: A Simple, Efficient Alternative to RAG
A git prepare-commit-msg hook for authoring commit messages with GPT-3
Full-stack Open-source Self-Evolving General AI Agent
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
Masks sensitive data and secrets before they reach AI
The best way to use and work with blocks
Summarize Prompt & LLM papers, open source data & models
Extract and convert data from any document, images, pdfs, word doc
Collection of tutorials for Prompt Engineering techniques
An efficient forwarding service designed for LLMs
Research and application of technologies such as nl processing
NeurIPS2025 Spotlight] Quantized Attention
Demystify RAG by building it from scratch
Towards Efficient Self-Evolving Agent System
Sharing knowledge about big models that everyone can understand
The paper list of the 86-page SCIS cover paper
95% token savings. 155x faster queries. 16 languages