A list of free LLM inference resources accessible via API
A high-throughput and memory-efficient inference and serving engine
From Paper to Presentation in One Click
Unified KV Cache Compression Methods for Auto-Regressive Models
Redundancy-aware KV Cache Compression for Reasoning Models
Seamlessly integrate LLMs into scikit-learn
Free ChatGPT&DeepSeek API Key
LLM abstractions that aren't obstructions
Cache-Augmented Generation: A Simple, Efficient Alternative to RAG
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
NeurIPS2025 Spotlight] Quantized Attention
An efficient forwarding service designed for LLMs
Towards Efficient Self-Evolving Agent System
95% token savings. 155x faster queries. 16 languages
The collaborative spreadsheet for AI
Implement CPU from scratch and play with large model deployments
This repository provides an advanced RAG
Implementation of "Tree of Thoughts