Seamlessly integrate LLMs into scikit-learn
A high-throughput and memory-efficient inference and serving engine
An efficient forwarding service designed for LLMs
LLM abstractions that aren't obstructions
Unified KV Cache Compression Methods for Auto-Regressive Models
Cache-Augmented Generation: A Simple, Efficient Alternative to RAG
From Paper to Presentation in One Click
The collaborative spreadsheet for AI
Free ChatGPT&DeepSeek API Key
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
NeurIPS2025 Spotlight] Quantized Attention
Implement CPU from scratch and play with large model deployments
Towards Efficient Self-Evolving Agent System
95% token savings. 155x faster queries. 16 languages
This repository provides an advanced RAG
Implementation of "Tree of Thoughts