Deep learning optimization library: makes distributed training easy
Unified KV Cache Compression Methods for Auto-Regressive Models
Implementation of TurboQuant (ICLR 2026)
Redundancy-aware KV Cache Compression for Reasoning Models
The highest-scoring AI memory system ever benchmarked
SOTA discrete acoustic codec models with 40/75 tokens per second
14-stage Fusion Pipeline for LLM token compression
Running large language models on a single GPU
A tension reasoning engine over 131 S-class problems
From-scratch PyTorch implementation of Google's TurboQuant
Contexts Optical Compression
LCM (Lossless Context Management) plugin for OpenClaw
DepGraph: Towards Any Structural Pruning
Python SDK for the Computer Use model Lux, developed by OpenAGI
Advanced RAG cookbooks for building accurate LLM applications
On the Structural Pruning of Large Language Models
Advanced techniques for RAG systems
An implementation of a deep learning recommendation model (DLRM)
The official repository for ERNIE 4.5 and ERNIEKit
ADAMS is a workflow engine for building complex knowledge workflows.
10x faster matrix and vector operations
Realtime bigdata tool for bit strings up to 2^63 based on AVL forest