Powerful AI language model (MoE) optimized for efficiency/performance
Easy token price estimates for 400+ LLMs. TokenOps
Open-source, high-performance AI model with advanced reasoning
Compress tool outputs, logs, files, and RAG chunks
Minimal reproduction of OneRec
Real-time multi-AI collaboration: Claude, Codex & Gemini
MoBA: Mixture of Block Attention for Long-Context LLMs
User toolkit for analyzing and interfacing with Large Language Models
An efficient forwarding service designed for LLMs
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Agentic, Reasoning, and Coding (ARC) foundation models
Large-language-model & vision-language-model based on Linear Attention
Open-weight, large-scale hybrid-attention reasoning model
Qwen3 is the large language model series developed by Qwen team
Uncertainty Quantification for Language Models, is a Python package
Performance-optimized AI inference on your GPUs
High-performance Inference and Deployment Toolkit for LLMs and VLMs
A Telegram bot for Large Language Models
A course of learning LLM inference serving on Apple Silicon
LightLLM is a Python-based LLM (Large Language Model) inference
TokenSpeed is a speed-of-light LLM inference engine
Korea Investment & Securities Open API Github
Unified KV Cache Compression Methods for Auto-Regressive Models
Traditional Mandarin LLMs for Taiwan
Autoregressive Model Beats Diffusion