A high-throughput and memory-efficient inference and serving engine
950 line, minimal, extensible LLM inference engine built from scratch
Parallax is a distributed model serving framework
Alibaba's high-performance LLM inference engine for diverse apps
Low-latency REST API for serving text-embeddings
Open-source large language model family from Tencent Hunyuan
A high-performance inference engine for AI models
A simple, performant and scalable Jax LLM
An efficient forwarding service designed for LLMs
Open-Source Analytics Infrastructure
High-performance Inference and Deployment Toolkit for LLMs and VLMs
slime is an LLM post-training framework for RL Scaling
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Mooncake is the serving platform for Kimi
Tensor search for humans
Large-language-model & vision-language-model based on Linear Attention
Serving multiple LoRA finetuned LLM as one
Building Mixture-of-Experts from LLaMA with Continual Pre-training