A high-throughput and memory-efficient inference and serving engine
950 line, minimal, extensible LLM inference engine built from scratch
Parallax is a distributed model serving framework
Low-latency REST API for serving text-embeddings
Open-source large language model family from Tencent Hunyuan
A simple, performant and scalable Jax LLM
An efficient forwarding service designed for LLMs
High-performance Inference and Deployment Toolkit for LLMs and VLMs
slime is an LLM post-training framework for RL Scaling
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Tensor search for humans
Large-language-model & vision-language-model based on Linear Attention
Serving multiple LoRA finetuned LLM as one
Building Mixture-of-Experts from LLaMA with Continual Pre-training