A library for accelerating Transformer models on NVIDIA GPUs
LM Studio Apple MLX engine
A real time inference engine for temporal logical specifications
DeepEP: an efficient expert-parallel communication library
High-performance reactive message-passing based Bayesian engine
Ling is a MoE LLM provided and open-sourced by InclusionAI
A high-throughput and memory-efficient inference and serving engine
950 line, minimal, extensible LLM inference engine built from scratch
Open-source large language model family from Tencent Hunyuan
A high-performance inference engine for AI models
A lightweight vLLM implementation built from scratch
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Deep learning optimization library: makes distributed training easy
Jlama is a modern LLM inference engine for Java
Blazing fast, instant realtime GraphQL APIs on your DB
lightweight, standalone C++ inference engine for Google's Gemma models
RGBD video generation model conditioned on camera input
Alibaba's high-performance LLM inference engine for diverse apps
Code for running inference and finetuning with SAM 3 model
High-performance inference framework for large language models
A Powerful Native Multimodal Model for Image Generation
Code for running inference with the SAM 3D Body Model 3DB
OCR expert VLM powered by Hunyuan's native multimodal architecture
Offline inference engine for art, real-time voice conversations
Inference Llama 2 in one file of pure C