A library for accelerating Transformer models on NVIDIA GPUs
LM Studio Apple MLX engine
DeepEP: an efficient expert-parallel communication library
A real time inference engine for temporal logical specifications
High-performance reactive message-passing based Bayesian engine
Ling is a MoE LLM provided and open-sourced by InclusionAI
A high-throughput and memory-efficient inference and serving engine
950 line, minimal, extensible LLM inference engine built from scratch
Open-source large language model family from Tencent Hunyuan
A high-performance inference engine for AI models
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Deep learning optimization library: makes distributed training easy
A lightweight vLLM implementation built from scratch
Jlama is a modern LLM inference engine for Java
lightweight, standalone C++ inference engine for Google's Gemma models
Alibaba's high-performance LLM inference engine for diverse apps
High-performance inference framework for large language models
Code for running inference with the SAM 3D Body Model 3DB
RGBD video generation model conditioned on camera input
Code for running inference and finetuning with SAM 3 model
Blazing fast, instant realtime GraphQL APIs on your DB
A Powerful Native Multimodal Model for Image Generation
OCR expert VLM powered by Hunyuan's native multimodal architecture
Offline inference engine for art, real-time voice conversations
Inference Llama 2 in one file of pure C