LM Studio Apple MLX engine
A real time inference engine for temporal logical specifications
DeepSeek 4 Flash local inference engine for Metal
950 line, minimal, extensible LLM inference engine built from scratch
TokenSpeed is a speed-of-light LLM inference engine
A high-performance inference engine for AI models
A lightweight vLLM implementation built from scratch
Alibaba's high-performance LLM inference engine for diverse apps
High-performance inference framework for large language models
Code for running inference and finetuning with SAM 3 model
Fast Multimodal LLM on Mobile Devices
RGBD video generation model conditioned on camera input
Universal LLM Deployment Engine with ML Compilation
Offline inference engine for art, real-time voice conversations
User-friendly AI Interface
Mooncake is the serving platform for Kimi
LightLLM is a Python-based LLM (Large Language Model) inference
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
Parallax is a distributed model serving framework
Multi-Agent daTa geneRation Infra and eXperimentation framework
Extensible workflow development framework
Superduper: Integrate AI models and machine learning workflows
Running large language models on a single GPU
Fully private LLM chatbot that runs entirely with a browser