Run Local LLMs on Any Device. Open-source
Port of Facebook's LLaMA model in C/C++
Ready-to-use OCR with 80+ supported languages
A high-throughput and memory-efficient inference and serving engine
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Optimizing inference proxy for LLMs
Bring the notion of Model-as-a-Service to life
FlashInfer: Kernel Library for LLM Serving
Visual Instruction Tuning: Large Language-and-Vision Assistant
Library for OCR-related tasks powered by Deep Learning
A high-performance ML model serving framework, offers dynamic batching
Large Language Model Text Generation Inference
Libraries for applying sparsification recipes to neural networks
Replace OpenAI GPT with another LLM in your app
Sparsity-aware deep learning inference runtime for CPUs
Phi-3.5 for Mac: Locally-run Vision and Language Models
Easiest and laziest way for building multi-agent LLMs applications
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Neural Network Compression Framework for enhanced OpenVINO
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
LLM training code for MosaicML foundation models
Open platform for training, serving, and evaluating language models
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Tensor search for humans
Powering Amazon custom machine learning chips