Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
FlashInfer: Kernel Library for LLM Serving
A library for accelerating Transformer models on NVIDIA GPUs
Ready-to-use OCR with 80+ supported languages
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Tensor search for humans
Build your chatbot within minutes on your favorite device
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
Simplifies the local serving of AI models from any source
State-of-the-art diffusion models for image and audio generation
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Pytorch domain library for recommendation systems
Large Language Model Text Generation Inference
PyTorch library of curated Transformer models and their components
Unified Model Serving Framework
Standardized Serverless ML Inference Platform on Kubernetes
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
High quality, fast, modular reference implementation of SSD in PyTorch
OpenMMLab Model Deployment Framework
A computer vision framework to create and deploy apps in minutes
Framework that is dedicated to making neural data processing
Database system for building simpler and faster AI-powered application
Serve machine learning models within a Docker container