Official inference framework for 1-bit LLMs
Ready-to-use OCR with 80+ supported languages
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
A library for accelerating Transformer models on NVIDIA GPUs
Unified Model Serving Framework
Bring the notion of Model-as-a-Service to life
The official Python client for the Huggingface Hub
A Customizable Image-to-Video Model based on HunyuanVideo
Inference Llama 2 in one file of pure C
Training and deploying machine learning models on Amazon SageMaker
AirLLM 70B inference with single 4GB GPU
Neural Network Compression Framework for enhanced OpenVINO
A set of Docker images for training and serving models in TensorFlow
Performance-optimized AI inference on your GPUs
Bayesian Modeling and Probabilistic Programming in Python
Efficient few-shot learning with Sentence Transformers
Sparsity-aware deep learning inference runtime for CPUs
Code for running inference and finetuning with SAM 3 model
950 line, minimal, extensible LLM inference engine built from scratch
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Faster Whisper transcription with CTranslate2
Minimal Python framework for scalable AI inference servers fast
Parallax is a distributed model serving framework
GLM-4.5: Open-source LLM for intelligent agents by Z.ai