A library for accelerating Transformer models on NVIDIA GPUs
Lightweight Python library for adding real-time multi-object tracking
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Libraries for applying sparsification recipes to neural networks
Replace OpenAI GPT with another LLM in your app
LLM training code for MosaicML foundation models
An MLOps framework to package, deploy, monitor and manage models
Low-latency REST API for serving text-embeddings
Openai style api for open large language models
Images to inference with no labeling
Probabilistic reasoning and statistical analysis in TensorFlow
Library for serving Transformers models on Amazon SageMaker
Open platform for training, serving, and evaluating language models
Unified Model Serving Framework
Fast inference engine for Transformer models
GPU environment management and cluster orchestration
A Unified Library for Parameter-Efficient Learning
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Deep learning optimization library: makes distributed training easy
State-of-the-art Parameter-Efficient Fine-Tuning
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
High quality, fast, modular reference implementation of SSD in PyTorch
Uncover insights, surface problems, monitor, and fine tune your LLM
A toolkit to optimize ML models for deployment for Keras & TensorFlow