Multilingual Automatic Speech Recognition with word-level timestamps
State-of-the-art diffusion models for image and audio generation
PyTorch extensions for fast R&D prototyping and Kaggle farming
The Triton Inference Server provides an optimized cloud
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Data manipulation and transformation for audio signal processing
GPU environment management and cluster orchestration
AIMET is a library that provides advanced quantization and compression
Phi-3.5 for Mac: Locally-run Vision and Language Models
A Unified Library for Parameter-Efficient Learning
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
MII makes low-latency and high-throughput inference possible
PyTorch library of curated Transformer models and their components
An MLOps framework to package, deploy, monitor and manage models
Create HTML profiling reports from pandas DataFrame objects
Database system for building simpler and faster AI-powered application
Fast inference engine for Transformer models
High quality, fast, modular reference implementation of SSD in PyTorch
A toolkit to optimize ML models for deployment for Keras & TensorFlow
Unified Model Serving Framework
Low-latency REST API for serving text-embeddings
A library for accelerating Transformer models on NVIDIA GPUs
Standardized Serverless ML Inference Platform on Kubernetes
Deep learning optimization library: makes distributed training easy
Replace OpenAI GPT with another LLM in your app