A unified framework for scalable computing
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Libraries for applying sparsification recipes to neural networks
Sparsity-aware deep learning inference runtime for CPUs
An easy-to-use LLMs quantization package with user-friendly apis
Operating LLMs in production
Integrate, train and manage any AI models and APIs with your database
Database system for building simpler and faster AI-powered application
Lightweight Python library for adding real-time multi-object tracking
Pytorch domain library for recommendation systems
LLMFlows - Simple, Explicit and Transparent LLM Apps
Framework for Accelerating LLM Generation with Multiple Decoding Heads
Build your chatbot within minutes on your favorite device
Phi-3.5 for Mac: Locally-run Vision and Language Models
Neural Network Compression Framework for enhanced OpenVINO
Openai style api for open large language models
A Unified Library for Parameter-Efficient Learning
Large Language Model Text Generation Inference
Superduper: Integrate AI models and machine learning workflows
Images to inference with no labeling
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
GPU environment management and cluster orchestration
A high-performance ML model serving framework, offers dynamic batching
PyTorch library of curated Transformer models and their components
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere