A unified framework for scalable computing
Large Language Model Text Generation Inference
Low-latency REST API for serving text-embeddings
Replace OpenAI GPT with another LLM in your app
Integrate, train and manage any AI models and APIs with your database
Pytorch domain library for recommendation systems
Simplifies the local serving of AI models from any source
Visual Instruction Tuning: Large Language-and-Vision Assistant
Lightweight Python library for adding real-time multi-object tracking
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Uncover insights, surface problems, monitor, and fine tune your LLM
Superduper: Integrate AI models and machine learning workflows
A high-performance ML model serving framework, offers dynamic batching
Libraries for applying sparsification recipes to neural networks
An easy-to-use LLMs quantization package with user-friendly apis
Phi-3.5 for Mac: Locally-run Vision and Language Models
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
State-of-the-art Parameter-Efficient Fine-Tuning
Optimizing inference proxy for LLMs
Neural Network Compression Framework for enhanced OpenVINO
Openai style api for open large language models
Sparsity-aware deep learning inference runtime for CPUs
Images to inference with no labeling
A library for accelerating Transformer models on NVIDIA GPUs
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction