Large Language Model Text Generation Inference
Official inference library for Mistral models
Images to inference with no labeling
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
GPU environment management and cluster orchestration
PyTorch library of curated Transformer models and their components
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Visual Instruction Tuning: Large Language-and-Vision Assistant
Open-source tool designed to enhance the efficiency of workloads
State-of-the-art Parameter-Efficient Fine-Tuning
Superduper: Integrate AI models and machine learning workflows
A high-performance ML model serving framework, offers dynamic batching
Framework that is dedicated to making neural data processing
MII makes low-latency and high-throughput inference possible
Trainable models and NN optimization tools
PyTorch extensions for fast R&D prototyping and Kaggle farming
Probabilistic reasoning and statistical analysis in TensorFlow
Low-latency REST API for serving text-embeddings
A library for accelerating Transformer models on NVIDIA GPUs
Standardized Serverless ML Inference Platform on Kubernetes
Replace OpenAI GPT with another LLM in your app
LLM training code for MosaicML foundation models
Open platform for training, serving, and evaluating language models
Uncover insights, surface problems, monitor, and fine tune your LLM