A library for accelerating Transformer models on NVIDIA GPUs
LLM training code for MosaicML foundation models
Tensor search for humans
DoWhy is a Python library for causal inference
A library to communicate with ChatGPT, Claude, Copilot, Gemini
AI interface for tinkerers (Ollama, Haystack RAG, Python)
OpenAI swift async text to image for SwiftUI app using OpenAI
An easy-to-use LLMs quantization package with user-friendly apis
Python Package for ML-Based Heterogeneous Treatment Effects Estimation
Operating LLMs in production
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
LLMFlows - Simple, Explicit and Transparent LLM Apps
Framework for Accelerating LLM Generation with Multiple Decoding Heads
Phi-3.5 for Mac: Locally-run Vision and Language Models
Efficient few-shot learning with Sentence Transformers
A Unified Library for Parameter-Efficient Learning
Official inference library for Mistral models
Toolbox of models, callbacks, and datasets for AI/ML researchers
Images to inference with no labeling
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
GPU environment management and cluster orchestration
PyTorch library of curated Transformer models and their components
Framework that is dedicated to making neural data processing
MII makes low-latency and high-throughput inference possible