INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Standardized Serverless ML Inference Platform on Kubernetes
Optimizing inference proxy for LLMs
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Large Language Model Text Generation Inference
Trainable models and NN optimization tools
Probabilistic reasoning and statistical analysis in TensorFlow
Build your chatbot within minutes on your favorite device
Easiest and laziest way for building multi-agent LLMs applications
Efficient few-shot learning with Sentence Transformers
Create HTML profiling reports from pandas DataFrame objects
Official inference library for Mistral models
Open-source tool designed to enhance the efficiency of workloads
A library for accelerating Transformer models on NVIDIA GPUs
20+ high-performance LLMs with recipes to pretrain, finetune at scale
GPU environment management and cluster orchestration
Phi-3.5 for Mac: Locally-run Vision and Language Models
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
Library for serving Transformers models on Amazon SageMaker
Deep learning optimization library: makes distributed training easy
PyTorch library of curated Transformer models and their components
State-of-the-art Parameter-Efficient Fine-Tuning
Open platform for training, serving, and evaluating language models
Easy-to-use Speech Toolkit including Self-Supervised Learning model
OpenMMLab Model Deployment Framework