Build your chatbot within minutes on your favorite device
GPU environment management and cluster orchestration
A toolkit to optimize ML models for deployment for Keras & TensorFlow
Simplifies the local serving of AI models from any source
Official inference library for Mistral models
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
A Unified Library for Parameter-Efficient Learning
Replace OpenAI GPT with another LLM in your app
The Triton Inference Server provides an optimized cloud
Probabilistic reasoning and statistical analysis in TensorFlow
Standardized Serverless ML Inference Platform on Kubernetes
Libraries for applying sparsification recipes to neural networks
State-of-the-art Parameter-Efficient Fine-Tuning
MII makes low-latency and high-throughput inference possible
A unified framework for scalable computing
Sparsity-aware deep learning inference runtime for CPUs
Large Language Model Text Generation Inference
Multilingual Automatic Speech Recognition with word-level timestamps
Python Package for ML-Based Heterogeneous Treatment Effects Estimation
Easiest and laziest way for building multi-agent LLMs applications
Efficient few-shot learning with Sentence Transformers
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Data manipulation and transformation for audio signal processing
A Pythonic framework to simplify AI service building
Adversarial Robustness Toolbox (ART) - Python Library for ML security