Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
Everything you need to build state-of-the-art foundation models
Uncover insights, surface problems, monitor, and fine tune your LLM
A toolkit to optimize ML models for deployment for Keras & TensorFlow
A library for accelerating Transformer models on NVIDIA GPUs
GPU environment management and cluster orchestration
Training and deploying machine learning models on Amazon SageMaker
Operating LLMs in production
Multilingual Automatic Speech Recognition with word-level timestamps
Replace OpenAI GPT with another LLM in your app
A set of Docker images for training and serving models in TensorFlow
Openai style api for open large language models
Phi-3.5 for Mac: Locally-run Vision and Language Models
A Unified Library for Parameter-Efficient Learning
State-of-the-art Parameter-Efficient Fine-Tuning
Python Package for ML-Based Heterogeneous Treatment Effects Estimation
State-of-the-art diffusion models for image and audio generation
A unified framework for scalable computing
Deep learning optimization library: makes distributed training easy
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Libraries for applying sparsification recipes to neural networks
Gaussian processes in TensorFlow
Single-cell analysis in Python
FlashInfer: Kernel Library for LLM Serving