Probabilistic reasoning and statistical analysis in TensorFlow
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
PyTorch library of curated Transformer models and their components
LLM training code for MosaicML foundation models
Low-latency REST API for serving text-embeddings
Optimizing inference proxy for LLMs
Easiest and laziest way for building multi-agent LLMs applications
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Tensor search for humans
Powering Amazon custom machine learning chips
A general-purpose probabilistic programming system
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
LLMFlows - Simple, Explicit and Transparent LLM Apps
Run 100B+ language models at home, BitTorrent-style
Framework for Accelerating LLM Generation with Multiple Decoding Heads
A computer vision framework to create and deploy apps in minutes
Implementation of "Tree of Thoughts
Implementation of model parallel autoregressive transformers on GPUs
The deep learning toolkit for speech-to-text
CPU/GPU inference server for Hugging Face transformer models