Operating LLMs in production
Run 100B+ language models at home, BitTorrent-style
Openai style api for open large language models
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Visual Instruction Tuning: Large Language-and-Vision Assistant
Libraries for applying sparsification recipes to neural networks
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Neural Network Compression Framework for enhanced OpenVINO
Efficient few-shot learning with Sentence Transformers
A Unified Library for Parameter-Efficient Learning
PyTorch library of curated Transformer models and their components
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
The unofficial python package that returns response of Google Bard
Open platform for training, serving, and evaluating language models
A high-performance ML model serving framework, offers dynamic batching
Framework that is dedicated to making neural data processing
LLMFlows - Simple, Explicit and Transparent LLM Apps
Build your chatbot within minutes on your favorite device
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Low-latency REST API for serving text-embeddings
Implementation of "Tree of Thoughts
Implementation of model parallel autoregressive transformers on GPUs
A computer vision framework to create and deploy apps in minutes
The deep learning toolkit for speech-to-text
CPU/GPU inference server for Hugging Face transformer models