Openai style api for open large language models
Run Local LLMs on Any Device. Open-source
Low-latency REST API for serving text-embeddings
The Triton Inference Server provides an optimized cloud
Easiest and laziest way for building multi-agent LLMs applications
Optimizing inference proxy for LLMs
The unofficial python package that returns response of Google Bard
A library for accelerating Transformer models on NVIDIA GPUs
Large Language Model Text Generation Inference
Replace OpenAI GPT with another LLM in your app
Operating LLMs in production
Simplifies the local serving of AI models from any source
Bring the notion of Model-as-a-Service to life
Data manipulation and transformation for audio signal processing
Unified Model Serving Framework
Python Package for ML-Based Heterogeneous Treatment Effects Estimation
Library for OCR-related tasks powered by Deep Learning
A high-performance ML model serving framework, offers dynamic batching
Framework that is dedicated to making neural data processing
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Run 100B+ language models at home, BitTorrent-style
Implementation of "Tree of Thoughts
Training & Implementation of chatbots leveraging GPT-like architecture