Fast inference engine for Transformer models
Replace OpenAI GPT with another LLM in your app
A RWKV management and startup tool, full automation, only 8MB
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
A library for accelerating Transformer models on NVIDIA GPUs
An easy-to-use LLMs quantization package with user-friendly apis
Visual Instruction Tuning: Large Language-and-Vision Assistant
A graphical manager for ollama that can manage your LLMs
Implementation of model parallel autoregressive transformers on GPUs
Training & Implementation of chatbots leveraging GPT-like architecture
Fast and user-friendly runtime for transformer inference