Official inference framework for 1-bit LLMs
Z80-μLM is a 2-bit quantized language model
AIMET is a library that provides advanced quantization and compression
Oobabooga - The definitive Web UI for local AI, with powerful features
A state-of-the-art open visual language model
Accessible large language models via k-bit quantization for PyTorch
An easy-to-use LLMs quantization package with user-friendly apis
A library for accelerating Transformer models on NVIDIA GPUs
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
PyTorch library of curated Transformer models and their components
Open Source Document Management System for Digital Archives
NeurIPS2025 Spotlight] Quantized Attention
Official implementation of Watermark Anything with Localized Messages
Low-code framework for building custom LLMs, neural networks
High-performance Inference and Deployment Toolkit for LLMs and VLMs
ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat
100–200× Acceleration for Video Diffusion Models
Capable of understanding text, audio, vision, video
Open platform for training, serving, and evaluating language models
Visual Instruction Tuning: Large Language-and-Vision Assistant
Implementation of Recurrent Interface Network (RIN)
Text to Speech Utility
A graphical manager for ollama that can manage your LLMs
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Basaran, an open-source alternative to the OpenAI text completion API