Fast inference engine for Transformer models
C++ image processing and machine learning library with using of SIMD
oneAPI Deep Neural Network Library (oneDNN)
BitNet: Scaling 1-bit Transformers for Large Language Models
Official inference framework for 1-bit LLMs
Z80-μLM is a 2-bit quantized language model
AIMET is a library that provides advanced quantization and compression
Oobabooga - The definitive Web UI for local AI, with powerful features
A state-of-the-art open visual language model
Wan2.2: Open and Advanced Large-Scale Video Generative Model
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
FlashMLA: Efficient Multi-head Latent Attention Kernels
Accessible large language models via k-bit quantization for PyTorch
A Powerful Native Multimodal Model for Image Generation
A scientific machine learning (SciML) wrapper for the FEniCS
Low-code framework for building custom LLMs, neural networks
ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat
100–200× Acceleration for Video Diffusion Models
The leading agent orchestration platform for Claude
Official implementation of Watermark Anything with Localized Messages
Open Source Document Management System for Digital Archives
A library for accelerating Transformer models on NVIDIA GPUs
NeurIPS2025 Spotlight] Quantized Attention
PhantomBot is an actively developed open source interactive Twitch bot
Package that makes it trivial to create and evaluate machine learning