Redundancy-aware KV Cache Compression for Reasoning Models
Spark-TTS Inference Code
Efficient few-shot learning with Sentence Transformers
Integrate, train and manage any AI models and APIs with your database
PArallel Distributed Deep LEarning: Machine Learning Framework
OpenVINO™ Toolkit repository
Code for running inference with the SAM 3D Body Model 3DB
The free, Open Source alternative to OpenAI, Claude and others
MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation
Open standard for machine learning interoperability
Audiocraft is a library for audio processing and generation
Automatic Speech Recognition with Word-level Timestamps
Technical principles related to large models
Pure C++ implementation of several models for real-time chatting
On-device AI across mobile, embedded and edge for PyTorch
PyTorch library of curated Transformer models and their components
Multilingual Automatic Speech Recognition with word-level timestamps
Python Package for ML-Based Heterogeneous Treatment Effects Estimation
NVIDIA plugin for secure installation of OpenClaw
Wan2.2: Open and Advanced Large-Scale Video Generative Model
A program that can do anything to earn money without human operators
Alibaba's high-performance LLM inference engine for diverse apps
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
Unified KV Cache Compression Methods for Auto-Regressive Models
Accessible large language models via k-bit quantization for PyTorch