An AI-powered security review GitHub Action using Claude
Let us control diffusion models
PyTorch code and models for the DINOv2 self-supervised learning
Clean and efficient FP8 GEMM kernels with fine-grained scaling
Official implementation of DreamCraft3D
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Controllable & emotion-expressive zero-shot TTS
Foundation Models for Time Series
OCR expert VLM powered by Hunyuan's native multimodal architecture
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
A Customizable Image-to-Video Model based on HunyuanVideo
Unified Multimodal Understanding and Generation Models
Audio foundation model excelling in audio understanding
A Pragmatic VLA Foundation Model
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Collection of Gemma 3 variants that are trained for performance
A collection of high-quality models for the MuJoCo physics engine
code for Mesh R-CNN, ICCV 2019
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Memory-efficient and performant finetuning of Mistral's models
Tiny vision language model
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Inference script for Oasis 500M
A Production-ready Reinforcement Learning AI Agent Library
State-of-the-art Image & Video CLIP, Multimodal Large Language Models