Block Diffusion for Ultra-Fast Speculative Decoding
Official implementation of Watermark Anything with Localized Messages
Multimodal Diffusion with Representation Alignment
HY-Motion model for 3D character animation generation
Generate Any 3D Scene in Seconds
CogView4, CogView3-Plus and CogView3(ECCV 2024)
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
LLM-based Reinforcement Learning audio edit model
GLM-4 series: Open Multilingual Multimodal Chat LMs
Qwen3-omni is a natively end-to-end, omni-modal LLM
Inference code for scalable emulation of protein equilibrium ensembles
An Efficient Agentic Model for Computer Use
Audio foundation model excelling in audio understanding
Long-form streaming TTS system for multi-speaker dialogue generation
Fast-stable-diffusion + DreamBooth
A Pragmatic VLA Foundation Model
Collection of Gemma 3 variants that are trained for performance
LTX-Video Support for ComfyUI
VMZ: Model Zoo for Video Modeling
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Project Lyra: Open Generative 3D World Models
Inference script for Oasis 500M
A Production-ready Reinforcement Learning AI Agent Library
Hackable and optimized Transformers building blocks