Code and models for ICML 2024 paper, NExT-GPT
Inference script for Oasis 500M
Official PyTorch Implementation
State-of-the-art (SoTA) text-to-video pre-trained model
Modular AI image and video generation web UI with extensible tools
A Unified Framework for Image Customization
A SOTA open-source image editing model
High-Fidelity and Controllable Generation of Textured 3D Assets
Marrying Grounding DINO with Segment Anything & Stable Diffusion
Virtual AI anchor that combines state-of-the-art technology
A PyTorch library for implementing flow matching algorithms
Official inference repo for FLUX.1 models
A Powerful Native Multimodal Model for Image Generation
text and image to video generation: CogVideoX (2024) and CogVideo
A fast TTS architecture with conditional flow matching
Generating Immersive, Explorable, and Interactive 3D Worlds
State-of-the-art Parameter-Efficient Fine-Tuning
An Open Source text-to-speech system built by inverting Whisper
MII makes low-latency and high-throughput inference possible
Qwen3-omni is a natively end-to-end, omni-modal LLM
DepGraph: Towards Any Structural Pruning
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Global weather forecasting model using graph neural networks and JAX
A Universal Customization Method for Single and Multi Conditioning
Flexible Photo Recrafting While Preserving Your Identity